Problem with CJK space: when a release is submitted or edited CJK spaces are replaced by ASCII spaces

Tags: #<Tag:0x00007fe310c55288> #<Tag:0x00007fe310c55198>

I have submitted a new release by using the web editor; it can be found here.

The track titles are in Japanese; track 1 and track 11 contain a CJK space. After submitting the release I checked the result and found the CJK space had been replaced by the standard ASCII space. I then tried editing the release and correcting the character but every attempt was unsuccessful. If I just replace the ASCII symbol for the CJK space I cannot submit the changes and a warning with the following message is shown: β€œYou haven’t made any changes!”. I didn’t try to edit the recordings individually though.

The correct track titles should be as follows (copy and paste from the official web page):

  • Track 1: η«₯謑確譚~序~ さくらさくら
  • Track 11: η«₯謑確譚~硐~ ι€šγ‚Šγ‚ƒγ‚“γ› γ»γŸγ‚‹γ“γ„ γšγ„γšγ„γšγ£γ“γ‚γ°γ— かごめかごめ

(Edited because after submiting this post I found that the CJK spaces used in the track titles above were replaced by standark ASCII spaces).

But this is how they are registered in Musicbrainz:

  • Track 1: η«₯謑確譚~序~ さくらさくら
  • Track 11: η«₯謑確譚~硐~ ι€šγ‚Šγ‚ƒγ‚“γ› γ»γŸγ‚‹γ“γ„ γšγ„γšγ„γšγ£γ“γ‚γ°γ— かごめかごめ

I know there are track titles and recordings containing a CJK space, for example in Rurutia’s album Chorion. In this example I mention if one edits the release, opens the track parser and accepts the data without changing anything the CJK spaces are replaced by ASCII characters. Same happens if I change a track title without opening the track parser and click on Next button: in the confirmation page I can see the CJK spaces have been replaced by ASCII spaces. So I wonder myself how this realese was submitted and preserve the CJK spaces.

Any idea what’s hapenning? Could it be a bug in the track parser? How could I edit the release in order to replace the ASCII spaces and use CJK spaces instead?

I think this might be intentional - I think we standardize many spaces to standard spaces. Not sure whether that is a good thing or not, though - what’s the intended use of the CJK space as opposed to the ASCII one?

I think there is not difference of use (at least as far as I know, since I am not a Japanese speaker). I suppose it is a result of a different charset used in Japanese. There are different symbols used in Japanese than their counterparts in Latin script (v.g. quotation mark, question mark, slash, dash, end mark, etc.). I suppose they are different so that they fit better with the shape and size of the Japanese script and make easier the reading. I think it would be appropiated to preserve them instead of using the translated symbols of the Latin script.

1 Like

I think there might have been a discussion about it in old forum.
I just found this back:

I did enter some releases wit CJK ideographic space U+3000 before the MBS change that now removes them.

1 Like

They have the same width as other CJK characters. Given that all characters are usually of same width this makes for a more consistent and readable presentation. It maybe doesn’t matter that much for only a short title, but in a block of text it makes a difference.

I think this is definitely one case where the normalization of spaces is not good. I think we should treat it like quotation marks: it is OK to use ASCII spaces, but if somebody wants to use the proper character it should be possible.


Related ticket :