A definitive answer on full-width vs. half-width punctuation?

japanese
style
unicode
Tags: #<Tag:0x00007fe3d049d408> #<Tag:0x00007fe3d049d278> #<Tag:0x00007fe3d049d138>

#1

This seems to be a running issue but I was just looking for a more definitive answer on full-width vs. half-width punctuation in Japanese releases. This isn’t necessarily about a specific release but more that most of what I contribute to MusicBrainz is Japanese releases and I’m always getting caught by this particular issue - especially with exclamation points. I know it’s best to stick to what’s printed but that’s not always very clear. The internet isn’t a reliable source for it either really because it’s usually a mixture of the two there. I just wondered what the preference is if it’s not clear. Always half-width? Is it better to match to whatever script the title is in? Even if it’s a mixture of scripts?

I have tried to look into this issue as well but the documentation for this doesn’t really clear it up any further in regards to when it’s unclear, so I’m hoping somebody on here will be able to give me a hand.

Thank you!


#2

I would also like to see some clarification in the documentation about this. I think the most common full-width punctuation you see are these ones: ?!: /

But I’m not sure if there should be any hard rule about this, since in my experience, even official discography / record label websites mix these up. They may have a half-width symbol where the actual release uses a full-width one, or vice versa.


#3

I’ve been thinking about this, and came up with the following convention:

  • Use the width of the character that’s most commonly used in the release and official sources (if available).
  • If you can’t distinguish whether a punctuation character is full-width or half-width, or if the release and official sources use contradicting widths for the character:
    • Use a full-width character if the preceding character is also full-width:
      • e.g. スタートライン!, not スタートライン!
    • Use a half-width character if the preceding character is also half-width:
      • e.g. Let's Dancing!, not Let's Dancing!
    • If there are two or more consecutive exclamation/question marks, use their half-width versions:
      • e.g. アイカツスターズ!の音楽!!, not アイカツスターズ!の音楽!!
  • When a full-width punctuation character is used in the middle of a title and the character itself includes an empty space after the visible part of the character (e.g. or ), do not insert a separate space character after it – unless there is a clear additional whitespace after it, usually a full-width space (U+3000 IDEOGRAPHIC SPACE, I can’t demonstrate it here because apparently Discourse converts it into a normal space).

Feedback is welcome. This is based on how I usually see these characters used in titles and Japanese text in general. But knowing how unique the naming of titles in Japanese releases can be, I’m sure there are some edge cases that don’t fit in this convention.


#4

Very good!
It is a pity the old forum is not accessible because we had lots of discussion around those.
The benefit of fullwidth punctuation is that they have the same box size as all characters, as required, and that they also rotate properly in vertical text.
They also belong to CJK punctuation pane for most of them, by the way.
Sometimes I think I have used a character called double exclamation mark also when I saw them like being in a same box, oblique.


#5

At least in Japanese release, characters named full-width or half-width by Unicode Standard should not be used. There is no rules to convert these characters between Unicode and JIS X 0213.
(JIS X 0213 is the latest character encodingsets standard in Japan.)


#6

Thanks.
Please link to those rules. :slight_smile:


#7

You can search JIS X 0213 standard from http://www.jisc.go.jp/app/jis/general/GnrJISSearch.html .
Another source is here: https://ja.wikipedia.org/wiki/JIS_X_0213非漢字一覧 . This is full list of non-kanji characters in JIS X 0213. Mostly all values in the “Unicode” column is not in the range of “Halfwidth and Fullwidth Forms (U+FF00-FFEF)”
Generally, JIS X 0213 used with ISO-8859 or JIS X 0201. ISO-8859 does not contain halfwidth or fullwidth forms. JIS X 0201 does contains halfwidth katakana, corner blackets, ideographic comma/fullstop and bullet, but does not contains fullwidth forms.


#8

Maybe I’m missing something, but I don’t quite understand where you are aiming at. Internally, MusicBrainz uses the Unicode character set for all data. The website is encoded in UTF-8, which can encode all code points in Unicode and is the default character encoding for HTML5. As far as I can see, the limitations of the JIS X 0213 standard are not a concern for us. If you use Picard and don’t want to use Unicode punctuation in your tags, the program offers an option to convert all punctuation to ASCII.

The style guideline for Japanese releases states: “characters should be used as-is”.

I thought about mentioning the double exclamation mark. On some fonts it looks exactly the same as two normal exclamation marks, though. The best case scenario for distinguishing the symbol is if the release uses both a single (half-width) exclamation mark and the double exclamation mark in the same typeface — but even then you wouldn’t be able to tell for sure.


#9

Hmm, Japanese standard should not be considered in Japanese release. That’s one way of looking at it.

  • “Halfwidth and Fullwidth Forms” are duplicate characters
    Many characters in “Halfwidth and Fullwidth Forms” come from JIS X 0201 or JIS X 0208. Why JIS X 0213 does not define rules to convert these characters from/to Unicode? because these characters are duplicate character.
    Unicode standard defines these characters for compatibility.
  • What is character width?
    JIS does not defined character width. Unicode also does not.
    Texts are rendered as proportional in modern computer environment. If “m” is looked as double width of “n”, do I have to use fullwidth “m” ?
    The JASRAC database (http://www2.jasrac.or.jp/eJwid/) use only JIS X 0208 character set for main artist names and main titles. If following the “characters should be used as-is” rule, I can not use halfwidth characters include ASCII character.
    (JIS X 0208 is wrongly called fullwidth character set popularly)

#10

The brackets and other borrowed punctuation marks like exclamation, interrogation, colon, etc. get their own full box in written Japanese, as all the characters alike.
See the columns 3 to 5 on https://en.m.wikipedia.org/wiki/File:Genkoyoshi.svg from https://en.m.wikipedia.org/wiki/Japanese_punctuation#Space


#11

No problem. Punctuations defined in out of “Halfwidth and Fullwidth Forms” range can be rendered as full width or any width.


#12

I’m still not sure how JIS character standards and their conversion problems with Unicode is related to the original question of this thread: whether to use full-width or half-width punctuation. Unicode (which, again, MB uses) supports both variations.

Pretty much the only reason that something like Shift JIS(-2004) is still used is because JIS encodings used to be the only way to encode Japanese. That hasn’t been the case for a long time, though. Every single character from JIS X 0213 — be it full-width or half-width — can be represented in Unicode since 2002. Japanese companies use the JIS standards because they’ve been doing it for years, it works for them, and changing all company documents and systems to a different encoding might not be straightforward and requires time and resources. Nevertheless, the usage for those standards has been slowly declining. For example: As of August 2018, only 0.5 % of websites use Shift JIS. In October 2009, the percentage was still 2.7 %.

Performance rights databases like JASRAC are good for finding work identifiers (ISWC), but they are not a preferred source for titles. The release itself is the most important source for title styling. Official discography / record label websites come after that.


#13

Japanese locale Microsoft Windows still use cp932 (Microsoft’s Shift JIS variant). Cp932 wave dash is wrongly mapped to Unicode full-width tilde (U+FF5E). So we can easily detect text or documents are converted from cp932. Many documents in Internet (include Musicbrainz) contains full-width tilde.
JASRAC’s database information are registered by artists or their agents. It is the reliable source.


#14

Can be rendered as expected or is rendered as expected in default setting?


#15

Default setting is proportional. It is neither full-width nor half-width.


#16

So we should indeed use the full box punctuation when available.


#17

Punctuations defined in out of “Halfwidth and Fullwidth Forms” range can be rendered as full box. Default setting of all characters including “Halfwidth and Fullwidth Forms” is proportional.


#18

As far as I can see, fullwidth parenthesis are closer to correct box spacing (and rotate nicely for vertical text).
I am not on a PC to make screenshots ATM. The old forum did contain my screenshots IIRC. :thinking: