Correct hyphen: Unicode HYPHEN or HYPHEN-MINUS

ASCII originally used some code points for multiple purposes. The most well-known cases are the " and ' (“typewriter quotation mark” and “typewriter apostrophe”), but - was also used both for the regular hyphen, various dashes and the minus sign.

Unicode “unsplit” those by making separate “”″ ‘’′ code points and a number of dashes, too. It also introduced U+2010 as an unambiguous way to designate a hyphen (whereas a - could be a legacy minus sign or dash). However, unlike the quotation/prime marks, and unlike the dashes, the hyphen and the hyphen-minus look identical, so interest in actually using U+2010 has remained rather low. Personally, I don’t think it makes much sense, either.

We should probably consider them as “quasi” canonically equivalent and convert all input to one of them consistently.

I had a look at the database: Currently, 558545 recording titles contain a hyphen-minus, and 4543 contain a U+2010 hyphen.

5 Likes