Correct hyphen: Unicode HYPHEN or HYPHEN-MINUS


#21

I don’t see any advantage in converting U+002D HYPHEN-MINUS to U+2010 HYPHEN, either manually or programmatically. As far as I’m aware, almost all fonts will have the HYPHEN-MINUS glyph produce a typographically correct hyphen.
I think @kepstin’s comment on STYLE-721 is correct:

If you’re seeing the HYPHEN-MINUS and HYPHEN with different appearances, the most likely cause is that the font you’re using has a HYPHEN-MINUS but doesn’t have a HYPHEN, so the HYPHEN is being pulled from a different, fallback font.
Based on my experiences, and general usage in digital text, my recommendation would be to prefer the HYPHEN-MINUS character for normal hyphens, but recommend the correct unicode character for minus signs and other types of dashes.


#22

The advantage would be that any text processing happening on the data would know specifically whether it’s a hyphen, a minus, or some kind of dash—in line with the “be more specific than generic if possible” style principle (which is only really verbalised in Prefer Specific Relationship Types at the moment).


#23

Yes, it’s advantageous to differentiate between hyphens, dashes, and minuses; but we don’t need to use U+2010 HYPHEN to achieve this. Rather than correcting every instance of HYPHEN-MINUS as you suggest above:

it would be less time consuming to just correct the dashes and minuses.


#24

We kinda do - otherwise, there’s no way to tell whether it’s correct, or just not fixed yet.


#25

This argument (which has come up repeatedly) is based on a double assumption, namely, that people who understand the difference between the various dashes and care about it will use U+2010 HYPHEN (and U+2013 EN DASH and so on), whereas the other will use U+002D HYPHEN-MINUS. However, both parts of the assumption are wrong.

  • There are lots of people who know of and care about the difference between en-dash, hyphen etc., but will still type the hyphen as U+002D HYPHEN-MINUS because they think this should be the preferred code point for the hyphen. (They will use U+2013 EN DASH, U+2014 EM DASH etc. where appropriate, of course.)
  • On the other hand, people who have no clue or don’t bother won’t specifically use U+002D HYPHEN-MINUS. Instead, they will use whatever is most easy for them to type or copy-and-paste. That may or may not be U+002D HYPHEN-MINUS.

Therefore, this idea of using U+002D HYPHEN-MINUS as an indicator of “needs review” and U+2010 HYPHEN, U+2013 EN DASH etc. as an indicator of “has been reviewed” is seriously flawed.

I also don’t understand why we should want to use such flags for the hyphen/dashes issue specifically. After all, we don’t ask people who are bad at spelling to mark the track names they enter with some special character, either. And even if we did, the people who would most need to use such a “needs review” mark wouldn’t be aware of that (Dunning–Kruger effect).


#26

To illustrate my point, here are some examples where “more specific” characters were wrongly used:

Piano Sonata no.1 in E–flat major, op. 1/1, H. 8/1: 1. Allegro moderato (U+2013 EN-DASH)
Save It (8–track demo) (U+2013 EN-DASH)
salva nos−dialogue remix ver. (U+2212 MINUS SIGN)
君を見つめて−The time I’m Seeing You− (U+2212 MINUS SIGN)
Akt 2, Szene 6—7: Was nun! Was nun! [Morone, Bischöfe] (U+2014 EM DASH)
16 Waltzes for Piano Solo, op. 39: no. 1—8 (U+2014 EM DASH)
Kapitel 05: „Action ‒ aus der Ferne“, Teil 1 (U+2012 FIGURE DASH)


#27

Sure. And if you don’t care about HYPHEN/HYPHEN‐MINUS, just move along and continue. Whatever @reosarevok decides on this, it won’t become wrong to use HYPHEN‐MINUS for hyphens, it’ll just potentially get more correct to use HYPHEN. And for anyone who cares, they can look up strings with HYPHEN‐MINUS in it and change it appropriately.

Just as there are currently people changing ʻokina 's into or introducing a lot of other errors. I don’t see how this is specifically relevant to “HYPHEN vs. HYPHEN‐MINUS”.

I don’t see where anyone has said that? There may even be cases of artist intent where a HYPHEN‐MINUS is exactly the character they wanted (e.g., using “Three-One” with ambiguity of the character to incur several meanings).


#28

I’m perfectly happy with a statu-quo (both allowed, one prefered).


#29

Can we at least agree to use the normal dash (U+002D) in catalog numbers?

I’ve only seen a single editor use U+2010 in catalog numbers, which I’d consider 100% incorrect. (I know I’ve seen jesus2099 say the same in the past.)


#30

Yes, catalogue numbers are not words, IMO it’s more simple to use that hyphen-minus we all have on our key pad.


#31

Shouldn’t the hyphen in e.g. “C-sharp” be a non-breaking hyphen (U+2011)?