Correct hyphen: Unicode HYPHEN or HYPHEN-MINUS

I don’t see any advantage in converting U+002D HYPHEN-MINUS to U+2010 HYPHEN, either manually or programmatically. As far as I’m aware, almost all fonts will have the HYPHEN-MINUS glyph produce a typographically correct hyphen.
I think @kepstin’s comment on STYLE-721 is correct:

If you’re seeing the HYPHEN-MINUS and HYPHEN with different appearances, the most likely cause is that the font you’re using has a HYPHEN-MINUS but doesn’t have a HYPHEN, so the HYPHEN is being pulled from a different, fallback font.
Based on my experiences, and general usage in digital text, my recommendation would be to prefer the HYPHEN-MINUS character for normal hyphens, but recommend the correct unicode character for minus signs and other types of dashes.

The advantage would be that any text processing happening on the data would know specifically whether it’s a hyphen, a minus, or some kind of dash—in line with the “be more specific than generic if possible” style principle (which is only really verbalised in Prefer Specific Relationship Types at the moment).

2 Likes

Yes, it’s advantageous to differentiate between hyphens, dashes, and minuses; but we don’t need to use U+2010 HYPHEN to achieve this. Rather than correcting every instance of HYPHEN-MINUS as you suggest above:

it would be less time consuming to just correct the dashes and minuses.

2 Likes

We kinda do - otherwise, there’s no way to tell whether it’s correct, or just not fixed yet.

3 Likes

This argument (which has come up repeatedly) is based on a double assumption, namely, that people who understand the difference between the various dashes and care about it will use U+2010 HYPHEN (and U+2013 EN DASH and so on), whereas the other will use U+002D HYPHEN-MINUS. However, both parts of the assumption are wrong.

  • There are lots of people who know of and care about the difference between en-dash, hyphen etc., but will still type the hyphen as U+002D HYPHEN-MINUS because they think this should be the preferred code point for the hyphen. (They will use U+2013 EN DASH, U+2014 EM DASH etc. where appropriate, of course.)
  • On the other hand, people who have no clue or don’t bother won’t specifically use U+002D HYPHEN-MINUS. Instead, they will use whatever is most easy for them to type or copy-and-paste. That may or may not be U+002D HYPHEN-MINUS.

Therefore, this idea of using U+002D HYPHEN-MINUS as an indicator of “needs review” and U+2010 HYPHEN, U+2013 EN DASH etc. as an indicator of “has been reviewed” is seriously flawed.

I also don’t understand why we should want to use such flags for the hyphen/dashes issue specifically. After all, we don’t ask people who are bad at spelling to mark the track names they enter with some special character, either. And even if we did, the people who would most need to use such a “needs review” mark wouldn’t be aware of that (Dunning–Kruger effect).

To illustrate my point, here are some examples where “more specific” characters were wrongly used:

Piano Sonata no.1 in E–flat major, op. 1/1, H. 8/1: 1. Allegro moderato (U+2013 EN-DASH)
Save It (8–track demo) (U+2013 EN-DASH)
salva nos−dialogue remix ver. (U+2212 MINUS SIGN)
君を見つめて−The time I’m Seeing You− (U+2212 MINUS SIGN)
Akt 2, Szene 6—7: Was nun! Was nun! [Morone, Bischöfe] (U+2014 EM DASH)
16 Waltzes for Piano Solo, op. 39: no. 1—8 (U+2014 EM DASH)
Kapitel 05: „Action ‒ aus der Ferne“, Teil 1 (U+2012 FIGURE DASH)

1 Like

Sure. And if you don’t care about HYPHEN/HYPHEN‐MINUS, just move along and continue. Whatever @reosarevok decides on this, it won’t become wrong to use HYPHEN‐MINUS for hyphens, it’ll just potentially get more correct to use HYPHEN. And for anyone who cares, they can look up strings with HYPHEN‐MINUS in it and change it appropriately.

Just as there are currently people changing ʻokina 's into or introducing a lot of other errors. I don’t see how this is specifically relevant to “HYPHEN vs. HYPHEN‐MINUS”.

I don’t see where anyone has said that? There may even be cases of artist intent where a HYPHEN‐MINUS is exactly the character they wanted (e.g., using “Three-One” with ambiguity of the character to incur several meanings).

I’m perfectly happy with a statu-quo (both allowed, one prefered).

7 Likes

Can we at least agree to use the normal dash (U+002D) in catalog numbers?

I’ve only seen a single editor use U+2010 in catalog numbers, which I’d consider 100% incorrect. (I know I’ve seen jesus2099 say the same in the past.)

2 Likes

Yes, catalogue numbers are not words, IMO it’s more simple to use that hyphen-minus we all have on our key pad.

1 Like

Shouldn’t the hyphen in e.g. “C-sharp” be a non-breaking hyphen (U+2011)?

I want to open discussion here about the use of U+2010 HYPHEN in our titles, artist names, etc.
Instead of the easier to find on keyboard but legacy ambiguous U+002D HYPHEN-MINUS.

I’ve been editing using HYPHENs for quite a long time already.
It is allowed by Style/Miscellaneous together with U+2019 RIGHT SINGLE QUOTATION MARK (unicode notes: this is the preferred character to use for apostrophe) instead of typewriter straight up U+0027 APOSTROPHE (unicode says: neutral (vertical) glyph with mixed usage / U+2019 is preferred for apostrophe / preferred characters in English for paired quotation marks are U+2018 & U+2019).

While I’m fully convinced for curly apostrophes and quotes, etc.
I am also convinced myself that HYPHEN and MINUS SIGN are more precise characters by definition than HYPHEN-MINUS.

But I feel less confortable to impose it to everyone with my edits when the question occurs.
Like here in https://musicbrainz.org/edit/59022786 where it is pointed out that it does not render properly on some computer.

1 Like

I will let the “grownups” decide what is the proper character to use.
But I will always be using - and ’

In fact, I (and many others) will be limited to using `~!@#$%^&*()-_=+[{]}|;:’",<.>/?
because that is all the keyboard offers. I know that there are “combinations” I can use to make other things appear. I just won’t take the time to learn.

So, keep that in mind when having the discussion. Your ‘average’ person behind the keyboard is only going to know the keyboard characters.

4 Likes

And don’t forget the different meanings in different languages and different usages.
Just two examples:
English Hyphen-minus
German Bindestrich-Minus

Personally I don’t mind either way. If MB wants to use prettified characters that are hard for many to understand - then that is MB’s choice.

You asked for opinion - so here are my thoughts. I personally think the hyphen is a step too far down the pendent road. Yeah, technically correct, but awkward to work with outside of the MB screen.

It is difficult for me to even know when to use it. I haven’t done English Grammar since the 1980s so avoid trying to work out what is needed. Thanks to scripts I can put in the apostrophe’s and speech marks, but beyond that I start getting confused at the rules.

I don’t find anything “ambiguous” about the hyphen on my keyboard. I know it can be used in a multitude of places. What I find ambiguous are these new rules of where to use all the different types of dashs\hyphens\etc. I come here for music, not English lessons. :smiley:

The important thing for me is both my CD ripper EAC and tagger Picard can strip it out and swap it back to a keyboard based dash\hyphen.

These can be very annoying in filenames. Nothing more confusing that two folders side by side with different hyphens. Especially as they are hard to tell apart by eye.

It also messes up searches on anywhere that is not MB as not all search tools know to treat them the same in a search. I’m also back to not being able to type these things when I don’t have the Magic Scripts.

It is a headache when it starts to appear in my media library. I then have to find a way to treat it in there. Not all fonts have the ability to show Unicode characters so splodges and squares appear where there should be a neat little dash.

(It does make me laugh that this forum makes it impossible to really talk about ‐ or - as it sets all dashes to the same Unicode item. Or does it? Looks like it leaves those alone. I can’t tell - they confuse me)

So - yeah - I don’t like 'em in my filenames. But that is easily solved, so not really a problem to me.

But do remember that as @justcheckingitout points out, the average person has never heard of these. Or would even understand them. So please don’t be surprised when normal people like @Kid_Devine try and “correct” these oddities so they can use the data in their own applications. I know it has also popped up as a problem on the KODI media centre when Olivia Newton‐John was causing some confusion due to the unicode hyphen-dash.

Are they also fully documented to people who are using the MB API? Do they know that if they type Olivia Newton-John then they will not find her in a search?

4 Likes

IMO that should be fixed on that computer — it’s not properly the job of Musicbrainz to worry about what characters every font on every computer in the world may or may not have.

6 Likes

It’s not that difficult: :wink:

  • HYPHEN (trait d’union in French) is for linking words
  • MINUS is for mathematical operations
  • EN and EM DASH, they are contrary of linking words they are, like brackets, separators (EN is smaller than EM)
3 Likes

Is the U+2010 HYPHEN displaying correctly for you, if so, I wonder what font your browser is using? If it helps, I use the Font Finder extension to check what fonts are actually displayed. I think it’s available on most browsers.

1 Like

I think unicode hyphen should be used as much as possible, because it is more typographically correct. But there’s no way to enforce it (and that’s a good thing). (Loook i didn’t use ’ but the good’ol ').

7 Likes

I can’t speak for @jesus2099 but on mine (Chrome Windows 7 64-bit) it renders using Verdana, according to the Chrome “Inspect element” feature.

Update: I installed Bitstream Vera Sans, and Inspect suggests that the artist name element in the edit page still renders properly with Bitstream Vera Sans including the hyphen character U+2010, although on the actual artist page the name is rendered with Bitstream Vera Sans for 20 characters and Lucida Sans Unicode for one character (presumably the hyphen)

1 Like