Correct hyphen: Unicode HYPHEN or HYPHEN-MINUS

Discourse prettifies them, anyway. :sweat_smile:

2 Likes

Note that the en dash is the grammatically correct character for date ranges, a fairly common use case of MusicBrainz:

Band: Greatest Hits 1970–1979

I imagine most editors just use the keyboard hyphen here.

4 Likes

What would be really useful to see is a short note in the Documentation of how this prettification can be removed in Picard for those who are tagging and using the data outside of the database. There is the Convert Unicode Punctuation characters to ASCII option, but it would be good to see a list of what it is actually swapping.

That should help reassure those who think this is madness.:wink:

I was also serious with my question about the API - how does that handle the fact you can have the inconsistencies of the dash? Would a search for “Easy-Star All Star” pick up all variations?

3 Likes

Isn’t the answer to this perennial and somewhat tedious argument to have MB decide.
If MB wants U+2018 or U+2019 to be used and it is given U+2027, let MB switch it.
Think of the thousands of edits you have seen where someone has spent time switching U+2027 to U+2019.
Same with hyphens - depending on context, let MB decide. Then we can all get on with something productive.

1 Like

That’s a nice idea in theory, but depending on the language and situation, ' would have to be replaced by ‘, ’ or even ‚ (which is not a comma :slight_smile:). Hyphen-minusses are even more problematic, even in English. To automatically choose the right -, – or — would be next to impossible.

4 Likes

A big stumbling point is the average user just can’t work out how to enter this stuff. I know when I first asked about it I couldn’t get a clear answer of which ALT+NUMPAD combo to type. It is only now I have some of @Jesus2099 scripts can I finally enter these.

Silly suggestions to make life easier - have a MENU available for noobs to enter these. U+2018 means nothing to the average computer user. Give them the keyboard combos and it is more likely to happen.

And a daft question, but why is it that the actual website does not follow these rules?
image
For a while when I was entering new tracks I spotted that apostrophe on the edit page and was copy\pasting that in… and then later realised it isn’t the same one. (I have to squint really hard to spot the differences so only the “Update the recording title to match the track title” question flagged up I was using the wrong thing.)

3 Likes

Most of the text on this website is very old, before it was decided to put typographically correct punctuation in the guidelines. I guess nobody bothered to update it. For what it’s worth, I used typographically correct punctuation in the Dutch translation, and other translations probably do the same. So that’s one place where the translations are ahead of the English website. :wink:

It would be very useful to have a box on edit pages with clickable punctuation that would insert the correct character at the current cursor location. I thought there was a ticket for it, but I can’t find it.

3 Likes

It is the main issue indeed with those characters.
I use some AutoHotKey macro on my keyboard (permanent, not only for MB).
Recently I also use a @Smeulf written awesome popup mb.unicodechars
user script with which you can find those fancy but useful characters by pressing Ctrl+m from within any MB editing text field.


I realise with shame that I have created this duplicate topic of Correct hyphen: Unicode HYPHEN or HYPHEN-MINUS.
If any @moderators could merge it inti that older topic… :bowing_man:

I think that probably shows you will never get a simple consensus on this issue. I know if I did a survey of my clients I would expect maybe two people out of three hundred would know what I was talking about.

What I like is that MB has this as a recommended but not compulsory action. This allows new users to add new data correctly. And then someone else can prettify if they want to. This should not be used to cause arguments as I have seen people driven away from MB by this. (That included me initially)

It is great when you spend your time soaked in Internet language and terminology. But this is a music database first so accuracy of data should always be first. A lot of people who know their music knowledge won’t understand this stuff.

5 Likes

I so agree, MusicBrainz cant expect most editors to know or care about the difference, and the priority shoud be making it easier to add data to MusicBrainz not harder. As an editor I would just use the characters easily available on the keyboard, but if MusicBrainz want to automatically change it I don’t mind as long as it is done automatically rather than any negative votes on my edits.

Two other things to consider

  1. Do those who insist on using the ‘correct’ character keep to this policy when doing non-mb things such as writing a letter
  2. As part of my Albunack project when trying to match artists to Disocgs artists I simplify the punctuation to find matches, this works well. Whereas insistence on using the correct characters probably prevents various MusicBrainz import scripts working as well as they could.
4 Likes

The search should treat all punctuations as non relevant, as mere separators.
If the search is fooled by HYPHEN, please open a bug ticket, IMO.

3 Likes
  1. I do.
  2. What Jesus said.

Also see my point of automating it being impossible above.

1 Like

It’s not impossible, just tricky. If guidelines can be written clearly enough for an editor to understand them, then the same logic can be applied to the MB code.

If you use Microsoft Word then it automates a lot of this prettification. It will swap hyphens around, but I’ve always just seen that as a long dash. Don’t know if they are following any rules correctly or not. Speech marks also get tarted up. The apostrophe’s change too, but not sure if it is the same ones as on here.

So more people use these than they realise, but that is due to MS Wor’ds own auto-replace.

I wonder if a version of those auto-replace rules, or even the ones in this forum’s source code, could be lifted and used in MB data entry?

It is replacing your typewriter apostrophe using the same U+2019 RIGHT SINGLE QUOTATION MARK, correctly.
However it does not replace your inword U+002D HYPHEN-MINUS by an U+2010 HYPHEN.
And although it adds space before punctuation (French) it adds a regular U+0020 SPACE instead of a U+202F NARROW NO-BREAK SPACE.

1 Like

Thanks for the feedback. The U+2010 HYPHEN doesn’t render properly anywhere for me. For input boxes on edit pages my font is Arial, which doesn’t seem to contain the HYPHEN character. Bitstream Vera Sans is being used everywhere else.

I use Vivaldi and I inspect element and it says:

Verdana—Local file(20 glyphs)
Lucida Sans Unicode—Local file(1 glyph)

So I guess the U+2010 HYPHEN is rendered with that Lucida Sans Unicode on my office computer (Windows 7, latest Vivaldi).
My home PC has been unfortunately unplugged for months already so I don’t know on that one. I think I remember it kind of worked too (Windows XP, old Vivaldi) but not sure.

1 Like

I’d never leave a negative vote if you used the wrong character, as long as it wasn’t undoing the correct one.

8 Likes

Couldn’t you show character differences if the old and new words/phrases collate the same?

Related ticket: