Unicode apostrophe standardization

Tags: #<Tag:0x00007f342413d5f8>


Hi @reosarevok - it is good to get an official reply. I am British English and use a British English keyboard. And going by that link to the miscellaneous page I read that as “use normal apostrophes” as they are typographically correct.

Use of basic ASCII punctuation characters such as ’ and " is allowed, but typographically-correct punctuation is preferred

It is also easier than having to load up Character Map and try and find the specific silly curly thing as used in Word. :smiley:

I’m just going to therefore stick with what I know. Especially as this keeps standard searches working okay. It also seem seems to be more logical with the way the rest of the rules of data entry work.

It certainly made me laugh when the guy who told me to “read the style guide” and all I could find is that above example where it was clearly using a standard apostrophe from the keyboard. I’ll ignore this next time and just aim to keep things consistent and aim to how the artist intended it to be seen.


You can see this standard more easily by opening any printed book around you. :stuck_out_tongue_winking_eye:


But surely this is a database and not a book?

I just want to get things right. Which is why I turned to the forum instead of the single person who complained.
So far I have had one reply from an official staff member, and that is the guidance I will stick with. Especially as no one has yet come up with a simple way of entering that curly thing. Even the guy who “corrected” my database entries pointed to a standard apostrophe when I asked him how to enter that thing.


I suspect most artists have zero intention of it using one character or the other. They just don’t even know about typography (let’s be fair, many don’t seem to know about capitalization rules or even simple grammar) so there’s no decision there :slight_smile:


This is getting messy. I have just spotted somewhere else this has been done, and ended up with a non-printable characters instead.

I am using EAC to rip my music, linked to MusicBrainz to fill in initial details. Set file names, folder names.

I then pass the files over to MusicBrainz Picard to do the tagging.

I have now spotted that “Easy Star All-Stars” has an unprintable character instead of the dash when displayed in certain fonts.

I did wonder why I was looking at my folders yesterday and saw what looked like two Easy Star All-Stars folders side by side. Now I realise it was someone playing with the different dashes. (A difference that is almost invisible to the eye)

Try using standard fonts like courier in Notepad++ and these typographical tweaks are not able to be displayed.

Try making up a playlist - how does one type these special characters? (I notice no one has come up with an answer on that simple part of the puzzle?)

I really don’t understand why cosmetic stuff like this is happening to common data in a database. Surely if someone wants to prettify their own version then they can adjust on their output. I don’t see the sense in doing this on data that gets used in so many other places like Media Players.

I don’t want to be a grumpy old git. I am just confused as I thought MusicBrainz was a music database for common world wide use in many projects. Now I find I am putting weird hidden characters into my files making them less usable.

So I need to find a solution for me to get round this issue for my case in tagging music files, writing music tags, manually writing play lists, using the data in other media players like KODI. Using punctuation I can find on a keyboard, whilst still keeping all the standard European \ Asian characters in the text that can be displayed in standard fonts.

I don’t want to tick the “swap unicode to ASCII” options in Picard as I want to keep my Japanese text, etc.

Is this an issue I need to take over to the Picard devs? See if an addon can be made to re-standardise this stuff? Something that can fix punctuation to a standard whilst leaving the Unicode in place for non-ASCII characters?


Talking to myself now as I realise this is my problem that I need to fix for my own usage.

I need to take my questions over to the Picard threads I guess. I’m looking through the settings and see “Convert Unicode Punctuation characters to ASCII” is a ticked setting in my Copy of Picard. Now it implies to me that that tick would do exactly what I am asking about here… but I am still ending up with these odd characters in my filenames and tags.

Ah - hang on. Now I am starting to work out where the mess is in my files. It is EAC putting these bits in initially. Now I am getting aware of these things, I may be able to find a fix at my end… time to work on EAC and Picard settings I think.


It is trivial to turn typographically correct characters into their ASCII equivalents, but the other way around is in some cases impossible (the software would have to choose between “ and ” for example). That is why we prefer to store the former. You could take a look at the plugin Non-ASCII Equivalents (which is similar to the built-in option) and try to remove bits of code you don’t want (like Japanese to Latin). You can download that plugin here: https://picard.musicbrainz.org/plugins/


@mfmeulenbelt I am trying to make sense of the Picard plugins and options, but there is so little written down about them.

I have seen options for “Convert Unicode Punctuation characters to ASCII” in the options that seems the best fit - but no documention on what it swaps.

I see the addon for the NON-ASCII Equivalents - but again find no details on what it swaps. (And I don’t really want to have to start editing source code unless I really really have to)

I am pretty sure it is just that first one I need to get right. I want to see my multilingual characters as I have Turkish and Japanese artists in my collection. My modern software in the modern media centre happily displays those okay.

It is just when getting to the level of filenames it gets awkward for me.

I have already realised that part of my problem is also coming from EAC when ripping. Because it now points at Musicbrainz instead of the FreeDB I am getting these oddities in my filenames, which is a bigger headache when mixed in with previous ripped folders and files.

TBH - the main scream from my has been because I have only spotted this now after re-ripping 350+ disks in the past few months. It now means I need to go back through everything and normalise things for my system. HAHAHA - this really is a never ending mission. This was the fourth time ripping my music collection… at least cleaning the tags with Picard should be quicker this time.

In the New Year I’ll make sure I have a clearer picture of this mess that has now happened in my files. And then write it up in some way for other people who will also come across it.


I would suggest you try a few different options with just one troublesome release and see if it comes out right. The source code for that plugin isn’t very complex. I think it will speak for itself if you open the .py file in notepad (you won’t need any programming knowledge).


I’ll take a calm clean look at this all again in the new year. I know I have specific requirements that don’t fit other people’s needs. I only need the punctuation cleaned up - mainly because I can’t SEE the difference between one hyphen dash and another, but my computer file system can see it. Leading to confuddlement.

I will look at the source code deeper - not a problem with the comprehension of it as I have written Python addons for elsewhere. (C \ C++ background). The initial look into that plugin shows it is far too wide for what I want. It is removing far too many characters for me. I still want to see Ayşedeniz Gökçin and Björk but I also want to know that Easy Star All-Stars and Ed Alleyne-Johnson are not getting confused with Easy Star All‐Stars and Ed Alleyne‐Johnson!

I think my bigger issue is going to be with EAC ripping using MusicBrainz metadata as I need to manually add a list of substitutions into that program now to catch these dashes and oddities that a small number of people are entering into the database, even though the official line is clearly that they are optional.


They are optional, but correct punctuation etc is preferred. That small number of people is improving the database.
That it makes tagging life difficult for you is unfortunate, but not a good reason to water down how we store data (MB aims to be a database first, a resource for Picard to use/for you to tag your files second).

Although I’m not sure what the exact issues are? I haven’t really heard of KODI or playlists having trouble reading or displaying MB tracks.


I can see this would be confuddling. Unfortunately the two hyphens used on Musicbrainz (Unicode HYPHEN and HYPHEN-MINUS) are supposed to look identical :man_facepalming: