I mentioned this a few years ago in IRC but can’t remember the result, so thought I would bring it up here again.
Musicbrainz seems to use a weird unicode apostrophe example here:
You can see the first 3 releases use a different apostrophe character than the last release.
This may seem insignificant to some people, but it actually means that the search for that album will not be correct and show up no results in some circumstances.
I suspect this may be a problem with the source data, not really musicbrainz…
The “weird Unicode apostrophe” is actually the preferred apostrophe on MusicBrainz. That’s because the ‘’’ is butt-ugly. Any reasonably good search thingy will find both apostrophes when someone searches for either one though.
Thanks guys, now i see its a style guideline I can of course program around it. Unfortunately both mysql and PHP don’t seem to treat this character as the same without special processing(I’m matching the iTunes RSS feed with the MusicBrainz Release-Group name). Again not a problem, I really was interested in why it was sometimes different. Even on the same Release-Group!
For anyone who is interested here is how I solved it in both:
Mysql
UPDATE album SET release-group = REPLACE(release-group,"’","’");
You’ll probably need to run similar updates in mysql for the open and close single quote, and for double-quotes (which will have multiple unicode code points in use).
For editors who want to learn how to enter fancy unicode, awesome. Go for it.
For editors who don’t want to, cool. Do what you can. Another editor will probably come along and change it later.
If you look at a book or at a CD, you will more probably see those curly (normal) apostrophes than the legacy upright straight (ugly) typewriter apostrophes that we were made used to because of computers.
This ’ is not weird, it’s normal.
Yeh only reason I mentioned it as “weird” is that a quick google had showed me this was mostly a problem with people copying and pasting from Microsoft word which also converts the apostrophe to the curly version. I had no idea it was intended Anyway, i’m happy to convert, the consistency is a much better question.
An alternative depending on your needs might be to strip all non-alphanumeric characters except spaces (and standardise those as single) from both sides for a simplified search
Well, U+2019 being the correct (for most cases) apostrophe, is the reason why MSWord does this kind of artificial „intelligence“ in the first place.
It’s especially irritating when applied to technical documentation, and the combination of stupid software and inattentive users turn a commandline like ls -l --si 'Tangerine Dream'/*
into ls -l –si ‘Tangerine Dream’/*
which will not work at all.
And even humans can make the wrong ' translation. E.g., just the other day I did Edit #42712692 - MusicBrainz – notice the ’s? Yeah. Those are not appropriate here! Such a git, that editor. Luckily a smart git that sometimes realises his mistakes: https://musicbrainz.org/edit/42767913
Why is there nothing in the style guide about using these “correct” apostrophe’s?
It would help if there was a page in there explaining why these curly apostrophe’s are in use, and how to type one in from the keyboard. I laughed when I saw that this this thread says it is for “cosmetic” reasons and that Microsoft Word was being taken as a standard here. (Must be a first - lolz)
I am using a standard Windows PC with a UK keyboard and have no idea how to do a ’ so all I can do is copy and paste. When I got picked up on it elsewhere I was told to Use a Unicode Apostrophe (ALT)+039 to give ’ which is what I was already doing.
So please, can someone write a definitive page in the Style Guide explaining this?
Currently if you look at the English Style Guide page even the style guide itself is written with the standard apostrophe’s using a normal ’ from the keyboard.
I am trying to analyse the differences in a hex editor, and my keyboard puts out the standard apostrophe same as (ALT039) ’ (Hex 0x27) whereas the tilted over ’ is hex 0x92.
Look down that list as you see at least three different apostrophe’s in use. Yet the TITLE as show on the style guide is using (ALT)039 ’ and not that odd tilted over apostrophe ’ you have in your example.
I am confused. (Not arguing as I want to get this right, but I also don’t want to be dragged into a weird *nix vs windoze argument as I live in both of those camps. I also know my music titles are going to get read back in a number of different fonts and I don’t want things getting too weird :D)
Help us get a definitive answer here
Edit Note: Oh great… I have just realised that my carefully typed out text has been trashed and the standard apostrophe’s been replaced with curly ones meaning the above is now not as clear as it should have been…
But generally: you only need to worry about these if you want to. Otherwise, just don’t change anything that is already there and might be typographically correct, and if someone complains because you’re using the basic ASCII punctuation when adding stuff, just remind them the guideline very specifically says “usage is allowed”.