Certain characters improperly showing up in Picard

Hey all, I’ve just recently been noticing an issue with Picard that’s bothersome to me and makes me wonder if I’ve been getting undetected errors in my tags. What happens is, certain characters, though I don’t have a definitive list of which, do not appear correctly in Picard. Two examples that immediately come to mind:

  • The song “Très très « rétro »” by Fly Pan Am, which uses the French guillemets, (i.e. « & »), and which is tagged properly in MB, appears in Picard as the ungodly “Très très << rétro >>”.

  • The artist Atom™, comes up in Picard as AtomTM. The same applies for any other instance of the trademark symbol.

These are clearly incorrect, and I’m not sure what the problem is because I swear special characters such as these have formatted properly in the past. Using the “tagger” link in a browser window and searching directly within Picard both yield the same issue. I don’t recall changing any settings that would cause such a problem. Weirdly, it seems to be only certain characters—some of the more wild Unicode characters have shown up fine.

For clarity, I’m on Picard 2.6.4, it was also happening on 2.6.3, and I’m on Windows 11. Anyone have any insight on what might be causing this or how I could fix it?

Have you ticked “Metadata \ Convert Unicode Punctuation to ASCII” to get rid of the weird apostrophe’s and dashes? It also takes out characters like this as it is pretty strict. It should leave «» as they are in the extended ASCII table along with the accents - but I just hink it is over strict as it also removes 1/2 and 1/4.

4 Likes

Ah thank you! I’m not sure how I totally skimmed past that setting, it seems to perfectly fix the issues!

3 Likes

I do love when browsing the forum you find a solution to a problem you didn’t know you had … I no longer need to force (in scripting and for example) Tea for the Tillerman² to appear correctly!

5 Likes

This option is set to True by default (here)

But perhaps it is now time to switch it to False by default, as Unicode is much more widely supported now.

@outsidecontext what do you think?

5 Likes

If you switch to false by default, please don’t change current settings.

The main reason I have this enabled and manually turn it off for a few releases are due to hyphens\dashes\apostrophe’s\speechmarks and File Names. Especially the dashes. It is hard to visually see the difference in a filename, but you can have one artist folder named with a dash and another with a hyphen and it is impossible to see the difference in a file manager and can lead to much scratching of heads as to why two folders appear side by side.

I keep promising myself to go write an script to deal with these and maybe I’ll finally go do that now. :slight_smile:

1 Like

Yes, agreed. I am actually surprised that this is indeed the default, I have it always disabled for me. Let’s turn that around.

No worries, we never change existing settings on update :slight_smile:

4 Likes

Maybe a tweak is needed in the help file. It would be great to see the ¼ and ½ and ™ appearing but please make people aware of the “this is why you can get two identical looking folders appear side by side”. Hyphens\Dashes can be fun to spot in file names. It especially can happen when you start renaming and cleaning up tags and half of your files have old ASCII dashes, and then Picard starts changing to Unicode hyphens.

Edit: Just looked at the help file and it is pretty misleading in what this option does. Needs to be clearer as to what is stripped out. Now the default state is to be flipped this little bit needs re-writing.

Now off to write my Have completed hacking my first Picard Plugin… Now I can flick this switch in my own Picard. Thanks for the push.

Actually I think we should have an option to not convert characters that can be represented in a specific character encoding. If it is not UTF-8 or some other Unicode encoding most often you deal with either ISO-8859-1 or the Windows-1252 encoding (which are mostly identical). When I did the RIFF tags implementation for WAVE, which have the best compatibility with other software when written as Windows-1252, I had the same issue: For better compatibility I wanted to use the convert to ASCII functionality, because it would nicely deal with a lot of special characters. But it would also convert a lot of characters that are totally fine in Windows-1252, so I didn’t do that.

It’s unfortunately rather tricky to change these conversion functions, especially because they actually can have quite some negative impact on performance as we had learned in the past.

2 Likes

I created PICARD-2306 for default value change

3 Likes

Suggestions for improvement are always welcome. Please see the information about contributing for details.

2 Likes

Isn’t there a separate option to replace such characters when naming files?

(Quite understand the two folders which are almost, but not quite the same frustration!)

There are some plugins that partially do this, but don’t get all the different hyphen-dashes, “quotes” and apostrophe’s that the Unicode Pixies put into our data. So I hacked my own solution together. Easy enough.

1 Like

I was referring to (under File Naming) ticking both “Replace non-ASCII characters.

(I’m pretty sure you know Picard better than me so there’s presumably some limitation I’m unaware of)

I don’t think that flag is much different to the metadata section, help file talks of stripping accents off of characters even though those accents are in the extended ASCII table. Just like « and ».

Options like that are too brutal for me. I guess it is only using the basic 108 printable characters of the old ASCII table from 1986. I like my Japanese titles, ™ in the names, and spelling Motörhead correctly. Don’t really understand why the need to be removed from a filename unless using a very old MP3 player.

My plugin has been written to cover both TAGs and Filenames. It is only the unicode characters that are unique but not visually different are a trouble for me.

1 Like