Invalid characters in the recording title


#1

I’m having a problem with a Title tag in this recording. The title is “Južanský typ”, but Picard places two additional bytes (0xC2 and 0x9E) between the “u” and the “ž” characters. The two bytes are Unicode Character ‘PRIVACY MESSAGE’ (U+009E) in the UTF-8 encoding. I’ve tried fetching the recording’s XML and it seems to be fine, so I guess this is a Picard bug…


#2

I can’t reproduce this, but can you give more details? Where exactly does it add these bytes, in the file names or the tags? In case of tags, what file format is it?

What plugins and scripts do you have enabled?


#3

I’m on Linux, Picard 2.1.2, Python 3.7.2. No plugins, no scripts.
The bytes are both in the tags and in the file name. The file format is MP3, I’ve also tried FLAC and it is the same. I’ve uploaded the output of mutagen-inspect here, can’t paste it here directly because the nonprintable bytes would vanish, but they are present in the file.
In the attached picture you can see the problem is the second “ž” character with a different font. It is not supposed to be there.


#4

It’s not Picard inserting 0xc3 0x9e, they existed in the database and were returned by the web service. I have removed them in https://musicbrainz.org/edit/59857500.


#5

This explain while in between me testing this with the entire release in Picard and then trying to fix it in the database I suddenly could no longer reproduce it :slight_smile:

But yes, the issue existed in the track listing. With the entire album I could reproduce it, previously I had only tested with the recording


#6

Thank you both, that solved the problem. I don’t know how did it get there, but I hope there are some checks implemented over what Unicode characters can and can’t be inserted into the database. :slightly_smiling_face:


#7

It’s rather odd that nikki_bot fixed the issue on the recording, but not on the medium. I wonder how many similar cases remained in the database…