Picard ignoring mp3s with special characters in filenames

Just recently started using Picard to sort through my iTunes library as I escape its evil clutches.

About 95% of the library was read and clustered/saved just fine, however I’ve noticed that most of the items left behind were because of “special” characters in the filenames, like Korean for example.

I’ve tried a daily release of 1.4 and it has the same behavior. Is there a workaround for this that is commonly used or am I stuck manually editing 10 gigs of mp3 tags and renaming until Picard will automatically handle things?

This is under MacOS.

-Tom

You may enable debug mode (using -d option on the command line or from Help > Debug/Errog Log), it will give more information on what is going on.

Ran it from command line with -d. Tried to drop in my Hüsker Dü folder. Following errrors logged :
May 5 20:00:32 nyx.local MusicBrainz Picard[3128] : 2016-05-05 20:00:32.015 MusicBrainz Picard[3128:532215] No valid sandbox extension for item: [789514] of flavor: [public.file-url] was created.
May 5 20:00:32 nyx.local MusicBrainz Picard[3128] : 2016-05-05 20:00:32.015 MusicBrainz Picard[3128:532215] Failed to get a sandbox extensions for itemIdentifier (789514). The data for the sandbox extension was NULL
May 5 20:00:32 nyx.local MusicBrainz Picard[3128] : 2016-05-05 20:00:32.033 MusicBrainz Picard[3128:532215] CFDataRef _CopyConvertedFileReferenceURLDataOrNil(CFDataRef) : Cannot convert file reference URL: file:///.file/id=7356914.424/ to file path URL due to error: The file aHuIsker DuIa couldnat be opened because there is no such file…

It really doesn’t look like Picard likes heavy metal umlauts… ( Yeah, I know Hüsker Dü isn’t metal )

-Tom

I’ve found that Picard rejects files with special characters when sent by command line (they show up red and can’t be used), but does okay if they are drag & dropped

I’ve been dragging on dropping.

This all did get me thinking about something so I tried renaming the Hüsker Dü subdir to just Hoo. After that I was able to drag and drop it just fine, and after clustering Picard saved the new folder out properly named however it was differently named.

Old name : Hu\314\210sker\ Du\314\210/

New name : H\303\274sker D\303\274/

Visually in the gui they look the same, but the first one throws the Doesn’t Exist error and the second one works fine.

Both of those names are gotten by using tab completion. I have a feeling this is some weird character set stuff going on. I have a workaround at least. Its an interesting error… I could tar something up if someone else wants a demonstration.

FWIW, the old name contains “u” plus “combining diaresis” (the base character and the two dots above are split into two Unicode codepoints), while the new name has the “ü” as one, pre-composed codepoint.

Perhaps Python is trying some magic on the first form that fizzles?

I don’t know those stuff that much but, it seems at least Python does know and manage with the 4 Unicode normalisation forms.
@evilxyzzy, could you say what OS and versions you are using?
And are your files on a local or network HDD?

OS is Yosemite.

Files are stored on an SMB share.

I just tried operating on a locally stored copy of the Hüsker Dü folder as a test, and it worked. I was able to drop the folder into Picard and it recognized.

Ah-Ha! I’m making some progress understanding whats going on.

OSX is storing filenames with the normal character followed by a diacritical mark, Picard doesn’t seem to like this for some reason. Picard does like precomposed characters. Both are valid UTF-8.

I think my simple solution is to write a script to beat the unicode into submission, but its still interesting that Picard doesn’t seem to handle it when its on an SMB share vs local disk.

Just as a side note, pre composed characters might be the most used ones in MB data. For instance, I try to consistently use them for VN music.
IME do generate pre composed characters and so are FR keyboards, etc.
All in all, IMO, pre composed characters are little preferable.

But Picard should be able to read any forms, indeed, whatever we are writing in the end. Maybe, as it cannot read decomposed form (at least in SMB over network), probably it cannot write them either, if you happen to stumble upon some MB data that uses this form.

Interesting indeed. It looks like a SMB issue to me, but feel free to report the issue at tickets.musicbrainz.org

Both my links in Picard ignoring mp3s with special characters in filenames are the same, by mistake.
One link has been lost and IIRC it did mention some issues with SMB… :disappointed:

The workaround I came up with for this is fairly easy, adding it here for completeness in case someone else has the same issue.

It looks like OSX likes to store files in with decomposed UTF. If you have the files local and drag them into Picard, its fine. If you drag the decomposed files off to a share things get weird.

One option is to use the iconv option to rsync when doing the copy. As this is a known OSX issue you can do the filename fix during the copy.

If you are like me and tired of rsyncing hundreds of gigs of MP3s, I found a utility called convmv ( https://www.j3e.de/linux/convmv/ ) that recursively fixes directory and file names.

you just run it like this : convmv -f UTF-8 -t UTF-8 --nfc --notest TARGETDIR

If you leave off “-notest” it will do a dummy run and just show you the changes it plans to make. The key here is we are converting UTF-8 to UTF-8, but saving using NFC ( the --nfc option of course ) which saves the filenames composed.

At this point I’m chalking this up to Quirk and not Bug I think. Should probably be a FAQ entry somewhere and not a code change to Picard since its not really Picard’s fault.

5 Likes

Can you try loading the music from a local copy and see if the same issue persists? I think OS X may be translating the Unicode format from the SMB share and that could be causing an issue.