Picard corrupting audio on move/save

Hi,

a while ago I posted a musing about Picard possibly corrupt audio files during saving.

At the time I had doubts about things like WiFi and AFP (Appletalk) even though I couldn’t rationalise why either of those would contribute to bad files (TCP is reliable and unless an unrecoverable error is encountered and the application notified, what is sent is what is received).

Anyway I since moved over to using SMB shares and recording the CRC32s of files before and after Picard has saved them. That’s the CRC32 of the PCM audio, so it should always be the same after save.

I’ve saved a ton of files recently with 100% success and had assumed the issue had gone. I just saved a ripped CD on my Mac over Gigabit Ethernet to a SMB share it one of the tracks failed its CRC check. In fact the corrupted file was 8MB smaller and the end of the file was just silence. I ran the good source file through Picard a second time and it saved correctly.

I understand this is really difficult to troubleshoot, so I guess this post is more of a Public Service Announcement: alway back up your files before giving them to Picard and compare the PCM of the files afterward.

Can you detail how you proceed to calculate CRC of PCM audio?

Picard is using Mutagen to read/write audio files, which itself relies on underlying file system.
Depending on file format and metadata, files may be totally rewritten, and currently this operation can fail (but in this case you should have errors shown in logs).

There’s a pull request draft to improve the reliability but it isn’t mature enough to be merged, plus it has a huge perfomance impact (basically, it does the common approach to write to a temporary file then rename it to final destination) (see https://github.com/metabrainz/picard/pull/1545).

Do you get any error (in Picard logs and/or system logs) when corruption happens?
Is Picard the only application that read/write concerned files?
What’s the format of audio files?

2 Likes

ffmpeg -loglevel error -nostdin -i "file.flac" -vn -acodec pcm_s16le -f s16le - | rhash -p '%C' -C -

I’ve run this over thousands of files. For CDs I’ve ripped I check each saved file’s CRC against the XLD log file (as recorded in the CRC32 field - same goes for EAC logs). For downloaded audio I append the CRC to a text file before I save with Picard, then re-calculate the CRC and check it against the text file.

No error; a green tick in the right pane of Picard. I haven’t checked Picard’s logs. Unfortunately I’ve restarted Picard since the corruption a few hours ago (because of another bug related to mouse/trackpad interaction - I can’t right-click with the trackpad after I turn off my Magic Mouse).
How do I turn on debugging? Currently Help->View Error/Debug Log brings up an empty window.

I’m not sure what you mean? Is Picard the only application corrupting files? Yes, I think so.

I mostly only write FLACs (including DoP in FLAC). I might save the occasional MP3 but that’s rare. I’d say I’ve only seen corruption in FLACs so far.

One good thing is I can re-enable AFP on my NAS since it’s not the culprit :slight_smile:

As everything in the best-effort world, “reliable” doesn’t necessarily means what you think it means.
The checksum is super weak and can let a bunch of dumb errors pass. That is why the application should do its own thing to catch and fix the errors (in case of network protocols, it is their responsibility). That doesn’t necessarily means the errors you’re seeing are due to network protocols.

e.g. The payloads 0x00F0 0x000F and 0x000F 0x00F0 produce the same checksum 0xFF00.
That literally means part of your payload could be swapped and TCP would say your message is perfectly fine.

Sound like a reasonable strategy. I originally only intended to check the header payload, assuming mutagen would preserve the data itself. Not sure if shipping an embedded FFmpeg build would be the best solution though.

1 Like

Agreed, but software bugs aside - that altered segment would also need to pass 802.3/802.11 and IP error detecting too. Still not a zero chance, however. My experience has eliminated layers 1/2 and layer 5. Which is why you’re right in saying Picard should do its own thing.

I only use CRC32 as I was using the CD ripper’s log files to validate against. If Picard has access to the audio both before and after move then a better (faster?) checksum may suit. I’m fortunate in that I run my post-move CRC directly on my NAS, which is quick. If someone were using a NAS over a poor WiFi MCS index and validating huge audio files it will be a slow process to re-read the moved file.

That said, the last mangling under Picard truncated the file so the sizes were dramatically different.

What would be interesting would be to have some users volunteer to turn the knob on to validate the PCM and send the telemetry back to a server and then correlate the findings. It may pinpoint where the issue is.

Certainly would.

I am/was looking into memory backed file descriptors and how to pass them to mutagen in order to do the changes in memory. If we can keep everything in memory, we could also check for mutagen errors before writing anything (e.g. by PCM check-summing). If everything pass the checks, we can then write the final version.

The final version would ideally be written to a temporary file, which would get checked again to make sure it matches with the memory backed file. Finally, the original file would be removed and the temporary file renamed and moved (copying relevant metadata, permissions, etc).

There’s no way to make it much faster than that without giving up some slack to errors.

Planning to continue working on that when I get more free time.

From my personal experience, mutagen definitely caused some of the issues, but my focus is on Picard. Still working on fixing stuff I broke and skipping the saving of unchanged files (which are way more complicated than I originally thought due to design choices and weird metadata fields).

Is that possible? Pink Floyd’s “Echoes” from a vinyl rip at 640Hz/24bit weighs in at just over 4 gigabytes (FLAC) with the entire album at over 8G.

That’s why there is an if. :slight_smile:

3 Likes