File renaming question: large number of files

wizzlepig · March 15, 2019, 9:29pm

So, I am dealing with mp3 files recovered from a mistakenly formatted hard drive- about 30,000 of them. They’re all named “RecoveredFile[number].mp3”, and are all in one big folder (facepalm).

I did a test run initially with a few files to see if I could get Picard to fix them up for me, and it looked very good, so I set it loose on all 30k. It was going very well! About… 12 hours later, it was nearly done, and I went to bed. It seems that it got through all 30k and then crashed, losing all the info it had compiled on all the tracks.

Is there any kind of, say, ‘background processing’ which would just do something like:

here is a folder full of files I would like fixed-

find this file? Yes.
OK, repair and save this file.
[repeat]

Granted, I was using an old version of Picard, 1.3.2, so I will definitely update before I try again, but I wanted to make sure I was doing it the right way. I assume I could grab smaller bunches of files, but with 30,000 to process, well, I’d rather not if I don’t have to.

Thank you in advance for any help!

Llama_lover · March 15, 2019, 11:26pm

Wecome @wizzlepig!

You are not a stranger with trying to process HUGE amounts of new files and having a crash.
Explanations? Picard does struggle with mass quantities of files. As a rule, I suggest taking a little extra time to submit just a couple of hundred and see what happens. Don’t forget to save the results. Then try a thousand (give or take) then no more than 4 or 5 thousand at a time. Once the files have been initially processed they will go pretty fast if you run them again.

Picard does its best to correctly match up what you throw at it. Keep in mind some of your files may not be in the data base and there could be a few multiple matches that you may have to choose from. Also when processing large batches Picard may appear to have stopped when it may just be in “deep thought”.

Others in the forum can give a little more depth in answering your question but give the above a try.

wizzlepig · March 16, 2019, 1:48am

OK! That sounds like a good plan, thank you for helping, I will give it a go.

IvanDobsky · March 16, 2019, 2:04pm

It is also worth tweaking the OPTIONS \ METADATA \ PREFERRED RELEASES

Especially the “compilation” option.

Do you know much about the music that was on the hard drive? Was it well organised albums from single artists? Or lots of compilation albums?

As @Llama_lover points out, the nature of the AcoustID matching means a popular track will match to many different compilation disks as well as the original artist’s albums. So a match will likely name the track and artist correctly, but may pop it onto a differently named album to what was expected.

Picard and MB is certainly a saviour for a hard disk failure.

And another speed tip for the first time you are running through - turn off ALL artwork options. OPTIONS \ COVER ART \ Untick the list of providers. Leave that for later runs.

Some albums have VERY high resolution artwork, and when you have thousands of tracks all trying to grab different artwork it adds a large load to your task.

As you are running Picard here to identify tracks, then on your initial runs skipping the artwork will give you a MASSIVE speed boost.

Then, once everything has proper MBIDs and well tagged Picard will run a lot faster. That is when you come back later to add the artwork in batches.

wizzlepig · March 16, 2019, 5:35pm

It’s my housemate’s backup drive- which turned out to not be a backup, but the only place he was storing his stuff. I never saw the drive contents before they got poofed, I susupect they were probably in folders, though, and not a lot of compilations. Just getting a good portion of it all back in a use-able state is going to be a real relief.

Thank you for the cover art tip, I will do that for sure!

IvanDobsky · March 16, 2019, 10:33pm

Ah - “backup drives”. Yeah, I’ve seen far too many people do that.

Your first tasks will be pure identification.

Also start by backing it all up…

Another thing to look at is MP3TAG.de That tagger can read the File TAGS and use them to directly change the filenames. So if he had decent tags, that may well help sort out a big heap of files before needing to go to AcoustID.

It may be a quicker start point that going the full Picard. (And yes, Picard will do the renames as well, but then it is also trying to do all the loookups and database work too so should be a later step)

outsidecontext · March 17, 2019, 6:26am

Just pointing out that Picard can do that as well. Just skip.scan or lookup and save directly on the left pane. Picard will use the configured naming script and existing tags to organize the files.

But regardless of if you do this using MP3Tag or Picard, if the existing tags are somewhat decent this might be a valuable first step. It will somewhat organize the files already, making it easier for you to work in badges. Then you can e.g. select all files for a specific artist, ensuring that all files for an entire album are loaded at one. This will make Picard’s Cluster and Lookup to be much more accurate. If you just select e.g. 1000 random songs you risk that it does contain incomplete albums.

Also if you have tags, definitely try the Cluster + Lookup approach before using g Scan (aka AcoustId lookup.or audio fingerprinting). If the tags are not too bad Cluster + Lookup usually gives better results, keep Scan for the cases where you have trouble identifying the files.

IvanDobsky · March 17, 2019, 11:27am

Interesting… never realised I could edit without the need to lookup\scan. Sooooo many different ways through Picard

You’ve read my idea right. Just using MP3TAG to read those tags back into the files, and then pass to Picard for the more intelligent and deeper lookups with the database.

wizzlepig · March 18, 2019, 3:55pm

Ah! Thanks again for all the help, everyone!

I added MP3TAG into the mix, it’s really speeding things up. I have broken the recovered files (410,000 files after I ran a new scan with a more recent version of file recovery software) down into folders of 5000 files each, and while Picard repairs one folder, I have MP3TAG renaming the files in the upcoming folders.

One thing I noticed though, for some reason after working fine on the first 1/4 of the files I have processed (about 130GB), ‘cluster’ just stopped working on me, even though the files have been renamed with MP3TAG. I am going to reboot and see if that helps. ‘scan’ is still working, but, so much slower a process.

IvanDobsky · March 18, 2019, 8:34pm

Excellent news

Little tweaks: Cluster will group files by common tags. So should group files which share that tag in the local files. Maybe it “stopped working” because those files were not as well tagged?

It is quick as it is only looking in the local tags.

Scan is going do a lot more work as it calculates that AcoustID and then chuggs off to the Interwebs to look those values up in the MB database online.

Lookup is similar to above, but just looks at the tags before chugging off to the Interweb to find them in the database.

The documentation is a bit basic for such a powerful tool, but worth looking at this page again as it walks through what the buttons do: https://picard.musicbrainz.org/quick-start/

Ya man @outsidecontext will fill in any details as he has the best view of the clever tricks of Picard. He is THE main guy maintaining it and gave us lots of nice new features for v2.x. And no doubt he’ll give you much more detail on Scan\Lookup than my basic details.

wizzlepig · March 21, 2019, 7:21am

Thanks Everyone!

[whew] … well, it’s done. @210,000 recovered MP3 files (700GB) processed down into 85k ‘good’ files (465GB) all stored in artist and album folder structures, and then there’s also 70k zero length files, still working on the total number of dupe files, 24k files with bad tags that are still named “lost file xxxx…”, various other kinks to iron out, but I figure I am in the last 5% of work that I will have to do- going to hand it back to the owner here in a few and let him wade through the tedious bits I can’t automate.

I suspect I spent 20-25 hours on this, and if I hadn’t already had some technical skills and tools I already knew how to use, I can’t even imagine how it would have gone. And drive space- there’s 5TB on my main machine, with about 1.5TB free, but spread over partitions- I had to juggle all kinds of stuff around to fit the original 700GB into one spot and still have 15GB or so free. Jeez.

Since windows is a total weenie when it comes to large numbers of files in one folder (once the total file count gets up above 20,000 or so, it likes to lock up file explorer when you try to do anything -like select 3000 files and move them), and the recovery app dumped something like 300,000 files (MP3s plus some other files), I wrote a script for moving the files into smaller groups in their own folders, this Ruby code generates destination folders and then moves files in bunches, linking here if anyone else might need it, you will need to tweak file paths and number of folders/files moved for your situation - https://github.com/wvw999/file_mover/blob/master/mover.rb

Ultimately- I ended up putting blocks of the files onto my SSD, and output them to a separate SSD. That sped things up fantastically. I was processing folders of 4,500 ~ 5,000 files in 3-4 minutes instead of 15-20 minutes.

Picard totally saved me, and with MP3TAG, this project was a viable option instead of a fool’s errand.

Llama_lover · March 21, 2019, 9:12am

Well done @wizzlepig. Thanks for the github link. Enjoy your tunes and contribute when you can!