Extremely large music collection needs advice on what dedupe program to use

There are some earlier discussions here about using Picard to identify duplicates:

I’m not sure if Picard will be able to load your whole collection into memory, though, and I suspect that looking up all the songs may take a long time if they aren’t already tagged with MBIDs. How large is “extremely large”?

If you’re looking for acoustic similarity, I wrote a program named soundalike that uses AcoustID’s chromaprint to scan a music collection. I described it a bit at Soundalike: a program for finding duplicate recordings in a music collection.

I run it periodically on a slow computer against a collection that’s currently just over 23,000 songs. The initial scan took about an hour but incremental scans are pretty fast.

If you give it a try, let me know if it works for you (or doesn’t). :slight_smile: