There are some earlier discussions here about using Picard to identify duplicates:
- How can I remove all of my duplicate music
- Remove duplicate files with Picard
- Is there an automated way to dedupe in picard?
I’m not sure if Picard will be able to load your whole collection into memory, though, and I suspect that looking up all the songs may take a long time if they aren’t already tagged with MBIDs. How large is “extremely large”?
If you’re looking for acoustic similarity, I wrote a program named soundalike that uses AcoustID’s chromaprint to scan a music collection. I described it a bit at Soundalike: a program for finding duplicate recordings in a music collection.
I run it periodically on a slow computer against a collection that’s currently just over 23,000 songs. The initial scan took about an hour but incremental scans are pretty fast.
If you give it a try, let me know if it works for you (or doesn’t).