How do I de duplicate?

I have music/albums in a variety of formats. Some are flacs, some mp3s and some are wav files.

I want to get rid of the duplicates / lower quality files. By lower quality I mean

  • higher bit rate mp3s are better than lower bit rates
  • flacs are preferred over mp3
  • wav files are preferred over 16 bit flacs
  • 24 bit flacs are the top

Any ideas?
Thanks

The first task I would do is buy a new external drive and make an archive backup. Something to tuck in a drawer in case something goes wrong with the dedup.

I would suggest don’t rely on Recording MBID alone to do the work. Sometimes you get the “same” recording slightly different. Example would be a live gig reissued. Slight different timings of the end of a track may catch you out. It can be a guide, but I would not trust it fully unless Recording MBID and Release MBID are the same.

This tool may help as it is designed for this task: https://dupeguru.voltaicideas.net

I expect dupeguru will work even better on a well tagged collection. Personally I have not tried it due to the way my music is sourced.

My own collection I have split on encoding type. WAV and FLAC live in separate folder tree to the MP3 files.

3 Likes

Another option is GitHub - qarmin/czkawka: Multi functional app to find duplicates, empty folders, similar images etc. Version 6.0 added audio comparison by content. Here is an example screenshot where I have 3 tracks in FLAC, MP3 320, and MP3 128 with metadata removed.


It wouldn’t be automatic, but you can use the option “Select all but the biggest” to check all the smaller files boxes for removal.

2 Likes

Thanks…You don’t mention it explicitly, but I think your assumption is a tagged set of files…

Not really as tags may not be reliable. Read the dupguru page and it will work on the audio of a file. Using tags is only an option.

If all you have are audio files without tags, then dupguru will “listen” to them to spot the dups. And allows you to sort based on bitrate (FAQ) It has similar algorithms for images too.

I was just guessing that as you are on MB your files may well be pretty well tagged. :grinning:

Both of the above apps have test modes you can run on a batch of files. See how they behave.

I’ve mainly used the image mode before to help someone out with a chaotic photo collection full of dups. Worked well once you get your head around filtering the lists it is making.

Earlier discussion:

2 Likes

Thanks. Do you think Picard is better at the audio than dupguru?

I’d say it’s more of a work in progress. Some are good, some I’m revisiting!

Thanks. I will have a look

Thanks for the link!

1 Like

Not really relevant when you want to deduplicate. I’d trust the tool designed for the job. Picard does tagging, Dupguru looks for duplicates.

As with anything, grab a few folders of test data and try the tools out. Working with data where you know the expected outcome will mean you can see how good they are. Dupguru has a pretty geeky GUI, but it powerful once you get your head around it. Especially as you know you can mess around with it just looking at files without changing anything until you are sure.

2 Likes

But Picard can show you files with same fingerprint.

Mono and stereo, dynamic and over-compressed versions, usually have the same fingerprint (AcoustID).

Then you can choose your preference (like dynamic stereo) and remove the others.

I don’t know how handy it is though, I never needed to do this.

1 Like

Add finding similar audio files by content - #970

Feature added 7 months after that post. If you check my screenshot you’ll see I stripped all metadata to test it.

Also you can see in the screenshot a dropdown option with “content”, the other option in that is “tags”.

2 Likes