Very large organizing project that exceeds MusicBrainz capabilities

bsacco2 · September 21, 2022, 5:35pm

I’ve been trying to use MusicBrainz for the past 6 months to organize a large collection of MP3s I have to no avail.

I’m not sure if I’m doing it correctly.

Is there a tutorial for a standard procedure I can check my settings against?

Please advise.

rdswift · September 21, 2022, 5:54pm

You might find something in the Picard User Guide, specifically the section on Work Flow Recommendations.

Note the comment in the Introduction section that, “[Picard] is not intended to automatically organize your collection of thousands of random music files, and if this is what you are hoping for then you will likely be disappointed.”

yvanzo · September 21, 2022, 5:55pm

Advice: Do not load too many files at once into MusicBrainz Picard. I never loaded more than a few hundreds at once even though it can probably handle a few thousands.

Two documentations otherwise:

bsacco2 · September 21, 2022, 6:12pm

My inability to get this program to work goes beyond loading too many files. The color-coding of files processed makes no sense to me and I can’t tell if all the files processed have been cleaned. Also, there seems to be no standard easy way to setup a simple naming convention for artist, name of song, track # etc… it appears you have to have a degree in coding in order to set that up. Also, the ORDER in which you use the software (User Interface) something as simple as Step 1, Step 2, step 3 seems to elude the developers of this software for some reason. Why can’t they just number the sequence instead of leaving it vague and confusing…cluster, lookup ,scan, etc… OMG. The simple evades software developers.

rhetticent · September 21, 2022, 6:42pm

Welcome back, @bsacco2

It looks like you’ve been here a few times in the past: Newbie, needs some advice

To rehash the previous responses, this is free software, made by passionate people, but you are not their employer. Please refrain from disparaging remarks that essentially equate to you spitting on their work.

rdswift · September 21, 2022, 6:43pm

Perhaps the Status Icons secion will help clarify, although I’m not sure I understand what you mean by “have been cleaned”.

There is a Tutorial explaining step-by-step how to set up your file naming script. Picard’s scripting functionality is very powerful, accommodating users to develop complicated tagging and file naming scripts; however, it is also easy to use for simple file naming conventions. There are even some simple built-in file naming scripts that can be used as-is or form the basis of a more personalized script.

If you look at the workflows that I referenced in my earlier response, you will see that they are shown step-by-step (numbered even), to make the process easy to follow.

aerozol · September 21, 2022, 8:13pm

No need to be rude, but FYI you are not alone. I estimate a huge percent of people bounce off Picard because they don’t know where to start, or they use it wrong (scan…) and get bad results.

See related ticket: https://tickets.metabrainz.org/browse/PICARD-2315

Now that you are here though, and have put in some time and effort, reading a start here guide or watching a ten min how-to really is all it takes.

bsacco2 · September 21, 2022, 8:55pm

thank you rdswift. I appreciate it.

My problem with MB is a global one.

I have a huge collection of music files that are a mess.

THe “Global” questions I have refer to how Do i even begin to tackle the problem? That is,

Do I need to de-dupe my files before I even begin? (reduce the work MB has to do)
Should I do a global search for all my MP3 files on my PC across all my drives to FIND all my individual MP3 files then drag them ALL into ONE big folder so that MB can analyze them and organize the NEW organized folder structures (Artist/LP in ONE PLACE?
I’m searching for a dedupe application but I can’t seem to find a good one. Ones I found are Duplicate Cleaner 5, DupInOut, Easy Duplicate Finder and Cisdem. But I can’t find any decent reviews on any of these products before I pull the trigger on paying for it.
Any tips from the forum would be appreciated as i have been attempting to clean my files now for over 6 months without any progress.

bsacco2 · September 21, 2022, 9:14pm

I know there used to Tidy Music and Tuneup 3.0 but both of those companies went out of biz. I suspect they were bought out my the Music Industry and forced to sign non-disclosures and non-competes in order to steer the entire industry toward streaming.

bsacco2 · September 21, 2022, 9:34pm

[yvanzo] - Can you recommend a tool where I can submit bulk music files for tagging and organization?

bsacco2 · September 21, 2022, 9:39pm

Perhaps, MB is not the tool I need. It appears MB is useful only in small amounts (200 files or less)

Even if I used MB, I still can’t seem to figure out how to use it across a large amount of random semi-tagged music files and maintain a MASTER collection of organized LP /Artist folders even if I had the patience to do so.

bsacco2 · September 21, 2022, 9:48pm

[aerozol] - Thank you. I do like the suggestions seen at: related ticket: [PICARD-2315] Basic Mode/Display - MetaBrainz JIRA

I do think the problem with MB is that it fails to set expectations on what it can do from the start.

MB should have a tag line that describes the benefit of using the tool and sets the User’s expectations.

For Instance, MusicBrainz - “Organizing music collection - 200 files at a time”

This way, it’s clear what the tool does. Also, limiting the tool to only process 200 files at a time would also avoid User confusion and fulfill the promise of the software.

Until MB does this, you are going to have a UI problem and a product disconnection with consumers.

FYI, I am a CMO with over 25 years experience.

bsacco2 · September 21, 2022, 9:55pm

Does anyone know if there is a conflict between DBAmp ripped files and MB?

I noticed that MB has verified support for log files generated by EAC but not DBAmp

When the ripper log file is available

This option was added to Picard in version 2.8, and supports the use of log files produced by the popular CD file rippers Exact Audio Copy (EAC) for Windows, X Lossless Decoder (XLD) for macOS, and Whipper for Linux. Because the log files of these rippers contain sufficient information to generate the CD table of contents they can be used in place of reading the CD. As with reading the CD itself, this method provides the greatest chance of tagging your music files with the most accurate match from the MusicBrainz database. It is also one of the easier methods for looking up the release.

IvanDobsky · September 21, 2022, 9:57pm

The main problem with throwing 1000+ files into Picard in one go is how do you spot the mistakes and bad matches? If Picard was 90% good at matching, then you’ll have 100 files that were wrong.

The reason we suggest working in smaller batches is so you the human can check the results. Work in small batches at the start and you’ll soon speed up. Throw in 1000+ files and you will never be able to spot the errors. Throw in two or three albums and you will be able to check the matches.

Trouble is that Picard will be too fussy for a general match that you are looking for. There are 120 copies of Dark Side of the Moon listed in the database, but I am guessing you just want your files tagged as the original 1973 LP. Matching a “deluxe” album with extra tracks often needs checking to make sure it is match to the correct album.

This is where a “different” Picard could help cases like yours where users just want to clean up a heap of files. Sometimes the database can cause confusion by the large number of choices available. This is why you need to tweak Picard to bias your country of choice, or your album choices.

bsacco2 · September 21, 2022, 11:49pm

I was reading through the FAQ at MusicBrainz.org / communities and under PRODUCTS I found AudioRanger.

Now, AudioRanger looks promising in that I can throw a bunch of files at it and it actually du-dupes at the same time which is something MB does not do. Do you have any experience with AudioRanger? Would this be a good tool for bulk cleaning + deduping? Or do you recommend another product?

yvanzo · September 22, 2022, 8:13am

It is difficult to make a relevant answer without a precise description of your needs. How many files do you have to process? How often do you have to do so? Which tags are important to you? For which usage in the end?

For example, you may not care about distinguishing between two reissues of the same album (which is mostly pertaining to match physical media), you may not care about the composers of the works interpreted in these recordings (which is mostly pertaining when organizing classical music), and so on. There are a lot of different use cases covered by MusicBrainz Picard.

As a CMO please note that MB or MusicBrainz is referring to the whole project that features both the tagger Picard and the collaborative database. The latter can be used by other taggers such as the affiliate taggers that you did find under the Products menu.

FYI, I’m a user not a contributor of the tagger Picard; I’m a user, contributor and developer of the database.

For your particular issue about duplicate files, I confirm that Picard is not able to handle these at the moment; See the ticket PICARD-311.

aerozol · September 22, 2022, 10:33am

I’ve never seen the issue with duplicates tbh (though I’ve never had to work with them so I may be underestimating the problem, it comes up often enough).

Just run everything through Picard, including dupes, and then search your resulting music folder for ‘(1)’. Delete the results - or check which one’s better and then delete one, if quality varies.

bsacco2 · September 22, 2022, 8:01pm

I just bought AudioRanger which I think used to be MagicTagger. I think my application is to use it as a first screen to weed out duplicates while tagging at the same time… (kill two birds with one stone). THen after, I can run it through MB for all the precise adjustments. AudioRanger tags pretty fast and gets the process started.

outsidecontext · September 23, 2022, 10:15am

There is no exact amount or rule here. It all depends how well one knows Picard, what the expectations are in tagging and how the current organization of the music collection is. E.g. if I’d have a collection of files that are generally already tagged and are full album rips, I might load even a few hundred albums (equating a few thousand files) into Picard. Cluster + lookup usually works well on these type of files, and my expectation would be to tag the entire albums. I can then rather quickly look through the matched files on the right and save those where I’m confident enough the are tagged correctly. After saving these can be removed from Picard. The left-overs I’ll need to look close, maybe do some research to find the proper album.

But if I on the other hand have just a bunch of random files in a folder, with bad or even non-existent tags, I’ll probably load only a few dozen of them at once and try the other tools Picard has to offer (e.g. Scan aka audio fingerprinting, Tags from Filenames or manual searching).

In general if you use a software that can manipulate thousands of your files it is always wise to first try this software on a few files, see how the software works and check if it does what you want or how it can be configured to do what you want. Using the software first on all your files and afterwards check if it actually does what was expected is not a good idea. That’s the core idea behind the recommendation of trying with a few files at a time. The other is that Picard is meant to be used interactively, the user is expected to check the results.

No, that’s not required. The default already works for many users (structuring by artist and album), and there are a few presets to select from. We also have a tutorial at Writing a File Naming Script — MusicBrainz Picard v2.10 documentation

Picard is kind of a toolbox, offering a set of different tools to tag and rename audio files. As I outlined above there is not one “right” way to use it, it depends.

But there is an existing ticket for a basic mode, see PICARD-2315 - Basic Mode/Display with some ideas. But it needs someone to actually do the changes.

Please keep in mind that there is not a huge development team. There are only a handful of people at best trying their best to make Picard better. And there is a lot to do. Picard is a community project, and like most open source projects could always use more contributors. So please if a software does not work the way you want it to work don’t just assume that the people providing you the software are just too stupid, ignorant or lazy. Thanks.

There is no conflict, there is just no support for dBpoweramp log files implemented, and so far nobody asked for this. I took a look, and yes, the log files generated by dBpoweramp could be used. I added a feature request for this and started implementing it:

dpr · September 23, 2022, 11:04am

Welcome back

I’ve been on a similar journey. What I’ve found is that it is helpful / important to define what you are trying to achieve:

What music player will you plan on using to browse and play your music? and what metadata will it use? There’s no point in spending time on meta data that is ignored. For example, if your music player only looks at MP3 metadata, filenames don’t have to be perfect.
What does organised mean? de-duplication of music? a nice file hierarchy that you can browse easily? Does your player use the hierarchy or is this an organisation approach?
what defines a duplicate track? Same track from the same album? What about higher bit rate or the same song from the greatest hits release?
what about artwork? - do you want just the front cover or all of it?
how will you know you are done?

At a very high level:

musicbrainz is a database of data / meta data. It is focused on the release.
Picard is a program that helps you identify music and write metadata into music files - including mp3s, but also other formats.
it has all kinds of scripts to do stuff like rename folders and files.
All useful tools, but you have to know what you are trying to achieve and what is ‘good enough’.

As others have hinted, do it incrementally.

I have two folder hierarchies:

My archive- which holders copies of the source CD rips and mp3s I have bought
The format for mp3s is: archive/artist/release year - album title/ track number - track title
If the release is a multi cd, then track number has disc number-track number.
(The reason for this folder is that I want a copy of my sources)
My music folder - which holds the music I want to listen to, tagged in a way I ‘like’/want:

Beatles CDs were released in the 2000s. etc. They are versions of the vinyl records from the 60s. Musicbrainz database will contain entries for both, but they are 60s music to me, so that is how I have tagged them for me, even though the releases in the archive folder have the release date in the 2000s etc.
The artwork is small for my iphone and car

In summary, define what you are trying to achieve or you can go round in circles.