Using with Beets on Linux and Encountering Some Odd Issues

So this may be a beets issue, but since it uses MusicBrainz I figured I would ask here as well.

So my usage case is I’m trying to get my music library well sorted, root out dupes and finding tracks missing metadata. Part of this is due to data recover from a dying drive. Which beets helped me get copied to a new location, just using file names.

So when I went back and tried to use it to check metadata to get more precise data for better sorting and organization. The format I was looking for was: albums or live/first letter of album artist name/album artist (artist country)/Genre/[release year] (release country) Album name [album type (regular, bootleg, ep, etc]/file type/{disc #} Track # Title (track bitrate).

Compilations and singletons are only slightly different.

So when I did this a number of very confusing things happened.

  • Albums got split up into similarly named folders. So all the folders had the same name with a unique identifier, but it would be like most tracks in one folder with a smattering of the rest of the tracks across 1-3 other folders. Like they were not from the exact same album.

  • Sometimes there was this sort of duplicate folder structure(same name with unique ids) and there would be a single track in those, but this time they were a duplicate track, oddly usually track 2 on the albums effected by this, and there was almost always two different folders with this one track. I’m 99% certain there were no dupes for these tracks initially, mostly because I can’t see how a single track would be duplicated in the way I do things. If there were dupes I would have expected full albums. Which there were such cases, but I’m almost certain that those were actually dupes on my system from different locations. Part of the reason why I needed a very thorough sort of my library.

  • Although I’m certain I configured beets and musicbrainz settings to look at file metadata primarily for the information (as 80+% of my library was already well/properly tagged), but in many cases it seemed to do it’s own thing, putting things like Unknown for release year, genre, album, etc when that information was very clearly in the metadata.

  • So I tried very very hard to configure things so that musicbrainz with very minimally rely on it’s database. As in the past I had my library completely destroyed (disorganized and mistagged to a crazy degree), as for whatever reason musicbrainz struggled with a surprising number of albums. Side note, I listen to some niche bands/genres, but the vast majority of my library is well known bands in their genres. So I failed in setting it up properly somehow, or it didn’t work as it should have. I had it set so that it would ignore any match in the database if it wasn’t 95% similar or better. So for example the entire Romeo & Juliet OST got broken apart and individually retagged with the comply wrong info. Another case was a song by a death metal band that got retagged as a Judy Garland song?!

  • Lastly I have encountered many albums that I know I had the full album with anywhere from a single song missing to 80 or 90% of the album missing. I’m still working my way through my library trying to fix all this, so these could be buried in a stash in VA or Unknown artist folders. But I just find it odd that it would separate these tracks and mistag them when they were in a folder structure with that album and as stated previously were probably properly tagged already.

Any insight or suggestions would be greatly appreciated.

How did you ID the music? Scan or Lookup?

If Beets is using MusicBrainz data, then lets assume it works like Picard. Here is (approx) how Picard could have made a mess like this.

When IDing music with Picard you have two choices - Scan or Lookup.

When Picard does a “scan” it will use AcoustIDs to try and match things. Sometimes you can have multiple versions of an album in a release group. Different releases from different years, countries, manufacturers. This can lead to 10, 20, 30 versions of a Release.

When Picard does a “Lookup” it will read the tagged data and look at the folder the tracks are in.

For 1 - when Picard uses Lookup it should keep a folder of tracks together and assign to one release. When Picard uses AcoustID it listens to each track individually and potentially picks a different release for each album.

This kinda makes sense. If Picard is just “listening” to a track it does not know if that is from the US CD or the French CD. So will take an educated guess. Sometimes even putting it into a compilation album match instead.

With the “Lookup” option Picard is more intelligent and tries to keep a folder of music together. It reads the tags and knows tracks in a folder together tend to be from the same release.

This is why any scan needs to have a human check a result. That way if you spot one release scattered across five different versions of a release you the human can pick the correct release and move your tracks to that release.

For 2 - sometimes your dups may be from a compilation and a release album. But as they sound the same AcoustID would not know that they came from different albums. So pops them in to the best match.

Maybe your data recovery software pulled out some tracks more than once?

3\ Hard to know what is happening here. Look at a smaller sample and use human brainz to spot the errors.

The short version - NEVER scan a whole collection and expect it to be perfect. No software will do that. This is why you previously experiences chaos.

Scan small chunks that you as a human can check and adjust the results.

When you do finally get towards a final answer, make sure you keep the MusicBrainz IDs in place. If these IDs had been in your damaged collection then there would have been no need to do any scanning as Picard could have just instantly reassembled the tracks into the correct albums without changing data. It could have just moved the files into a new set of folders and you would have been done. I guess your old files didn’t have these MBIDs? So we are back to trying to identify data as if it is new?

Yeah, as I stated I wasn’t trying to get Beets/MusicBrainz to do anything with musicbrainz database, but after going through the process of installing beets multiple times to get it just right (running linux and Beets has a non-standard way to get it installed an many of it’s packages) I missed a critical setting or forgot to change a setting (not being able to remember if I had done it “that time”…

So back to cleanup. Do you happen to know where I can find a beginner friendly thorough walkthrough of how to properly use Picard? I tried searching for one myself, but came up short for something that was beginner friendly and thorough.

I can’t point at real noobie walk throughs. Just the one in the manual: Work Flow Recommendations — MusicBrainz Picard v2.13.3 documentation

But sometimes that is a little too geeky.

Some of what I say above should help. And there are other threads around of how people do things. This is a common question and it is a pity we can’t pin some of the better guides.

My main advice:
1\ Backup before you start. So you have reference data to return to if you make errors.
2\ Work in small batches that you can check.
3\ If you have tags, then use “Lookup”.
4\ Only use “Scan” if you have no tags or identifiable data. AcoustID is good, but all it will do is name the music. I won’t know which album it is on.
5\ Remember you can copy the link of a Release from Musicbrainz and paste it into Picard.
6\ Work with two main “pools”. A source called “Chaos” and a destination called “Order”. Pulling albums from one side to the other in batches.
7\ Picard can have its matches tweaked. Tweak country bias, tweak album vs compilation bias.
See Options \ Metadata \ Preferred releases. This will reduce that “split album match” thing.

Work in small batches while you get the hang of things, and after a while you will speed up.

Don’t know if this clears anything up, but it looks like the folders that had a single track, usually the second track of the album; they may have all been a single flac album. That is the entire album in one track….But why would it change them from the album name to the name of the second track?!

Because Picard can’t do CUE files or albums ripped to one track. It did a best guess instead.

You need to sift these types of albums out of your Picard searches and manually handle them. Should be easy as you can do a manual search across your folders based on size.