Mozart 225: how to move a mountain?

classical
dataentry
boxset
Tags: #<Tag:0x00007f23c2e34df0> #<Tag:0x00007f23c2e34cb0> #<Tag:0x00007f23c2e34b20>

#1

Wow, I just found out about this. A 200-CD box set of Mozart works, Mozart 225, from Decca Classics and Deutsche Grammophon. From a quick look at some Amazon reviews, it seems like this is a good collection. I’m pretty confident that people will buy it and want to enter it into MusicBrainz. Yay!

But… 200 CDs! How do a set of editors efficiently enter such a large collection into MusicBrainz? How does one move a mountain? How can the work be shared effectively? Has such a large collection been entered before? Are there tips on how to go about it?

One obvious question: how many of the discs are exact duplicates of previous Releases that are already in MusicBrainz? That will probably make things easier.


#2

I believe MusicBrainz can’t handle releases with so many media. They are usually split into “releases” with a number of media. So in this case you would have something like 20 of these releases with 10 discs each, all in the same release group. That also makes it easier to divide the work.

Personally I don’t see why anyone would buy box sets that are that huge. Maybe it’s just to show off? You’ll never listen to even a quarter of all that. You’re better off choosing a couple of works you would like to have and try to find the best versions of that. Plus it’s Mozart! 200 CDs! Do these people want to bore themselves to death?


#3

I see a very simple reason: Most collectors want the completest “Complete Edition” available. It’s not about listening the works, it’s much more “I must own/have it” :wink:


#4

I think the best approach would be two enter a few mediums in a single release at a time and then merge all the releases together at the end. It should work fine post-merge, as long as a nobody tries to fetch data about the release…


#5

It can for display, see The Complete Bach Edition. But trying to open Edit relationships there will try and load all mediums at once, and (I assume) fail. So for adding data, it might be needed to do it via separate releases (which, if you’re lucky, will actually exist since they won’t be new recordings but reissues of previous albums).


#6

Looks like the Decca website already gives access to partial tracklists (~ 10 CD, e.g. http://www.deccaclassics.com/en/cat/4831217?), so I would suggest to keep the same splitting in musicbrainz (easy to merge later if wanted/needed)

You can then use the “DG/Decca import” userscript to load track names/durations/artists in a new release for each of these… there will still be a lot to do of course :slight_smile: (starting with identifying the artists since you cannot save until it is done)

For Decca/DG I usually look by performer/recording date to find existing recordings and from there look if one release exists with the same exact tracklist. Takes some time, although Decca often doesn’t change the tracklists and disc order from one compilation to another (so I expect “all symphonies” and “all piano sonatas” to already exist in musicbrainz exactly as needed)


#7

This separate smaller digital release might not have exactly same track list as the 200 CD version. It’s common for them to split box sets as smaller digital releases and sometimes tracklist differs with couple of tracks. I still recommend adding these digital releases with your script because most (if not all) recordings are the same on a bigger box set.


#8

There’s a complete tracklist, kind of:

http://www.deccaclassics.com/html/misc/00028948300006-tracklisting.html
and

Nothing you could easily import, though.


#9

I tried it and, well… there are far worse things it can do than fail. Like, succeed. 16 GiB of RAM apparently isn’t enough anymore*. Made the machine unusable via thrashing for a good several minutes. (Really need to figure out how to get Firefox and Chrome in their own cgroup with the memory limiter).

*Pedants may point out that no finite amount of RAM has ever been enough for a web browser, at least not unless its restarted every few hours.


#10

Are you sure your browser is 64-bit? I’m not sure whether the Firefox Windows releases are 64-bit yet. I know Mozilla’s official builds have only been 32-bit for a long, long time. Which means they’ve only been able to utilise ~4GB of your RAM, regardless of how much you’ve had. I have 16 GB RAM on my 64-bit Linux laptop, and my Firefox with four windows and probably a couple hundred tabs (:weary:) is running for days on end with no issues.


#11

Yep, it’s the Debian amd64 build.

$ file /usr/lib/firefox/firefox
/usr/lib/firefox/firefox: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=a9efabdb6e740b467eecbcbae02b4ba4e41cbb4e, stripped

I may well have some extension increasing memory usage…


#12

https://musicbrainz.org/release-group/f72954d9-0467-48d6-ab97-369faa02c825

Is there a 40 CD limitation on a release? Can (and should) these releases be merged? They have the same catalog number and barcode.


#13

Not really.

They can. Whether they should, I guess, depends on how feasible it is to load a 200CD release right now. It might be better to wait until the relationship editor doesn’t try to load them all in one go, since that’ll kill most browsers I think…


#14

I created these releases separately because of the handling of relationships in relationship editor. I believe most of the browsers aren’t able to handle 200 CD releases on it. This box set is having over 3300 recordings so the amount of relationships must be over ten thousand. Currently I recommend not to merge this kind of huge releases.