Hi, I’ve recently built a dataset and a CLI (in the Rust language) which I think is very nice: GitHub - myersm0/werkverzeichnis: Machine-readable catalog of catalogs for classical music metadata
When I first began designing this, a couple weeks ago, I was under the (false) impression that classical metadata was not well-modeled in MusicBrainz, and that I was doing something novel:
- providing an interface for querying classical works by catalog number
- building a data model that cleanly accounts for the changing nature of catalog numbers and composer attribution, and smooths out the difficulties surrounding those things
I knew that there were plenty of individual composer tools like this out there. The Deutsch catalog for Schubert, for example, provides a web interface for searching their D numbers, and similarly that there are tools out there for navigating the BWV catalog for Bach, etc. But I didn’t think anyone had yet tried to generally solve the problem of dealing with the catalog number mess for all composers (or, at least, for a majority of the major canonical composers).
It was only after I had nailed down the basic data model for my project, and populated it with a starting dataset of ~500 compositions, that I started seriously investigating MusicBrainz’s capabilities in this regard. And what I’ve found has surprised me: MusicBrainz does this quite well already, it just wasn’t very discoverable to a newcomer to the project like me. It took me a couple of hours of preprocessing a MusicBrainz JSON dump, and working with some SQL joins, to realize this. (EDIT: On second thought I don’t think I had to do any table joins. I think the table joins I was thinking of was for enumerating the movement structures.)
So now that I realize much of my work has been redundant with what already exists in MusicBrainz, I’m debating what are the next directions. I’m hoping to get some feedback about this. Do people see any value in a tool like this as a complement to MusicBrainz? Or maybe as a derivative of MusicBrainz, kind of an alternative version that you can simply query like this from the command line?
To quickly orient you to what my project actually looks like: it’s a command-line utility — you just install (or clone and build from github), and then you can do things like:
$ wv get mozart k 331
Sonata in A major, K. 331
Or to get the full json record:
$ wv get mozart k 331 --json
[full json not shown here for brevity]
To get Beethoven’s works from opus 2 through 10:
$ wv get beethoven op 2..10
Sonata in f minor, op. 2 no. 1
Sonata in A major, op. 2 no. 2
Sonata in C major, op. 2 no. 3
Sonata in E♭ major, op. 7
Sonata in c minor, op. 10 no. 1
Sonata in F major, op. 10 no. 2
Sonata in D major, op. 10 no. 3
(Note: This only covers his piano sonatas because that’s all I have from him in the dataset right now. It’s just a minimal proof-of-concept dataset.)
All of this is offline. The data is right there on your computer and you can even edit the files like this:
$ wv get beethoven op 2..10 --edit
It also handles some edge cases like:
- Edition-aware queries (e.g. for Mozart you can look up K. 300i under the 6th Köchel edition specifically)
- Attribution changes over time are preserved as structured history rather than just current state. E.g. Bach’s BWV 141 is now in the appendix of BWV as a spurious work attributed to Telemann; this is straightforwardly represented and queryable in my dataset.
So you can see the design is oriented around catalog number lookups as the primary access pattern — which in MusicBrainz, as far as I can determine, requires preprocessing a data dump and working through SQL joins. (EDIT: As I clarified above, I don’t think it actually requires SQL joins.)
Each of the ~500 items in my dataset now has a cross-reference link to the MusicBrainz work ID (MBID). It would be trivial for me to commit to extending this dataset to the complete works of four or five major composers, but to extend it beyond that I would need either some funding or some contributors.
Thanks in advance for any feedback.