Announcing yambs (Yet Another MusicBrainz Seeder)

I made a new tool for seeding MusicBrainz edits:

The main focus is creating standalone recordings or releases using data from text files (e.g. CSV or TSV), although it also has basic support for pulling release data from Bandcamp (edit: and Qobuz, and Tidal).

I wrote it because I wanted to add a bunch of standalone recordings for a single artist, and I couldn’t find any way to do that without spending a lot of time switching between the keyboard and mouse to reenter the same data over and over in the Add Standalone Recording page. yambs is hopefully useful to people who (like me) can enter data faster in a text editor or spreadsheet than in a web form. :slight_smile:

To give an example, if you want to add a bunch of recordings, you can specify a list of field names (e.g. “name,length”) and then provide CSV input with one recording name and length per row:

First Song,3:14
Second Song,6:56.02
Third Song,0:56
Fourth Song,2:35
...

If there are fields that you want to set for all recordings, you can supply an additional list of field=value commands:

artist=7e84f845-ac16-41fe-9ff8-df12eb32af55
url0_url=https://www.example.org/
url0_type=255
edit_note=downloaded from https://www.example.org

yambs generates a webpage with links to the seeded edit pages and buttons for opening some or all of the pages in new tabs. (You may need to tell your browser to let the page open multiple popups the first time that you try this.)

The README.md file has more details, but hopefully the example values in the web version also give some hints about how everything works.

As part of this, I generated an enums.go file that lists link type IDs and various other hardcoded values (release group types, packaging, release statuses, etc.) that are used when seeding edits. I couldn’t find these values documented anywhere (and I suspect that my lists are still incomplete compared to what’s actually in the live database).

Please file issues if you have suggestions or encounter bugs, and don’t hesitate to let me know if you find stuff that doesn’t make sense.

14 Likes

Perhaps against my better judgment, I’ve added support for seeding edits based on local MP3 files. This is only available in the command-line yambs program (i.e. not in the web interface).

I did this for one of my own use cases: I follow an artist who periodically releases new singles via their website, and it’s a pain to manually create a new release every time: I need to browse to the artist’s MB page and then copy the title and duration from the ID3 tag, copy the release date from the artist’s website, add the URL as a relationship, copy the annotation, disambiguation, and edit note from somewhere else, extract the cover art from the MP3 so I can upload that later, etc. Too much clicking.

Now, I have a script that runs the following command, which hardcodes various fields and automatically extracts everything else from the MP3:

yambs \
  -type release \
  -set artist0_mbid="..." \
  -set annotation="..." \
  -set disambiguation="..." \
  -set event0_country=XW \
  -set barcode=none \
  -set url0_url="..." \
  -set url0_type=75 \
  -set edit_note="..." \
  /path/to/single.mp3

(I guess that spending a lot of time writing code to automate a tedious task that takes a little time is the programmer way. :thinking:)

I created a v0.1.1 release incorporating the new functionality, along with some other small improvements.

You mentioned in the Pulsewidth a-tisket thread that you’ve added TIDAL support, which I greatly appreciate. I only wish it pulled copyright details, label information and barcodes like a-tisket does; I had to fill in the gaps using TIDAL, and even that doesn’t differentiate between regular copyright and phonographic copyright much of the time.

2 Likes

I’m happy to add more things if the API exposes them! Barcodes look easy to get, but I’m less sure about copyright and label info. I’ve filed https://github.com/derat/yambs/issues/23 to track this and have a few open questions there.

3 Likes

Anyway to get barcodes from Qobuz? That’s the only thing I can’t get when it’s not included in the URL.

3 Likes

It looks like yambs was already trying to extract SKUs from Qobuz pages, but I think I wasn’t using them when seeding because I hadn’t verified that they’re also valid UPCs (as opposed to internal IDs used by Qobuz). I’ve filed #36 - Extract barcodes from Qobuz albums - yambs - Codeberg.org to track this and will try to take a look at it later today.

4 Likes

I’m having a hard time figuring out the relationship between Qobuz SKUs and
UPCs. By SKU, I’m referring to the 13-character component at the end of the URL (sometimes only digits, but sometimes with letters mixed in). This is also repeated a bunch of times in the page source and sometimes referred to as an ID there.

In some cases like https://www.qobuz.com/us-en/album/a-dave-brubeck-christmas-dave-brubeck/0008940834102 (which just redirects to an artist page for me now), the SKU looks like it’s the first 11 digits of an GTIN-12 code (i.e. without the check digit), left-padded with two zeros. Feeding it to Check digit calculator - Services | GS1, I get 089408341021, which leads me to https://www.discogs.com/release/14330931-Dave-Brubeck-A-Dave-Brubeck-Christmas.

Then there’s https://www.qobuz.com/us-en/album/in-rainbows-radiohead/0634904032432, which only has a single zero on the left. Per What is a GTIN, I think that it can’t be the first 13 digits of a GTIN-14 since those aren’t supposed to start with a zero. It could be GTIN-13 6349040324320 without the check digit or GTIN-12 634904032432, but I don’t see any results online for either of those.

And then there are URLs like https://www.qobuz.com/us-en/album/the-dark-side-of-the-moon-pink-floyd/xggxq5w5dmljb that are mostly letters. Maybe there’s a way to map the letters to digits, or maybe something entirely different is going on.

I’m pretty stumped, so if there are any UPC experts out there with more insights or Qobuz experts who know of a way to consistently get UPCs, I’d love to hear from them.

4 Likes

Yeah, the SKU has nothing to do with the barcode. I was hoping you had access to the API. It’s the only way I know of to get UPCs, but there is no public API. They used to always put the UPCs in the URLs like that, but on newer releases they all have that odd code. Sometimes the URL has the full barcode at the end and sometimes as you noted, it’s missing the check digit. You just have to check the code against the barcode field by leaving off the leading zero and double checking that it’s a valid UPC code. If not, it’ll tell you what the check code is that way and I just add it. It usually matches that of a release from another site.

1 Like

It might be relevant that the Dave Brubeck example you found is a CD, and that it is a CD’s barcode that the Qobuz album reused. This reminds me of how ASINs work: The ASIN of printed books, audiobooks, and some other items is the ISBN-10; if no ISBN-10 is supplied, then an algorithm generates an alphanumeric ASIN. Usually releases on streaming services don’t reuse the barcode of the CD release, but sometimes they do, and this Brubeck album is one such example. The same barcode can be found in the Spotify, Deezer, and Apple Music entries:

As you locate more examples of Qobuz releases with EANs or partial GTINs in the URL, maybe you will find that they are all reused CD barcodes. I can’t explain why Qobuz URLs don’t use the digital media barcodes. I can only say that the string is not generated by any of the most common hashing algorithms (it’s the wrong length for that). Here are two 13-digit identifiers for the same digital-only album:

EAN-13: 0198004964562
 Qobuz: byy50rxixi5ib

I’m not good enough at cryptography to say whether we can rule out the possibility that this is a hash, but even considering the zero-padding and missing check digit that you noticed in the Dave Brubeck album, it seems clear that they’re not using a character-for-character substitution scheme

2 Likes

AFAICT, the SKU used to be a 13-digit UPC. If the UPC is shorter than 13 digits, zero(es) are added at the beginning. If it’s longer than 13-digits, the extra digit(s) are dropped. Releases added after December 2017 use the alphanumeric jumble, presumably to safeguard against instances of separate releases using the same UPC.

Apparently it’s possible to get the UPC from play.qobuz.com. This script says it can copy the UPC (I don’t have a script manager to test it) and there’s also a tool for ahem downloading music if you have an active subscription, which includes the UPC in the metadata.

2 Likes

@Anakunda can probably answer definitively, but from glancing through the userscript, I think that it’s using Qobuz login credentials to call the API to get the metadata blob containing the UPC (i.e. the approach described by @tigerman325).

I’m still considering adding support for seeding barcodes in the case where the SKU/ID appears to be a UPC, but I’m a bit hesitant since it sounds like Qobuz may just be reusing UPCs from earlier CD releases, and I don’t really want to promote the creation of more near-duplicate digital media releases that only exist because some platforms provide UPCs and others don’t.

A problem with the barcodes in the URL. Sometimes they leave off the check-digit. I leave off the leading zero on the barcode and see if MB will give a check digit if it states that the current one from the URL isn’t legit.

I don’t think I understand how to use it. When I click on the get metadata and paste it to notepad, there isn’t much useful data, just promo reviews, etc that are on the page after some metadata, but no barcodes or ISRCs that I could find.

If you download purchased music from Qobuz it will have the UPC in the metadata. IIRC they started doing this when the new official downloader was released.

2 Likes

It seems like the qobuz barcodes are coming through in the web app, so that’s good

Edit: Wait nevermind, it worked for one release but now another one has the barcode field left blank

Thanks for reporting this. I think I’ve seen it myself now, but it seems to be happening inconsistently – maybe the API is being weird. I’ve filed #40 - Qobuz barcodes are sometimes missing - derat/yambs - Codeberg.org to track it.

It looks like this is probably due to geographical restrictions on where albums are available (which I now see was also mentioned at Qobuz importer script - #9 by Sp00kyFox). I’m not sure there’s much that I can do about this unless there’s some secret way to pass a country code to the Qobuz API. https://yambs.erat.org/ is running in GCP’s us-central region, so it might only be possible to get barcodes for releases that are available in the US.

Will it work with a VPN? I have to do that with ISRC Hunt, because Spotify is now restricting info on the APIs for releases that aren’t available where they live. It’s annoying. Used to be able to see all releases around the world. Anyway, I’m not surprised by this.

All of the data is fetched by the server, so connecting to it via a VPN won’t help. It’d be a huge hack, but it might be possible for me to make the user’s browser query the API and then incorporate the barcode from there into the rest of the seeded data from the server. I’d need to think about that a bit more.

Another option would be using the yambs command-line program (there are Linux, macOS, and Windows executables available from Releases - derat/yambs - Codeberg.org) so that the network requests are made from your local IP (or running your own instance of yambsd from wherever you’d like).

1 Like

Just to mention some new features that have been added recently:

For people who want to bulk-import data from other sources, all entity types (artists, events, labels, places, recordings, releases, release groups, series, and works) can be seeded from CSV, TSV, or text files now.

The command-line program can also use podcast RSS files to seed either releases or standalone recordings (see discussion).

The web interface is still at yambs.erat.org, and precompiled binaries can be downloaded from the releases page. Let me know if you have questions or run into problems (or, feel free to file an issue in the issue tracker).

2 Likes