How to get a huge release from API including area and place?

InvisibleMan78 · January 20, 2022, 1:02pm

I would like to get the “recorded at” informations from API for a huge release with 96 CDs as you can see it online with:

If I try it with this command:
https://musicbrainz.org/ws/2/release/1d9867ad-721a-4abb-9a22-41e118531068?inc=recordings+recording-level-rels+area-rels+place-rels
I don’t get any information at all about place or area. It doesn’t seem to make any difference if I ask for JSON or XML format or limit it to any number.

The same command for a partially available release like the first 28 CDs from the same release:

seems to work perfectly fine, returning the informations about place and area.
https://musicbrainz.org/ws/2/release/5418ad56-bce0-4d69-abfa-7425d7e756cc?inc=recordings+recording-level-rels+area-rels+place-rels

Question:
If there is any limit to return such data, how could I get it in parts for huge releases?

nadl40 · January 20, 2022, 2:06pm

Try this, it gives you more than you need, I’m using it to get release info I’m interested in

https://musicbrainz.org/ws/2/release/1d9867ad-721a-4abb-9a22-41e118531068?inc=media+recordings+artist-credits+artists+labels+artist-rels+release-rels+url-rels+recording-rels+work-rels+recording-level-rels+instrument-rels+place-rels+area-rels+aliases

InvisibleMan78 · January 20, 2022, 2:31pm

Thanks for your reply @nadl40

Did you try it? I can’t find any recorded at information in your search query.

(I use the browser “Google Chrome” on Windows 10 Pro)

nadl40 · January 20, 2022, 3:20pm

I use this all the time, it’s large xml, you can try a simpler release to see it in the browser.
I dumped to a file and grep it for “recorded at”

curl -s https://musicbrainz.org/ws/2/release/1d9867ad-721a-4abb-9a22-41e118531068?inc=media+recordings+artist-credits+artists+labels+artist-rels+release-rels+url-rels+recording-rels+work-rels+recording-level-rels+instrument-rels+place-rels+area-rels+aliases -O test.xml

grep "recorded at" test.xml

nadl40 · January 20, 2022, 3:32pm

and in a editor

InvisibleMan78 · January 20, 2022, 3:52pm

Thank you very much for your confirmation!

That’s weird, because in my test environment (browser) I don’t get the same result, I don’t see any “recorded at” in the ouptut…

I used your curl command in my windows 10 command line window and get a 5’740 KB size test.xml file. But there is NO “recorded at” in this file… I opened it with Notepad++ and even WinMerge.

How can that be?

jesus2099 · January 20, 2022, 3:53pm

It seems to be one of those releases with more than 500 tracks for which recording-level-rels will not be returned by the web service: https://github.com/jesus2099/konami-command/blob/master/CONTRIBUTING.md#interesting-mb-test-pages

You will need to make two additional calls, ~~one for /ws/2/area?release=1d9867ad-721a-4abb-9a22-41e118531068 and the second for /ws/2/place?release=1d9867ad-721a-4abb-9a22-41e118531068 with pagination offset.~~

Not sure about these examples, you should rather have a look at a real example.

Well, @outsidecontext’s Picard example is even better for you, as you want two entity types at once!

InvisibleMan78 · January 20, 2022, 3:55pm

Thank you @jesus2099 for this hint!

But @nadl40 explained above, that his/her test.xml contains the “recorded at”.
Do you say, the /ws/2 from a browser call does not return the same as a curl command?
If yes, why does my curl command under windows does NOT get it?

outsidecontext · January 20, 2022, 4:11pm

As @jesus2099 said the web service omits recording relationships on releases exceeding a certain no. of recordings for performance reasons. Picard is plagued by the same issue.

One solution is to do a separate query for just the recordings in this case:

https://musicbrainz.org/ws/2/recording?release=1d9867ad-721a-4abb-9a22-41e118531068&inc=area-rels+place-rels&limit=100&offset=0

This will give you the missing relationships. But note that this request is paginated and returns a max. of 100 recordings per call. You need to perform additional queries with increaed offset parameter to get all recordings.

I just implemented exactly this for Picard a few days ago, where it detects if recording relationships are missing and if they do it performs separate queries.

No, you get the same result. The screenshot shown seems to be for a different release, at least I cannot see “St. Jude-on-the-hill” as a recording location for 1d9867ad-721a-4abb-9a22-41e118531068

InvisibleMan78 · January 20, 2022, 4:18pm

Thank you very much @outsidecontext for this confirmation.

It’s a load off my mind. I thought I was going mad.

I will search for your Picard solution to get an idea how this pagination thing works.

jesus2099 · January 20, 2022, 4:26pm

Update after I read above post: Well, @outsidecontext’s Picard example is even better for you, as you want two entity types at once!

Oh @InvisibleMan78, in fact, when I wanted to get works from a 501+ track release, I made another request with recording batches (not the requests as I showed you in my earlier post, but maybe they work):

github.com

jesus2099/konami-command/blob/7b92a6d8b0e681fa21e07e5d472a45ce85a399b7/mb_COLLECTION-HIGHLIGHTER.user.js#L512

    
      
          // #                                  LOAD WORKS FROM HUNDREDS OF RECORDINGS #
          // ############################################################################
          var mbs12154 = 0; // #### REMOVE WHEN MBS-12154 FIXED // reduce batch size (results in random order, we try to get less than 101 results to keep them all on one page)
          function loadMissingRecordingWorks(recordings, action, _batchOffset, _wsResponseOffset) {
          	var batchOffset = _batchOffset || 0;
          	var wsResponseOffset = _wsResponseOffset || 0;
          	// keep the query URL short enough (100 recordings) to avoid 414 Request-URI Too Large
          	var batchSize = 100;
          	batchSize -= mbs12154; // #### REMOVE WHEN MBS-12154 FIXED
          	var batch = recordings.slice(batchOffset, batchOffset + batchSize);
          	var workQueryURL = "/ws/2/work?query=rid%3A" + batch.join("+OR+rid%3A") + "&limit=" + MBWSSpeedLimit + "&offset=" + wsResponseOffset;
          	if (wsResponseOffset === 0) {
          		modal(true, "Fetching works from " + batch.length + " recordings… ", 0);
          	}
          	var xhr = new XMLHttpRequest();
          	xhr.addEventListener("load", function(event) {
          		if (this.status == 200) {
          			modal(true, this.response.works.length.toString(), 0, {text: "recordings", current: batchOffset + batch.length, total: recordings.length});
          			for (var r = 0; r < this.response.works.length; r++) {
          				if (stuff["work"]) { addRemoveEntities("work", this.response.works[r], action); }
          			}

But I had to limit the amount of recordings to avoid pagination:

Because there is a bug with random order results:

nadl40 · January 20, 2022, 4:34pm

sorry for the mixed up, my curl command had an -O instead of -o and test.xml was from another release when the command was correct.
As per others, you need to paginate.

InvisibleMan78 · January 20, 2022, 4:44pm

For the record - as a side note:
If anyone else should try CURL in a windows 10 command window, you should use

curl https://musicbrainz.org/ws/2/release/1d9867ad-721a-4abb-9a22-41e118531068?inc=media+recordings+artist-credits+artists+labels+artist-rels+release-rels+url-rels+recording-rels+work-rels+recording-level-rels+instrument-rels+place-rels+area-rels+aliases --ssl-no-revoke -o test.xml

Without the additional option --ssl-no-revoke you get an error like

The revocation function was unable to check revocation for the certificate.

outsidecontext · January 20, 2022, 4:52pm

In case it helps the changes are at PICARD-2398: Load recording relationships separately for huge releases by phw · Pull Request #2036 · metabrainz/picard · GitHub . It also shows how Picard detects if relationships are present.

But I just realize it isn’t maybe that easy to see what it is doing without a deeper understanding of Picard’s internals, since the actual request URLs are not directly visible in the code changes, as this just uses all the existing query functionality already present in Picard.

InvisibleMan78 · January 20, 2022, 4:54pm

Thank you for this link!

If you don’t mind, I would ask you again if I need further details.

jesus2099 · January 20, 2022, 5:00pm

To put it simply, when the first recording you browser has no .relations attribute, it means that you didn’t get any recording-level-rels for this release (today’s criteria being more than 500 tracks).

If .relations is an empty array [], you are not in this case. It’s just that recording has no relationships.

github.com

jesus2099/konami-command/blob/7b92a6d8b0e681fa21e07e5d472a45ce85a399b7/mb_COLLECTION-HIGHLIGHTER.user.js#L435

    
      
          		if (missingRecordingLevelRels > 0) {
          			modal(true, concat([createTag("code", {s: {whiteSpace: "pre", color: "grey"}}, "\t└"), " \u26A0\uFE0F " + missingRecordingLevelRels.toLocaleString(lang) + " recordings queued for ", createTag("b", {s: {color: highlightColour}}, "delayed work fetching")]), 1);
          		}
          	}
          }
          function browseTrack(track, action) {
          	var missingRecordingLevelRels = 0;
          	if (stuff["artist"]) { addRemoveEntities("artist", track["artist-credit"], action); }
          	if (stuff["recording"]) { addRemoveEntities("recording", track.recording, action); }
          	if (stuff["artist"]) { addRemoveEntities("artist", track.recording["artist-credit"], action); }
          	if (track.recording.relations) {
          		for (var w = 0; w < track.recording.relations.length; w++) {
          			if (track.recording.relations[w]["type-id"] === "a3005666-a872-32c3-ad06-98af558e99b0") {
          				// is a recording of
          				if (stuff["work"]) { addRemoveEntities("work", track.recording.relations[w].work, action); }
          			}
          		}
          	} else {
          		// no recording.relations: when there are more than 500 tracks, the recording-level-rels are not returned
          		if (stuff["missingRecordingWorks"].indexOf(track.recording.id) < 0) {
          			// add each recording to a list for later later work fetch

nadl40 · January 20, 2022, 5:17pm

Did you try concatenating paginated xml’s into 1 that represents all data ? Thinking how to fix my API the easiest way.

outsidecontext · January 20, 2022, 5:29pm

You’d usually parse the received XML, read the data you need (e.g. a list of recordings) and append these to the already loaded ones.

Personally I’d use JSON over XML (with the fmt=json parameter), it is just easier to work with:

https://musicbrainz.org/ws/2/recording?release=1d9867ad-721a-4abb-9a22-41e118531068&inc=area-rels+place-rels&limit=100&offset=0&fmt=json

In Picard’s case we read the release, detect the missing recording relationships, then load all the recordings separately and inject just the relations key from that into the already loaded release data. That way the rest of the code of reading data from the release stays the same.

Writing this reminds me that I need to check whether plugins can access the loaded release relationships. Probably not, which means I need to refactor this again a bit

InvisibleMan78 · January 20, 2022, 5:37pm

Just that I asked it:
Would it not be much better to fix it at the source for all participants?

nadl40 · January 20, 2022, 5:47pm

I don’t think this is a bug, it’s a feature to limit runaway API calls.

Doing separate calls is ok but I try to limit those because of the limits per second against the live server, slave servers do not have this limitation, that’s why I try to get as much data as possible in one API call. Need to adjust how to process paginated release.