How to get a huge release from API including area and place?

I use this all the time, it’s large xml, you can try a simpler release to see it in the browser.
I dumped to a file and grep it for “recorded at”

curl -s https://musicbrainz.org/ws/2/release/1d9867ad-721a-4abb-9a22-41e118531068?inc=media+recordings+artist-credits+artists+labels+artist-rels+release-rels+url-rels+recording-rels+work-rels+recording-level-rels+instrument-rels+place-rels+area-rels+aliases -O test.xml

grep "recorded at" test.xml

and in a editor

Thank you very much for your confirmation!

That’s weird, because in my test environment (browser) I don’t get the same result, I don’t see any “recorded at” in the ouptut… :thinking:

I used your curl command in my windows 10 command line window and get a 5’740 KB size test.xml file. But there is NO “recorded at” in this file… :exploding_head: I opened it with Notepad++ and even WinMerge.

How can that be?

It seems to be one of those releases with more than 500 tracks for which recording-level-rels will not be returned by the web service: https://github.com/jesus2099/konami-command/blob/master/CONTRIBUTING.md#interesting-mb-test-pages

You will need to make two additional calls, one for /ws/2/area?release=1d9867ad-721a-4abb-9a22-41e118531068 and the second for /ws/2/place?release=1d9867ad-721a-4abb-9a22-41e118531068 with pagination offset.

Not sure about these examples, you should rather have a look at a real example.

Well, @outsidecontext’s Picard example is even better for you, as you want two entity types at once!

3 Likes

Thank you @jesus2099 for this hint!

But @nadl40 explained above, that his/her test.xml contains the “recorded at”.
Do you say, the /ws/2 from a browser call does not return the same as a curl command?
If yes, why does my curl command under windows does NOT get it?

As @jesus2099 said the web service omits recording relationships on releases exceeding a certain no. of recordings for performance reasons. Picard is plagued by the same issue.

One solution is to do a separate query for just the recordings in this case:

https://musicbrainz.org/ws/2/recording?release=1d9867ad-721a-4abb-9a22-41e118531068&inc=area-rels+place-rels&limit=100&offset=0

This will give you the missing relationships. But note that this request is paginated and returns a max. of 100 recordings per call. You need to perform additional queries with increaed offset parameter to get all recordings.

I just implemented exactly this for Picard a few days ago, where it detects if recording relationships are missing and if they do it performs separate queries.

No, you get the same result. The screenshot shown seems to be for a different release, at least I cannot see “St. Jude-on-the-hill” as a recording location for 1d9867ad-721a-4abb-9a22-41e118531068

6 Likes

Thank you very much @outsidecontext for this confirmation.

It’s a load off my mind. I thought I was going mad. :wink: :upside_down_face:

I will search for your Picard solution to get an idea how this pagination thing works.

Update after I read above post: Well, @outsidecontext’s Picard example is even better for you, as you want two entity types at once!


Oh @InvisibleMan78, in fact, when I wanted to get works from a 501+ track release, I made another request with recording batches (not the requests as I showed you in my earlier post, but maybe they work):

But I had to limit the amount of recordings to avoid pagination:

Because there is a bug with random order results:

3 Likes

sorry for the mixed up, my curl command had an -O instead of -o and test.xml was from another release when the command was correct.
As per others, you need to paginate.

For the record - as a side note:
If anyone else should try CURL in a windows 10 command window, you should use

curl https://musicbrainz.org/ws/2/release/1d9867ad-721a-4abb-9a22-41e118531068?inc=media+recordings+artist-credits+artists+labels+artist-rels+release-rels+url-rels+recording-rels+work-rels+recording-level-rels+instrument-rels+place-rels+area-rels+aliases --ssl-no-revoke -o test.xml

Without the additional option --ssl-no-revoke you get an error like

The revocation function was unable to check revocation for the certificate.

4 Likes

In case it helps the changes are at PICARD-2398: Load recording relationships separately for huge releases by phw · Pull Request #2036 · metabrainz/picard · GitHub . It also shows how Picard detects if relationships are present.

But I just realize it isn’t maybe that easy to see what it is doing without a deeper understanding of Picard’s internals, since the actual request URLs are not directly visible in the code changes, as this just uses all the existing query functionality already present in Picard.

3 Likes

Thank you for this link! :+1:

If you don’t mind, I would ask you again if I need further details.

3 Likes

To put it simply, when the first recording you browser has no .relations attribute, it means that you didn’t get any recording-level-rels for this release (today’s criteria being more than 500 tracks).

If .relations is an empty array [], you are not in this case. It’s just that recording has no relationships.

1 Like

Did you try concatenating paginated xml’s into 1 that represents all data ? Thinking how to fix my API the easiest way.

1 Like

You’d usually parse the received XML, read the data you need (e.g. a list of recordings) and append these to the already loaded ones.

Personally I’d use JSON over XML (with the fmt=json parameter), it is just easier to work with:

https://musicbrainz.org/ws/2/recording?release=1d9867ad-721a-4abb-9a22-41e118531068&inc=area-rels+place-rels&limit=100&offset=0&fmt=json

In Picard’s case we read the release, detect the missing recording relationships, then load all the recordings separately and inject just the relations key from that into the already loaded release data. That way the rest of the code of reading data from the release stays the same.

Writing this reminds me that I need to check whether plugins can access the loaded release relationships. Probably not, which means I need to refactor this again a bit :smiley:

3 Likes

Just that I asked it:
Would it not be much better to fix it at the source for all participants? :zipper_mouth_face:

I don’t think this is a bug, it’s a feature to limit runaway API calls.

Doing separate calls is ok but I try to limit those because of the limits per second against the live server, slave servers do not have this limitation, that’s why I try to get as much data as possible in one API call. Need to adjust how to process paginated release.

2 Likes

It’s not a bug, it’s made to limit the CPU and memory consumption on server side.

1 Like

Hi @outsidecontext, I think I have to do this instead of my works search, as it allows for trating all recording relationships, not only works.
I just need works for the moment, but already have another feature in mind to search for certain linked recordings too.

But could you tell me if the /ws/2/recording?release=<release-mbid>&inc= DOES NOT have the same MBS-12154 bug as the /ws/2/work?query=rid%3A<recording-mbid>+OR+rid%3A<recording-mbid>…&inc= ?

Hopefully maybe not, as you are using it in Picard and as mine is a search (query=…) but yours is a browse.

Yes, exactly. If you do a search you get weighted results from the search server, ordered by how the search servers rates the match.

Also as you said getting all recordings for a release is the only way to get all the relationships as you would get them for the release request.

2 Likes