How can I get the same results in the API as in the web interface?

obruchez · March 24, 2017, 2:36pm

Sorry for this long post.

Let’s take two examples: “miles davis” and “kenny garrett”.

https://musicbrainz.org/search?query=miles+davis&type=artist&method=indexed

This returns the “real” Miles Davis (score 100), then a relatively unknown bassist also called “Miles Davis” (score 96), and then all the groups including Miles Davis, with scores <90. This is good.

If I call the API:

http://musicbrainz.org/ws/2/artist/?query=miles%20davis

The order is about the same, but the score of the second result is 93 (not 96). This is not a problem.

Now let’s try with “kenny garrett”:

https://musicbrainz.org/search?query=kenny+garrett&type=artist&method=indexed

The “real” Kenny Garrett is first (score 100) and all the other results have a score <50. This is good.

Same result in the API:

http://musicbrainz.org/ws/2/artist/?query=kenny%20garrett

Good.

Now, let’s suppose I made a typo (“kenny garret”):

https://musicbrainz.org/search?query=kenny+garret&type=artist&method=indexed

I still get the good Kenny Garrett first (score 100) and the rest with scores <=52. Good.

But in the API:

http://musicbrainz.org/ws/2/artist/?query=kenny%20garret

I get Kay Garret first (score 100) and Kenny Garrett comes way down with a score of 52.

Let’s try with a fuzzy search:

http://musicbrainz.org/ws/2/artist/?query=kenny%20garret~

Kenny Garrett comes first (score 100). Good! But the second result is an artist called GARREN with a high score of 73. Weird. Let’s remove the typo and search for “Kenny Garrett” again:

http://musicbrainz.org/ws/2/artist/?query=kenny%20garrett~

Kenny Garrett is first (score 100) and the second result is an artist called JGarrett with a high score of 78.

Now back to Miles Davis:

http://musicbrainz.org/ws/2/artist/?query=miles%20davis~

The “real” Miles Davis still comes first (score 100) but the bassist also called “Miles Davis” comes third with a “low” score of 62, which is very different than the score of 96 in the web interface.

And if I prefix the search query with “artist:” or “name:” I get completely weird results:

http://musicbrainz.org/ws/2/artist/?query=artist:kenny%20garrett~
http://musicbrainz.org/ws/2/artist/?query=name:kenny%20garrett~

JGarrett comes first (score 100).

For me, the most intuitive (“correct”) results are the ones I see on the web interface. How can I call the API to get the exact same results?

Bitmap · March 24, 2017, 5:07pm

Yeah, if you don’t enable advanced query syntax in the web interface (which allows you to specify fields), it informs the search server to perform a “dismax” query, which is a fancy internal method that searches across multiple fields using different weightings.

There’s an undocumented way to enable this through the web service, but the usual caveats apply to using undocumented features: http://musicbrainz.org/ws/2/artist/?query=kenny%20garret&dismax=true

I’m not sure why it was decided to not enable dismax by default in the WS if no fields are specified (that was before my time), but I guess at this point it would contradict our API docs: search for “Query terms without a field specifier” on MusicBrainz API/Search - MusicBrainz Wiki to see the promises we make there (as much as I doubt many people rely on those behaviors).

obruchez · March 27, 2017, 7:24am

Thanks for your answer. This helps a lot!

reosarevok · March 27, 2017, 7:37am

Is there a reason why this is undocumented/unofficial?

ijabz · March 27, 2017, 9:31am

Dismax search didn’t originally exist, it was added later to provide better results for the website search. It wasn’t really considered it would be used via the api since it doesnt allow any choice over what fields can be searched