Hi guys, I’ve been having a look at the v2 API (it’s awesome!), but can’t help feel there’s an issue with the scoring. When running the following search to find Nirvana’s Nevermind:
The API scores it joint second after “Nevermind Sessions” (100) and on par with “Nevermind, It’s an Interview” (91). Both of those are by Nirvana, but obviously not the most relevant.
At first glance it looks like a document length normalisation problem, as the longer documents are ranked higher, but the fact that Nevermind is ranked as 91 and not 100 makes it seem like there’s something else up with the algorithm. I’m not familiar with the Lucene scoring algorithm so I can’t really comment further, but is this not considered a fairly big issue?
From a programmatic point of view, it seems pretty beneficial to have this scoring as accurate as possible, otherwise the result set needs to filtered again on the client side to determine which result is actually most relevant.
Apologies if I’m ranting nonsense, I’ve only been poking at the API for a couple of nights.
Edit: It’s been brought to my attention that the following search gives a better result:
But the above question still stands, just to a lesser extent I guess; Shouldn’t the query match the document more accurately?