With the recent launch of Solr to beta (read all about the improvements here - MB Search Overhaul) , I received a lot of tickets on the Solr ticket tracker on why a particular search doesn’t return results as expected.
Since I am expecting a lot of these, please remember we cannot possibly cover every case. But, what we can do is try to fit our search config to perform as good as possible, as such I am creating this thread as a one stop discussion/request thread to improve upon boosts.
To help, please post your search improvement requests in the following manner -
Link to the beta site search with the query (To see Solr’s results)
Link to the main site search with the query (To see our old search servers results)
Expected result.
Explanation to why you think your expected result should have a better score. ( Answering questions like - is your query in the entity’s alias? Its comment? Its sort-name?)
The above format although not strictly necessary will be really useful for me as I can debug things a lot faster and improve upon our search results.
@reosarevok and @Freso will be helping me sort out any editing specific doubts and help me bridge the gap between the search server code and how you as an editor expect it to work.
The search results for works are less relevant on the beta server with the new SOLR search.
When searching for “aria Final Fantasy VI” I want this work, but instead I get a list of works that are far from what I am looking for.
Compare the results: Search results - MusicBrainz Search results - MusicBrainz
The way this is now on the beta I’m less likely to find the work I’m looking for. In this case I am counting on the search to look at the disambiguation as well.
There is no aria final fantasy vi in aliases.
There is aria alias but final fantasy vi is a parent work.
You would like parent work titles and aliases be taken into account in work search too?
I like it that the results are now more specific to the search, personally, it is why I used to use the direct search often.
I just searched artist index for fred _and was a bit surprised it return _Fred Astaire, Fred Frith, Fred de Fred and Fred Steiner before any artists just called Fred
I know it was decided to boost popular artists over less popular ones (which doesnt seem very musicbrainz which is why I never did it), but shouldn’t exact name matches should come before partial matches (a search for Fred is a better match to Fred then to Fred Astaire)
Searching for “XXme” doesn’t find an artist entered as “XX:me”. Not really sure if that’s a major bug or something easy to fix, but I imagine that kind of behaviour will fool people into believing that the thing they’re searching for doesn’t exist in the database.
Also, this should probably be updated at some point.
While I felt the old search was a bit odd to use, giving me dozens (sometimes hundreds) of unrelated results, including results that I can’t even figure out how or why they could be included…
The new search could maybe be expanded a little. For example:
I searched confederate railroad, but I misspelled it. My search, confederate railrold, gave me zero results.
While zero results is technically correct, a loosening of the criteria could be useful. Especially when considering simple name variations (john vs jon or smith vs smithe).
Something is wrong is paging. If you run the query above and try to switch from page 1 to page 2 and back several times, you will see that content of each page is not permanent. Or you can simply refresh the page several times - in 50% cases it will display different records.
Have you got some tests for this new search. If not you really need some otherwise you’ll find that as you try and improve things for one type of search you will inadvertently make the results worse for another search.
I am not sure what’s wrong with giving correct results? The de burgh search finds more results simply because of the fact that those results match the terms. The correct result is still number 1 either way.
The de Burgh search appears to return any name which includes a portion of my search string - unless I enclose it in double inverted commas. That’s probably what it is programmed to do. I have no problem with that.
What I found strange was that it gave me just the correct name when I searched for what was in effect an incorrect name.
My local tax bill hasn’t been paid for years. They keep sending the tax bill to the wrong person. A simple misspelling of my name voids my obligation to pay them. Correct address. Phonetically acceptable version of my name. But it is spelled wrong.
Correct results are awesome.
But a simple spelling discrepancy between de burgh and deburgh shouldn’t be excluding other spelling discrepancies.
Think about it; if I search the spelling included on a news article, and I get no results. My first instinct is not to try other spellings since I am reading a news article that has that spelling. I am going to add a new artist entry.
But, if I got a list of results that are not exact, but are at least close, my first instinct is to open them to see if there could be a misspelling - especially if some of the other information is filled in (disambiguation, dates, places, etc).
In my ideal world, at Big Rock Candy Mountain, I can set a sliding level switch anywhere from “exact search term” to “very very fuzzy search term” and have the results morph in front of my eyes.