Search results improvements

samj1912 · July 11, 2018, 3:50am

It is possible to control the fuzziness of the searches.
For example if you are not sure whether an artist is Jon or John, you can do something like -

https://musicbrainz.org/search?query=artist%3AJon~1&type=artist&limit=25&method=advanced

Again for cases like Rick, Rich, Ricky, Richy (Just a warning, it will also match things like Dick, Nick etc.)

You want to -

https://musicbrainz.org/search?query=artist%3ARick~2&type=artist&limit=25&method=advanced

Just change the value after the “~” to specify how fuzzy you want your searches.

bsammon · July 11, 2018, 4:54am

[haven’t been following this closely, but quickly skimming this thread I get the impression I should post specific search problem instances here]

Searching for artist/artists named “Clay”:
https://musicbrainz.org/search?query=clay&type=artist&method=indexed
The artists whose name is exactly “Clay” are scattered among the first two pages of results (and further? didn’t check). Seems a definite worsening relative to the old search engine where all the exactly-“Clay” artists would be at the top.

I’m reminded of Google search in recent years – Google seems to prioritize “we guess what you misspelled” over accurate/predictable results for power users who type their search terms correctly.

samj1912 · July 11, 2018, 5:00am

Let me address the issues you have. First of all, I simply failed to reply to your comment as I have been on a vacation since the last 2 weeks and away from my work station. I replied to all the comments I could while on my phone as it suited me. So please have patience and in case you are not getting any replies.

Tests - I agree with you that we lack on unit tests on search results, and yes I will be fixing that. We have focused most of our efforts on getting a version of Solr to production that can provide delta index updates and reduce the load on our database and servers. We planned on fixing search results and relevance once Solr was in production as routine maintenance and this included unit tests. For the other side of things - like the response writer and the index updater - we have ample unit tests which can be found here and here

Using “Solr defaults” -

Search is a difficult thing to get right, and people assume it’s easy since the only benchmark you have to compare with is Google. And they are amazing at this. As I am sure you would know, it’s not that easy. If you find something missing its because we have had to make trade-offs and not because we were lazy-ing around and stuck to “Solr defaults”. Heck, it is anything but defaults. We have an entire repo dedicated just to our own custom Solr configuration mbsssss, with over 100 commits just in the last 6 months. It took a lot of time and effort to get to current standards (as evident in the commit history).

“Not enough time to give feedback” -
I have been working simply on the search results part, since the last 6 months, ever since the launch of Solr on our test website. Infact I asked even asked for help and feedback on our blog back in January. Community members who have been involved with the process know about it and members like @reosarevok, @ApeKattQuest_MonkeyPython, @Freso, @Leo_Verto, @rob @rdswift and others have been continuously helping me with testing and improvements - which let me tell you have improved a lot. (in fact the “!!!” case was reported long before you did https://chatlogs.metabrainz.org/brainzbot/metabrainz/2018-05-18/?msg=4179451&page=2 )

Also to put things in perspective, I am the only person who is working on Search currently and since the last year. It’s very much unfair of you to assume that I can reply to each and every improvement and suggestion.

Which brings us to the last part “Moving things away from the ticket tracker” -

Since you cannot put search results and relevance in absolute terms, where 1 result is entirely right and 1 is entirely wrong - I wanted to funnel all improvements to search results first to the forum, so that experienced users and editors (and people who have a basic understanding of how our search is expected to work - for eg. @reosarevok) can give their opinion on it, and in case if there is a community consensus on it (that it is indeed and improvement that is needed( for eg https://tickets.metabrainz.org/projects/SOLR/issues/SOLR-89) and is not something that users have come to expect as a by-product of the ill-tuned old search for eg - https://tickets.metabrainz.org/browse/SOLR-86 ), we can then make a related ticket on the ticket tracker(as we have done with SOLR-89 to allow for partial matches). This is also the reason I originally closed your SOLR-88 since it felt like a subjective “improvement” and I wanted community opinion on it before filing it as a “bug” like you did - because a side effect of fixing that would be deterioration of results for queries like https://musicbrainz.org/search?query=bach&type=artist&limit=25&method=advanced

Also your reported “bug” is similar to the clay search I posted about below and can be done with https://musicbrainz.org/search?query=“fred”&type=artist&limit=25&method=indexed the following query.

There is another reason for doing so - The discourse account is linked directly with a user’s MB account. As such they can simply use oauth and sign into Discourse. This is in contrast with Jira where a person has to sign up for a new account, which a lot of everyday users won’t take effort to do.

As I wanted wide-spread feedback and a collective place for it, I chose discourse.

This doesn’t mean that our SOLR ticket tracker is not to be used anymore. In fact all the community confirmed search improvements and any and all bugs related to any part of our Solr search are to be filed on Jira (which they are - as you can see https://tickets.metabrainz.org/projects/SOLR/ )

So please ask for information and wait for a reply before making false accusations and spreading wrong and disrupting information in the community about the dev. team. We are an organization that puts community and community feedback above all else and their opinion is extremely valuable to us. Please do not assume otherwise. If there are some delays in response by any of the members of the dev. team it’s simply because we are stretched thin with other more pressing issues.

samj1912 · July 11, 2018, 5:05am

If you want more exact results, simply search with quotation marks -

https://musicbrainz.org/search?query="clay"&type=artist&limit=25&method=indexed

samj1912 · July 11, 2018, 5:08am

Well, as @reosarevok said, it’s for both. You can mark an alias as a “search hint” so that it appears on the search results.

For example see https://musicbrainz.org/instrument/b3eac5f9-7859-4416-ac39-7154e2e8d348/aliases

It includes mis-spelt versions of piano in the aliases filed under “search hint”

samj1912 · July 11, 2018, 5:12am

This is an UI bug un-related to Solr. It was fixed recently - https://github.com/metabrainz/musicbrainz-server/pull/692

aerozol · July 11, 2018, 7:05am

If MB wants to be treated like a hobbyist project then I can see why devs would take negative feedback personally - but if it wants to be taken seriously then it can deal with criticism as a professional organisation, especially with a big roll out.

Asking people to wait an indefinite time to possibly, possibly not, get a reply will periodically lead to frustration. Taking it personally isn’t helping you or the contributor wanting input. A comms policy would be nice…

Freso · July 11, 2018, 8:09am

I think it’s one thing to “deal with criticism” when the criticism is warranted, and another to “deal with criticism” when the criticism is critiquing mute points or making personal jabs (e.g., implying that @samj1912 has been lazy about the development).

Do you feel @samj1912, @reosarevok and others are not responding to actual concerns/critiques in this thread?

Maybe make a new topic with more details about what you’d like to see from this for further discussion. This is definitely way off topic for this discussion here.

ijabz · July 11, 2018, 1:28pm

Thanks for replying.

Just to be clear I have never ever said or implied that you are lazy I know you are probably the most productive of all the MusicBrainz developers. My point was that you had skipped writing comprehensive unit tests and IMO you should have written the tests before release to production not after, ideally you would have written tests as you go along with development.

Regarding replying to comments, on 15th June you announced Search going onto Beta. Then one week later you announced on 22 June that you were happy with Search beta and advised that you were going to release it one week later if no major problems. Then on 30th June it was released. I read the irc chatlogs and in there you said you were way for one week so I assumed you were on holiday for one week, and I also assumed that you would not still be on holiday when search was released on 30th June. So I think it is reasonable for me to assume that you were no longer on holiday as you made no announcement to the contrary, furthermore it isnt very good planning to ask for comments and then do final release when you are holiday, that doesn’t give you a chance to properly consider the requests.

As you say Search isnt easy, and I know because I wrote the one being replaced. My point about defaults was that it only the !!! issue that caused you to reuse the MusicbrainzTokenizer, until that point you were using standard tokenizers/analysers, so I stand by that point.

Okay so you asked for help in January, but that was too early for most testers, it was too buggy at that point for meaningful testing. It was only when you released beta a few weeks ago that Search was ready for the majority users, so my point is almost as soon as you got it working quite well you released it.

Regarding the !!! that was only on 18th May but it doesnt seem it was going to get fixed until I raised it as an issue. And this demonstrates the fundamental weakness of mentioning issues informally/ discourse threads rather than using a bug tracker , they are much more likely to get lost. I dont really think it is unfair to expect a reply to each suggestion I think I managed this when I did search, and I was doing it unpaid in my own time. But I would have been more patient if I was allowed to log them as issues so would not then be concerned they would get lost in thread clutter.

You have just said you closed SOLR-88 because it was a subjective improvement, few things are totally objective but I dont think this is really subjective, people have posted a version of the same issue on this thread, I still find it rude for you to have closed it on JIRA without giving any explanation

I don’t see what is false about my accusations, maybe I shoudnt have used the word ‘bothered’ but you havent written any tests that check the quality search results and you havent done much customization. Also the same people who criticize my tone on discourse are the same people happily criticizing me personally ( and the old search, implied critisism) on irc, so there alot of double standards here.

ijabz · July 11, 2018, 1:34pm

I dont know what you mean about moot points, I was never implying that sam was lazy, only that he had incorrectly not done an important part of development, perhaps in eagerness to release as early as possible.

Freso · July 11, 2018, 1:43pm

This has indeed been the/a stated goal for a while, as the new search performs a lot better technically – it handles more searches, in less time/using less resources. The old search has broken down several times recently and caused issues with the whole software stack (AIUI), so the whole team keeping the servers running have been eager to replace it. As @samj1912 has said, development of the search is not done, but it was (and is) at a state where it is plausible replacement for the old server, which will save/is saving @zas and others handling system administration a lot of headaches and frustrations.

psychoadept · July 11, 2018, 2:15pm

That’s fair, although it still requires adding search hints individually for EVERY Richard/Robert/Elizabeth/etc, whereas a built-in system for catching common name variations would cover all current and future instances without the ongoing need to decide whether a search hint is needed.

reosarevok · July 11, 2018, 2:31pm

I definitely agree that it would be ideal to at least recover the old option to search for “I Surname” to find Initial Surname. Not sure how doable it would be to go further: names can be abbreviated in many ways and I feel it might make more sense to expect the user to search for Bob Tables and Bobby Tables if Robert Tables doesn’t work (and add them as aliases if they’re common enough for Mr. Tables, or really, if the user can be bothered!).

dashv · July 11, 2018, 3:33pm

The tone of this discussion seems to have abated so I will leave that alone. As a user and editor I want to say thanks to the staff for their hard work. Things happen, things will get fixed, life will go on, lets be constructive.

culinko · July 12, 2018, 12:52am

Is this really the intended approach? This issue is the exact same issue as I mentioned above in my post and I would expect not needing to use the quotation marks in this case. Many of the artists’ performance names consist of just one word and I don’t think you should be having to use the quotation marks for every search you perform just so you can have the most relevant results at the top.

Edit: Even with using the quotation marks it doesn’t show the desired results. It’s a bit better, but still not ideal. Afaik the old search would show you the exact matches first, then the rest, without the need to use the quotation marks at all. Could you check what were the results in the old search? This is how it looks currently:

Searching riot:

Searching “riot”:

samj1912 · July 12, 2018, 4:18am

Okay, since there seems to be a consensus that exact matches are more important than “popular” matches, I have updated the artist boosting algorithm to weigh exact matches more heavily.

Try now

However, this means that popular matches are now a bit worse and searching for bach no longer puts him on the first place. https://musicbrainz.org/search?query=bach&type=artist&limit=25&method=indexed

This is what I get without using quotes -

And this is what I get with quotes

However, I think this is a good enough compromise since all the relevant results are shown on the first page itself even without the use of quotes, while keeping popular artists like JSB for “Bach” searches still on the first page.

justcheckingitout · July 12, 2018, 4:29am

But, for the most part, the old search gave those results. I say “most part” because even though I got far too numerous results, there were some instances where a John/Jon/Johnny/Johnnie situation did not show.

And I feel the need to repeat this so that people don’t think that we are complaining or that we do not appreciate the changes – I absolutely love that I search John Smith and do not get 100 unrelated results.

But a few “exact results only” can be just as bad at too many unrelated results.

samj1912 · July 12, 2018, 4:33am

For those wanting to test the old search - it is now available on https://test.musicbrainz.org/

so that we can compare and improve the new search.

samj1912 · July 12, 2018, 4:49am

Just updated the scoring algorithm for this too. Now the search returns more fuzzy results.

Compare
https://musicbrainz.org/search?query=Chris+de+burgh&type=artist&limit=25&method=indexed

vs

https://musicbrainz.org/search?query=Chris+deburgh&type=artist&limit=25&method=indexed

culinko · July 12, 2018, 4:51am

Thank you for improving the search results, the work you do is greatly appreciated.

However I have to disagree with

I still think you should get all of the exact matches before any of the partial matches (interesting to see “Riot V” still above the rest of Riots even when using quotes). The search algorithm is also used when adding/editing entities and the search area only shows the first 10 results until you click “show more”. Here is how it looks now:

I can usually find the artist I am looking for within the first 10 results unless they share the same name with many other artists. The issue I have is that using the quotes for almost every artist I will be searching seems to be a lot of extra work which I wasn’t required to do before. I am still curious how did the results look in the old search in the case of the artist Riot. Is there a way for you to check that, please?