GSoC 2017: Search for Entities

musicbrainz
gsoc
gsoc-2017
Tags: #<Tag:0x00007f2a1116dd38> #<Tag:0x00007f2a1116dbf8> #<Tag:0x00007f2a1116dab8>

#1

Personal information

Nickname: ListMyCDs.com
IRC nick: ListMyCDs
Email: musicbrainz@listmycds.com
GitHub: https://github.com/listmycds

Proposal

Current search function for edits is cleverly done because you can easily make complex searches without having any knowledge about the system behind it. There’s no need to have special IT-skills or knowledge about the search engine syntax because it’s mostly self-explanatory. You can also easily share url to your search. I’ve been lately wondering could we have a similar search function for all our entities. I’m proposing a replament for the current entity search (the one from the search-menu).

For example searching for works could look like this:

or searching for releases:

Dynamic form takes care that search conditions are only related to selected entity. I plan to include most (if not all attributes) of the entities in MB and make searching of them possible. There would be also option “All Entities” for searches related to all entities (for example url).

UI (using React) would be following the similar style as the edit search is already having. Query would be constructed based on user selections on the form and then queried from the Solr search server.

Even though I believe this search function is quite self-explanatory it wouldn’t hurt to have wiki-pages with some pictures explaining the functionality. What is self-explanatory for me might not be that for everyone else. Creating these wikipages is also part of this proposal.

I plan to work closely with the community and make changes based on a feedback when necessary.

Timeline

May 5 – May 29
During the community bonding period I would get familiar with the codebase and solve some tickets. I would have discussions with the mentor to be sure that we both understand what I’m planning of doing. I need to be ready when coding officially starts. I would bond with the community and start participating weekly developer meetings.

May 30 – June 11
Search for works.

June 12 – June 25
Search for release & groups.

June 26 – July 2
Search for artists. Phase 1 Evaluation.

July 3 – July 16
Search for labels.

July 17 – July 30
Search for places, recordings & standalone recordings. Phase 2 Evaluation deadline.

July 31 – August 6
Search for series.

August 7 – August 20
Searches for “all entities” (url, annotations, mbid…).

August 21 – August 29
Final mentor evaluation. Will keep this as a small buffer if there’s any delays. If everything is already done I will keep fixing some tickets.

Detailed information about yourself

  • Tell us about the computer(s) you have available for working on your SoC project!
    3 workstations, 2 laptops, 3 tablets, Raspberry Pi 3. For SoC I would use Phenom II X6 1090T (6 x 3,2Ghz), 16 GB RAM, 200 GB SSD, 12 TB on multiple NAS-drives.

  • When did you first start programming?
    I was about 7 years old when I got my first computer (Commodore SX-64). Computer magazines where then full of basic games and I typed many of those and also wrote my first “Hello, World!”. I had some fun with Pascal (later with Delphi) during 90s and started this century with some Java. I’ve mostly coded some simple stuff for my own use plus couple of minor websites which aren’t online anymore. As part of my bachelor’s degree (Computer Sciences/Turku University) I’ve mostly used Java. Now during my Master’s degree (Data Sciences/Turku Univeristy) I’m mainly writing Python.

  • What type of music do you listen to? (Please list a series of MBIDs as examples.)
    John Williams (53b106e7-0cc6-42cc-ac95-ed8d30a3a98e), Jean Sibelius (691b0e9d-9e57-41cf-932d-a3d21b068e75), Kurt Atterberg (0b607587-9e4e-4ff9-b200-af563678ae2f) to name some of my favourites. You can spy me at Listenbrainz. I mostly listen film scores and classical and attend concerts frequently (I live next to concert hall). Sometimes listening some Jazz and while being drunk I accept most of the musical styles.

  • What aspects of MusicBrainz interest you the most?
    I’ve done almost one million edits on MB and use my time daily on the site. Still I’m not sure if I got one major interest and can’t name any aspects. I like editing for sure. And I should have a T-shirt with “I love relationships” xD

  • Have you ever used MusicBrainz to tag your files?
    I have about 130 000 FLAC files tagged with mbid.

  • Have you contributed to other Open Source projects?
    This would be my first time with any open source project.

  • If you have not contributed to open source projects, do you have other code we can look at?
    Hopefully these snippets give some idea what I’ve been doing lately: [1] [2] [3]

  • What sorts of programming projects have you done on your own time?
    Most of the programming has been related to my studies.

  • How much time do you have available, and how would you plan to use it?
    I’m planning to use about 40 hours per week (need to have some time for editing too!) but but when necessary I can easily do more than that. If there would be any delays with the planned timeline I would naturally do some extra hours.

  • Do you plan to have a job or study during the summer in conjunction with Summer of Code?
    If being selected this would be my only activity and I wouldn’t accept any other jobs during the project.


#2

Couple of words outside the proposal:

I’ve installed MB server to my dedicated server (I love those cheap Kimsufi servers!) and as part of my studies I’ve used private git-repository so I already have some little experience working with git. I believe I wouldn’t have any major problems with the development tools. Will fix couple of tickets during the next week but I’m now just too sick to do that.

Quite often I have reported bugs which could have been fixed in couple of minutes by someone with the knowledge of the system and all the tools available. Hopefully soon I would have enough knowledge to fixing minor bugs by myself. I’m also really interested about AcousticBrainz and hopefully soon skilled enough (my studies about data analysis have just started) for toying with the data.

Discussion with the community isn’t limited to discussion about this proposal. I’m open for feedback and discussion also during the development process and after it. I hate to leave this proposal so late but I’ve been so sick lately that couldn’t have done it earlier.


#3

I like the idea of the proposal, and I’m happy to see a longtime editor apply for GSoC and maybe becoming a volunteer developer as well!

You are aware that MBS is a Perl codebase?


#4

Sure. I have almost zero experience with Perl but before posting my proposal I did check how the current edit search has been implemented with it. I’m sure I will survive even though there’s a lot to learn. I guess I should have used the word “mainly” instead of “only”. I originally ment to say that with studies I’m only using Python. Will change that word on proposal.

Thanks for the feedback!


#5

Thanks for your proposal. The entity search deserves to be more accessible indeed.

You are aware that the search server is going to be switched from Lucene to Solr?

Actually, it is the outcome of a previous GSoC. I don’t know the exact implications of it, if any, for your proposal, but it should probably be discussed with @Gentlecat.


#6

Thanks for your feedback!

That’s what I’ve understood based on earlier searches to IRC logs. Because I’m not aware of the timetable of this I proposed direct database access for implementation (because it would work with the current and the future systems). If necessary I’m willling to make necessary changes to make it work by using Solr. I’m also willing to work with Solr configs if some necessary entities or attributes aren’t currently being indexed with it. I believe current Lucene query syntax can be used with Solr so we can keep using the same syntax if necessary for indedex search.


#7

Thanks for the proposal! I like the idea of course, but I’m worried it’s far too similar to a previous GSoC project that was completed in 2015 by Ruchiranga Wickramasinghe (but hasn’t been merged yet):

https://tickets.metabrainz.org/browse/MBS-7789

In that project’s case, the UI is different from what you’ve proposed and it queries the search server rather than the database directly. But, it already provides a UI to set multiple conditions and select type information. We wouldn’t want to build a whole 'nother one of these pages that mostly does the same thing.

It would definitely be a project of GSoC-size to complete the above project (mostly cleanups, missing functionality, and usability issues), add a direct-search backend so that it works without the search server, and change the UI to be closer to the edit search — that’s a lot of work. But I’m also not sure if it makes sense to implement a direct SQL backend at this point, when the new Solr-based search server will have nearly live index updates.

That said, I wouldn’t be opposed to a proposal that aimed to complete the previous project and changed the UI. Note that this would require some knowledge of Perl, but mostly a lot of JavaScript (including the React library). How is your JavaScript knowledge?

It would also be great to hear from you on IRC in #metabrainz if you want to learn more about the previous project and better understand the status/requirements (I’ve already done some work on it myself, which you would take over).


#8

After sleeping over it I decided not to make a new proposal about completing the earlier search project.

Instead of highly motivated summer with my own creation I would most likely be just frustrated with this project. Seems it could take the same amount of time fixing a work by someone else than what it would take for me to write something completely new. Without a high motivation the end result would be just good and not excellent as MB would deserve.

I updated couple of words on my earlier proposal: search would be made with Solr and React would be used for UI. I understand it might have a small change of getting accepted but will try my luck with it. Instead of using your developer resources for the earlier project you might decide to have another one sponsored by Google :wink:

Thank you all for giving me your valuable time!


#9

Thanks Timo. :slight_smile: I understand your feeling unmotivated to complete someone else’s work BTW, but I also communicated that poorly. 90% of the previous project was UI code, and since you’d be replacing the UI, a lot of it would be your creation.

But, there are parts of the previous project which you shouldn’t rewrite. Mostly the boring parts. For example, the React components for the actual search results listings were already written, and they’d be the same in either project. The components themselves are quite isolated and easy to reuse.

Otherwise, even the code that constructs the actual queries (whether SQL or Lucene) probably makes sense to write from scratch in your case, since the fields would work completely differently. I’m only saying you should keep the parts that you can.


#10

No worries, I also failed to ask the right questions. With a given time frame I didn’t have time to get a full picture of the earlier project. If I’m being accepted I will make a complete analysis what has been already done and detect re-usable parts. I believe I could also learn from some earlier mistakes. I’m sure we could decide to have a well working compromise between my and earlier proposal.


#11

Yeah, I like what I am reading, I think this is a good agreement to reach. Onward!