Search API for acousticbrainz

kartikgupta0909 · March 2, 2016, 11:45am

The aim of this API will be to allow users to download data or perform search queries. This API should be very simple to use. The user should be able to give values for few of the properties of a song and get the results. For example: songs with a specific key. The results should be in Json, and the user should have the liberty to choose the number of songs they want.

alastairp · March 3, 2016, 10:36am

Search is something that I’ve really been wanting in AcousticBrainz for a long time. In fact, we already made a start on some experiments using elasticsearch: https://github.com/MTG/acousticbrainz-labs/tree/master/search

I don’t think an API is the only output that we should provide for a search project. If we index things like high- and mid- level features (genres, bpm, key, etc) we should be able to generate graphs from this data too. We made some graphs a long time ago with some early data (http://blog.musicbrainz.org/2014/11/21/what-do-650000-files-look-like-anyway). It’d be cool to be able to automatically generate this kind of data in real-time. We had some success integrating kibana into the elasticsearch database to do this kind of thing.

An API would be useful in our Datasets project. We currently have to type in or copy MusicBrainz IDs into a text field to add songs to a class in the dataset. It would be cool to be able to search for a track by its name, or even search for a release or artist to add all matching recordings to a class.
This search could also match recordings which match certain tags. For example, we have tags on recordings in MusicBrainz which look like genres. Would we be able to say “find all recordings tagged with ‘rock’ and add them to the rock class in our genre dataset”?

Have you looked at the output of the lowlevel and highlevel feature extractors? What specific list of descriptors do you think would be a good idea to include in a search index? For each of these descriptors can you suggest one or two examples of why someone might want to search for these values, and what they would do with the results once they have them?

kartikgupta0909 · March 3, 2016, 11:09am

Other than API what I am thinking is that we can do something like a data warehousing, which will give the users a very deep insight into the data of acousticbrainz. I have had a look at the low level and high level feature (rather using similar features for my academic research), and they seem to be perfect for data warehousing. For eg: A user might want to see a view based on (genre,bpm) values. I am thinking of Materialising view selection, wherein only those views are materialized from which all other views can be materialized, and this will make possible for a user to see different graphs based on different features and they’ll have a very wide variety of graphs to choose from. This can be applied to mid level feature as applying them to low level might lead to a very large number of views which might not be even possible for us to store.
The above approach will actually help the user to see data in very different ways and a good insight into our data.
Putting search for dataset feature. This is one of the places where I thought of applying search in. Which creating a dataset, rather than putting MBID’s the user should be able to put some text search from which they should be to choose. This can be extended to all the mid level features. For eg: The user might want to create a class in a dataset where he wants to put all the songs with the key Cm, and another class where he puts all the songs of key Dm. So this will make usage of dataset very easy to use.

alastairp · March 11, 2016, 11:07am

Be careful about suggesting too many project ideas. You have some interesting thoughts, and they also reflect some of the ideas that we have had for the project, but you should give some concrete ideas if you want to discuss this project further.
This could include things like a particular use case of exploring data to end up at a particular value, or a description of the kind of data you want to show (e.g. you talk about graphs but don’t talk about the types of graphs or the data that you want to show).
If you want to continue with this idea, please be more detailed in the idea discussion.