AcousticBrainz: Making a hard decision to end the project

Hi everyone,
Thanks for your interesting discussion on AcousticBrainz over the last few weeks. We’ve written a blog post outlining some of our reasons for shutting down the project, the final steps that we’re taking, and a few ideas about our future plans for recommendations and other things in the MetaBrainz world.


So what will happen to AcoustIDs now that AcousticBrainz is no more? I still rely on them when deciding to merge recordngs.

AcoustID is a separate project and unrelated to AcousticBrainz. It is also not a MetaBrainz project and is operated separately, only closely related. Lukas, who develops and operates AcoustID, was the main developer on Picard 1.0.


Okay so this is rather disappointing, my first question would be that since this a MusicBrainz open source project isn’t the first step for you to discuss the issues and then see if the community could help resolve these issues rather than just announcing out of the blue the project is going to be shutdown.

The most novel thing about AcousticBrainz was that a dataset was being created of acoustic characteristics of music and this was accessible to users without them requiring them to actually have the music.If this is removed then applications will just go back to calculating the data locally and not sharing that information as I think you are suggesting with Picard.

The musical key data that we were generating was accurate on some styles of music, but not on the full range of music that we collected in AcousticBrainz. The BPM tools work well on a wide range of music, but there are many recordings for which the predicted value is incorrect. The data that is generated by these algorithms is unable to indicate a confidence level of the predicted value, and so we are unable to determine which data we can trust.

Just because the data is not perfect is does not make it useless, it seems essentially the project is being closed down because the data is not good quite enough and Universitat Pompeu Fabra is not interested in using it anymore. But perhaps the algorithm can be adjusted to improve the situation, and if it works for most popular music then that is good enough for many scenarios, in fact it sounds like you already know how to fix somethings based on your future work comments

On of the problems with AcousticBrainz was that a data download has never been provided, it it had been it would enable users to more easily work with the data and clean up the data (as I have done with Acoustid removing about 100,000 records from the Acoustid database). And there has always been a distinct lack of communication with it seeming that AcoustBrainz is just a funnel for providing data to the University with no real engagement between AcousticBrainz and MusicBrainz in general.

So why not open it up now, of course if you dont want to work on it yourself that is fine but perhaps someone else would like to. Perhaps the data could be moved to a different server if MusicBrainz no longer wants to host it. After all the hours that users have put into processing their music files it doesnt seem right to just shutdown a project like this

Also, personally I don’t understand the fascination with music recommendation whether it based on the music sounding similar or based on users listening habits so the ListenBrainz project is alot less interesting to me that the AcousticBrainz project. Being able to group songs that I already like by Bpm, Mood ecetera is more useful than getting recommendations for bands that I dont know and probably wont like that much even if I do listen them.


This is absolutely true, and was the main motivation for starting AcousticBrainz in the first place. However, if you’re unsure that the BPM is correct, if the key is wrong, and if there are no clear models to identify other musical characteristics such as genre and mood, then what value is this information?

To be clear, this was not a decision by UPF, it was made by MetaBrainz. As we mentioned in the blog post, we don’t have the skills within MetaBrainz to make these improvements to the algorithms, these small improvements often take months or years of work by researchers who have years of experience in a small narrow field such as BPM detection.
So for example, in this case we don’t know how to improve BPM detection in general, but we do know that there has been work in the last 5 years that has improved the current state of the art, and we’re thinking about how we can take advantage of this research to get musicbrainz users to help us to build a smaller database of useful information.

Yes, you’re right here. I know that many people have asked for dumps and even though we said that we were working on it, it was something that we never managed to get working in a good way, and I’m sorry that we weren’t able to get this to people when they asked. We did spend time getting the bulk API working quickly, and it looks like many people are using this, although I know that this isn’t useful for all possible tasks.

This is definitely an option. If a third party wants to host a version of AcousticBrainz then the source code is available, and as we mentioned in the blog post we will make a dump available.


When do you think a dump would be available, it would be great if we could have it soon as I would like to do a bit of analysis on the data we have and see what could do with it, for example look at how much disparity between datasets that are meant to be for the same recording

Is the source code for calculating the bpm part available, if so can you point me towards I probably wont understand it but would like to have a look anyway.

If there are other tools that can calculate bpm more accurately then I still think there is worth in adjusting the acousticbrainz submitter to the better bpm calculation and having submission to a central point

Yes, that’s part of the plan. The details need to be defined, but as mentioned in the blog post finding ways for reliable BPM calculation is one of the goals. There have been initial discussions already about the possibility to integrate the necessary libraries into Picard and to use this as a GUI client for users to generate and submit this data (like it was also intendet with the AB client tools).

The quote below from the blog read to me that Picard would provide a tool to calculate BPM, but it would just be to allow users to locally tag bpm for their files. It didnt sound like the BPM would be submitted to a central location, have I misunderstood that, and if it was to be sent to a central location surely the obvious place would be a modified version of AcousticBrainz ?

Use some improved tools to compute specific musical characteristics. We have been reviewing some of the recent work in tempo estimation and are looking to see how we can integrate it with tools such as Picard so that we can allow people to compute these features if they need them, and help us confirm that the computed data is correct.

I hate to see this being shutdown. I rely on this source for my ever-growing music collection

Might I ask which information exactly you had been using from AcousticBrainz, and how well you considered the data quality?

Here’s a sample of the information.

I am satisfied with the data quality.
I’d rather have what is supplied from AcousticBrainz than not have the metadata I have supplied now.

Note: Some information is using other plug-ins but most are pulled from AcousticBrainz.

Are you thinking of MusicBrainz in general? The only data I see there that comes from AcousticBrainz is key and bpm.

Yes. I use the metadata for key and bpm.

I’m hoping to work on it this month.

This is using essentia’s RhythmExtractor2013 algorithm (the “degara” variation, source code). There is a tutorial of how to use this in python.

Right - to follow up @outsidecontext’s comment here, the idea that we had discussed was to use picard to submit this data back somehow. We haven’t yet worked out exactly what this looks like. Perhaps it’ll be integrated into MusicBrainz or it might be part of a rebooted AcousticBrainz. This implementation detail isn’t sorted out yet.


Thanks for the description of what data you are using. Do you often use this data to filter or search within your music collection? How often do bad results cause problems with your search query?

Worth noting I also use AcousticBrainz to store other data such as to identify if music is instrumental but i havent done more than a cursory check of the results.

Been thinking about this some more, there are two elements to the AcousticBrainz project:

  1. A central database of musical charateristics.
  2. Tools to submit the data

Point 1 works okay, the only real problem is the data needs to be cleaned, where you have multiple submitted data for the same track then needs rationalisation to choose the best value. This is one reason why i would like to a download so I can properly look at this.

Point 2 sounds like the algorithm is not reliable, but elements of it work most of the time, but it could be decoupled from 1 and be incrementally improved. E.g another tool could be used to generate bpm and then the bpm value for a particular mbrecordingid could then be updated without affecting any other data.

So I think the project should be continued, but it is clear the MusicBrainz team is more interested in this recommendation idea at the moment.

If the alternative is that acousticbrainz project just dies what I could do is host a version of the project that allow new data to continue to be submitted and makes the data available to download. There would be no cost but I wouldn’t provide an api for getting data because the bandwith costs would be too high, instead consuming applications could periodically download the database from the site instead and provide for users from own server.

1 Like