Discontinuation of AcousticBrainz (pre-announcement)

Thanks, just canceled a reply I was making. I read through the log and hope for a good summary of the project and possible future plans? I would have no problem in the future resubmitting data from a newer/better version (if it happens). I have read the here about the processing MrClon has done and I think there is a commitment in the community to collect/process this kind of data. I just started on it later than others.

If there are future plans then I would like to see discussed, GPU support and affects on data by lossless vs loosy (at different bit rates).

1 Like

I hope talks about new better AB will not be just talks. I believe tracks features database essential for good recommendation system, and good free recommendation system needed for some big tech corporation didn’t monopolize music


We’re not planning on replacing AB with a new project that tries to extract data from the audio-streams. Instead we’re going to focus on mining/collaborative filtering more data from the LB listens as well as collecting some data in a more traditional manner (e.g. moods in the style of tags/genres).

What kicked off this whole thing was that I did a proof of concept of recording-recording similarities that I wrote based on user listening history. Early testing is happening here:


This is bleeding edge, really. Working on it when not posting here. :slight_smile:

It’s not sustainable for recommendation system. No one listened new track → no one got this track recommended → no one listening track. If user listen unpopular music (e.g. some local, non-english, scene) what collaborative filtering can offer him? What profit can it take from him? Why such user should continue/start submitting listens to LB? It’s trap of «1000 greatest hits of all time»

Hard way to get small quantity of low quality data. Unless you have some really smart plan how to organize mass for this job. I think manual feature extracting can be productive only as data for (re)training/correcting automatic extraction. I think perfect option is collect some kind of low-level audio data and extract hi-level features from it by iterative improvable (by users manual input) algorithm. not sure if it possible without storing music files on MB servers


Just tossing my pennies into this fountain (although I feel we’ve drifted from the original post), could this data not be useful within the listenbrainz project?

Yes, this is a good point and is definitely one of the values that content-based recommendation can bring. Unfortunately at the moment we don’t have the resources or research tools available to produce such a system that we’re happy with distributing.
We know that the MusicBrainz community has a strong history of contributing metadata about music, including for less common stuff. We hope that we can rely on this community to help us build a good initial set of data. Perhaps in the future we can take this data and see what else we can do with it.

Part of the limitation of the current version of AcousticBrainz is that much of the data is also of low quality. For example, we have only a few models that work with “genre”, and they all have a very small number of categories. In comparison, we were able to tag over a million new recordings with hundreds of genre tags by collecting user-contributed data, so there is definitely value in this approach.

Yes, absolutely. Part of the wind-down of AB will involve seeing which parts we can re-use in other MetaBrainz projects, and ListenBrainz is definitely a candidate to receive some of these parts. It is also being used as a source of data for similar tasks, see for example the preliminary discussion about recording similarity using ListenBrainz data

1 Like

Do you plan to start using streaming sources to extract music characteristics?

It could be useful in a number of MetaBrainz contexts as well as other people. Once we get these debugged to our liking, we’ll release public dumps.

No, we’re not going to work with raw audio anymore – it isn’t part of our core skills. We’d hoped to harness what music researchers created and scale that up, but all the algs we were given produced nothing of use. To fix this properly would be a huge investment, which we dont have.

You guys know how to make a sad music nerd smile :grin:

Thanks for not just binning the data off

Which kind of contribution do you expect from MB community now?

1 Like

Adding data to MusicBrainz, writing reviews on CritiqueBrainz, submitting listening data to ListenBrainz. :slight_smile:

1 Like

The question is, what is the competitive advantage of MB compared other sites? Why should people use MB and why should people evolve as contributor to keep MB DB up to date and appealing for the sponsors?

As an example, contributors use Discogs for its marketplace. Discogs is used by people who find Discogs content in search results and can use this information through its user-friendly website; MB basically doesn’t appear in search results and the issues with its user interface has been discussed in several threads.

Discogs is losing its grip as the market moves from physical to digital media. Streaming platforms are feed by labels and artists in real time. People can find and (most important) listen to music supported by the recommendation engine and news based on statistics, preferences, playlists, and behavior of similar users: is there anything like this planned for MB?

In MB you have different platforms to manage preferences and listening statistics without any way to make them interact up to now: is this going to change?

Will the listening data be sent using the MB player, adding complexity and losing the features provided by the original streaming platforms players? Or can the use of derived data provided by users who agree to interconnect MB allow for better performance than the original sites?

Working on original data with AcousticBrainz could have been a differentiator that motivated people to contribute to MB: now I don’t see a new one.

1 Like

For myself, one particular reason is that the data here is open source and free. Unlike Discogs, AllMusic, Last.fm or any other database the data here isn’t locked away and kept away from people. People are actively encouraged to contribute in the name of spreading knowledge (which in my opinion should always be free), making applications that can benefit from this knowledge and not be at the mercy of some shareholder who hasn’t made enough $ that month.


This is our common goal, but in order to be able to provide a reliable source of information, it is necessary to attract enough contributors to keep the DB up to date; otherwise, the example of AllMusic you mention is what we should expect: an unreliable source of information not useful for any serious application.


I actually don’t think AB itself was a reason for people contributing. It probably even wasn’t well known. For most users (even those actively contributing data to AB) it did not offer any real benefit, yet.

What I think was a big driver in the past was the ability to tag the own music files. And early on also to have a way to get metadata for CDs (after all MB was created as a replacement for freedb). Both are declining I think due to the rise of music streaming.

What I see currently as the most promising new motivator for contributing to MB is ListenBrainz. Having the ability to record the listening activity independent of a single service and getting music recommendations based on this will be an important thing. Especially if this gets integrated into players and music servers.

And from the work being done by the LB team and how they are pushing things I think they agree with that.


I think you’re both right. I think there are two major non-corporate MB users that often but not always overlap.

The first would be self-hosters who were never going to move entirely to music streaming in the first place. Some are too picky for streaming services alone to be worh it (audiophiles, obscurophiles…), some hoard music (distrust of streaming services, a general interest in archiving…) and some are both. If you have a big collection of audio files sitting on your HDD, AB was a way to make them useful.

The other like free and open source data and software. I would bet that most of the first group also falls into this group, but people who stream most or all of their music can still fall under it. Some will only add what they need to improve the quality of what they listen to, some will add things to contribute to open data for its own sake.

Folksonomy style moods and descriptors like RYM is a good addition, but AB (in theory) gave the first group a way to make fuller usage of an asset they have. Maybe the best answer would be a less ambitious acoustic tool that focuses on things that are immediately useful for a music collector while still being of interest to a database, like BPM for DJs and dynamic range for audiophiles… I could see Steve Hoffman forum users adding a lot of releases so that they could easily compare dynamic range calculations between old CDs and the newest remastering.

Sorry for rambling, maybe there’s something insightful here.


Can you explain this statement a bit more please? I have a big collection of audio files and I don’t see how AB made them “more useful”. Also I’m not aware of any music player software that pulled information from AB.

1 Like

I don’t mean that AB made the collection more useful to the person who has them, but that it let them contribute something to the public interest when they would have otherwise just have been sitting there.

Umberto Eco talks about how you should collect books you will may never read just in case as a means to store potential knowledge. Somebody might have terabytes of albums they will probably never listen to. They put in the minimum effort to add the releases to MB so they can be tagged and stored properly, but they may never listen to them or add more detailed metadata. Something like AB allows them to immediately contribute more information to public knowledge with something that would otherwise just sit there, possibly forever.

1 Like