Discontinuation of AcousticBrainz (pre-announcement)

Just so you know, the plan is to discontinue the AcousticBrainz project in the near future. Likely not worth the effort to continue to add scans.

1 Like

Okay, why is that happening. Is the data not worth it? As I have my trigger finger on the “stop” button!

Why? Where this plan was announced/discussed?


I’m not sure, but I believe the data AcousticBrainz calculated from the files turned out to be less useful than hoped. So I guess the question is whether the analysis can be improved and whether that’s worth the effort.

@rob or @alastairp can explain better, but I believe that’s the case.

It was discussed in IRC, 3 or 4 days ago. See around this part of the log.

1 Like

Is it possible to get a definitive answer to the project termination? I have Googled and found nothing about this. I believe in doing my part to help generate data like this but I do not want to continue if it is going to waste? I was getting ready to put my 3-cpu backup machine on line to do this processing.

Yes, the project is going to be terminated – we haven’t really gotten around to making an announcement yet since we haven’t finalized our plans on how to do this. So, yes, hit that stop button. Sorry, that we are doing this, but we realized the project was a dead-end for us.

Alastair, the project lead, or I will post some more details later. Please stay tuned.

Thanks for everyone who helped, but it just didn’t pan out as planned.


Thanks, just canceled a reply I was making. I read through the log and hope for a good summary of the project and possible future plans? I would have no problem in the future resubmitting data from a newer/better version (if it happens). I have read the here about the processing MrClon has done and I think there is a commitment in the community to collect/process this kind of data. I just started on it later than others.

If there are future plans then I would like to see discussed, GPU support and affects on data by lossless vs loosy (at different bit rates).

1 Like

I hope talks about new better AB will not be just talks. I believe tracks features database essential for good recommendation system, and good free recommendation system needed for some big tech corporation didn’t monopolize music


We’re not planning on replacing AB with a new project that tries to extract data from the audio-streams. Instead we’re going to focus on mining/collaborative filtering more data from the LB listens as well as collecting some data in a more traditional manner (e.g. moods in the style of tags/genres).

What kicked off this whole thing was that I did a proof of concept of recording-recording similarities that I wrote based on user listening history. Early testing is happening here:


This is bleeding edge, really. Working on it when not posting here. :slight_smile:

It’s not sustainable for recommendation system. No one listened new track → no one got this track recommended → no one listening track. If user listen unpopular music (e.g. some local, non-english, scene) what collaborative filtering can offer him? What profit can it take from him? Why such user should continue/start submitting listens to LB? It’s trap of «1000 greatest hits of all time»

Hard way to get small quantity of low quality data. Unless you have some really smart plan how to organize mass for this job. I think manual feature extracting can be productive only as data for (re)training/correcting automatic extraction. I think perfect option is collect some kind of low-level audio data and extract hi-level features from it by iterative improvable (by users manual input) algorithm. not sure if it possible without storing music files on MB servers


Just tossing my pennies into this fountain (although I feel we’ve drifted from the original post), could this data not be useful within the listenbrainz project?

Yes, this is a good point and is definitely one of the values that content-based recommendation can bring. Unfortunately at the moment we don’t have the resources or research tools available to produce such a system that we’re happy with distributing.
We know that the MusicBrainz community has a strong history of contributing metadata about music, including for less common stuff. We hope that we can rely on this community to help us build a good initial set of data. Perhaps in the future we can take this data and see what else we can do with it.

Part of the limitation of the current version of AcousticBrainz is that much of the data is also of low quality. For example, we have only a few models that work with “genre”, and they all have a very small number of categories. In comparison, we were able to tag over a million new recordings with hundreds of genre tags by collecting user-contributed data, so there is definitely value in this approach.

Yes, absolutely. Part of the wind-down of AB will involve seeing which parts we can re-use in other MetaBrainz projects, and ListenBrainz is definitely a candidate to receive some of these parts. It is also being used as a source of data for similar tasks, see for example the preliminary discussion about recording similarity using ListenBrainz data

1 Like

Do you plan to start using streaming sources to extract music characteristics?

It could be useful in a number of MetaBrainz contexts as well as other people. Once we get these debugged to our liking, we’ll release public dumps.

No, we’re not going to work with raw audio anymore – it isn’t part of our core skills. We’d hoped to harness what music researchers created and scale that up, but all the algs we were given produced nothing of use. To fix this properly would be a huge investment, which we dont have.

You guys know how to make a sad music nerd smile :grin:

Thanks for not just binning the data off

Which kind of contribution do you expect from MB community now?

1 Like

Adding data to MusicBrainz, writing reviews on CritiqueBrainz, submitting listening data to ListenBrainz. :slight_smile:

1 Like