Discontinuation of AcousticBrainz (pre-announcement)

PierPiero · February 9, 2022, 11:39am

The question is, what is the competitive advantage of MB compared other sites? Why should people use MB and why should people evolve as contributor to keep MB DB up to date and appealing for the sponsors?

As an example, contributors use Discogs for its marketplace. Discogs is used by people who find Discogs content in search results and can use this information through its user-friendly website; MB basically doesn’t appear in search results and the issues with its user interface has been discussed in several threads.

Discogs is losing its grip as the market moves from physical to digital media. Streaming platforms are feed by labels and artists in real time. People can find and (most important) listen to music supported by the recommendation engine and news based on statistics, preferences, playlists, and behavior of similar users: is there anything like this planned for MB?

In MB you have different platforms to manage preferences and listening statistics without any way to make them interact up to now: is this going to change?

Will the listening data be sent using the MB player, adding complexity and losing the features provided by the original streaming platforms players? Or can the use of derived data provided by users who agree to interconnect MB allow for better performance than the original sites?

Working on original data with AcousticBrainz could have been a differentiator that motivated people to contribute to MB: now I don’t see a new one.

sound.and.vision · February 9, 2022, 7:54pm

For myself, one particular reason is that the data here is open source and free. Unlike Discogs, AllMusic, Last.fm or any other database the data here isn’t locked away and kept away from people. People are actively encouraged to contribute in the name of spreading knowledge (which in my opinion should always be free), making applications that can benefit from this knowledge and not be at the mercy of some shareholder who hasn’t made enough $ that month.

PierPiero · February 10, 2022, 7:43am

This is our common goal, but in order to be able to provide a reliable source of information, it is necessary to attract enough contributors to keep the DB up to date; otherwise, the example of AllMusic you mention is what we should expect: an unreliable source of information not useful for any serious application.

outsidecontext · February 10, 2022, 3:00pm

I actually don’t think AB itself was a reason for people contributing. It probably even wasn’t well known. For most users (even those actively contributing data to AB) it did not offer any real benefit, yet.

What I think was a big driver in the past was the ability to tag the own music files. And early on also to have a way to get metadata for CDs (after all MB was created as a replacement for freedb). Both are declining I think due to the rise of music streaming.

What I see currently as the most promising new motivator for contributing to MB is ListenBrainz. Having the ability to record the listening activity independent of a single service and getting music recommendations based on this will be an important thing. Especially if this gets integrated into players and music servers.

And from the work being done by the LB team and how they are pushing things I think they agree with that.

selflessself · February 10, 2022, 11:34pm

I think you’re both right. I think there are two major non-corporate MB users that often but not always overlap.

The first would be self-hosters who were never going to move entirely to music streaming in the first place. Some are too picky for streaming services alone to be worh it (audiophiles, obscurophiles…), some hoard music (distrust of streaming services, a general interest in archiving…) and some are both. If you have a big collection of audio files sitting on your HDD, AB was a way to make them useful.

The other like free and open source data and software. I would bet that most of the first group also falls into this group, but people who stream most or all of their music can still fall under it. Some will only add what they need to improve the quality of what they listen to, some will add things to contribute to open data for its own sake.

Folksonomy style moods and descriptors like RYM is a good addition, but AB (in theory) gave the first group a way to make fuller usage of an asset they have. Maybe the best answer would be a less ambitious acoustic tool that focuses on things that are immediately useful for a music collector while still being of interest to a database, like BPM for DJs and dynamic range for audiophiles… I could see Steve Hoffman forum users adding a lot of releases so that they could easily compare dynamic range calculations between old CDs and the newest remastering.

Sorry for rambling, maybe there’s something insightful here.

atj · February 11, 2022, 12:47pm

Can you explain this statement a bit more please? I have a big collection of audio files and I don’t see how AB made them “more useful”. Also I’m not aware of any music player software that pulled information from AB.

selflessself · February 11, 2022, 6:55pm

I don’t mean that AB made the collection more useful to the person who has them, but that it let them contribute something to the public interest when they would have otherwise just have been sitting there.

Umberto Eco talks about how you should collect books you will may never read just in case as a means to store potential knowledge. Somebody might have terabytes of albums they will probably never listen to. They put in the minimum effort to add the releases to MB so they can be tagged and stored properly, but they may never listen to them or add more detailed metadata. Something like AB allows them to immediately contribute more information to public knowledge with something that would otherwise just sit there, possibly forever.

elomatreb · February 11, 2022, 8:40pm

Running a service like AB takes up a much more significant amount of resources than some books on a shelf though, servers cost money, energy, and valuable people time (administration).

I can definitely understand shutting down the project, especially since nothing really evolved to take advantage of the data over the years the project existed.

selflessself · February 11, 2022, 10:05pm

Yes, I agree. If AB is genuinely a dead end, then retiring it would make sense. My point is that the concept is good because it takes advantage of a resource that many MB users can provide - that is, large collections of music, much of it being unavailable on streaming services. Some of it is very rare; for example, someone could rip a rare LP, and the copy sitting on their computer is the only digitized version ever made. AB or a similar project allows some acoustic information about it to be contributed to the public record.

But I am really an “idea guy” with no meaningful talent to provide. So I might just be talking idealistic nonsense.

dpr · February 16, 2022, 10:53am

Is there anykind of overview of how all the MB ‘tools’ / ‘products’ are envisaged to be used / work together or not as the case maybe please?

rob · February 16, 2022, 2:40pm

Indeed!

I wanted to chime in about the future directions of our projects – mainly that they aren’t changing a whole lot. We’re still very much dedicated to building open recommendation systems and all of the prerequisite data-sets that are required for those to function correctly.

AcousticBrainz was always supposed to be an important element of this strategy. But at this point we would need to fully reboot the project, raise funds, find engineers and then wait for things to come to maturity, which would take several years at the very least. Doubling down on ListenBrainz and trying to mine that data for these missing data sets is something we already started doing and we can see tangible results right now.

That said, I would suggest that we think of shutting down AcousticBrainz more as saving time towards our goals, as opposed to prolonging that already long road.

alastairp · February 16, 2022, 2:56pm

Thanks for the comments in this thread, everyone. I’ll just reply to a few points to clarify some of the comments that have been made.

I agree that the idea of having automatically extracted features from audio is a great idea and a great differentiator for MetaBrainz. However, we recently started looking at the data to use it and a whole bunch of things came up that made us question the data. Some specific points:

We know that the algorithm used to compute BPM is correct about 80% of the time. But we don’t know which 20% of recordings it got wrong. In this case it can be difficult to trust that any BPM value that you get is correct.
We know that the algorithm used to compute key works mostly fine on classical music, but not very well on other types of music. Again, we provide this data for all recordings in AB, but there’s no great way to be able to indicate in which case the values should be trustworthy or not. In fact, given that this is factual data we already have this in musicbrainz anyway!
We’ve known for a long time that many of the datasets that we provided in AcousticBrainz weren’t of high quality (especially the ones that worked with “genre”). We hoped that by providing the data and tools to build new datasets that other contributors would help us to build better classification models, but this never eventuated.

This is a nice goal, and I agree with it, but if you’re only 80% sure that the BPM is correct, if the key is wrong, and it can only say the genre is one of 8 western popular music genres from the end of the 20th century then is this data useful, or worse than useful?

MrClon · February 17, 2022, 7:26pm

I suspect that in many cases genre can’t be extracted from audio. It’s more line social construct then property on music itself. It’s false goal than can’t be achieved.
I think potential «good new AcousticBrainz» should starts with search for sound properties than both human understandable (or at least perceivably) and machine detectable

ijabz · February 18, 2022, 8:31am

Isn’t the problem that BPM is not appropriate for all kinds of music, if there isn’t a beat is it actually possible to generate a BPM in a meaningful way. Certainly I dont know anyone who listens to classical music who cares what the BPM is, if it works well for most music genres that actually care about BPM then that is good enough.

Again, for some time of music there is not a consistent key and therefore not possible to accurately identify it. But it is is important for Classical and although you maybe can derive the key from the title that is difficult to do accurately and not all classical music has the key in title. Its also useful for regular Pop/rock music if it is not atonal then how well does the algorithm work.

alastairp · February 18, 2022, 10:54am

Yes, depending on the accuracy with which you want to annotate genre, this might not be possible, and is one of the reasons that we started to focus on community driven genre annotations.
Unfortunately, according to current research, it’s still not really clear what properties are both easy for a machine to identify, and is unambiguous to humans as well. As we mentioned earlier, within MetaBrainz we don’t have the resources to perform this kind of research ourselves, and so at the moment this is not something that we’re able to focus on. Perhaps in the next few years there will be some newer techniques from academia that we might be able to take advantage of.

No, this isn’t the underlying problem. The main problem that we saw was tracks that do have a clear BPM, but it was incorrectly identified by the algorithm, with no indication that it might be incorrect. In the case where we know there are particular styles of music which don’t have a BPM, and we had metadata available for that, it would be a good way of “masking” the computed value, or adding a notice that we’re not sure about the BPM for a particular track.
I think that classical music does benefit from clear BPM annotations, and BPM is definitely important, and so any algorithm that we provided should also work on this.

this is not the problem that we had in the essentia extractor used for AcousticBrainz. It worked OK on classical music tuned to a specific tuning, but not well on other types of music. Even when we had a rock song in a clear, known key, the results from the algorithm were incorrect. As I understand, these types of algorithms need to be independently developed for different styles, and then you need to use another piece of information (e.g. genre/tag annotations) to decide which algorithm’s results you should present to an end user. In the case of atonal music you should do something similar - first determine if it’s atonal, and if so indicate that any potential annotations may not be accurate.

PierPiero · February 18, 2022, 2:27pm

Community-driven genre annotations don’t seem to be more accurate, aside from broad genre descriptors that are not useful at all for new releases or personalized recommendations.

alastairp · March 1, 2022, 4:21pm

Yes, that’s definitely true, but keep in mind that algorithms for narrow genre prediction on the data in AB aren’t much better either. At the moment we think that we’ll have better luck with human annotations, but we hope that if we can get good annotations from people then this could turn into better algorithms in the future.

epoupon · April 30, 2022, 6:02pm

Hello,
You told us the data in AB is not good enough. But AFAIU you only talk about the high level computed data?
I guess the low level data is still meaningful?
Personally I use low level data to create a SOM of tracks, so it was handy to have AB API available to fetch these low level metadata…