My issue with the latest blog post

UltimateRiff · December 3, 2024, 8:17am

I will second that ListenBrainz should probably add tags and genres as similarity points, as that is an important data point that we’re currently not utilizing (and might partially fix the “nobody listens to this artist so we can’t get people to listen to this artist” problem, at least for artists with tags)

another idea I’ve seen bouncing around that I really like is perhaps try using Listens Per Listener instead of straight Listens (or perhaps in addition to), at least for recommendations. this might help boost less popular but well-enjoyed artists and tracks while also lessening the flooding of popular artists on everyone’s playlists (I might rather listen to a song with 3 listeners and 100 plays than with 300 listeners and 300 plays)

(this is of course assuming we do currently use straight listens for recommendations, I’m not familiar with the backend of ListenBrainz, lol)

Grandiose_Magus_Equirhodont · December 3, 2024, 1:42pm

Hi, well I’ve seen a bug, but I guess it’s been reported a couple of times (I think) and I guess it also generates a critical error.

In the following images, we can see that I listen to a song (the lament or lady death), these songs come from the album Down Below by Tribulation. But in the statistics it appears to me that I listened to these songs from a sampler “Rock Hard Sampler Jan 2018”. I have never listened to this sampler that contains songs from Tribulation and Watain. So the statistics are already wrong because they are counting plays of albums that I didn’t listen to. And this happens with a lot of albums too.

Captura de pantalla 2024-12-03 a la(s) 10.38.09

You can’t manually link them to the corresponding album, since when doing it from musicbrainz the song that doesn’t correspond always remains, I understand that this happens when you import a song from another album instead of leaving it as a new release.

Always grateful for the great effort you put into Listenbrainz.

Thanks

RustyNova · December 3, 2024, 2:22pm

track listens are only counted for the first release group that got inserted into Musicbrainz, so yeah, there’s not much else you can do.

Although the counts ignores compilation releases. I’m not sure if a sampler count as a compilation, but if so you can correct your count.

mr_monkey · December 4, 2024, 3:37pm

A post was split to a new topic: Global listen counts do not add up

Isabelxxx · December 7, 2024, 10:36am

The decision was to simply nuke it instead of looking for a replacement or just deleting all old data and starting fresh to get homogeneous data (if that was really the problem).

That’s my point, I am not really arguing about the quality of the data collected at that point. If there are other services providing KEY for tracks, and it works fine, then it could have been fine here too; even if that needed a total server reset and starting from scratch. That’s better than having zero data.

We never claimed that.

About this, we may have different POV about it. In reality all energy towards a system recommendation has been put into that, there have also been multiple statements against the use of AI, etc. even if I have not literally read “collaborative filtering is better than similarity by audio features/genres/tags”, the point is actions speak louder than words. Maybe that was not the intention, but here we are.

If we read this thread for ex. in the end all discussion gravitates around “not enough data” problem which is essentially a collaborative filtering problem. There is no solution for that, since no more users/devs will come to LB unless it offers something different/better than other services. And [some of] these features don’t work because there is “not enough data”. I think it’s clear the problem at this point is the approach chosen.

Anyway that’s just a single point of the thread. Thing is LB does not replace at all dev needs now tha Spotify has shutdown part of their API. And the main reason is ListenBrainz doesn’t provide at all any audio features. That’s undeniable.

sanojjonas · December 7, 2024, 10:50am

i think what he was trying to say was that it took to much resources and at the end it didn’t bring anything good to the table.
it’s all open source so ANYBODY could restart the project and continue where they stopped, but since nobody hasn’t…

this is something that frustrates me alot. it sounds more like an excuse then a sollution. if the system doesn’t work becasue you don’t have enough date, try something different that does work. only if you have something that works, you will get more users in the project, more users = more data, and then eventually the collaborative filtering thing could work. but until then, try new things, have multiple options.

outsidecontext · December 7, 2024, 12:29pm

I don’t really see how nuking all the data and collecting the same data again with the same system would make any difference.

The point is that it became clear that the data quality wasn’t up to the required quality. And the project lacked the resources and people to actually improve it.

At this point in time the AcousticBrainz service is even still running. What was stopped was data collection. And I don’t see how continuing to collect data would have helped with any of the issues.

Also the data and source code of the project are both available, so anyone could actually bring this back to live.

mr_monkey · December 10, 2024, 1:03pm

This thread has gone a few different directions at the same time, but I’ll try to address all the points I can

Audio data / AcousticBrainz

LB does not replace at all dev needs […] doesn’t provide at all any audio features

Indeed we don’t offer a replacement for that Spotify feature; quoting the blog post:

While not everything that Spotify is enshittifying has a direct replacement with ListenBrainz, we can at least offer a path forward for developers.

More specifically we are not equipped to do audio analysis, feature detection, etc. Shame as it might be, this requires a team of researchers to develop good algorithms and stay up to date with technological improvements.
The data that is in AcousticBrainz is not reliable, and starting from scratch with no better algorithms and waiting years to see if the result would be better is a waste of resources for our small team.
The first issue is developing accurate audio analysis algorithms (research domain), the second is that without actual music files (only audio features, not audio files are submitted to AB) we can’t re-process tracks if said algorithms are improved over time, leaving you with completely unreliable datasets.

Recommendations

I’ll start by saying that if anybody reading this has experience in recommender systems, your help would be very much appreciated!

Where Spotify employs —or at least used to— a whole team of researchers in the music field, our team has a whole two people (neither of which are researchers) working on all aspects of the LB back-end, which includes (but is not limited to) the recommendations systems.

The best we can do is read papers about recommender systems and implement them ourselves. We use Spark for this, if anyone is curious: Collaborative Filtering - Spark 3.5.3 Documentation.
Then comes evaluating the results: we get some feedback from you our users, but not at a scale that would allow us to easily draw solid conclusions. So we slowly crawl in the dark with our hands in front of us, improving bit by bit.

These are very complex computational problems to solve, especially without a big team, on a shoestring budget and with limited user data.
We can’t snap our fingers and “try something different that does work”. This is an unrealistic expectation.
I’m well placed to say that having opinions and ideas on how the recommendations should work does not translate to the reality of implementing them. Despite reading lots of papers on the topic myself, I can only hint at ideas for improvements, while the technical implementation is out of my capabilities as a developer.

Current state / future improvements

Our current priority is a more technical one: improving the stability and reliability of all the different systems that compose LB. With only two people working on the back-end (and other projects at the same time), that does not leave time for reading and digesting research papers, implementing new algorithms, etc.
The recommendations will continue to improve, bit by bit over time, but I request everyone’s patience —or assistance. Constructive feedback helps! And while I understand your frustration (we also want better recommendations!), it is sometimes disheartening to read harsh feedback or hand-waving suggestions.
We have a few avenues of improvement already in mind, some to help remediate the not-enough-data problems by leaning more on MusicBrainz metadata. This will take time.

Music discovery

In the meantime, on the topic of “try new things, have multiple options”, we have other music discovery tools in LB that require more active participation and don’t have the computational pitfalls I described above:

Fresh Releases: ListenBrainz
Music Neighborhood: ListenBrainz
LB Radio: ListenBrainz
User feed (direct recommendations from other users): ListenBrainz

Isabelxxx · December 10, 2024, 8:21pm

That only is true under the assumption that…

Essentia has not improved at all.
No Essentia bits provide quality data. (use only those)
There are no other reliable tools to extract audio features (false).

The point is that it became clear that the data quality wasn’t up to the required quality. And the project lacked the resources and people to actually improve it.

I may agree with that part, but obviously extracting audio features and then at some point add it to a model to extract useful high-level data is the way to go. I think that’s a pretty clear path nowadays on any Streaming service (Plex, Spotify, etc.). If old data is not good enough, then it can be recollected.

I understand there should be some concerns about this POV and there were reasons to stop it. But lets be honest and ask yourselves where audio analysis/recommendation is going in the next 5 years. How will ListenBrainz fit into that future? I don’t see how playing a “numbers game” in a market absolutely dominated by bigger players brings anything new to the table.

Isabelxxx · December 10, 2024, 8:24pm

Thanks for your replies. Obviously one thing is discussing “a problem”, another the reality behind the project, team and limited resources.

Blockquote the second is that without actual music files (only audio features, not audio files are submitted to AB) we can’t re-process tracks if said algorithms are improved over time, leaving you with completely unreliable datasets.

This could be partially solved by enforcing by default file analysis in Picard. So everytime you look by fingerprint, you send the analysis data. Also streaming services nowadays would simplify this problem a lot, only resources limit having servers analyzing streamed tracks on batch.

sanojjonas · December 11, 2024, 10:07am

I would like to appologise (again) about my tone in these posts.
I really appreciate the effort you guys put in there.

and in my opinion there is a difference between “there isn’t enough data” and “we currently don’t have the time/expertese to change the algorithm”.

So again my apologies if we where a bit ungratefull and demanding.

but there is still one thing i would like to mention about the following:

more specifically:

these tools still use recommendation algorithm. and don’t function for smaller artists that only have a few listens.

this would be nice if there where some stats on that. similar too the global stats & everything that you could have stats only of your friends, instead of a long list of songs that they listened to.

anyhow; what i’m trying to say is. keep up the good work and we just need to be more patient or help code (or stfu)

DontMindMe · December 11, 2024, 5:42pm

And of course try to lure as many users as you can to ListenBrainz! I try to mention it on Reddit whenever relevant, and on last.fm whenever I see people there complaining about last.fm issues. Also could follow and like/repost things on the social accounts (https://bsky.app/profile/listenbrainz.org for example)

mr_monkey · December 18, 2024, 1:34pm

One more piece of information I forgot to mention: we are planning on processing a huge music listening dataset (MLHD — music listening history dataset) that we reworked and massaged (see Datasets - MetaBrainz Foundation for more info).

This dataset of 26 billion entries —compared to LB’s 1 billion listens— will significantly improve the quality of our similarity calculations for pre-2017 music.

However processing such a huge dataset requires some temporary infrastructure modifications and server rental. We are planning to do this in the first part of the new year.

RustyNova · December 18, 2024, 3:18pm

(see Datasets - MetaBrainz Foundation for more info).

I can’t believe I got rickrolled once again…

mr_monkey · December 18, 2024, 3:29pm

What?! By MetaBrainz?! Never !

vvater · December 18, 2024, 11:46pm

I want to say that some of the issues that have been brought up regarding similar artists are true - I will not cite specific examples, but I remember looking and thinking, oh, geez, this is just a bunch of mainstream stuff I’m already familiar with that doesn’t really direct me to something more specific at all… But this is absolutely a common issue on similar websites - whether it’s buying books on Amazon or looking for similar artists on Last.Fm, if what you are looking at has reached a certain threshold of popularity, the “similar” stuff is often things you are completely familiar with…

Now, I want to say this…

I’m actually impressed that, with quite limited data, the Similar Artists for some of these musicians are accurate and equally underground.

Pop punk group the Hextalls has only 1.8k listens, and I believe the similar artists are decent enough, I am just surprised there are so many for such a small amount of plays, because you see Gimp Fist is has 1.6k listens, but only three similar artists, and one seems slightly off point, but the other two are helpful. The Macc Lads with 6.8k has some questionable similar artists… Nonetheless, it seems very promising. The algorithm is working more or less, and will be further refined with more data.

Thus, more users is ideal… And even just these guys who go, Oh I need to back up my last.fm, can be very helpful. For instance, there is a specific account that listens to a lot of obscure punk & Oi who I know has paid major dividends for the future of similar artists since I see he has really plumbed the depths of the genre…

I am very optimistic based off of what I see.