How are albums and recordings aggregated in the Top Albums/Tracks charts?

Alioth · July 4, 2024, 9:34pm

I think something doesn’t work as expected - or, at least, it works contrary to my expectations.

Long story short, I prepared a minimal example that demonstrates it. I wiped all my ListenBrainz data and imported a set of 59 listens of one album. When importing I submitted artist, album and recording MBIDs. The listens I submitted look like this:

{
 "listened_at": "1612138655",
 "track_metadata": {
  "artist_name": "Lull",
  "track_name": "[Moment 1]",
  "additional_info": {
   "artist_mbids": [
    "2970da48-7f03-4f49-897e-e633b256992a"
   ],
   "recording_mbid": "e69037d9-5c69-4bc0-b4fe-dc353cc71dfd",
   "release_mbid": "21b10815-cf4b-492d-8b14-3bc7b02fc16c"
  },
  "release_name": "Moments"
 }
}

The listens only differ by listened_at, track_name and recording_mbid fields. Note that the recording title is different in my exported data than in MusicBrainz - “[Moment 1]” vs. “[untitled]”. The recordings all display as “[untitled]” in my recent listens - but I think that’s the way it should be.

When I hover over the titles in my “Recent listens” list I can see the expected recording MBIDs (ones that I submitted) in URLs. There is a different recording MBID in each listen, based on the data I sent.

My all time stats look like this:

I have 59 listens in total - expected.
All of them by artist Lull - expected.
I have two top albums: “Moments” and “Vacu Sessions 17” - definitely not expected. “Moments” is correct, but I never listened to the other one.
I have two top tracks: “Moment 100” and “Moments 1-3”. I never listened to either of them: “Moment 100” is featured on a recent release of “Moments” that I never listened to, and “Moments 1-3” appears to be a recording from a compilation (or maybe DJ mix) release.

I assumed that since I took care to send the correct recording MBIDs in my submission, I would have 59 separate top tracks in total, each listened to exactly once. And all of them in the same release. I don’t understand why they are aggregated (in case of recordings) or split (in case of album) like that?

Same problem also appeared in other artist and albums, but I don’t currently have an example ready. I could provide another example with different artist and different set of tracks if needed for investigation or testing.

rob · July 9, 2024, 10:32am

Hi!

What you are describing sounds pretty much how things are currently intended to be, which is obviously less than ideal.

When we wrote the original mapping software that takes in simple metadata and maps that data to MB recordings, we started with Artist and Recording titles as the only required pieces of data needed to get a match. We didn’t really know how to accomplish this and it took months of attempts and huge amount of data processing to learn how it can (and should) work. This is what we currently have in production – the explicit goal was to learn from this and see what our next steps should be.

Despite its shortcomings, we were blown away by how well the system worked and started collecting a list of bugs in this roll-up ticket. We’ve now collected enough data about how this system needs to be improved - namely that we need to build a whole new parallel lookup path, one that includes the album when looking the data. We finally understand how to do this! However, now we’re waiting for our team to free up from current projects before we dive into this project.

I suspect that in the autumn we will have time to address this project and make some drastic improvements going forward. In the mean-time take a look at the roll-up ticket and see if there is anything else you can contribute to the ticket.

Thanks!

Alioth · July 9, 2024, 9:58pm

Thanks @rob, I appreciate the detailed answer. It appears I must make it a habit to search through Jira first when I see some technical issues with the service. I’ll have a look at that ticket and maybe I really can contribute something.

I have a lot of faith in the ListenBrainz project, I think in the long run it can become superior to Last.fm as far as data collection and processing is concerned. So when I see something surprising, I just want to report it to help make ListenBrainz better.

Thanks for your efforts.

rob · July 10, 2024, 10:00am

I’m pleased to hear this and I appreciate you reporting issues. The whole team is engaged to make LB as best as it can be, so it is always nice to hear encouraging words.

One question for you: What things in LB are missing or not good enough that makes you think that last.fm is still better overall? We should we focus on next (year)?

Alioth · July 12, 2024, 8:46pm

That’s a tough one. I’ve given some suggestions in the “What do you want in LB” thread (and elsewhere too), but it seems you’re asking more from overall project management perspective?

First of all, I’m not saying Last.fm is necessarily “better” as it is now. I’ve been a long-time user, but I’ve also been pretty frustrated with the experience, especially in the recent years. And it’s not just me. There are some issues Last.fm just won’t be able to fix without updating their data model, but ListenBrainz should not run into the same problems because the data model here is already better.

Some examples:

the possibility to distinguish entities which share the same name - artists, recordings, etc. (MBIDs help with that)
allowing users to help clean up the data if errors are found (typos, iNNovAtiVE cAPitALizAtIOn, etc. - no problem, just go and edit what you think is wrong)
handling missing data in the catalogue (just go and add a missing release to MusicBrainz)
merging scrobbles of the same artist or track split by spelling or punctuation differences (like artist name variations, or using a typographic apostrophe vs. straight ASCII one) - also possible to handle thanks to MBIDs.

So ListenBrainz could definitely become “Last.fm done right”, and take care of the needs of the frustrated Last.fm users who really care about the quality of their data.

If I were to point to the things to make ListenBrainz better overall, I’d say to focus on the end user experience, starting from making it easy for the people to migrate from Last.fm to ListenBrainz. When I last checked, the LB Last.fm importer didn’t work too well (and I know it’s not easy to write a robust application using Last.fm API - in the end I hacked my own Last.fm exporter and LB submitter, so I experienced the rough edges). I didn’t look at it for some time, so maybe the importer was improved in the meantime.

After importing their Last.fm data the new user should be able to quickly see the stats of their freshly imported data. What’s even more important, the numbers should be easily comparable to what they know from Last.fm. Now, I realize this is tricky, the numbers could be different because of different underlying data models in Last.fm and LB. And statistics generation in LB is, as far as I understand, a significant hardware resources problem. But it’s something to think about. One of the major reasons for using Last.fm is having ones own personal listening history, so data accuracy is important. “I imported my data from Last.fm but now the order of my top artists and albums is different and I don’t understand why” would not be a good first experience of a new LB user.

Also, while I don’t use Spotify myself, I know that many Last.fm users complain about frequent disconnections between their Last.fm accounts and Spotify, resulting in lost scrobbles. I’m not sure what the root cause is, but maybe ListenBrainz could handle it better than Last.fm does? Stuff like that goes a long way.

From the time I spent in the Last.fm subreddit I can say that people love cool visualizations of their listening data. Some of them are already here (e.g. the world map and the time charts), and others could be added in the future. There are multiple third-party tools created with Last.fm API that could be an inspiration. For example, for some reason I especially enjoy the “artist race chart” and cumulative scrobble charts from https://lastfmstats.com. But I guess this is a question of hardware resources, again.

What I clearly wouldn’t want LB to focus on are the social features. For example, having anything resembling shoutboxes would open a huge can of worms (it’s just too hard to moderate).

TL;DR effortless migration from Last.fm, focus on data correctness, faster charts generation, more cool data visualizations.

rob · July 17, 2024, 11:50am

Thank you so much for your thoughts – I think most of what you suggested is more or less on our roadmap already.

I think the most important thing you’ve made clear is that the migration from last.fm should be clear, easy and have immediate tangible results. I think that needs to become a mantra for us. The immediate tangible results will be the most challenging one…

Working with spotify is royal pain – their systems throw lots of errors, so we need to be very diligent about catching errors and trying to recover from their frequent problems. Not a whole lot of fun, TBH.

So, not just last.fm on this one.

Thanks!