Allow getting more user data through the API

frikandel · September 4, 2023, 3:35pm

Hey, I’m the developer of .fmbot (A Last.fm Discord bot with ~650k users and ~300k servers). I’ve been looking into adding Listenbrainz support as an alternative for Last.fm, but am running into some limitations on the API that make it very difficult for me to support Listenbrainz as a proper alternative. I talked about this before on IRC some time ago, but figured it might be better to put it all in writing. Additionally I think that other developers who have created tools for Listenbrainz might be running into the same limitations.

Most tools created for Last.fm fetch quite a bit of user data. For my app this is no different. From Last.fm I get this data for each user on signup:

Listens/scrobbles for the last few months, up to 25k
Alltime top 4000 artists
Alltime top 5000 albums (releases)
Alltime top 6000 tracks (recordings)

We update top lists with recent listens/scrobbles whenever someone uses the app or once every ~2 days.

I need this data stored locally for various reasons, for example from comparing friends to each other in leaderboards to showing music trends in large communities. Other music statistic bots that work with Discord work in a similar way.

However, with the Listenbrainz API it is not really possible for a few reasons:

The MAX_ITEMS_PER_GET is set to 100, which makes getting enough data really expensive with a lot of requests. For Last.fm this is set to a 1000 per request
~~The user /listens endpoints supports no pagination/offset~~
There is no way to get a users playcount for a specific artist/album/track (aka artist/release/recording)
User top recordings endpoint does not go past top 1000

You might think, why not use the data dumps if you need this much data? I have done a small write up on that before here. It comes down to that that would be a very complicated process that would always be a bit out of date anyway. It would be much better to be able to programmatically get data for users through the API. Since we get about 1000 new users every day the signup process needs to be as flawless and automated as possible.

I hope this doesn’t come off as needy, I just figured that it might be interesting to hear feedback from a developer wanting to integrate the website. The site already works great from a user perspective imo, just the third party developer system that could be improved a bit

outsidecontext · September 4, 2023, 4:40pm

You can paginate through all the listens with the max_ts timestamp parameter. For your next request use the oldest listen’s listened_at timestamp for max_ts. That’s how pagination is also done on the LB listens page.

See also Core — ListenBrainz 0.1.0 documentation

rob · September 8, 2023, 1:25pm

Hello and thanks for your interest in supporting the LB functionality! Sorry for the delay in getting back to you – loads on this week!

We agree with #1 – I’ve hit this issue myself, so I’ve opened this PR:

Increase MAX_ITEMS_PER_GET to 1000 by mayhem · Pull Request #2572 · metabrainz/listenbrainz-server · GitHub

As for 3, we would like to have this data eventually, but it is a massive body of data for which we currently don’t have the right infrastructure. We’re hoping to improve this and make this data available in the future, but right now I cannot provide a timeline for this.

For 4, what do you do with 6000 tracks? Effectively, the answer for 4 is the same as for 3 – this is a huge amount of data that must be calculated daily, but 99.999% of this data will never be used before it gets refreshed. We would need to make infrastructure investments to make this quantity of stats available for everyone. Very tricky.