Sort troi patches by stability and use

yuioen · October 6, 2023, 11:20am

Dataset hoster: area-random-recordings? seems to have a problem. It time out. There is no other info displayed. I tried with differents queries, same result

rob · October 6, 2023, 11:36am

That endpoint was fully experimental – and its not very usable since the queries are really heavy on the server that serves them. Currently that server is busy loads of other stuff since we have nearly the whole team assembled in one place furiously hacking on features on that server.

Did you have a use case in mind for the endpoint?

rob · October 6, 2023, 11:39am

I should also add another bit of info: The endpoints on datasets.listenbrainz.org are experimental, unstable and not generally guaranteed to be up and running all the time.

The endpoint on labs.api.listenbrainz.org are much more stable and we keep those running at all times, making sure they keep functioning. We have production systems running off this site.

yuioen · October 7, 2023, 3:47pm

So most of the patch in troi are to be considered has unstable ? Maybe we should create three kind of patches in troi :

stable
experimental
backend : used only by listenbrainz servers (like topmissedrecordings)

A little bit of context : troi is implemented into funkwhale in the backend, I’m making experiments with the Funkwhale network to see what patches can be used or not

yuioen · October 7, 2023, 3:51pm

I’ve looked at labs.api.listenbrainz.org . There is three interesting tools we could use at funkwhale :

similar-recordings: Similar Recordings Viewer
similar-artists: Similar Artists Viewer
tag-similarity: ListenBrainz Tag Similarity

Can I create patches to integrate these into troi ?

UltimateRiff · October 7, 2023, 6:09pm

ooh, that’s cool~

for additional context, (and correct me if I’m wrong, @yuioen), Funkwhale is a federated (ActivityPub/Fediverse) platform for sharing music and audio. think SoundCloud, but open source and federated

rob · October 8, 2023, 9:34pm

Thanks for the feedback on troi – I’ll clean things up next time I’m working on troi. (which will be soon-ish)

And yes, the endpoints on labs.api.listenbrainz are considered semi-stable – we may still change these from time to time. In particular the similar artist endpoint is very much a proof of concept endpoint which is not very friendly to end-users.

Go ahead and develop new patches for troi, please! All these endpoints are made specifically for this. But, before you release something, we should discuss how we’re going to make the used endpoints stable before release.

I have recently learned a few things, since I added a crude funkwhale playlist support to GitHub - metabrainz/listenbrainz-content-resolver: Resolve ListenBrainz playlists from JSPF files to local playlists. and to have it take some playlists generated by LB Radio ( LB Radio - ListenBrainz ) and resolve them to local funkwhale content.

The results are rather quite disappointing – out of the 50 tracks that were in the playlist only 4 tracks were resolved to my local collection. My collection is a bit dated, but even if I updated it, perhaps only 50% of the playlist would be resolved, which destroys the integrity of the playlist. Meh.

What needs to happen to resolve this is to make LB Radio aware of a local collection. In this case there would need to be two API endpoints provided by funkwhale in order for troi to make local LB radio playlists:

Select N random tracks from the collection tagged with a given tag and in a given popularity range.
Select N random tracks from the collection from a given artist in a given popularity range.

If funkwhale (or any client) can provide these two endpoints, we can have troi use these endpoints for a local collection, rather than using the global MusicBrainz collection. Playlists generated by this patch would not need resolving and can be simply saved to a funkwhale playlist.

I’m in the process of building this and bringing the power of troi to funkwhale and other projects like it. If you’re interested, we can collaborate and make this a reality faster.

yuioen · October 9, 2023, 11:56am

We will speak in troi github about this o/

I tested on an experimental pod, got 13 result out of 50 tracks (see Troi real world review (#2228) · Issues · funkwhale / funkwhale · GitLab). I agree we should find a solution. I could create these endoints but were do I get the popularity ?
Why give a random number of track ? Shouldn’t we give all the available tracks ? Maybe we could create a pod independent service, that scrap fw network for track metadata. This service give mb algorithms all the tracks available. This has pros and cons but it avoid getting huge metadata exchange between mb and funkwhale pods, which doesn’t seem scalable.
Another solution is to alow fw to get music from third party source. But this is still in discussion.

Yes I’m still very interested o/

rob · October 9, 2023, 8:57pm

For popularity, look at this endpoint:

Dataset hoster: bulk-tag-lookup

Given a list of recording_mbids (up to 1,000!), return the tags and popularity (expressed as percent here). Each tag will result in a row, so the output is going to be very verbose; each row will contain the recording_mbid, tag, popularity and the entity the tag is attached to.

The output could be more optimal for the user – however, this table has 330,000,000 rows in it, so I’m not going to do any joins or aggregating on it. At least not yet, until I understand this better.

The overall plan is this:

FW will need to have recordings tagged with MBIDs. Sorry, but this is not a thing up for discussion – none of this would work without a tagged collection. If people refuse to do that, then we can’t help them.
FW will need to periodically call this API endpoint for all the recordings to fetch/update the tags and the popularity and store them in the local db.
Make a local API endpoint that mirrors this undocumented API endpoint at LB: https://api.listenbrainz.org/1/lb-radio/tags?tag=downtempo&begin_percent=0&end_percent=50&count=50
(code is here https://github.com/metabrainz/listenbrainz-server/blob/master/listenbrainz/webserver/views/explore_api.py#L128 )
The tag and popularity data can then be returned by the endpoint to troi – we’ll have to come up with a way to tell troi to use alternate URLs for some of its calls, but that isn’t a lot of work.
So far, this allows troi to generate playlists – to actually save them we will need to have a troi plugin that knows how to create new funkwhale playlists.

Point #5 is one that you could start working on – I am traveling for the next week and won’t be online much. But once I get back, I plan to spent some serious time on these features.

yuioen · October 10, 2023, 11:27am

I don’t understand the point 3 because lb-radio will query listenbrainz api anyway… (similar artist, similar tag, etc) So in any case we first need to :

Find a way to make listenbrainz aware of what tracks are available in fhe fw pod or in the fw network
Return recordings that are on the available metadata (and not from the whole mb db).

This should be a filter that can be applied to every listenbrainz api endpoint returning recordings.

We could also keep the current implementation and let fw handle this. But to have enough track match we need to add an option in the patch to return way more recordings (for example daily-jam only return 50, if it would return 250 we would get a 50track playlist in funkwhale. We loose the playlist integrity but this is cheaper for lb servers.

rob · October 10, 2023, 2:04pm

This approach doesn’t work very well – if we make the call to LB radio on listenbrainz.org it will give us a global playlist that will match poorly to a local collection. But if we use a local endpoint in troi, we can effectively ask “Give me a selection of recordings in the local collection tagged with tag X”.

If we then build a playlist from this endpoint (and an artist endpoint) we will not need to filter them, since we know 100% that those tracks are in the local collection. Troi can then create the playlist and save it to FW without a resolution step.

Yes, this is not a good solution. In order to return a 250 track playlist, then troi may need to fetch many more recordings to make a good playlist. The best way to do this is to only select recordings that are available locally and not do any filtering at all.

yuioen · October 10, 2023, 10:55pm

I think you didn’t understood my proposition : Implement a filter into lb so it only make queries on a pre-defined set of recording. But I agreed popularity and tag data should be clone into fw db, would be more efficient this way.

This could work but the lb-radio use at least similar-artists lb endpoint. And I don’t see how we could have this locally. Also what about the other patch that need other endpoints ? We should think about what can be clone into fw db and what cannot, and how we can handle the data that cannot be cloned.

For example for the periodic_jams I suppose we could have a copy of user recommended track that are in the pod.

Yes I agree it would be better to only select recordings that are available locally. But approach consisting on creating bigger playlists has the big advantage of being usable right know without having to clone any lb data into fw. I’m not against this cloning at all but this need a good amount of planing (which patch / endpoints, will they be stable ? ). Maybe increasing the size of troi generated playlist could be a partial solution

yuioen · October 12, 2023, 12:31pm

also note that implementing musicbrainz tag into fw is in our roadmap already but it’s blocked by MBS-12368: Add annotations, rels and aliases to genre WS by reosarevok · Pull Request #2518 · metabrainz/musicbrainz-server · GitHub.

rob · October 13, 2023, 4:02pm

This does not need to be mirrored locally. Troi can use the global one and then see which artists are available locally – so this this we would need a different endpoint that allows us to send a list of artists and it filter that list according to what is locally available and return it. That’s all we need to make it work – save for troi changes as well, but those would be minor.

This is not workable from our perspective since this would create quite a load on our servers since it is grotesquely inefficient. Remember, whatever we release and make available to the world, we need to support effectively “forever”. We need to get it right before we let many users use it, otherwise we might need to remove the feature later, which leads to a really crap experience.

Scalability is always a concern/problem for us, since we’re a non-profit and have very limited resources, so we need to be conservative with our resources.

rob · October 13, 2023, 4:02pm

@reosarevok do you have a timeline for merging this?

yuioen · October 13, 2023, 4:23pm

I would need to look into the code, but from a general perspective how do you see these implementations working ? We would have some sub-patch that are funkwhales patches ? troi.patches.funkwhale.patch_name Or better we could have a registry, with all endpoints, and if a patch is launched with "funkwhale" : True in the config, the patch use the registry and use fw endpoints ? This way we could implement this with any other software

yuioen · October 13, 2023, 4:26pm

yes make a lot of sense jaja

rob · October 13, 2023, 5:46pm

The latter is the approach I was thinking of, because I want this to work for all end-user installations, be it FW, Navidrome or just a play files-on-disk collection. For the tag lookup we can swap out the URL for the API endpoint – super easy. For similar artists we will need to add a new call to the artist filter endpoint, which gets optionally called when troi is invoked with the funkwhale=True option.

I can start working on these features mid next week when I return back to Spain.

outsidecontext · October 13, 2023, 7:37pm

On the summit it was discussed that this only is blocked by the search server update. But updating the search server seems to be the difficult part.

yuioen · October 14, 2023, 2:56pm

I don’t get what endpoint youre speaking about :s

I will be in spain too, we could do a hacking session.