GSoC 2022: Proposal for Create a 'Release Radar' plugin for the Troi toolkit

Continuing the discussion from Create a 'Release Radar' plugin for the Troi toolkit - GSoC 2022:


GSoC 2022 Proposal For ListenBrainz - Create a ‘Release Radar’ plugin for the Troi toolkit

Project Logistics

Project Name

Create a ‘Release Radar’ plugin for the Troi toolkit

Project Description

Development/Summer of Code/2022/ListenBrainz

Proposed mentors: mayhem

Languages/skills: Python, Postgres possibly

Estimated Project Length: 175 hours

Expected outcomes: One or more finished, debugged, and tested plugins for Troi.

Our troi recommendation toolkit is our playground for developing recommendation algorithms. The toolkit already knows how to fetch data from ListenBrainz for stats, collaborative filtered recommended tracks, similar artists, and similar recordings. From MusicBrainz, it can fetch needed metadata such as genres and tags. This project should generate a playlist every Friday that is a collection of selected tracks that have been recently released (last 2 weeks or so) by artists that are in a given user’s top artists list. We will have an API endpoint that will list recent releases for a given user, which will be implemented by a MetaBrainz team member, and your Troi plugin should select tracks from these releases and make an exploration playlist from these tracks.

Outline

  1. Background

    1. What is Troi and what is it for

      The Troi recommendation playground is a music recommendation engine sandbox for developers based on music data in ListenBrainz and MusicBrainz. The toolkit takes advantage of the listening history records in ListenBrainz and the comprehensive music information database in MusicBrainz, allowing various ways to implement cutting-edge recommendation algorithms. This project aims to provide better music playlist services for users of ListenBrainz.

    2. Current features

      1. Datasets and available APIs

        Dataset hoster: home

        Dataset hoster: home

        MusicBrainz API

      2. Patches

        • Region-based recommendations: Area-random-recordings and World-trip, which return playlists based on songs mostly using the “area-random-recordings” API from the wolf dataset hoster.
        • Artist-based (collaborative filtering) recommendations: Daily-jams and Weekly-flashback-jams, which return playlists based on top or similar artists of a given user, mostly using ListenBrainz recommendation API.
        • Collaborative filtering: First represent the relationship between different users and songs using the weighted matrix factorization (WMF) algorithm. Second, generate recommendations for each user by finding the ‘K’ closest song vectors for every user vector, using the approximate nearest neighbor algorithm. The process can be simply shown in the figure below.
    3. Why release radar

      Nowadays, surrounded by music stream services, there’s no need to say how much playlists matter to us. In the tutorial on playlist, we can get a general idea that the methods and evaluations of automatic playlist generation are important.

      Ultimately, the playlists are meant to serve for a smooth and innovative listening experience. And as for release radar, my understanding of it comes from Spotify’s automatic playlist with the exact name: it should tailor to the user’s listening habits, including the latest music from use’s following artists, and add new blood to the user’s listening experience. This playlist can not only save users’ time to explore new songs through trial and error but also brings hidden gems or rising stars in different music styles to the public.

    4. What is a good release radar

      1. Factors for the goodness of a playlist

        • The songs in the playlist - including the listener’s familiarity with and preference for the songs
        • The level of variety and coherence in a playlist
        • The order of the songs
        • The song transitions
        • Overall playlist structure
        • Other factors: serendipity, freshness, ‘coolness’
        • The Context
      2. Factors for user preference

        • Musical taste - long term slowly evolving commitment to a genre
        • Recent listening history
        • Mood or state of mind
        • The context: listening, driving, studying, working, exercising, etc.
        • The Familiarity
        • People sometimes prefer to listen to the familiar songs that they like less than non-familiar songs
        • Familiarity significantly predicts choice when controlling for the effects of liking, regret, and ‘coolness’
  2. Goals and Non-goals

    1. A new bug-free patch that generates a weekly release radar playlist in at least one approach
    2. Incorporate the new release API endpoint from MusicBrainz
    3. Automatically deliver the playlist each Friday
    4. Incorporate user feedback to improve the “goodness” of the playlist generation process
  3. Plugin Framework

    Patch description: The release radar is a patch in the Troi toolkit that takes in a MusicBrainz/ListenBrainz user_id string, then spits out a Playlist entity including newly released songs by the user’s top artists. I use existent patch structures as references.

    1. Obtain top artists and top recordings for a certain user
    2. Obtain top artists’ new releases within the last 2 weeks
    3. Transform the mbids from previous searches into lists of recordings
    4. sort and modify the release playlist by predicting user preferences
    5. remove redundant recordings in the release playlist
    6. Make sure the size of the release radar is between 1 to 25 recordings
    7. loop this patch to update the release radar list only on Friday
  4. Plugin Design and Implementation

    1. Release API usage

      1. Input: User_Name string

      2. Output: a JSON file that contains MBIDs

      3. Output structure samples:

        {
        "payload": {
            "count": 25,
            "from_ts": 1649030400,
            "last_updated": 1650160844,
            "offset": 0,
            "range": "week",
            "releases": [
              {
                "artist_mbids": [
                  "b2029169-2574-4305-820f-252a5fde3697"
                ],
                "artist_msid": null,
                "artist_name": "Meghan Trainor",
                "listen_count": 7,
                "release_mbid": "0d529701-f09a-431e-b088-97b4df737e37",
                "release_msid": null,
                "release_name": "Title"
              },
        		"to_ts": 1649635200,
            "total_release_count": 124,
            "user_id": "lucifer"
        	}
        }
        
    2. Catering to user preferences
      As we can see, in the WMF algorithm, the matrices are made by play count. If the song has never been played, the confidence variable will have a low value.

      In this case, we can’t only rely on collaborative filtering since the newly released recordings are not likely to exist in most users’ listening histories. We need to combine the collaborative filtering algorithm with the release recordings fetched by our new API. The relationships between my proposed methods for predicting preferences can be drawn as below:

      1. User-similarity-driven method 1:

        Simply looking for the subset of new release playlist and recommendation recordings both based on top artist of the user.

      2. Recording-similarity-driven method 2&3:

        This is to assume that people love to hear songs that are similar to their previous loved songs.

        Use a distance-based similarity algorithm such as spotify annoy to get scores between recordings in the release playlist and the top listening (ranked by play count) recordings. Then get the distance between each recording in the release playlist and all the recordings in the user’s top listening playlist.

        Method 2 Catch your favorite tastes: Sort the release playlist by the minimum distance ascendingly to any recording in the top listening playlist, so that the new songs are very likely to be the user’s new favorite regardless of the user’s listening habits.

        Method 3 Play in the safe zone: Sort the release playlist by the average distance ascendingly to the whole top listening playlist, so that the new songs are relatively close to the user’s current listening habits.

      3. Novelty-driven method

        Method 4 New blood: Sort the release by the average distance descendingly to the whole top listening playlist, so that the new songs are totally opposite to the user’s current listening habit. It may be helpful for the user to discover a new continent in the music world.

Expected Timeline for Deliverables

Extra Information

Working Time

I will be based in New York, NY during the summer, which is in Eastern Daylight Time (EDT, GMT -5) timezone. I will not take any classes during the summer but will work on a research project besides this project. I will be able to continuously spend about 12 weeks on this project, each week spending about flexible 10-30 hours depending on the progress of each period. I can also start early to prepare for the project prior to the official start of the coding period.

References

Hi! Thanks for the proposal. @rob can provide detailed and useful feedback when around since he is the one most familiar with this.

Some comments from me mostly about clarifying the proposal:

  1. IMO, the algorithm is the crux of this project. So my general suggestion is to add as many details as possible to the algorithm you intend to use to generate the playlist.

  2. recording_lookup and mbid_mapping appear multiple times in the flowchart. I think those refer to Troi elements. Is that so or do they refer to something else?

  3. In method 1, you suggest

    Simply looking for the subset of new release playlist and recommendation recordings both based on top artist of the user.

    . Can you please elaborate this? IIUC, you mean to take the recordings from the new releases list of the artists which are in the users top artist list. More details and clarification on this part would be helpful.

  4. In similarity method, I am unclear how you propose to calculate the similarities (I am not familiar with how annoy works so it might due to that. In that case, feel free to ignore this point). The recordings in the new releases list will likely be not listened earlier by the user so they won’t be in the user’s top listening list. How will you calculate similarity then?

    Note that we don’t have acoustic data for recordings. We used to have it in AB but that is going away so we can’t use that. We intend to user aggregated mood data to LB/MB but that won’t be available until many months.

  5. To confirm, the proposal proposes one way to generate a list of tracks and various ways to sort those tracks. After each sort, we pick the top 25 tracks and discard the rest to generate a playlist. Is this understanding correct?

1 Like

Thanks for the comments!

  1. I can add more details about general theories of collaborative filtering and distance calculation, but I’m not sure about what the implementation method wrt filtering LB recommendation API is using exactly since I can’t find details about it in the API doc.

  2. recording_lookup and mbid_mapping are the methods in troi.musicbrainz which I believed are used in common pipelines in Troi patches.
    image

  3. In method 1, recommendation recordings refer to a list of recordings based on the user’s top artist using listenbrainz’s recommendation API. The new release playlist refers to a list of recordings based on the user’s top artist using release API (you kindly showed me a demo of it earlier). The method aims to return common elements in the two playlists.

  • As far as I know, annoy is a tool for calculating distance of high dimensional vectors to find nearest neighbors. As long as we have a matrix of users and recordings, we can do matrix factorization then calculate the distance between these user/recording vectors. Though pipelines for building matrix and factorization may have to be build from scratch (still looking for existent implementations). My idea is to build a huge matrix with rows of users and columns of songs in the whole listenbrainz database, using a specific metric for the matrix values, and do the factorization. Then, upon query for each user, get the sorted release playlist by a.get_distance(i, j) in annoy, where i refers to each song in the user’s top listening list, and j refers to each song in the new release list. Since the recommendation should be done each Friday, we can update the matrix once a week on Fri, hopefully wouldn’t be too expensive.

  • Thanks for letting me know that we’re unable to use audio features at this point. It means I can’t use content-based filtering anymore, but it still works with annoy if we stick to using factors such as ‘play count’ as the value of the matrix.

  • I agree that this method of using ‘play count’ faces the issue of adding new users/items to the system. As for new users, we can’t have enough data to perform the recommendation. As for new releases, I’m trying to come up with new properties to construct song vectors: we may use the genre(tag attribute in recording) of the recordings as the indicator for their preferences. In this case, I need to either obtain the genre vector from text embeddings from pretrained / retrained NLP models, or simply use them as categorical indicators. However, another challenge for this is the lack of tags of recordings in MB data.

  1. Yes, you are correct, but I choose to pick at least 1, and at most 25 tracks for the return list.

P.S. Considering the trial and error in the similarity method, I might plan to start this project with method 1 using existing tools to complete minimal features first.

1 Like

Thanks for the clarification.

A side comment, in case you didn’t know we also have love and hate feedback on tracks in LB given by users which may be useful to you. See https://listenbrainz.readthedocs.io/en/latest/users/api/recordings/#get--1-feedback-user-(user_name)-get-feedback.

I can add more details about general theories of collaborative filtering and distance calculation,

Ah sorry, I didn’t mean to ask that. You only need to add details directly relevant to your project. We are already using CF so no (added after edit sorry had forgotten the word earlier) need to add general theory about it but if they are specific details you wish to modify or utilize from it, then yes add that.

I’m not sure about what the implementation method wrt filtering LB recommendation API is using exactly since I can’t find details about it

We are using collaborative filtering with implicit ratings for that.

As long as we have a matrix of users and recordings, we can do matrix factorization then calculate the distance between these user/recording vectors. Though pipelines for building matrix and factorization may have to be build from scratch (still looking for existent implementations)

Thanks for the details. This sounds similar to Collaborative Filtering to me but I’ll need to read up in detail later. If this is indeed similar to Collaborative Filtering, then as mentioned earlier we the LB recommended recordings are already generated using that so not sure if this will be needed.

As for new releases, I’m trying to come up with new properties to construct song vectors: we may use genre of the recordings as the indicator for their preferences. In this case, I need to either obtain the genre vector from text embeddings from pretrained / retrained NLP models, or simply use them as categorical indicators.

We do have genre data in MB but I am unsure how many % of recordings have that etc. might be worth to start with that. Obtaining the genre using NLP seems interesting. I am not well versed this @rob or @alastairp might be. In any case, please do include the details of whatever methods you intend to use in the proposal.

1 Like

Thanks for all the info!

  1. I can use the love/hate binary indicator to show the preferences of users in the matrix then. But the sparsity of the matrix will be affected due to the richness of user feedback just like the genre element (tags).

  2. I think annoy is an application of CF for a specific Nearest Neighbors algorithm. If the middle process ( specifically the user and recording vectors) of LB recommendation can be accessed, then I can use annoy on them directly. Thanks for the clarification, would it be fine if I just add how distances(similarities) are calculated using vectors in annoy in this part?

  3. To elaborate on the NLP method, we can get vectors (word embeddings) of the genres first, using text corpus related to music. Then fill the matrix with the vector attached to each recording genre. But I guess in this case, method 2 ~ 4 needs to play with the inside of ListenBrainz API, modifying inputs and outputs, instead of just calling the URL.
    I’ll make sure to include these new properties in my proposal.

Hi and thanks for your proposal!

Your proposal contains 70% material that is a rehash of what exists and somewhat how that stuff works – but this is not what we ask for in a proposal. What we want to hear is HOW you’re going to solve the problems at hand with a detailed run down of the steps you plan to take in this project. You’ve given us very little to go on here and I really have no idea how you propose to solve the problem.

Let me remind you about the idea description for this project:

However, some care must be taken to not select ALL the tracks from a new release, but instead to pick some tracks that we think might be interesting to the user. How would you do this? This question is hard to answer on your own – you will be required to engage with the ListenBrainz team in IRC to discuss this feature in detail before you make your proposal. Any proposal that does not engage the community to design this feature will not be considered for acceptance, due to the nature of this project.

This project, much more so than any of the others, required you to work very closely with a team member to define how this project would work. You turned up in IRC chat a couple of times and asked some questions and just as I was starting to explain the project to you, you left and didn’t come back.

And the few things that I was able to convey to you did not end up in the proposal at all. Worse yet, you didn’t understand what the Spotify release radar really does:

And as for release radar, my understanding of it comes from Spotify’s automatic playlist with the exact name: it should tailor to the user’s listening habits, including the latest music from use’s following artists, and add new blood to the user’s listening experience.

Put succinctly the release radar is a playlist of tracks that have just been released (in the last week or so) that you might like. It is clear from your proposal that you didn’t understand what we had asked.

Given that you didn’t sufficiently engage us while defining this project and that you proposal doesn’t really implement a “release radar”, I will not consider this project to be a valid submission for the release radar idea.

However, let me consider your idea more as a general “make a recommendation playlist” for a user, using whatever tools we can find. This is totally fine, since GSoC applicants can propose their own ideas.

Extrapolating from the little relevant information you provided in your application you vaguely mention both collaborative filtering and a nearest neighbor algorithms. While we have a collaborative filtering algorithm available to us you didn’t seem to explore how to interact with it in your project. Your proposal suggests using both of these algorithms without really having enough insight as to how difficult it is to implement a collaborative filtering algorithm that works on our scale – let alone doing so in a 175 hour project. Furthermore, nearest neighbor solutions simply do not scale to the size of data that we need to process.

If you were to attempt to revamp your entire proposal in a day, I would suggest that you throw out everything but the personal information and start fresh, telling us in detail how you would implement a release radar without telling us about how our our own systems work.

Sorry for the load of negative feedback, but good luck!

2 Likes

Hi Mayhem,

Thank you a lot for the strict but absolutely meaningful comments! It helps me reflect within or without the context of the GSoC project. Please bear with me as I wish to give some clarifications and explanations in vain about the problem statement, hoping to somehow improve this proposal.

  1. I hope to first confirm with you the definition of a release radar. I think I want to convey three points in my proposal, but if it’s misrepresented:

    • Every track in the release radar is released within the last 2 weeks, which should be ensured by the release API endpoint.
    • Every track in the release radar should be somehow related to the user’s preference, i.e., the user is likely to love every track in the release radar after listening to it.
    • Update every Friday.
  2. Spotify’s description is helpful for me to double-check since the definition of Release Radar is the lighthouse of this project.
    I can summarize them with 2 key phrases: “new music” and “favorite artists”. From here I get an intuition to focus on “artists” to solve the second point mentioned above. Since the project description shows the recently released tracks are obtained only “given users top artists list”, I think we can ignore the “sprinkled in with some new discoveries” feature in Spotify’s version for now.



    Say Hello to Release Radar

    Getting music on Release Radar

  3. I hope to renew and rephrase my proposal briefly as follows. My proposal hopes to achieve the three points as three tasks one by one:

    • As for the first task, this should be done by the release API. As expected, this API should take in the artist’s mbid (arid) and fetch a list of mbids of the artist’s recordings latest released with restrictions on the “date” of the recordings, within 2 weeks before “today”.

    • As for the second task, according to the project and Spotify’s description, this “might be interesting to the user” thing should be achieved by a selection method, which returns a subset of the data fetched by the first task. To create a subset that satisfies “not select ALL the tracks from a new release, but instead to pick some tracks that we think might be interesting to the user”, I thought of 2 methods in my proposal. Now, let’s use A to denote the list of recent releases for a given user in the first task.
      1. The first method is to create another meaningful playlist and find its intersection in A as the subset. And the choice of another playlist should show positive contributions in “might be interesting to the user”.
      2. The second method is to find a meaningful way to sort A and apply a cutoff of the top N tracks.
      3. As for the third task, I haven’t come up with the exact workflow solution for automatic deliveries of release radar for now.
        • Use methods in yim_patch_runner to run release radar patch for all users
        • Pass results in the previous step as JSPF format to the user’s playlist using LB API /1/playlist/create
        • Use cron entries to run the previous step iteratively once a week
  4. I’m truly sorry for the off-and-on conversations on the IRC. I’m not so used to the live chat design of the website, so it often timeout, and I forgot to reconnect, but I always check the chat logs to catch up with the chat flow. Anyway, let me recap the limited conversations I had there and possibly point out my own mistakes here:






    Basically, there are five things that I did, thankfully with the help of you guys:

    • I asked about the failure of tests when running through every patch in troi to see how each line of code works and get familiar with the usage of pylistenbrainz package & api.listenbrainz.org.

    • I asked about issues with current patches and fixed some of the bugs.

    • I asked about the usage of API in this task, getting the possibilities of using MB API and Bono (which is wolf now). I assume using wolf is a way of using MB API, and that is how I try to finish the tasks in my proposal, I doubt if it’s a correct understanding though.

    • As for your requirements of

      “the MB search API contains the release dates for everything. so it is a matter of crafting the right query to fetch the data and then fetch the top users for a user from LB. then see what intersects.”

      I guess in my proposal the “new music” task regarding “release dates” and “crafting the right query to fetch the data” is solved by using the API provided by lucifer.
      Then it comes to “fetch top users for a user”, at first I thought of it as an implementation of the get_user_recommendation_recordings in LB API, since the CF works in this way. Thus I wrote in my proposal in method 1 to find the intersection of tracks both in LB recommendation and MB new release. However, now I doubt if it’s the right interpretation, please correct me. If this is not correct, I have another interpretation now, please see 5.2.

    • I asked about the usage of an LB API using to fetch new release recordings and was told to use this API with expected outputs upon queries.

  5. I haven’t got much experience working with so large scale of data before, so it’s a very good point that you mentioned the nearest neighbors through sparse matrix doesn’t work for MB’s scale. To this point, I have already realized that my whole part about sort-based methods using “distances” should be dropped. And it seems workable to focus on the first method mentioned in 3. only:

    The first method is to create another meaningful playlist and find its intersection in A as the subset. And the choice of another playlist should show positive contributions in “might be interesting to the user”.

    There are 2 ways that I can think of now to generate another useful playlist before finding an intersection: [WIP]

    1. Generate another playlist by LB recommendation API: GET /1/cf/recommendation/user/ (user_name )/recording, with artist_type = 'top' in the params, because all the tracks returned in A are by top artists.

      The requirement of using this method is to ensure there are ‘implicit ratings’ for newly released songs as well.

    2. First find the most similar user of our target user by LB API GET /1/user/(user_name)/similar-users with the largest similarity. Then generate another playlist by release API using this most similar user.

      Getting its intersection with A means searching for artists that suit your music taste. This list of artists is different than top_artist because the preference for artists here is manifested by common interests with similar users instead of just listen counts.

    3. First find the target user’s top recordings by LB API /1/stats/user/(user_name)/recordings. Then filter A by artists who appeared in the user’s top recordings list as well.

      This comes from my assumption that users might want to listen to other songs from the artists of their recent favorite songs first. Also, the artists in top artists but not in top listenings may not be of the user’s recent interests (e.g., the user listened to all the songs of the artist, each song listened only once, then the user abandoned the artist because it’s not his/her taste). The intersection is generated by a subset of top artists.

    I’ll clear out all the algorithm-related proposals and focus on combining the results of different queries to make it work if this makes sense to you.

  6. I also did a draft patch program for the release radar using method 2 if it ever shows any relation to this project. [WIP]

    • new API usage is based on the demo of API outputs

    • Draft code for release_radar.py in ~/troi/patches

      
      import click
      
      from troi import Element, PipelineError, Recording, Playlist, Release
      import troi.listenbrainz.recs
      import troi.filters
      import troi.musicbrainz.recording_lookup
      
      
      @click.group()
      def cli():
          pass
      
      class RecordingsFromReleasesElement(Element):
          '''
                  Taken a list of releases, spit out a list of recordings.
          '''
      
          def __init__(self, skip_not_found=True):
              Element.__init__(self)
              self.skip_not_found = skip_not_found
      
          @staticmethod
          def inputs():
              return [Release]
      
          @staticmethod
          def outputs():
              return [Recording]
      
          def read(self, inputs):
      
              releases = inputs[0]
              # API WIP: implement /1/cf/release/{}/recording
              
              recording_list = []
              # ...
              return recording_list
      
      class PlaylistIntersection(Element):
          '''
              Find intersection between the latest release playlist and another playlist.
              Taken a list of playlists, spit out a list of recordings.
          '''
      
          def __init__(self):
              Element.__init__(self)
      
          @staticmethod
          def inputs():
              return [Playlist]
      
          @staticmethod
          def outputs():
              return [Recording]
      
          def read(self, inputs):
              playlist1 = inputs[0]
              playlist2 = inputs[1]
              
              intersection = []
              # ...
              return intersection
      
      class ReleaseRadarPatch(troi.patch.Patch):
          """
              Taken a user's top artitsts' recent release playlist, filter it by similar user's taste.
          """
      
          @staticmethod
          @cli.command(no_args_is_help=True)
          @click.argument('user_name')
          def parse_args(**kwargs):
              """
              Generate a weekly playlist from the ListenBrainz recent release recordings.
      
              \b
              USER_NAME is a MusicBrainz user name that has an account on ListenBrainz.
      
              """
      
              return kwargs
      
          @staticmethod
          def outputs():
              return [Playlist]
      
          @staticmethod
          def slug():
              return "release-radar"
      
          @staticmethod
          def description():
              return "Generate a playlist every Friday of selected tracks released within last 2 weeks " \
                     "by top artists of a given user."
      
          def create(self, inputs, patch_args):
              user_name = inputs['user_name']
      
              # Get complete recent release playlist of the given user
              # --- API WIP
              releases = troi.listenbrainz.recs.UserRecentReleasesElement(user_name=user_name,
                                                                                artist_type='top',
                                                                                count=-1) # spits out a list of releases with mbids
      
              recs = RecordingsFromReleasesElement(skip_not_found=True)
              recs.set_sources(releases) # turns releases into a list of recording with mbids
      
              r_lookup = troi.musicbrainz.recording_lookup.RecordingLookupElement(skip_not_found=True)
              r_lookup.set_sources(recs) # turns recordings into a list of recordings with more data in MB
      
      
              # Get the most similar user of the given user
              similar_user_name = troi.listenbrainz.recs.UserSimilarUserElement(user_name = user_name) # spits out a most similar user to the given user
      
      
              # Get complete recent release playlist of the given user
              # --- API WIP
              releases2 = troi.listenbrainz.recs.UserRecentReleasesElement(user_name=similar_user_name,
                                                                                    artist_type='top',
                                                                                    count=-1)  # spits out a list of releases with mbids
      
              recs2 = RecordingsFromReleasesElement(skip_not_found=True)
              recs2.set_sources(releases2)  # turns releases into a list of recording with mbids
      
              r_lookup2 = troi.musicbrainz.recording_lookup.RecordingLookupElement(skip_not_found=True)
              r_lookup2.set_sources(recs2)  # turns recordings into a list of recordings with more data in MB
      
              # Filter release radar of given user by the intersection of playlists of the two users
              release_filter = PlaylistIntersection()
              release_filter.set_sources(r_lookup, r_lookup2)
      
              # Check for insufficient data to generate release radar
              release_filter.check() # spits out error if not intersection found
      
              # Limit the size of the playlist
              shaper = troi.playlist.PlaylistRedundancyReducerElement(max_num_recordings=25) # To ensure at most 25 tracks
              shaper.set_sources(release_filter)
      
              # Shuffle the playlist
              shuffle = troi.playlist.PlaylistShuffleElement()
              shuffle.set_sources(shaper)
      
              return shuffle
      
      
    • Draft code for recs.py in ~troi/listenbrainz/

      # Proposed method of using new API in ~troi/listenbrainz/recs.py
      import requests
      from troi import Element, Recording, PipelineError, User
      import pylistenbrainz
      import pylistenbrainz.errors
      
      MAX_NUM_RECORDINGS_PER_REQUEST = 100
      
      class UserRecentReleasesElement(Element):
          '''
              Fetch recent release recordings of top or similar artists for a user from ListenBrainz
              within a certain range of time
          '''
      
          MAX_RECORDINGS_TO_FETCH = 2000
      
          def __init__(self, user_name, artist_type, range, count=25, offset=0):
              super().__init__()
              self.client = pylistenbrainz.ListenBrainz()
              self.user_name = user_name
              self.range = range
              self.count = count
              self.offset = offset
              self.artist_type = artist_type
              self._last_updated = None
      
          def outputs(self):
              return [Recording]
      
          @property
          def last_updated(self):
              return self._last_updated
      
          def read(self, inputs = []):
              recording_list = []
      
              remaining = self.MAX_RECORDINGS_TO_FETCH if self.count < 0 else self.count
              while True:
                  try:
                      recordings = self.client.get_user_recent_release_recordings(self.user_name,
                                                                                  self.artist_type,
                                                                                  self.range,
                                                                                  count=min(MAX_NUM_RECORDINGS_PER_REQUEST, remaining),
                                                                                  offset=self.offset+len(recording_list))
                  except (requests.exceptions.HTTPError,
                          pylistenbrainz.errors.ListenBrainzAPIException,
                          requests.exceptions.ConnectionError) as err:
                      if not str(err):
                          err = "Does the user '%s' exist?" % self.user_name
                      raise PipelineError("Cannot fetch recent release tracks from ListenBrainz: " + str(err))
      
                  if not recordings or not len(recordings['payload']['mbids']):
                      break
      
                  for r in recordings['payload']['releases']:
                      recording_list.append(Recording(mbid=r['release_mbid'], ranking=r['listen_count']))
      
                  remaining -= len(recordings['payload']['mbids'])
                  if remaining <= 0:
                      break
      
              if recordings:
                  self._last_updated = recordings['payload']['last_updated']
      
              return recording_list
      
      
      class UserSimilarUserElement(Element):
          '''
              Fetch the most similar user for a given user from ListenBrainz
              based on listening histories
          '''
      
      
          def __init__(self, user_name):
              super().__init__()
              self.client = pylistenbrainz.ListenBrainz()
              self.user_name = user_name
      
      
          def outputs(self):
              return [User]
      
          def read(self, inputs = []):
              user_list = []
      
              while True:
                  try:
                      users = self.client.get_most_similar_users(self.user_name)
                  except (requests.exceptions.HTTPError,
                          pylistenbrainz.errors.ListenBrainzAPIException,
                          requests.exceptions.ConnectionError) as err:
                      if not str(err):
                          err = "Does the user '%s' exist?" % self.user_name
                      raise PipelineError("Cannot fetch recent release tracks from ListenBrainz: " + str(err))
      
                  if not users or not len(users['user_name']):
                      break
      
                  for u in users:
                      user_list.append(User(user_name=u['user_name'], user_id=u['user_id']))
      
              # sort user_list by similarity descending
              # ...
      
              return user_list[0].user_name
      
      
      
      
    • Draft code for client.py in ~/pylistenbrainz/ class ListenBrainz:

      # Proposed method in ~/pylistenbrainz/client.py
      def get_user_recent_release_recordings(self, username, artist_type='top', range = 'week', count=25, offset=0):
          """ Get recent release recordings for a user.
      
          :param username: the username of the user whose recent releases are to be fetched.
          :type username: str
      
          :param artist_type: The type of filtering applied to the recent releases.
                              'top' for filtering by top artists or
                              'similar' for filtering by similar artists
          :type artist_type: str
      
          :param count: the number of releases to fetch, defaults to 25, maximum is 100.
          :type count: int, optional
      
          :param offset: the number of releases to skip from the beginning, for pagination, defaults to 0.
          :type offset: int, optional
      
          :return: the recent releases as other data returned by the API
          :rtype: dict
          """
      
          if artist_type not in ('top', 'similar'):
              raise ValueError("artist_type must be either top or similar.")
          params = {
                      'artist_type': artist_type,
                      'count': count,
                      'range': range,
                      'offset': offset
                   }
          try:
              return self._get('/1/cf/releases/user/{}/recording'.format(username), params=params)
          except errors.ListenBrainzAPIException as e:
              if e.status_code == 204:
                  return None
              else:
                  raise
      
      
      def get_most_similar_users(self, username):
          """ Get a list of most similar users for a user.
      
          :param username: the username of the user whose similar user list is to be fetched.
          :type username: str
      
          :return: the similar user list as other data returned by the API
          :rtype: dict
          """
      
          try:
              return self._get('/1/user/(user_name)/similar-users'.format(username))
          except errors.ListenBrainzAPIException as e:
              if e.status_code == 204:
                  return None
              else:
                  raise
      
      
  7. Last, as a wish, even if this proposal is invalid with respect to the “release radar” project, I still hope to apply for a project, either self-proposed or assigned for ListenBrainz, and I’m also willing to contribute to the project outside GSoC so I’ll stay tuned!

Thanks again for the suggestions and help!

Sincerely,
Sivan

1 Like

And thanks for @lucifer’s suggestions, I’m writing the pseudocode of the draft patch here. I’m implementing the method mentioned in 5.2 in my previous reply in the code.

  1. Fetch a list A of recently released recordings of the given user by top artists
  2. Get the most similar user of the given user
  3. Fetch a list B of recently released recordings of the most similar user by top artists
  4. Turn the two lists into playlists
  5. Filter A by looking for the intersection of the two playlists
  6. Make sure the intersection playlist has at least one recording
  7. Limit the size of the intersection playlist by 25
  8. Shuffle and return the release radar playlist
1 Like

This is your spin on things, not something the idea or I suggested. New tracks are well, new. There is very little known about them, so we have very little chance at evaluating would like it. This is another form of the cold start problem in music recommendation. Yes, it would be ideal to do this, but if there is no data known about it, this is practically impossible with the tools we have to our disposal.

Yes, this would make a proposal that could meet the criteria of the idea as originally suggested. But, I have my doubts that you can turn this into an adequate proposal that will convince me that you are the right candidate for this project. Perhaps it would be a better idea to give up on this year and try again next year.