GSoC Pre-Proposal: Serendipitous Recommendation System

Hey guys! My name is Julia and I am last year comp sci student at UNC Charlotte, going to Northeastern in Fall 2025 for grad school :smiley: My major area of research during undergraduate studies was in data visualization, particularly in the context of explainability to users and in the context of social computing: how do users interact with data and share data that they receive in visualizations? That being said, I also have interest in recommendation systems and how that information is displayed to users.

Recommendation systems typically fall into two categories; collaborative recommendation systems and content-based recommendation systems. We can observe the collaborative paradigm in applications like Spotify or Apple Music, which both allow users to exchange their listening data with friends, and both recommend music based on the traditional ā€œshared interests between usersā€. My major focus this past semester has been on content-based recommendation systems, and novel approaches to these systems.

This is where I believe my involvement in development can be beneficial to ListenBrainz. I propose a novel content-based recommendation system that is informed by semantics embedded within album descriptors, album reviews, and album cover art. To further break this down I will outline how I envision such a system working, and how these three inputs will allow for serendipitous recommendations for users.

A recommendation system typically benefits from having some semblance of what a user already enjoys (this is to solve the ā€œcold-startā€ problem where a system doesnā€™t know where to begin recommending). Since ListenBrainz already has this feature by allowing users to import previous data, the main focus then shifts towards recommendations, particularly content-based recommendations. Commercially available albums can be viewed through the lens of ā€œcorrelationsā€ between actual musical content, and what a user wants to listen to. Tags are the most common form of this correlation and they are something that has been beaten to death within environments such as Spotify. One of the most common issues that arises from this is a filter bubble where a user is trapped within an echo chamber essentially. A solution to this issue could be the introduction of serendipity through various new means.

The idea here is to rely on existing information from publicly available digital libraries such as wikipedia, and to extract hyperspecific features instead of relying on short lists of tags that may be too vague. Based on review language descriptions, new and insightful semantics can be extracted which can offer richer mood based descriptions of a particular album. We can combine these parameters with yet another point of metadata that often goes overlooked in album recommendations; album art. Album art is but an extension of an album. It offers a completely new dimension of semantics that can help users understand just what exactly theyā€™re getting into. A great example could be the 2000ā€™s digital minimalism of brat which evokes emotions of nostalgia, envy, and general nonchalantness. Perhaps we see the cover of ā€˜The Dark Side of the Moonā€™ and weā€™re filled with a sense of inexplicable awe. With a computer vision approach, we can attempt quantify these semantics through the use of basic visual design elements such as color, shapes, value, texture, but also visual design principles such as unity, hierarchy, contrast, and dominance. If we combine these features with semantics derived from text, and opt to offer an explainable approach for recommendations we can help offer users a sense of agency they might feel is lost in the age of artificial intelligence. This will encourage users to explore albums as an art, and can encourage users to break out of filters they exist in.

Please let me know if you have any questions, or any feedback for this idea!

5 Likes

Would love some more visualisation in LB!

Iā€™m not quite what the problem is that you would like to solve here, and what the final output will be. For instance, we currently generate playlists, generate artist ā€œwebsā€ on artist pages, and can explore albums by cover art colour.

I recommend being very specific with your proposal - there will be a flood of vague and flowery AI-written or assisted proposals this year. ā€œHardā€ code/information is what will make you stand out. I should also note that our devs can sniff out AI-written code from a mile away, and it may be grounds for an instant project fail (if the student doesnā€™t understand the code that theyā€™re delivering, and havenā€™t been open with their mentor).

Further reading, re. the common pitfalls of GSoC applications :smiley:

1 Like

Thanks so much for the reply!
The problem Iā€™m trying to solve is the filter bubble problem where often times users remain trapped in the same genre/subgenres and have trouble finding new music. The final output would be a new method of recommendation that focuses on interpreting semantic meanings (from both text and images) to allow users to find music across different genres. Iā€™ve worked with a professor at my university who does research in this area and heā€™s pointed me in a few really interesting directions for these sorts of recommender systems.

Where would I be able to contribute to any existing issues pertaining to data visualizations at the moment? If thereā€™s anything specific you had in mind in this realm Iā€™d love to take a look at it! Iā€™m definitely flexible in my project proposal :slight_smile:

FYI, Iā€™m not a mentor, but I guess Iā€™m not clear where this sits/how it exists with existing recommendations, or if this is something new.

Have a look at the daily/weekly playlists we generate for users:
my playlist page
Which is generated by:
listenbrainz radio

Huesound is a piece of data we generate based on album artwork:
huseound browser

We also have a similar artist explorer/web on artist pages:
example artist page

We donā€™t do that much visualisation stuff, probably the closest thing at the moment is @anshgoyal31ā€™s ongoing work on the genre explorer/web (the artist explorer/web, but for genre):

Itā€™s not data-driven but Iā€™ve always thought this is a fun ā€˜visualā€™ ticket, I have mockups that are ready to be implemented:

i would love to get artist recommendations from other bands that the band members are in.

or recommendations on bands that have frequuently played support for each other or something like that.

this is already data that is in music brainz but there isnā€™t a clean way to visualise this in listenbrainz yet

Oh wow I love the idea for the genre explorer with that sort of mapping out feature! The daily activity heat map is also a really interesting way of actually visualizing temporal data in the context of listening. I really like that dimension. I suppose that there could be data visualizations that encourage users to inch out of their zones?

The mockups for the grids and the cards do look promising, and I especially like the ability to breakdown different genres and subgenres. The idea to implement customization in these cards is definitely a draw though, especially if this is something that can be shared with other people. Thereā€™s this really interesting book I read a while back by Ethnographer Nick Seaver called ā€œComputing Tasteā€ that really delved into how these large streaming companies quantify taste. At the end of the day I really appreciate how this project is aimed at giving autonomy to the user, something that I feel is really important in the age of big AI.

I can create some mockups for this recommendation system, but perhaps it can allow users to explore recommendations through interactive visualizations?

That would be cool! Thom Yorke is part of so many side projects itā€™s hard to keep up haha.

@aerozol Oh I see that thereā€™s already a cold start issue created!

https://tickets.metabrainz.org/browse/LB-1671

a suggestion, if youā€™re looking to include reviews in this recommendation algorithm, you might wanna take a look at CritiqueBrainz, since itā€™s open data and also based on MusicBrainz data (being a sister project of ListenBrainz), therefore easy to match to artists, release groups, labels, and recordings for a ListenBrainz recommendation

Thanks for the suggestion! Iā€™ve looked at CritiqueBrainz, and while it would definitely be way simpler to link entries based on their RDMS entries (PostgreSQL I assume), I feel that the main issue comes from the lack of reviews on CritiqueBrainz :frowning: . My initial idea was to scrape sentiments from critic reviews off of Wikipedia using NLP methods, due to how much information that can provide.

1 Like

true, there arenā€™t yet many reviews on there (The Dark Side of the Moon only has 5 currently), but I think it would still be good to include. it might help to alleviate the chicken-and-egg problem of not enough people know about the site, therefore there arenā€™t many reviews

1 Like

I think relying on reviews would be a problem - youā€™d have to find an open source resource that you can scrape reviews.

Maybe thatā€™s possible, but I suspect that even if you could scrape every review, it would leave some large gaps. Listeners want recommendation systems to give them things they canā€™t ā€œdiscoverā€ themselves (sandwiched between their favourites, afaik), and itā€™s usually popular/known releases that get reviewed the most.

e.g. If the source has no worthwhile k-pop or death metal or latin etc reviews to scrape, then it becomes a worthless part of the equation for those listeners.

P.S. the forum is great for community interaction but definitely ping a mentor if you want to check if this path is worth following. Find links to the chat channels where the devs/mentors are here: Development/Summer of Code/Getting started - MusicBrainz Wiki

P.P.S. even if you donā€™t take part in GSoC, you are always welcome to play with the data/make pull requests - if something interests you and you are looking for project experience. If you want to create a new toy without worrying too much about how it integrates with everything else, it can always live on its own in the LB ā€œExploreā€ section

Thank you so much!! I stumbled here by mistake because I was looking at creating this system for a personal project, and lastFM ended up pointing me to this open source foundation.

Yeah I definitely see your point with the reviews, which is why Iā€™m not going to limit to one specific editorial website like Pitchfork or whatnot. My question is, doesnā€™t Wikipedia already aggregate from different editors offering that diversity?

Iā€™ll go ahead and look into the mentors :slight_smile: . Worst case, I definitely will still look at contributing because this is one of my interests in comp sci research and this community seems to be a really great place to expand that skillset.

3 Likes

hiya!

I like where you are going with your proposal, it fits nicely into what I wish to see in LB in the future. However, your idea feels ambitious and a bit unfocused, so lets see if we can improve on that.

First some comments to set the stage:

  1. We donā€™t usually accept projects that have large UI components to them. Now that we have a designer on our team, we need to make sure that our work fits into the design system weā€™ve already established. We found that in the past the challenge of coordinating designer, mentor and student to be challenging and frustrating to everyone involved. That said, if your work has a UI component (clearly it does), the we should not focus on that and let our team create those bits.
  2. It is important to me that projects are realizable in the short time we have and that they continue to live and be useful in our ecosystem after you move on. Given that, we should be thoughtful on how to define the goals for this project. We can do a back-end implementation over the summer and if our team has capacity we could even work on the UI in parallel ā€“ if not we can work on it in the autumn. The most important thing is to ensure that the project gets finished.

Now onto specific next steps ā€“ Iā€™d like to find out the following things:

  1. What algorithm are you planning on using? Are there software packages that already do what you need? Apache Spark is our backend for these types of calculations, so would Spark be a suitable candidate for this work?
  2. What are the inputs needed to the alg? What are the expected outputs? I need to understand this in detail so that we can work out if we have enough data to make this work. I know where to find the data, but right now I am not quite sure what I am looking for.
  3. What is the Minimally Viable Product that we can aim for this summer? How can we do the least work and still deliver a functioning component at the end of the summer?

Overall, this is a good opportunity for both of us, I think. Given that, Iā€™m willing to work alongside with you to help deliver a functioning component at the end of the summer. We just need to be very clear as to what weā€™re building.

2 Likes

I think starting with one source, namely Wikipedia is a good start. Adding more sources to improve the data long term can be done later. Just writing the scraper, scraping the data, processing it and then storing it in a DB is quite a task.

In fact, that is something I can work on and let you focus on the semantic recommender, rather than doing web scraping heavy lifting.

1 Like