when is was reading the blog post about the Syndication feeds i came across @mr_monkey 's generated playlists.
and i was jealous.
because the recommendations that he got would be the recommendations that i would want.
so i was going a bit deeper in his listen history and i found that he only has 27k listens.
in the other post about my rant of the blog (this one) @RustyNova suggested that i maybe be more mainstream than i think. because she has mostly electro dubstep stuff in her playlists. but for here again she only has 25k of listens.
and i would be lying if i don’t listen to a mainstream artist every once in a while.
but i would still expect that the playlist would be generated on the music i mostly listen to. and not the one most popular song i’ve listend to.
so thats why i’m wondering. are the weekly playlist based on my listens of last week? or are they based on my TOTAL listens?
if it would be my weekly listens, then if i listen to something totally different for just 1 week, i would get a TOTALLY different playlist the next week.
but if it is my total listens, then the playlist would probably be sort of the same every week.
Heh, I have yet to import my complete spotify listening history, which would backfill about 10 years worth of listens that I’m missing.
To calculate recommendations we take a user’s listens from the last 180 days (~6 months) and use a collaborative filtering algorithm to find tracks you might like.
Then we filter out any tracks that you have listened to in the past 60 days.
From that pool of potential tracks we generate the recommendations playlists.
This limit of 6 months of listens is to ensure recommendations are up to date with a user’s tastes, but also because these recommendations calculations take in every user’s last 6 months of listens: we can’t put all of the listens in LB with our current infrastructure, that would be much too much data and processing times.
No, that is not how it works. The main problem is that the user base is quite small, which means there is not enough statistical power to make recommendations for all but the most popular genres. Reducing the 6 month listen window will result in even less statistical power.
so how i think that it works:
there is a magical thing that creates suggestions. and then you give it an input (the listens of a user) and that generates a playlist.
that magical thing that creates suggestions can still be based on 6months of every user listens data. but the input you give to get a personalised suggestion can be a week of user data data.
but then again
if ALOT of people list to song A & song B. and you listen to song A, the chance that you would get song B suggested will be extremely high.
BUT
if you haven’t listened to song A recently, it wouldn’t suggest song B.
(i don’t know if what i’m saying is true or false, but this is how i guess it all functions. if somebody can really explain how it work that would be nice. i’m just trying to understand why i don’t get recommendations that are usefull while others are. and i find the explenation “there is not enough users” an insufficient explenation.)
It will be pretty hard to explain this without getting into the gory mathematical details, but the bottom line is: We need more data!
Because a large fraction of the users will have popular music tracks in their history (by definition), sufficient listen data can be collected in a relatively small user base. This is not the case in genres with fewer listeners, since there will be too few data points to decide whether there is a positive association between listening to song A and song B. In the current set up, the easiest remedy is getting more listeners in those rare genres. Another possibility to get more data would be increasing (not decreasing!) the listening window. But, as @mr_monkey explained, this has the drawback that the algorithm will be slow in picking up changes in listening behaviour. In addition, these pesky musicians have the nasty habit of continuously releasing new music so you don’t want the window to be too large to respond to new releases.
tl;dr the solution to better recommendations will always involve more data, either through more listeners, or by relying on other types of data (e.g. relationships or musical content).
i’m sorry if i sound hard but i think i prefer the gory mathematical details, then just another “we need more data” i know that by know…
my magical thing that creates suggestions is what @mr_monkey described as "the user listens from the last 180 days and use a cllaborative filtering algrithm th find track you might like.
then my input to generate the playlist is what @mr_monkey describes as we filter out any tracks that you have listen to in the past 60 days.
so i think my explenation on how i think it works is correct.
this talks about the magical thing which in my opinion shouldn’t be changed to get better recommendations.
i want to change the input, (the 60 days of listens you listend to)
if this explenation is incorrect then please point out at what point it is incorrect.
currently we use 60 days of data to generate the playlists. which means that if i would listen to 1 genre for the next 60 days my recommendations would change to that genre.
or, if i would listen to the EXACT same music as @mr_monkey for 60 days i should get similar recommendations as @mr_monkey (the recommendation i am jealous off)
if we change the 60 days of data to generate the playlist to 10 days, i only need to listen to 1 genre for the next 10 days to see a drastic change. (i only think the 60 days is so there are enough songs to filter the suggestions, (for people who only have a few listens per day)
OR maybe instead of filtering ALL the songs i’ve listend to in the past 60 days,
only filter my top 100 played songs of the last 60 days. that way the mainstream song that i listend to every once in a while will not be part of the filter to get the list of possible recommendations.
Perhaps I misunderstand, but I am under the impression that you believe the playlist to be recruited from the tracks you listened to in the previous 60 days. However, my understanding is that the filter is against the tracks you listened to in the past 60 days in order to ensure that you are served recommendations you haven’t played recently.
You have ha big pool of songs that are linked together… this pool is calculated by the listens of the last 180 days of all the users…
Then you filter out the songs you have played the last 60 days. This gives you a list of the songs that are linked to the songs that you listened to in the last 60 days.
Also there are 2 playlists generated:
exploration: songs you havent listened before
Jam: songs you have listened before
So i guess of the list of linked songs, you check if the usere has already listened to it before or not and then you split the list in 2…
Disclaimer: I was not involved in the development of the recommendation system so I am just making educated guesses.
Check! But this is an important step, because I believe the raw recommendations are already generated here, by ranking the songs according to the listening history of similar users.
No, I read that as that songs you listened to in the past 60 days are removed from the list of raw recommendations: they are filtered against.
can you explain what you mean by that? how do you filter something against something?
if i have a list with names for instance [piet, jan, jaak, fons,…] and i need to filter names that start with a J i get [jan, jaak]
if my pool of recommendations is like this:
[
{song 1 - song 2},
{song 3 - song 4},
{song 5 - song 6},
{song 7 - song 8},
{song 9 - song 10}
]
and my listens thes last 60 days was [song 1,song 6,song 7]
so i filter them and i get:
[
{song 1 - song 2},
{song 5 - song 6},
{song 7 - song 8}
]
so my recommendations would be [song 2, song 5, song 8]
that how i think filtering works. you have a list and you see where there are matches.
but if you remove the stuff you have listened to, then then the only thing you have left is ALL the songs that are linked to stuff you havent listened to. so you would end up with eveything you don’t want?
I’m also not deeply involved in the playlist generation stuff. But what I can see from the source code that generates the weekly / daily jams and exploration playlists it filters out listens of the last 60 days from the generated recommendations.
This gets calculated, as monkey said above, by some collaborative filtering based on your listens for the last 180 days. Note that this includes the last 60 days. The recommendation can contain tracks you have listened to or that are considered similar to what you have listened to.
For the jams playlists it then uses only recordings you have listened to before. The exploration are the opposite and contain only listens you haven’t listened to.
It then also excludes the recordings you have listened to in the last 60 days. I think the intention is to not make the playlists very repetitive every day. It also excludes any recordings you have explicitly marked as “hated”. This gives users an actual way to never get a specific song recommended.
maybe the main issue is that the recommendations feel like a black box.
it just magicly give you some recommendations, but it doesn’t give yu the information why it recommended the artist.
if you look at last fm, they tell you at the bottom of each track why they recommended it. would be nice if listenbrainz could do that. then maybe i understand why i get the recommendation i get.