It all started when I was talking with a friend not from CS background about ListenBrainz. And found it hard to explain why would people put their entire listen history public. People are afraid that their listen history if accessed by someone they don’t want can cause them some trouble. E.g. If I’m listening to Eminem’s songs which contain explicit content may be seen by my parents which is not an ideal scenario (or the other way around). On the other hand if I could access the listen history of my friends and some stats like “Most listened track by your “N” friends is: ‘…’ ”. It will be interesting.
If most of my friends are listening to some song most probably I’ll like the song too.
But for that LB should keep information of my friends. Which is kind of what social networking sites does.
Here are the features that I’d like to see in LB:
- Search for friends using First Name, Last Name, not completely correct usernames. That way I can find out my friends.
- Have a way to keep my listens hidden from certain users. For example keep privacy to only friends or public or only me. Restrict certain friends. Restrict certain listens from certain friends. Maybe have some way to get information from MusicBrainz about if the recording has explicit content and don’t show it in my listen history.
- Have friends list.
- Having stats like “A lot of your friends are listening to “XYZ” song”.
- Having a way to recommend Listens to friends. I want to recommend “XYZ” song to my friend “ABC”.
- Having CB ratings by my friends displayed on my LB listens. So that I can have stat like “Your friends have given this artist 5 stars”. My friends rating can be more relevant to me then overall rating.
- Use CB ratings for stats Maybe have an option like “Sort by CB ratings” for recommended artists, release-groups, if possible recordings.
But while we build these features we have to somehow keep the data public. If data is not public then we don’t achieve a very important goal of MeB i.e. keeping data open to public.
One way to keep data hidden from certain users or giving access to some users while keep the data public is as follows:
- We provide tokens to user just like we have api-tokens for users which are private and used to submit listens.
- We keep a private mapping from username to these tokens. This is only known to LB.
- While creating public datadumps we replace usernames with those private tokens. That way people having access to data don’t really know which token represents which user.
I still don’t know how with this kind of data people will be able to play around in the recommendation engine. E.g. I won’t be able to execute queries on BQ like “select my friend ABC’s most listened 5 listens this week” because I don’t know what’s the secret token for my friend is. And usernames are not public there.
For that we will have to either restrict people to execute only queries which LB provides or send queries to LB with usernames and on the backend LB uses token-username mapping to modify the query and return the results. Don’t know how it can be done with so many security problems this may cause.
In case someone wants to change their secret token as they might think it has been compromised, we will have to change it in a lot of places which is somewhat expensive.
We can opt for not sharing the tokens to the user, that way we will not have to worry about this. But playing with data becomes less fun as user doesn’t know which listens belongs to the user in public datadumps.
As the number of listens and users grows it will become hard for people to figure out the reverse mapping from tokens to usernames.
For security we may argue that provided huge amount of resources even SHA, RSA etc. can be broken. The same applies to LB. Provided sufficient resources people can find out the reverse mapping. But for a regular user it is not possible.
In conclusion such a social networking twist will have some benefits like privacy for users, and recommendation based on friends listens. That way we may end up getting more user much quickly. As people will tell their friends to join LB to get recommendation based on their listens.
But it will not be easy to work with the data as new security and privacy concerns will come up.
This idea requires a lot of discussion only then we can figure what is feasible and what is not feasible.