How is user similarity score calculated?

That comes down to the same thing: The angle between the vectors depends on their relative directions. For example: if the angle between two vectors is zero, you know they are both pointing in the same direction.

Alas, I haven´t. It is quite hard to present the correlation coefficient in an intuitive fashion. The new similarity score you implemented is already a nice improvement, I feel. You may want to try some alternative similarity measures, like the Jaccard index. That measure is easy to calculate, but it is supposed to work with binary data, so doesn’t accommodate multiple listens of the same track :frowning_face:.

I wonder why it doesn’t consider mutuality. That is, running some sort of stable match algorithm such that each user sees the other at the same rank. Further, percentages could be normalized to a global percentile (ie, this match in top 10% on the site).

If unable to run stable match, at least locally combine user ranks before sorting. For example, use the worse of 1st on one’s list and 10th on the other (ie, 10) to sort.

Some sort of caching would help most computation concerns.

I also wonder about basic matrix factorization/completion and spectral clustering (as used for collaborative filtering and/or recommendation systems) for better handling of sparse data. Run correlation then on latent rather than raw/original user vectors.

1 Like

Welcome @joellark! Have you considered checking out the ListenBrainz repo and contributing improvements?

1 Like