I’m really excited about ListenBrainz. My background is in Python and AI research, and I want to help solve the “Cold-Start” problem for new artists. Right now, recommendations depend a lot on listening history, but I want to use MusicBrainz metadata (tags/genres) to create a Semantic Similarity Engine using Sentence-BERT and pgvector.
This is an interesting proposal but I suspect that MusicBrainz tag density follows a heavy power law i.e. the artists who most need content-based discovery are exactly the ones with the sparsest tags. If an artist’s metadata is just [“Artist Name”, “USA”, “Rock”], generating a 384-dimensional vector from that isn’t very meaningful. So I’m not convinced this would solve the cold start problem.
Then again, I haven’t analyzed the meta-data. If you haven’t done any sort of clustering analysis on the meta-data/tags it may be worth it before investing time in all the aspects of the project.