Connect Listenbrainz Dataset with Musicbrainz Database

For my Bachelor Thesis I try to answer the question „How a musical change affects the customers of an artist“. For this I (in help with my professor) have already created a skipgram model.
For the analysis my professor said that I need to connect the Listenbrainz data with the Musicbrainz data in order to find the right artists, release_groups, …
However, I don’t know how that works. Can someone of you explain me step by step what to do? I have no knowledge in programming, using terminals, …
I would be very thankful!

3 Likes

The listenbrainz data set does not typically have musicbrainz identifiers so there is no direct match from one to the other.
The listenbrainz data is a track name string and an artist string and you need to fuzzy match these.

Tracks may have some extra labels such as “I Like to Move It (original mix)”
There can be one or several artists on the one track and an artist name can be one or many words ie " Reel 2 Real feat. The Mad Stuntman" so you may need to deal with that.

The basic approach would be:
download the listenbrainz json dump
parse the listenbrainz json
If the listen has a set of musicbrainz identifiers you are done, move on to the next recording.
Call the musicbrainz api and search for the artist, this may return one or many results so you will need to look at the results.
Do a recording lookup based on the artist and look at the results.

2 Likes

Having a simple “search for this work/recording on MusicBrainz” link next to each entry in listenbrainz would be a huge help. Especially if it even implements a “most listened artists/tracks/releases” statistics vew, like Last.fm does.

3 Likes