Hey,
I’m working on a research project for my bachelor’s thesis related to music recommendation systems. One part of my analysis involves comparing seed songs with their recommendations on Spotify, and for that, I need to retrieve a large set of songs with valid ISRCs to cross-match with the Spotify Web API.(I thought ISRCs are the best way to do the match?)
So far, I’ve worked with the public JSON recording dump, but it only contains a few thousand recordings with ISRCs — far from the millions that MusicBrainz is known to have.(Database statistics - Timeline graph - MusicBrainz)
Here’s what I’m trying to understand:
- Is there a way to access a more complete or full dataset of MusicBrainz recordings that include ISRCs?
- Are ISRCs only available through a full PostgreSQL DB setup?
- Is there an alternative dataset or dump that includes them?
- What’s the recommended way to extract this information efficiently (recording title + artist + ISRC)?
- Should I set up the full PostgreSQL DB and query
recording
+isrc
+artist_credit
tables? - Or is there a lighter-weight method that gives sufficient coverage?
- Any tips for matching MusicBrainz data to Spotify tracks via the Spotify API?
- I’m currently searching tracks using ISRCs, but in many cases, Spotify returns no match even though the song exists with a different ISRC or under slightly different artist names or doesn’t exist on Spotify at all.
I’d appreciate any insights, best practices, or recommended tools you’ve used for similar large-scale MusicBrainz extraction and ISRC-Spotify mapping workflows.
Thanks a lot in advance!