Problem working with JSON data

philipnye · September 27, 2024, 9:20am

I’m attempting to work with the recordings JSON data dump but am running into a problem.

I’ve downloaded the 20240925 data file (recording.tar.xz) from this page and unzipped it using 7-Zip. That results in a ~1.1GB data file along with the relevant README etc. files.

Using the pandas Python package to read the data file, however, I’m only getting 125,693 lines(/recordings) - when this page leads me to think I should be getting 33m.

Is anyone able to provide suggestions on what I might be doing wrong - or confirm that the JSON data dump does contain all the records it’s supposed to?

Happy to post more details including the Python code I’m using if that’s helpful.

philipnye · September 27, 2024, 11:00am

@reosarevok kindly solved this question for me on Discord: the recordings JSON file only includes standalone recordings, with the rest available under the appropriate release. This is mentioned on Development / JSON Data Dumps - MusicBrainz (under mbdump/entity).