AcousticBrainz datadumps

acousticbrainz
Tags: #<Tag:0x00007fe3d0c19d30>

#1

Hi, what is the current situation regarding AcousticBrainz data dumps,

the only full dump is still the original Jan 2015 one, and there hasn’t been an incremental dump since this one some time ago

http://acousticbrainz.org/static/download/acousticbrainz-lowlevel-json-incr-18.tar.bz2

It must be better if data could be downloaded rather than applications using the api and potentially overloading the server


Updated datasets or replication method?
#3

Could we have an update if there is ever going to be an uptodate data dump or not please.


#4

At the moment my schedule isn’t giving me time to get these dumps working. It’s something that we do want to provide, but I have other commitments that prevent me from getting into them.
All of the AcousticBrainz data is available through the API, so for now this is the recommended way of accessing the data.


#5

Okay, fingers crossed that you will have some time.

But let me restate the issue from a tagger perspective, given a single release two AB lookups (high and low) have to be done for each track, so for a 10 track album requires twenty lookups. This means looking up data from AB takes a significant about of time compared to matching a release, I assume this is also a problem for Picard.


#6

Currently not for Picard itself, as it does not (yet) use AcousticBrainz. But there are two plugins available that make use of AcousticBrainz data, and yes, the same issue applies. If you have both plugins enabled two requests per track are made.

But for Picard the data dump would not be a solution. You probably plan to setup your own service with that, but the better solution for all would likely be to have additional endpoints that allow you to query the data for multiple tracks in a single request.


#7

Yes years ago I integrated the data from the last data dump into my AlbunackDb. Not only can I process requests more quickly it also means I use less of AcousticBrainz bandwidth, but over time I have to do more and more lookups from AcousticBrainz since the data dump is now three years old.

Being able to lookup multiple recordings (or simply being able to lookup data for all recordings in a release in one go) would certainly be a big improvement to AB that I would like to see but this wouldn’t reduce their bandwidth usage, and I assume creating a new datadump would be much simpler than changing the api.


#8

Probably also more welcome in your use case :slight_smile: I just wanted to point out that this is not the solution for Picard, since both Acousticbrainz and Picard are Metabrainz projects there is no gain in setting up a separate server for Picard access.


#9

True, but it would be a helpful for any project outside of MusicBrainz.
Actually for Picard it would be useful if the data was part of the Musicbrainz database then I guess you could then Picard with MusicBrainz VM, but I can see it wouldn’t be very flexible if the database was merged so not a good idea.


#10

I see @iliekcomputershas managed to create new data dump, so would it be possible to download a copy of that, would be very helpful, thanks.


#11

Yes, we’ve finally got a partial dump working, but there are a handful of edges that we want to clean up before we publish it as an official dump. We have some hardware issues with the current AcousticBrainz server so our current priority is to get it migrated to a new server before we publish the dumps.


#12

OK, FWIW I am referring to full Database dump rather than full Json dump, a full Json dump would be less useful since I would have to parse it and then import into database anyway which would take alot longer than importing a straight db dump in the first place.


#13

Any news on this, for your own sake it does look pretty bad on https://acousticbrainz.org/download that the last download is January 30, 2015, If I was just to look at this page I would assume that the project had been discontinued.


#14

I asked the question at the summit and the answer was “soon”