Bandcamp as source of AcousticBrainz data

Ran new version of tag submitter, now it submit all tags.

Also lossless AB submission in process, will took 2 or 3 weeks. Now i work only with album BC pages, but script ready to work single tracks pages (


Submit 206629 tags and found bug with non-latin chars. Now need to resubmit it all


Agree to add all the tags since as described @Freso that’s the nature of those systems and they are designed to handle those. The only issue may be agressive terms to filter but guess Bandcamp already have a system for that (TBC).

We had:
“rock & roll”
rock roll
rock & roll
rock& roll
rock and roll
rock ‘n roll
rock ‘n roll.
rock ‘n’ roll
rock ’n’ roll
rock n roll
rock n’roll
rock n’ roll
rock’n roll
rock’n’ roll
and some amount things like “rock; hard rock; heavy metal; guitar; rock and roll” (it’s one tag).
I think we need some tags clean up


… and also a hierarchy, to allow queries on root genres without the need to list the whole “family”

Relevant tickets:

I thought there was a ticket for this but the closest I could find was this one:

Anyway give some of those a vote :relaxed:


Hierarchy is complicated, merge all rock-n-rolls to one, find other tags like this, slit taglines is easy and can be done in semiautomatic way

This is the reason why it seems time to make a decision about moving from folksonomy tags and genres to a structured taxonomy.

700106 tags submitted!


if someone begins to do this for spotify let us know. Maybe I will try.


I thought about it. I think Spotify not so tolerant to downloading as Bandcamp, so it can be harder and slower. But maybe subset of releases without AB data, without BC link and with Spotify link not so big, then speed not a big problem.
Will be awesome if you try to do it

great project, i also thought about how useful the 128kbit streams could be for AB!

did not see it being brought up, but especially if you start to download the lossless files constantly, Bandcamp might notice a regular increase in traffic. maybe you want to contact them before. or maybe i am too cautious.

Spotify should use Tagtraum genre annotations, which should already be in the AcousticBrainz Genre Dataset: I don’t know who is working on it now.

If you don’t ask permission you can’t be refused :wink:

Anyway average speed i can get from BC is 50 mbit/s, i don’t think it is notable «increase in traffic» for BC. And i constantly download FLACs for last week (and MP3 for two months) or so using only two (static) IP addresses. If BC don’t like my activity they can easily block me

One million tags submitted. 150k Bandcamp pages downloaded and parsed, about 200k pages wait to be downloaded, it will took at least 30 days


Since last post i finished big Bandcamp downloading, now i have on disk 240k Bandcamp pages (10 Gb). Tags from downloaded pages (+700k tags) submitted to MB. There is another useful information in these pages (see above), but i really don’t want write bot to submit it to MB. If someone want use this information i can share pages, metainformation and some parsed information from these pages.

Also submit to AcousticBrainz and AcoustID 5k releases, another 10k releases in progress.

Hope to the and of year number of unique recording on AB will exceed 7 millions


My CPU busy now, so i start submitting to AcoustID releases that have AB data but don’t have AcoustID fingerprints. It’s much easier to CPU i can do it much faster then AcoustID+AB submitting. There is 142k such releases


138k releases was submitted


Done. There is 7,000,025 unique records in AcousticBrainz


so, I already asked elsewhere, but I figured I’d ask here too…

is there any chance of re-running the Bandcamp tag import portion of this project on newly added Bandcamp releases?