I have read the API documentation here: API - MetaBrainz Foundation and I’ve got a couple questions about the incremental data dumps.
It seems like the incremental dumps are only for the last hour. Am I understanding that correctly? If I downloaded a full dump on Saturday, and use the number in the REPLICATION_SEQUENCE file 3 days later, will the incremental dump only contain an hour’s worth of updates? Or is it all data updated since that REPLICATION_SEQUENCE number?
If it is the last hour only, is there a way to get all of the updated entities since the full dump was completed? Or would I have to be pretty consistently downloading the incremental updates every hour after downloading the full dump?
Yes, each incremental dump contains an hour’s worth of changes. You’ll have to download them hourly to get all of the changes since the last full dump. The sequence number just increments by 1 each hour.
Would I risk to hit the rate limiter? I won’t do hourly updates, but might do them once a week or even less frequently. Which means there are easily hundreds of potential dumps. Also, while developing I’m probably hitting the same packets over and over again until the script is ready.
There’s no rate limit on fetching replication packets or data dumps (AFAIK). If you’re thinking of the MusicBrainz web service rate limit, that’s 1 req/s. It would make no sense to check for new incremental JSON dumps at that rate.
P.S. Are you (mherger) and derrickp the same person? (I was confused by both accounts asking the same question at the same time in the other thread, too.)
Thanks for the info on how the numbers in the replication packets work. I was thinking that might be the case when I looked at the differences between the Wednesday and Saturday but wasn’t sure. This will make it much easier than constantly downloading and parsing the full dump of data…
mherger and I are not the same person. I didn’t see the other question when i posted this thread. Or I saw one question but decided to split this question into its own topic, but did that before I saw the other messages in the other thread. Sorry!
Ah, a separate thread is fine, I was just curious because we don’t get questions about the JSON dumps that often, so it was odd to see two people asking similar questions about them at the same time. It’s good to see more people using them though.
Hehe… I got confused myself, not seeing some of my previous answers. I didn’t realise there were now these two threads. Now I have to figure out what thread to post in. I’ll be back in a minute.