Something wrong with effective rate limit?

mulamelamup · May 29, 2016, 11:39pm

Hi!

So…:

$ while true; do sleep 1; curl -s -A "just testing rate limit" "http://musicbrainz.org/ws/2/work/b1df2cf3-69a9-3bc0-be44-f71e79b27a22" | grep -q "UTF" && is="OK" || is="NOPE"; echo time=$(date +%s) ping$(ping -n -c1 musicbrainz.org | grep -o "=[0-9.]* ms") $is; done;
time=1464563850 ping=201 ms OK
time=1464563852 ping=202 ms NOPE
time=1464563854 ping=202 ms OK
time=1464563856 ping=200 ms NOPE
time=1464563857 ping=202 ms NOPE
time=1464563859 ping=201 ms NOPE
time=1464563861 ping=201 ms OK
time=1464563863 ping=202 ms OK
time=1464563865 ping=202 ms NOPE
time=1464563867 ping=202 ms OK
time=1464563869 ping=203 ms OK
time=1464563871 ping=201 ms NOPE
time=1464563873 ping=202 ms OK
time=1464563875 ping=200 ms OK
time=1464563878 ping=201 ms OK
time=1464563880 ping=201 ms OK
time=1464563882 ping=201 ms OK
time=1464563884 ping=204 ms NOPE
time=1464563886 ping=202 ms OK
time=1464563888 ping=203 ms OK
time=1464563890 ping=203 ms OK
time=1464563893 ping=201 ms OK
time=1464563894 ping=201 ms NOPE
time=1464563896 ping=202 ms OK
time=1464563898 ping=202 ms NOPE
time=1464563900 ping=202 ms OK
time=1464563902 ping=202 ms OK
time=1464563905 ping=201 ms OK
time=1464563906 ping=201 ms NOPE
time=1464563908 ping=200 ms NOPE
time=1464563910 ping=202 ms OK
time=1464563912 ping=202 ms OK
time=1464563914 ping=201 ms NOPE
time=1464563916 ping=201 ms OK
time=1464563918 ping=201 ms NOPE

And again with 10 seconds, this time pinging google instead just to make sure the pings aren’t at fault:

$ while true; do sleep 10; curl -s -A "just testing rate limit" "http://musicbrainz.org/ws/2/work/b1df2cf3-69a9-3bc0-be44-f71e79b27a22" | grep -q "rate" && is="NOPE" || is="OK"; echo time=$(date +%s) ping$(ping -n -c1 google.com | grep -o "=[0-9.]* ms") $is; done;
time=1464564812 ping=47.7 ms NOPE
time=1464564823 ping=47.1 ms OK
time=1464564834 ping=47.2 ms OK
time=1464564844 ping=48.5 ms NOPE
time=1464564855 ping=46.7 ms OK
time=1464564866 ping=48.5 ms OK
time=1464564877 ping=47.5 ms OK
time=1464564888 ping=47.7 ms OK
time=1464564899 ping=47.0 ms OK
time=1464564911 ping=47.7 ms OK
time=1464564922 ping=46.8 ms OK
time=1464564933 ping=47.2 ms OK
time=1464564943 ping=47.3 ms NOPE
time=1464564954 ping=46.9 ms OK
time=1464564965 ping=47.2 ms OK
time=1464564976 ping=47.5 ms OK
time=1464564987 ping=52.1 ms NOPE
time=1464564998 ping=47.5 ms OK
time=1464565008 ping=47.3 ms NOPE
time=1464565019 ping=46.8 ms NOPE
time=1464565030 ping=47.0 ms OK
time=1464565041 ping=48.7 ms OK
time=1464565052 ping=47.4 ms OK

I’m guessing either

the rate limit was changed and noone told “us” or
the reason for this is a HUGE latency jitter behind/in the musicbrainz server(s)
?

Would it be a terrible idea to just get rid of that “rate limit message”? Limit the simultaneous connections per IP and put a “sleep 1” in front of whatever generates the json?

andreivolgin · May 30, 2016, 1:05am

If I understand correctly, there are two different limits: per IP and global. You can see the performance chart at:

http://stats.metabrainz.org/dashboard/db/musicbrainz-web-servers-status

Switch to “This month” view and hit the refresh icon in the top right corner. You will see that performance has dramatically improved compared to April and early May. I assume (I am not part of the dev team) that upgrade to new Postgres and/or changes to schema and/or some other patches have helped to improve performance.

And while I am very happy to see this improvement and grateful for all the efforts that lead to it, I must say that 1 request per second reminds me of my youth, which, sadly, finished before this century. This rate is a bit slow even for an individual consumer, and, obviously, it is a non-starter as a real-time data provider for any decent-size project.

It would be very helpful if someone from the development team explains the bottlenecks. If it is possible to create a JSON API mirror, I will consider hosting it.

mulamelamup · May 30, 2016, 8:55am

This might be (partially) outdated: MusicBrainz API / Rate Limiting - MusicBrainz
… but I was expecting a 501 if global limits were reached, not a:

?

… yes, the performance has been great since the update as far as I can tell. That’s why I was trying this now…

Aren’t there already many private mirrors? Didn’t look into that much, but I thought you could just download & run pretty much the whole thing?

Anyways - at the moment, it seems to work perfectly:

while true; do curl -s -A "just testing rate limit" "http://musicbrainz.org/ws/2/work/b1df2cf3-69a9-3bc0-be44-f71e79b27a22" | grep -q "UTF" && is="OK" || is="NOPE"; echo time=$(date +%s) ping$(ping -n -c1 musicbrainz.org | grep -o "=[0-9.]* ms") $is; done;
time=1464597417 ping=206 ms OK
time=1464597418 ping=207 ms OK
time=1464597419 ping=207 ms OK
time=1464597420 ping=205 ms OK
time=1464597421 ping=206 ms OK
time=1464597422 ping=205 ms OK
time=1464597423 ping=205 ms OK
time=1464597424 ping=214 ms OK
time=1464597425 ping=206 ms OK
time=1464597426 ping=204 ms OK
time=1464597428 ping=206 ms OK
time=1464597428 ping=206 ms OK
time=1464597429 ping=205 ms OK
time=1464597430 ping=205 ms OK
time=1464597431 ping=207 ms OK
time=1464597432 ping=205 ms OK
time=1464597433 ping=205 ms OK
time=1464597435 ping=386 ms OK
time=1464597436 ping=389 ms OK
time=1464597437 ping=385 ms OK
time=1464597438 ping=389 ms OK
time=1464597439 ping=205 ms OK
time=1464597440 ping=207 ms OK
time=1464597441 ping=208 ms OK
time=1464597442 ping=206 ms OK
time=1464597443 ping=206 ms OK
time=1464597444 ping=206 ms OK
time=1464597445 ping=207 ms OK
time=1464597446 ping=206 ms OK

So, unless they fixed it already since I tried yesterday, the “rate limit error message” really seems to be misplaced / misleading and have to do more with global stress than with what’s explained behind the link.

Is didn’t really look into this, but from the nature of the project and the names of some of the source files I kind of just assumed that that’s all part of the project and everyone can just set it up and many people probably already did?

https://github.com/metabrainz/musicbrainz-server/blob/master/INSTALL.md

Anyways, that doesn’t really change that I can’t explain the “rate limit errors” I’ve been getting while I was clearly throttling myself to 1/10th the allowed speed already. Just saying, someone should probably check if that’s right - shouldn’t I be getting a 503 or a 302 with “try after” instead?

andreivolgin · May 30, 2016, 9:26am

I moved all my projects to Google App Engine 7 years ago, and never looked back. No patches, upgrades, replication issues, capacity or bandwidth constraints - scaling from one to a million users with exactly zero effort.

I will look into installing my own copy of the server, but this is a big step backwards as far as I am concerned.

outsidecontext · May 30, 2016, 10:37am

Can’t say much on the rate limiting issue as I don’t know how this is currently done or what changed recently. But I can assure you that you can’t scale a database like the one MB has from 1 to a million users with “zero effort”.

andreivolgin · May 30, 2016, 6:46pm

Of course you can’t. PostgreSQL has significant limitations when it comes to replication. This is why I use Google Cloud Datastore for my projects, which has an infinite capacity, requires zero maintenance, and offers seamless data replication.

It will be difficult to move the entire MB DB to a non-relational datastore. I hope, however, that the MB team has plans to move away from physical servers at some point.

Freso · May 30, 2016, 8:20pm

Are you setting a User-Agent in your curl config? Generic User-Agent strings get throttled more heavily than specific ones:
https://musicbrainz.org/doc/XML_Web_Service/Rate_Limiting#User-Agent

Freso · May 30, 2016, 8:23pm

But I hear @Rob would be happy to make the move if someone donated the additional amount of $ that would be needed to make it happen.

mulamelamup · May 30, 2016, 8:54pm

That would be here:

The results are very different depending on when I try. A few hours ago, there were no “rate limit” messages at all even after I removed the “sleep 1”. Now it’s as bad as the examples above again.

andreivolgin · May 30, 2016, 11:49pm

@mulamelamup - Thank you for the pointer. I somehow missed that MB server supports JSON API out of the “virtual box”

Now I have installed my own copy, and all my jobs run at least 10 times faster.