I’ve been getting quite a few 502 Bad Gateway time-outs while loading this forum this week. Is anyone else having the same issue or am I the only one? The rest of MusicBrainz seems to work fine.
I’m aware of it, it happened 2 times today, and 4 times on Jan, 30, i’ll have a look, not sure what is the cause.
The issue appeared when we deployed current version, something changed in the discourse backup process, it seems they take much longer time once a week, starting at 3:30 UTC, and continuing all the day until the evening when there’s more traffic, and then resources are temporarly missing causing 502s.
I moved the backup hour a bit sooner to reduce the risk of 502s as a mitigation, but i’m not yet sure about the exact issue.
I had this with my last post (about reporting editors for bad behaviour).
I had to reload the page then paste my post again.
I couldn’t load the forum all day today until now.
I could not load forums until 6:45 am eastern std time (USA). It’s fine now.
It just happened again…
Not happened here in the UK yet… or for the random VPN countries I also connect from… but I’m rarely on here after 00:00 UTC
…
I think i found the issue, it was related to discourse in-app backup system that was running permanently (keeping restarting after backup failures).
After a cleanup, i just did a test and it seems to work as it should again.
I’ll wait next automatic backup in few hours before telling it is fixed though.
Got a 502 again just now. Overall it seems to be more performant again, but some issues seems to remain.
Yes, there’s still an issue with automatic backups, for some reason they fail “partially”, and it has 2 bad results:
- files are leftover on each attempt, eating disk space
- it retries again and again, backup processes (tar on a lot of data) are running permanently, eating cpu resources
I’m still investigating this issue, the exact cause has yet to be determined.