We were first emailed by a user on November 5 informing us that they had changed their email on MB to an address that was not used anywhere else, and it had started receiving spam. So on the same day as receiving that mail, we:
- Did an audit of our servers and found no signs of intrusion
- Did an audit of our data dumps and found no email addresses in them
- Set up a honey-pot email address on a test user to see if it started getting spam
However, we noted that their updated email address was almost identical to the old one, and posited that the previous email had already been leaked by other means and spammers might be using an automated guesser to reach the new address. On November 6 we asked them to try changing the email to something much more random and watching if spam appeared again.
At this point I didn’t start a full audit of the musicbrainz-server codebase, though outside of auditing the data dumps I of course reviewed the obvious places like user profile pages that might somehow contain emails. The reality was that almost every page on the site deals with editor objects in some way, so it was hard to guess where the leak might be without doing a full audit (and the issue indeed turned out to be a place I didn’t expect).
While I regret not pursuing a full audit sooner in hindsight, I was then satisfied with the request to try a more random address first. We also had assumed that if there was a widespread leak, more than one user would’ve contacted us. It later came to my attention (today) that someone had reported something similar back in August. If I’d seen that thread and started investigating it then, the timeline would likely look a lot different. I regret to say that I don’t follow the community forums very frequently, but I’m going to try to do that more often – and if anyone finds a similar issue in the future, emailing us directly in addition to posting on the forums would be helpful just in case.
On November 7 they mailed us back saying they updated the email to a much more random address. Then on November 22 they mailed us again to say that, unfortunately, the new address had also started receiving spam. This was the first email I saw when I woke up and I immediately started a full audit of musicbrainz-server.
Once I found the issue with annotations, it was hotfixed immediately, but I continued reviewing all places we send editor data to the templates. This took about 8 hours, but by the end I was highly confident there were no other cases of exposure (and could finally eat something).
Yes, the blog post on November 23 was the first public disclosure of the leak.
There is nothing higher priority for me right now than ensuring this doesn’t happen again. I’m still actively working on reducing the amount of editor data we handle in our templates overall and improving automatic detection of leaks during testing and development. Improvements to editor JSON handling by mwiencek · Pull Request #1809 · metabrainz/musicbrainz-server · GitHub is a start, but I don’t plan to work on anything else this week and possibly next.