Most of these probably never made any edits and just add spammy links and stuff on their profile, but both of these lists have at least one active spammer.
There’s no way to batch report spammers. There might be something in SpamBrainz once that rolls around, but please, please, please do not report spammers until we have SpamBrainz, unless there’s something that really needs urgent attention. (E.g., a spammer that continues to add spam edits after its first few edits, or a spammer that links to something truly heinous. Pharmacy, real estate, etc. spammers are just annoying, but we’ll deal with them when we have a better framework for it (SpamBrainz).)
Because combing through ~2000 editor reports will 1) take hours and hours and hours to handle, hours that will not be going to e.g., getting ready for GCI or handling reports of actual human editors, 2) burn me out (I don’t want to look at 2000+ spam accounts, repeating the same steps over and over to delete them), 3) be a drop in the ocean—we probably get at least about 400,000 new spam accounts/month, or around 13,000 accounts per day. I wouldn’t be able to go through 2000 in one day, and even if I could, there’d already be 11,000 new ones waiting for me when I was done.
The spam issue is too severe to be handled manually (which is the only way we can currently deal with it) → we need better tooling to deal with it, and for now we’re putting our eggs in the SpamBrainz basket.
I didn’t know the situation was that bad Yes, sounds like this can only be handled by some automation. After all the spammers automate themselves, fighting against this with human labor is going to be a loosing battle.
That’s insane! If you’re ever looking for a blog post (or conference…) topic I would be fascinated to hear more about the ins and outs of why and how and what.
Especially if you see the actual Editor numbers:
From 1’977’737 valid Editors are 772’009 inactive. Another 992’776 has “validated email only”. From the difference (212’952 ever active Editors) only 1’309 “edited and/or voted in the last 7 days”.
Hm. Looking at the MB stats, the numbers I have may be slightly (or drastically) off. If we got 400,000 users in 31 days, we’d be way past 1,977,737 editors by now. The 400k was a figure I got from @Zas, and I didn’t actually verify it myself. I’ll try and investigate more tomorrow.
Okay, so, @Zas and I both misread the numbers since the report we were looking at apparently didn’t actually take the dates given to it into account.
We’re looking at some other stats and the forum is getting around 90–140 new “users” per day. Most of these are likely bot-created, which means most likely spam accounts (passive, active, or “sleepers”). The forum also gets less than MusicBrainz.org, since 1) I have blacklisted a number of known spammer-only e-mail domains from getting accounts on the forum at all, so forum stats are not fully representative, and 2) accounts only get created here once the e-mail used for MB signup has been verified—so there are probably a fair number more “bad” accounts on MB, but probably not in the 1000s/day range… yet/luckily.
Still, the numbers are too damn high, but we need better tooling before we can do anything effective about it, so the plea to not report spammers still hold.
I’d encourage someone to investigate into this, basically there is many East German spam accounts. You’d think I’m kidding but if you go to area > East Germany > Users, there is like 10 spam accounts. I also found one from the Soviet Union (and reported it)… I think they’re selecting unusual locations go by unnoticed yet still appear legit.
However, there are a number of spam-related tickets in the ticket tracker, if you or anybody else would like to help!
You have reminded me to try nudge along a long-overdue removal of old user accounts with 0 activity (of any kind) though - I think that will sort out a massive amount of spam, including the ones you found under East Germany.
Oh, I did not know spam situation was that bad (should have read the thread)… Maybe there should be a captcha you have to fill when editing the bio? To me it appears as if it’s too easy to create an account – you can even use one of those temporary emails to verify it.
Thanks for informing me about the ticket tracker I’ll look into that.
There hasn’t been a captcha for quite some time. If I’m not mistaken @yvanzo “temporarily” disabled reCAPTCHA on 2023-07-09T22:00:00Z and then it just stayed that way
Not that I care, there doesn’t seem much of a spam increase and I hate captchas with a passion. They are being abused more and more in places were they should not be used.
I’m sure we could still come up with some deterrence’s (no such thing as a 100% solution ofc). But I’m guessing it’s the implementation/dev time that would still be lacking.
One thing seems sure though - spammers are a much bigger problem for MB than similar databases?
Other similar databases have problems with spammers, too.
This thread for reporting spammers at discogs https://www.discogs.com/forum/thread/409381?page=183
currently has 183 pages with 18246 reports.
I have seen spammers there, who added hundreds of
fake releases with spam track-titles.
I have no idea what percentage of the reported Discogs spammers are active - it seems Discogs lists somewhat over 700k users (https://www.discogs.com/stats/contributors?page=100) but I’m not sure how to even see their profiles so there might be hundreds of thousands of profile spammers which nobody sees. I cannot even find how many users RYM has. It’s possible they don’t display these numbers because they know they’re mostly spammers too