Detecting bad actors/AcoustID submission groups

I don’t know if it’s intentional, but someone appears to have submitted a chunk of their music library as AcoustIDs for https://musicbrainz.org/recording/2f2e8246-1fe5-4e57-8ff3-8e082daa1847

1 Like

Just in case you don’t already know, you can also Ctrl+click or middle+click to open them quicker.

LOL indeed! :scream:

1 Like

And our friend with Big Balls has added the AcoustIDs twice meaning they don’t appear in these reports.

I clicked on about a dozen at random in that list, and all were a pair of submissions.

That is an example of somewhere a targeted bot could work.

1 Like

Sure, depending on the browser even Shift+click or Shift+Ctrl+click to change the focus to the new window or tab right away :slight_smile:

1 Like

11 posts were split to a new topic: Report that shows AcoustIDs linked to many different songs

I double checked unsuprisingly there is no user information in the data files available from Acoustid so no way to say these submissions are from one user, so only Acoustid itself could do any cleanup based on users.

3 Likes

Thanks for checking!

I was going to ask you if you could check actually, I would have been curious if your lists of ‘Possibly wrong acoustIDs’ had any overwhelmingly over-represented submission IDs.

Without that data being available this thread becomes nothing more than a nice thought experiment however music plays and thread floats up into the clouds

Personally these recent threads have made me reconsider if I would spend any time tidying AcoustIDs. Very easy and quick to mess with (accidentally or on purpose), very time consuming to fix :thinking:

2 Likes

Hmm, I would think the opposite actually, most of the potential bad matches are so obvious that not alot of brainz effort has to be engaged in removing them, it is just the physical process of removing them, in the last couple of weeks I have removed about a thousand of them.

You start to see patterns, for example a song by Steros MCs often come up as a single match, when I see that I know its wrong. I am currently working on importing fingerprint lengths into my database, once this is done (alot of data to download and process) I can then compare the track length of the mbrecording to the fingerprint range to further confirm that it is a bad match and produce a report to show this.

The largest report has 21,000 rows, but this is out of 15M acoustid/musicbrainz pairings. I would guess there are no more than 100,000 bad pairings in total, and this would give an error rate of 1% , so really Acoustid is pretty accurate and it would just take a short term concerted effort to get on top of this.

2 Likes

Wow that’s really not bad!

I could add another 100k bad pairings in an afternoon though? :grimacing:

Well you possibly could, but I dont think anyone is attempting sabotage (certainly not on that scale)

My own approach is to be positive and try and solve the issue. Of course it will not be 100% correct, but then neither is MusicBrainz (or probably any other database).

6 Likes

It is a shame the data dumps don’t provide an anonymous user id, that way could look for groupings by a particular user without having or needing any knowledge of the particular user, raised Provide anonymised user id for acoustid/mbid pair submissions · Issue #77 · acoustid/acoustid-server · GitHub for consideration.

1 Like