Detecting bad actors/AcoustID submission groups

But thats what Ive done with this report, we can fix it now.

1 Like

There is way to add data to AcoustID bypass AcoustID API? API require user and client API keys which linked to AcoustID user

According to the Acoustid blog page, it seems like there is/was some kind of:

There are also two new methods /v2/user/lookup?user=X and /v2/user/create_anonymous?client=X&clientversion;=X , to support anonymous user accounts. They can be used from applications that can’t ask the users to log in on the Acoustid website, but there are a number of rules that such applications should follow:

You find the Rules here.

1 Like

Who change the title to something attacking people? Totally misses the point of the original discussion. :frowning_face:

/unubscribed.

1 Like

It was @freso i think, I don’t really understand the need to split the topic

1 Like

The title wasn’t changed, Freso moved this part of the discussion out into a completely new thread :+1:

I just realized I can edit the title! So have done so.

Why not? That’s the aim. What variables would capture this set of submissions?

The simplest scenario: ID X submitted 100 Stephen King AcoustID’s 10 years ago. ID X has never been active again, they submitted nothing else. How large a % of the AcoustID’s from that session would you manually want to unlink before the system decides to auto unlink the rest?

Complex scenario: ID Y submitted their whole library of 10k files in one go 3 months ago, without checking anything. 10% were bad, including their hastily tagged Stephen King collection. Since then they have continued adding acoustIDs for new additions, which are tagged correctly. Is there a threshold where we remove the 10k hurried submissions (anything submitted by that ID on that date for instance), to save you x hours having to clean Stephen King entries? And accept that there are good ones that will be removed*?

Your answer may well be “no”, which is totally understandable. Personally I think bad submissions outweigh a lot of good ones. I am not talking about permanently banning or besmirching key/ID 255267’s good name btw. Just questioning how we might be able to use what info we have to make the DB more reliable overall.

Note: This more nuanced session based approach relies on AcoustID storing submission timestamps as well as IDs. Otherwise it would be a more crude purely % based approach, which might not be acceptable.

*even the laziest Picard clicker is going to struggle to mistag a entire collection completely! I would be very impressed :stuck_out_tongue: 10% is huge tbh.

I would be concerned if it was too easy to pull out the user IDs from submissions. That kinda data gets personal, but I don’t want to go on a tangent or someone will split the thread again.

Timestamps would be far more useful. “IDs from a specific date that matches a named artist”. or “IDs added to a specified release on a specific date”. That would allow a net to be spun around bad data without picking on a specific person.

There are way too many duff acoustIDs from Audiobooks for it to be only one person doing it. That person would have to have a large collection. Something is screwy with audiobooks. Or maybe they just stand out more

1 Like

Just to clarify, I don’t imagine these unlinks would ever be linked to a MusicBrainz editor. It didn’t even occur to me tbh.

It would just be behind the scenes cleanup when a certain threshold is hit within certain parameters.

Side note, has there ever been an example of someone purposefully mass adding/vandalising with incorrect AcoustIDs? Would we have any way of knowing/finding out, as it stands?

How about this for a classic audiobook mess? Release “The Fortune of War” by Patrick O’Brian - MusicBrainz

NOTHING on there seems to be linked to the book. You can tell as there are no consistent runs of numbers. This is an example of a Release that just needs “all AcoustIDs unlinked”. But at 9CDs that is too much to do by hand.

Actually, now I look closer, acoustIDs only get to CD3 and then stop.

I have seen AcoustID in the last medium.
It seems you are using INLINE STUFF, but it’s broken at the moment, sorry.
Maybe I should think about a status page or system inside scripts themselves, when I know they are broken.

2 Likes

I see that as the site is borken and not letting your INLINE STUFF run properly. :grin: No need to apologise as your scripts add so much that is missing from the site GUI. Wish I could just turn off that annoying default collapse thing as it also make a mess of the browser’s “Find in Page” and so many other items.

I fix many of these AcoustID dups thanks to your pink highlighted AcoustIDs in INLINE STUFF

1 Like

Same here, super useful for spotting and fixing obvious errors! All glory to @jesus2099 :clap: :clap: :clap:!
BTW, as a workaround for big multi-disc releases: open just a single medium spread at a time like https://beta.musicbrainz.org/release/16caa039-aba9-45e8-9ff1-8cca8635ff52/disc/12#disc12
(most often i just r-click on the medium link and open it in a new tab or window)

1 Like

I don’t know if it’s intentional, but someone appears to have submitted a chunk of their music library as AcoustIDs for https://musicbrainz.org/recording/2f2e8246-1fe5-4e57-8ff3-8e082daa1847

1 Like

Just in case you don’t already know, you can also Ctrl+click or middle+click to open them quicker.

LOL indeed! :scream:

1 Like

And our friend with Big Balls has added the AcoustIDs twice meaning they don’t appear in these reports.

I clicked on about a dozen at random in that list, and all were a pair of submissions.

That is an example of somewhere a targeted bot could work.

1 Like

Sure, depending on the browser even Shift+click or Shift+Ctrl+click to change the focus to the new window or tab right away :slight_smile:

1 Like

11 posts were split to a new topic: Report that shows AcoustIDs linked to many different songs

I double checked unsuprisingly there is no user information in the data files available from Acoustid so no way to say these submissions are from one user, so only Acoustid itself could do any cleanup based on users.

3 Likes

Thanks for checking!

I was going to ask you if you could check actually, I would have been curious if your lists of ‘Possibly wrong acoustIDs’ had any overwhelmingly over-represented submission IDs.

Without that data being available this thread becomes nothing more than a nice thought experiment however music plays and thread floats up into the clouds

Personally these recent threads have made me reconsider if I would spend any time tidying AcoustIDs. Very easy and quick to mess with (accidentally or on purpose), very time consuming to fix :thinking:

2 Likes

Hmm, I would think the opposite actually, most of the potential bad matches are so obvious that not alot of brainz effort has to be engaged in removing them, it is just the physical process of removing them, in the last couple of weeks I have removed about a thousand of them.

You start to see patterns, for example a song by Steros MCs often come up as a single match, when I see that I know its wrong. I am currently working on importing fingerprint lengths into my database, once this is done (alot of data to download and process) I can then compare the track length of the mbrecording to the fingerprint range to further confirm that it is a bad match and produce a report to show this.

The largest report has 21,000 rows, but this is out of 15M acoustid/musicbrainz pairings. I would guess there are no more than 100,000 bad pairings in total, and this would give an error rate of 1% , so really Acoustid is pretty accurate and it would just take a short term concerted effort to get on top of this.

2 Likes