I actually think this is a good example of where a bot/the system can fix these.
If it’s one user submitting all this junk, as you’ve surmised, then we can detect them as a bad actor (after a certain amount of percentage of unlinks by editors) and remove all their submissions. Assuming that we’re happy to drop a whole bunch of valid IDs/we think a incorrect link is more trouble than 100 correct ones is worth (I think so tbh). Nothing too complicated I think.
Are submissions always linked to a user? There seem to be so many different routes to adding data to AcoustID.
You have a simple example in that one. Identify who uploaded those Depeche Mode tracks and I’ll bet you have the same name six times.
I am also curious - will that resolve to a username? Or an anonymous ID number?
But surely to spot this one “bad user”, you still need a human to say “that ain’t right, put that user on a naughty list”?
My assumption was not that one user was uploading all bad data. Just they got that one release wrong a number of times. Maybe tried six times before they realised where their error was, and after fixing it went on to add hundreds of good quality releases.
After seeing two bad tracks, it would have been good to have said “clear that user’s additions for this Release”, but that assumption cannot spread beyond one release without checking.
Just some thoughts re. where automation could theoretically help…
You’ve done that, by unlinking their submission. My proposal is that if a user has had X amount or % unlinked (as you say, this requires a good human eye), the system unlinks all of their submissions.
Assuming: We can link users/IDs with submissions. And we are happy to throw away X amount of good submissions.
The reason I think this is a good idea is because it’s often mentioned that a real issue with AcoustID is someone with a crazy amount of files coming in and just hitting save and submit on everything… imo, once identified, we could just take them out of the equation, if their submission suck
I mean, I’m not proposing we just scrub anyone who has bad additions… I definitely don’t want to throw out the baby with the bathwater/shitty AcoustID’s.
More productively, what threshold would you think is throwing out too much? What percentage of good submissions? And how long of a grace period for a user ID (if there is even such a thing attached to submissions) before it should be checked?
e.g. a user ID’s submissions are all 5 years old, they have 1,000 submissions, and 50% of them have, incredibly, been unlinked by very attentive and busy users such as yourself. Would this hit the threshold for you to assume their submissions suuuuuuuck? Note that any AcoustID with even a single submission count from another user ID would still stay.
Please don’t forget: AcoustID accepts contributions from various sources, not only from MusicBrainz/Picard. Picard for example “validates” the submissions to AcoustID where other 3rd party applications don’t care at all.
Would it not be much easier, if Picard refines his search and use of AcoustID’s? I can’t tell you exactly what numbers or dependencies would help most. But this way MB could act on its own. (I’m not sure how fast AcoustID would introduce tests to increase the data quality or detect and reject users as in your example.)
You can do check at the application end, but the more checks you do the more calls have to be made to the rate limited database and there comes a point where things get too slow, also many of these cases would be too difficult to pick up.
Much better to fix once at source, then have to handle the issue every time that track gets processed by Picard. And of course fixing the data is a solution for all MusicBrainz/Acoustid users not just Picard users.
According to the Acoustid blog page, it seems like there is/was some kind of:
There are also two new methods /v2/user/lookup?user=X and /v2/user/create_anonymous?client=X&clientversion;=X , to support anonymous user accounts. They can be used from applications that can’t ask the users to log in on the Acoustid website, but there are a number of rules that such applications should follow:
The title wasn’t changed, Freso moved this part of the discussion out into a completely new thread
I just realized I can edit the title! So have done so.
Why not? That’s the aim. What variables would capture this set of submissions?
The simplest scenario: ID X submitted 100 Stephen King AcoustID’s 10 years ago. ID X has never been active again, they submitted nothing else. How large a % of the AcoustID’s from that session would you manually want to unlink before the system decides to auto unlink the rest?
Complex scenario: ID Y submitted their whole library of 10k files in one go 3 months ago, without checking anything. 10% were bad, including their hastily tagged Stephen King collection. Since then they have continued adding acoustIDs for new additions, which are tagged correctly. Is there a threshold where we remove the 10k hurried submissions (anything submitted by that ID on that date for instance), to save you x hours having to clean Stephen King entries? And accept that there are good ones that will be removed*?
Your answer may well be “no”, which is totally understandable. Personally I think bad submissions outweigh a lot of good ones. I am not talking about permanently banning or besmirching key/ID 255267’s good name btw. Just questioning how we might be able to use what info we have to make the DB more reliable overall.
Note: This more nuanced session based approach relies on AcoustID storing submission timestamps as well as IDs. Otherwise it would be a more crude purely % based approach, which might not be acceptable.
*even the laziest Picard clicker is going to struggle to mistag a entire collection completely! I would be very impressed 10% is huge tbh.
I would be concerned if it was too easy to pull out the user IDs from submissions. That kinda data gets personal, but I don’t want to go on a tangent or someone will split the thread again.
Timestamps would be far more useful. “IDs from a specific date that matches a named artist”. or “IDs added to a specified release on a specific date”. That would allow a net to be spun around bad data without picking on a specific person.
There are way too many duff acoustIDs from Audiobooks for it to be only one person doing it. That person would have to have a large collection. Something is screwy with audiobooks. Or maybe they just stand out more
NOTHING on there seems to be linked to the book. You can tell as there are no consistent runs of numbers. This is an example of a Release that just needs “all AcoustIDs unlinked”. But at 9CDs that is too much to do by hand.
Actually, now I look closer, acoustIDs only get to CD3 and then stop.
I have seen AcoustID in the last medium.
It seems you are using INLINE STUFF, but it’s broken at the moment, sorry.
Maybe I should think about a status page or system inside scripts themselves, when I know they are broken.
I see that as the site is borken and not letting your INLINE STUFF run properly. No need to apologise as your scripts add so much that is missing from the site GUI. Wish I could just turn off that annoying default collapse thing as it also make a mess of the browser’s “Find in Page” and so many other items.
I fix many of these AcoustID dups thanks to your pink highlighted AcoustIDs in INLINE STUFF