There are 6 sources linking to the Depeche Mode track, and only one for Silver Line. However the fingerprint lengths and the user submitted metadata all point towards the Silver Line track only.
So my conclusion would be that despite there being 6 sources the Depeche Mode link is incorrect. But I wanted to check if the fingerprint length is 33 seconds and the track linked to is 4m 23s is it technically possible for this to be a good match or not, i.e is the track length part of the fingerprint algorithm ?
The audio length is part of the AcoustID comparison, yes. Fingerprint submissions also include the audio length, when the AcoustID server decides whether to group the submitted fingerprint with an existing AcoustID the length is considered. If the length difference is beyond a certain threshold (I think 30 seconds, but you’d need to check the source code for this) then it will result in a separate AcoustID.
But this is orthogonal to the question whether an AcoustID can be linked to a specific recording, and whether such a link is valid. Technically you can link the AcoustID to any recording by including this recording ID as metadata of the fingerprint submission.
Theoretically such a length difference could be valid, due to differences in what AcoustID and MusicBrainz consider the same recording. AcoustID bases this purely on acoustic similarity of the first 120 seconds and overall length. But certain length differences would not necessarily mean a separate recording, e.g. if you just have silence added to the end.
For you example this is pretty sure wrongly linked. I cannot imagine any way how a German informative speech on a CD drive cleaner disc could even remotely sound like Depeche Mode’s “A Question of Lust”. Comparing the first fingerprint of the CD cleaner track with another one for “A Question of Lust” clearly shows no similarity: Compare fingerprints #31226008 and #36308259 | AcoustID
I wasnt suggesting that, but as Im sure you know completely different songs can match to the same fingerprint if whatever Acoustid uses as datapoints to calculate the fingerprint are the same.
However you confirm the difference in track length alone for a completely different track almost immediately excludes as a valid match, thanks.
Also seem to have a Ike & Tina Turner album incorrectly matched to a Depeche Mode album, although at least in the case it makes some sense as the track lengths for each bad match are approximately the same.
What is interesting in this example is how the counts are “back to front”. The Depeche Mode tracks have six samples to the one CD cleaner. Yet it is the Depeche Mode that needs deleting. Why do I assume that? Because it is ALWAYS six. I bet the same person hit the upload six times on one album.
All the other data - lengths and track details in the “additional user data” - all fit the CD cleaner.
Notice that all the tracks 1-6 tracks on the CD Lens Cleaner are in the same order on the Depeche Mode album. (Odd that track 7 is not linked)
Actually - it gets more interesting. Look at this track: Track "80fbdc2a-a975-4cd3-bcdb-a025c7cf3469" | AcoustID That is neither track 10 or 11 of the Depeche Mode. Lengths are off again. But look at the list of “additional user metadata” and it is all from the CD Cleaner, but we have no AcoustID listed from Picard. SIX samples again though.
A good example of why a Bot can’t fix the AcoustIDs. Only ONE good sample to SIX bad samples. Needs a human brain to read all the other clues on the page.
Although if we took length of fingerprints against length of matched musicbrainz release we could possibly have filtered out the Silver Line as bad match. I other words I think bots could probably create a smaller sublist of this list where all potential bad matches are bad, but since then they have to be manually submitted anyway and Acoustid team very unresponsive currenlty it doesn’t make much difference
An auto-clean up is very messy to do, but an interesting challenge. I had a post half written for your other thread where I found all kinds of oddities that would trip up a bot. I didn’t drop it as I thought it may have been boring for other people. I’ll go drop it in there now (as I’d be told off for being OT if I dropped it here)