Report showing acoustids likely to be bad link to musicbrainz recordings

When there is only a few submissions I tend to agree. From looking RealWorld™ at this it is common to see a few rogue like this and no other data. I tend to just nuke them now.

On rare occasions I have found that the wrong lengths are actual data errors on the Release. Manually typed values that are wrong. Majority of cases I agree these are nearly always wrong versions \ different mixes wrongly linked.

When manually checking the latter examples it is common to find that recording linked to multiple AcoustIDs. Nuking the ones 30%+ out is nearly always a positive.

This stands out in that New Order example. The 6:49 and 3:10 Recordings have multiple AcoustIDs attached to them. And all are better matches that this 4;20 AcoustID.

This is all happening from “Lookup by Name” and no checking going on as to the results. Only have to watch the forums to see people who do “close enough” type lookups an 20000 tracks and then hit the pretty looking < Submit AcoustID > button because they don’t realise the damage it is about to do… that button needs hiding. (Especially if more than 100 tracks are being matched in one session)

Your work with the Napalm is all good. (And I don’t see anyone saying stop :wink:)

Will award you the Napalm medal for work on AcoustIDs
:fire: :fire: :fire: :fire: :medal_military: :medal_military: :fire: :fire: :fire: :fire:

2 Likes

Certainly the case sometimes, but this can only be fixed really by preventing acoustids being linked to a musicbrainzid if outside a reasonable track length (20% should more than suffice). If the MusicBrainz track really is the right one but with the one wrong track length recorded then the submitter needs to fix the issue on MusicBrainz and then submit the pairing.

In the early days of Acoustid the priority would be increasing the size of the database as quickly as possible, but now it is a more mature product should be more careful about letting data in. Since Luks doesnt seem to be round much these days the easiest way to get this done would be implement changes in Picard, what do you think about this @outsidecontext and this

4 Likes

The bad recording length issue will be independent of the person who submitted the AcoustIDs. the more I have looked at these, the more I realise that detaching the mis-matched AcoustID helps as it flags up the discrepancy for anyone looking closer.

Yes, I agree. A clear mismatch in track length should be a clear indication to skip submission. That should be rather easy to handle and nearly always should indicate a bad link.

3 Likes

Updated the reports at http://www.albunack.net/reports.jsp

For the first reports not that many new entries, so can be manually checked.
Will probably napalm the 20-30% and > 30% reports again.
Probably the 10-20% contain too many valid pairs to napalm so I have left alone, maybe I can break down further ?

1 Like

@ijabz I’ve re-enabled Track "8a4b6996-2d58-4d3f-97b7-3991518933c1" | AcoustID, the time difference is due to 6 minutes of silence at the end of the track on some releases. Track "700bfed9-ee86-4ee0-af7a-cf12b3879b32" | AcoustID is a similar case, if it’s been picked up in any of the reports.

I’ve got a few tracks with excessive silence at the beginning/end and can post the AcoustIDs here in the future, if necessary (after I’ve merged the recordings first though).

2 Likes

Thanks for the examples, okay so its the same recording but with silence at end of some releases for the first track, and at the start for the second. In the first case the Acoustid contains the silence, but it is not included in the recording length, in the second case the Acoustid doesnt contain the silence but it is included in the recording length.

So first thing I can add both of these to my internal checked table so these will not come up in future reports.

Secondly I see that the track length (the length of the recording on the release) varies depending on the release so we have a mixture of lengths with/without the silence.So if I amend my reports to check against a match on track lengths as well as recording lengths I can probably filter out these.

What I am not clear about is are the recording lengths in your example correct, because first one doesnt include silence, and second one does - is that correct or are they both meant to include or not include the silence.

1 Like

This is so rare I had to post it. First genuine clash. Track "9af6ccfa-6393-444d-9ed4-0b6b2a41455a" | AcoustID

I have a copy of Chillout Tribute to Pink Floyd, but this track AcoustID matched first to The Wall 2000. Slightly different lengths, but both tracks only have this single AcoustID. I can’t check both albums. But from what I can tell this is a different recording. Cover recording so I guess they mimicked really close.

I posted this example as it is really rare to find one. And there is a small length difference.

1 Like

I added https://tickets.metabrainz.org/browse/PICARD-2396 for tracking the length mismatch issue in Picard. I guess we can tackle this for Picard 2.8

2 Likes

Thanks, I have another question about Picard.

Can the same user resubmit the same mbid/acoustid pairs, if so it would be good if that could be stopped because it then gives the impression that the same pair has been submitted multiple times by different people and likely to be correct when really it is the same person submitting the same pairs multiple times, and they could just be resubmitting the same bad pairing.

1 Like

The album has 6 minutes of silence between the 2 songs. On the original release, it was at the beginning of track 12. On some subsequent re-releases, the silence was moved to the end of track 11 instead. Only one track per release has the silence, it just varies which track it is. I’ll edit the recording annotations to make it a bit clearer.

1 Like

Thanks, that is useful information but not quite what I meant.
My question is should the recording length (rather than the track length) include the slience in the track length or not, or does it depend if silence is at start or end of track.

I submitted changes for Picard to no longer include the MBID in the submission if the length difference between file (as reported by fpcalc) and the MB recording is more than 30 seconds. For details see the merge request at https://github.com/metabrainz/picard/pull/2045

The same fingerprint MBID pair can only be submitted once per currently loaded files. If the MBID match was the result of a scan Picard does not submit this fingerprint / MBID pair. Only if the user intentionally matches the file to a different recording than what was originally automatically selected. Also if the user intentionally chooses to re-calculate the fingerprint they can submit it.

I don’t think we should restrict this further. It becomes confusing for the user if they cannot submit. And in case of import errors like we had some time ago on AcoustID the user would not be able to submit again. Also Picard would need to maintain a database of submissions.

3 Likes