Report showing acoustids likely to be bad link to musicbrainz recordings

No feedback so Im going to start deleting these.

1 Like

just not had any time free this week. but no one else seems to care anyway so load up the napalm

So I have been napalming that report, but not quite as simple as pressing one button. Instead I have private version of reports that split the pages into 5000 records each (larger pages seem to be too large/slow) , then I can press button once for each page. However sometimes it only disables the first record then stops (and I couldn’t get it work at all for a week, any ideas ?) but if it does work it will continue for the whole page, so its been a bit of a battle but now at the last 10,000 for the first >30% report.

Also for the earlier reports the only records left had been checked they were either clearly valid, or it was not possible to know if they were valid or not, so I have now moved these records into an internal acoustid_mbid_checked table so they are no longer shown in reports.

Next week I will update my database with the latest Acoustid data and the latest MusicBrainz data, will be interesting to see how many potential invalid matches come up in these first reports.

2 Likes

Okay fixed my problem by using a GreaseMonkey script instead, now just napalming the Acoustid links to songs that vary by at least 30% from fingerprint length, multiple submissions, and not covered in earlier report report

I checked some records in this report and when there are multiple submissions but no other mbids linked to the same acoustid there is rarely any user submitted metadata so I think the same user is just resubmitting the same wrong link in Picard.

e.g Track "3c53454e-8612-4e57-863d-1ce56d43c3e7" | AcoustID

When acoustid is linked to other mbid then usually case that matching to the right song but wrong version of the song (wrong length) when there is one matching length available.

e.g Track "e023a522-1e31-4e9c-b3de-32d71b510263" | AcoustID

2 Likes

When there is only a few submissions I tend to agree. From looking RealWorld™ at this it is common to see a few rogue like this and no other data. I tend to just nuke them now.

On rare occasions I have found that the wrong lengths are actual data errors on the Release. Manually typed values that are wrong. Majority of cases I agree these are nearly always wrong versions \ different mixes wrongly linked.

When manually checking the latter examples it is common to find that recording linked to multiple AcoustIDs. Nuking the ones 30%+ out is nearly always a positive.

This stands out in that New Order example. The 6:49 and 3:10 Recordings have multiple AcoustIDs attached to them. And all are better matches that this 4;20 AcoustID.

This is all happening from “Lookup by Name” and no checking going on as to the results. Only have to watch the forums to see people who do “close enough” type lookups an 20000 tracks and then hit the pretty looking < Submit AcoustID > button because they don’t realise the damage it is about to do… that button needs hiding. (Especially if more than 100 tracks are being matched in one session)

Your work with the Napalm is all good. (And I don’t see anyone saying stop :wink:)

Will award you the Napalm medal for work on AcoustIDs
:fire: :fire: :fire: :fire: :medal_military: :medal_military: :fire: :fire: :fire: :fire:

2 Likes

Certainly the case sometimes, but this can only be fixed really by preventing acoustids being linked to a musicbrainzid if outside a reasonable track length (20% should more than suffice). If the MusicBrainz track really is the right one but with the one wrong track length recorded then the submitter needs to fix the issue on MusicBrainz and then submit the pairing.

In the early days of Acoustid the priority would be increasing the size of the database as quickly as possible, but now it is a more mature product should be more careful about letting data in. Since Luks doesnt seem to be round much these days the easiest way to get this done would be implement changes in Picard, what do you think about this @outsidecontext and this

4 Likes

The bad recording length issue will be independent of the person who submitted the AcoustIDs. the more I have looked at these, the more I realise that detaching the mis-matched AcoustID helps as it flags up the discrepancy for anyone looking closer.

Yes, I agree. A clear mismatch in track length should be a clear indication to skip submission. That should be rather easy to handle and nearly always should indicate a bad link.

3 Likes

Updated the reports at http://www.albunack.net/reports.jsp

For the first reports not that many new entries, so can be manually checked.
Will probably napalm the 20-30% and > 30% reports again.
Probably the 10-20% contain too many valid pairs to napalm so I have left alone, maybe I can break down further ?

1 Like

@ijabz I’ve re-enabled Track "8a4b6996-2d58-4d3f-97b7-3991518933c1" | AcoustID, the time difference is due to 6 minutes of silence at the end of the track on some releases. Track "700bfed9-ee86-4ee0-af7a-cf12b3879b32" | AcoustID is a similar case, if it’s been picked up in any of the reports.

I’ve got a few tracks with excessive silence at the beginning/end and can post the AcoustIDs here in the future, if necessary (after I’ve merged the recordings first though).

2 Likes

Thanks for the examples, okay so its the same recording but with silence at end of some releases for the first track, and at the start for the second. In the first case the Acoustid contains the silence, but it is not included in the recording length, in the second case the Acoustid doesnt contain the silence but it is included in the recording length.

So first thing I can add both of these to my internal checked table so these will not come up in future reports.

Secondly I see that the track length (the length of the recording on the release) varies depending on the release so we have a mixture of lengths with/without the silence.So if I amend my reports to check against a match on track lengths as well as recording lengths I can probably filter out these.

What I am not clear about is are the recording lengths in your example correct, because first one doesnt include silence, and second one does - is that correct or are they both meant to include or not include the silence.

1 Like

This is so rare I had to post it. First genuine clash. Track "9af6ccfa-6393-444d-9ed4-0b6b2a41455a" | AcoustID

I have a copy of Chillout Tribute to Pink Floyd, but this track AcoustID matched first to The Wall 2000. Slightly different lengths, but both tracks only have this single AcoustID. I can’t check both albums. But from what I can tell this is a different recording. Cover recording so I guess they mimicked really close.

I posted this example as it is really rare to find one. And there is a small length difference.

1 Like

I added https://tickets.metabrainz.org/browse/PICARD-2396 for tracking the length mismatch issue in Picard. I guess we can tackle this for Picard 2.8

2 Likes

Thanks, I have another question about Picard.

Can the same user resubmit the same mbid/acoustid pairs, if so it would be good if that could be stopped because it then gives the impression that the same pair has been submitted multiple times by different people and likely to be correct when really it is the same person submitting the same pairs multiple times, and they could just be resubmitting the same bad pairing.

1 Like

The album has 6 minutes of silence between the 2 songs. On the original release, it was at the beginning of track 12. On some subsequent re-releases, the silence was moved to the end of track 11 instead. Only one track per release has the silence, it just varies which track it is. I’ll edit the recording annotations to make it a bit clearer.

1 Like

Thanks, that is useful information but not quite what I meant.
My question is should the recording length (rather than the track length) include the slience in the track length or not, or does it depend if silence is at start or end of track.

I submitted changes for Picard to no longer include the MBID in the submission if the length difference between file (as reported by fpcalc) and the MB recording is more than 30 seconds. For details see the merge request at PICARD-2396: Do not link AcoustIDs to recordings if there is a large difference in length by phw · Pull Request #2045 · metabrainz/picard · GitHub

The same fingerprint MBID pair can only be submitted once per currently loaded files. If the MBID match was the result of a scan Picard does not submit this fingerprint / MBID pair. Only if the user intentionally matches the file to a different recording than what was originally automatically selected. Also if the user intentionally chooses to re-calculate the fingerprint they can submit it.

I don’t think we should restrict this further. It becomes confusing for the user if they cannot submit. And in case of import errors like we had some time ago on AcoustID the user would not be able to submit again. Also Picard would need to maintain a database of submissions.

6 Likes

Reports have been updated, first time since May 2022.

Good news is the for most reports the number of bad acoustids is relatively low, helped in part by the Picard fix I expect.

The Acoustid links to songs that vary between 10% and 20% from fingerprint length, only one submission, and not covered in earlier report and Acoustid links to songs that vary between 10% and 20% from fingerprint length, multiple submissions, and not covered in earlier report are high, but that is only because we didn’t process the results last time because tracklength is too close to fingerprint length to delete automatically, I plan to break down these reports further.

5 Likes

And they have just been updated again, and increase of bad acoustids continue to be relatively low.

3 Likes

Made a few changes to these reports so reports that list acoustid where by the fingerprint length does not match the MusicBrainz length now splits between cases where the MuscBraiunz recording is the only link to the Acoustid and those where there are other links

e.g

Gone from

  • Acoustid links to MusicBrainz Recording 30% difference, only one submission
  • Acoustid links to MusicBrainz Recording 30% difference, only one submission

to

  • Acoustid links to MusicBrainz Recording 30% difference, only one submission, no other MB linked to Acoustid
  • Acoustid links to MusicBrainz Recording 30% difference, only one submission, multiple other MB linked to Acoustid
  • Acoustid links to MusicBrainz Recording 30% difference, multiple submission, no other MB linked to Acoustid
  • Acoustid links to MusicBrainz Recording 30% difference, multiple submission, multiple other MB linked to Acoustid

This is useful because analysis indicates that when there are other MusicBrainz recordings linked to the fingerprint that the bad MusicBrainz Recording is indeed incorrect, and usually the wrong version of the song.

Whereas when there are no to other MusicBrainz recordings linked to the fingerprint it is more likely that this bad MusicBrainz Recording is possibly correct, but maybe the track length has been incorrectly entered into MusicBrainz, or the fingerprint has unneccessary trailing silence at end.

4 Likes