Report showing acoustids likely to be bad link to musicbrainz recordings

There’s at least one controversial MusicBrainz guideline that can explain why this may be:

The original recording Don’t Push appears on tracks which vary in length from 3:45 to 3:55 because they have been mastered at different speeds. Therefore, the pitch of the audio is different on these tracks, but because there is no difference in mixing they are considered the same recording.

The more reasonable MusicBrainz guideline of merging recordings with differing amounts of silence at either end of the track can also explain this.

1 Like

There is no AcoustID API endpoint for it, but you can link to a link such as

https://acoustid.org/edit/toggle-track-mbid?track_gid=5c3e6fbc-0bea-4e51-8d73-c562f5617b96&mbid=27a9018a-5132-4ed4-9979-f2779885bb15&state=1

where track_gid is the AcoustID track ID and mbid is the MB reocrding ID. I think you could add such links to your report. Also maybe link the recording names in the Good Match / Suspect Match columns to the corresponding recording ID.

1 Like

Ah yes, I meant to do this, I will add it, this will speed up manual editing a bit.

The good match/suspect match columns group together similar names (but different mb recordings) so not so easy to provide a link, the link can be grabbed via the corresponding link on the Acoustid page.

I would have assumed if something has been mastered at two different speeds, then it will create different AcoustIDs. I work with bootlegs and have seen stuff like this and pretty sure AcoustIDs change.

In a similar vein, it is common in the bootleg world to chop 30 seconds off the end of a recording due to crowd noise.

Just found an example of that: https://acoustid.org/track/03432e70-1e0b-4803-835e-ad737039cb42 Obviously same track from same concert, but guess less crowd and\or chat on the end.
Still makes same acoustID.

@ijabz I see you still have very out of date data you are working with. I just did a quick dip in and a sample of just six all come up as “already fixed”. Would not want to be running any bot until the data has at least caught up. (And you know I am no fan of bots anyway - too many errors)

Oh - and 30seconds is very short in the Pink Floyd world. The ones I quick checked were 12 to 20 mins long tracks. And I tried another dozen - and all is fixed.

Sorry I’m just fixing for this new report now, it is due to error in the data feed provided by Acoustid it doesnt show the disabling of all pairings , so I have to use the api to check each value in the report table that feeds the html report before creating the html report, might take a few hours.

The data for the all other reports should only be 1 day out of date.

2 Likes

I realise it may take a few hours. Trouble is, report in that state would waste too many hours to check data that had already been fixed. Thanks for re-doing it.

I still poke in to these on bored evenings, but focus on artists I know.

Classical Music is a big puzzle in there. Track names change so much. Don’t know how that would be fixed with a bot.

It should always be run automatically before publishing report, just forgot about it.

So when I say bot, to be clear I just mean a tool that if provided with a list of pairings could login to acoustid and then disable each pairing.So for now i would only run on it report 1 because these seem to be always wrong, I don’t see a trackname problem with these.

So the bot is not having to work out what to disable, that work has already been done, it just takes away the manual effort of clicking on buttons in the report.

1 Like

Three random selections that would break a Bot.

Bot’s don’t know about alternate track names. They don’t know about typos. Don’t know about extra spaces. Track "f6079a31-af81-4be5-85ea-6077e793d02b" | AcoustID , Track "dee6c78e-c869-454e-bf79-085694c405e0" | AcoustID , Track "976d22cd-59c1-42ac-bd53-0f4c5353c595" | AcoustID

All examples of same track, different name, but not same enough to be merged.

Geeky Pink Floyd knowledge is needed here to know they are the same. This gets far worse in the Classical arena.

if the tool is manual - then it speeds things up. but too often it can be confusing. What is “correct” here? Track "e54b8c32-e1a6-4862-8b7e-8ca2ccd27c14" | AcoustID Without listening we don’t know. Time matches the first track, but title matches the second.

Though, on the flip side, I also see removing too much can be better than bad matches. Good data should come back to replace it.

I have been using these reports to fix up the Releases in many cases. I’ll follow from a reported recording and look at a release in the whole. It can often then stand out when a bad CD has been attached and a full set of duff links removed. But this is based on fan knowledge.

None of those examples are listed in Report 1 - for starters they are all within range of the fingerprint length, so your examples are not one in my reports that i am suggesting fixing by bot.

Most reports don’t actually seem to have many classical tracks listed.

I have now updated the new report so it doesn’t show potential bad pairings that have already been disabled, now has 123,418 records.

They are all in Report1, page 5. I followed your link.

I haven’t had time to go find ones with big time differences because most of the bands I focus on I have already corrected.

The Prince example does exist with wider time differences in other examples. It is a real question - how do you know which is the correct one without having that music to hand? Is it the one that matches the length, or the one that matches the titles?

The AC/DC example is also in your report. Almost 30 seconds, but it is going to just be a live track with less crowd. ( Track "03432e70-1e0b-4803-835e-ad737039cb42" | AcoustID )

Maybe I am looking at the data too closely. It is a thing I do. I look at patterns. No one else seems to be commenting. But I would suggest you get a response from the owner of the database before you delete thousands of links by an automated process.

Your re-done report is still working on old data. Page 86 you will find Pink Floyd/Astronomy Domine tracks that were removed in January. Also it shows that 30second in a 25 mins track is a bit misleading. ( Pink Floyd/Atom Heart Mother or Pink Floyd/Atom Heart Mother )

(Sorry if this is all badly worded babble… complex day here, bad head, probably shouldn’t be looking at this stuff as I don’t think I am making myself clear. or probably just confused and talking rubbish)

In most cases, yes. Then they both get attached to the same Recording. They’re both correct, even though you have a huge duration difference. Therefore, you can’t make an automatic judgment based on duration.

1 Like

Totally agree. Speed of playback is just mastering. And MB treats mastering as something not important and clumps them all as the same Recording. It is the same tape played at different speeds. (See also crowd noise chopped off in an early fade)

Don’t think that would come into the above thread though. This thread is more about spotting real errors in data.

Sorry, okay the first two are but the third is not.
So the first one I would say both are invalid because they are 20 seconds too short, I dont really buy this idea that it is valid to match to a song 20 seconds shorter because it has crowd noise at the end. The MusicBrainz should record the length of the audio, if the audio it links to is 20 seconds later than it is different audio.

Prince example is an awkward one, but I would say it was the one that matched based on length.

The Atom Hear Mother example is not using old data, they are both listed because neither matches the length of the fingerprint.

I’m not sure the post above this current post is in alignment with your claim.

But this is very much a different discussion for a different topic. And all I am doing is parroting the MB guidelines. I see the logic as to why they do things the way they do, but I don’t necessarily agree with that for my own collection. All I am doing is fitting inside what the rules of this website are. Personally I keep my recordings in separate folders.

MB is not implying that they are identical, but just that they come from the same recording session and can therefore be grouped together so they can share related data. Any edit that includes something different is separated.

MB has their own definition of “Recording” that is unique to this website. It has Pros and Cons that would be better debated in a new thread, but I doubt it would change much as it underlies so much of how the site is structured.

These can often be messy if they are using typed data. For example, if someone has written what is on the LP. And then uploaded an AcoustID from their ripped files. 30 seconds is only just over 1% of a 30 minute track.

Sorry, I’ll go away. I have just seen the damage bots do and we get left to clear up the mess. I work in IT and not a fan of automation as it does not always fix the issues. I’ll leave the discussion to others.

Hi, thanks for your input. Its good to your expertise input on this but I do think these issues you raise are so rare that they would effect a miniscule number of records. And maybe because you are focusing on Pink Floyd which you have already fixed the obscure issues seem more commonplace than they actually are, if you were to take some random artist in the report i think it would be pretty obvious in almost all cases that the pairings are wrong without you having to have any knowledge of the artist.

Take your point about MB guidelines but if a particular live recording is released on two different albums, and on one of the albums the original recording is edited so that is is shorter surely in that case they would be stored in MB as two different recordings they wouldnt be merged into one recording ?

It seems to be that if the recording time doesn’t match it is more likely to be because actually have matched to the wrong release, i.e there is a version of the bootleg out there that matches the submitters track lengths but MusicBrainz doesn’t have that version and therefore the Acoustid is linked to a wrong (but very similar) version of the recording.

If you look at the AC/DC example the shorter version that doesn’t match fingerprint length links to Video “Dirty Deeds Done Dirt Cheap” by AC/DC - MusicBrainz and this of type video and the four releases it is on are all forms of Video. So in this case this is wrong Acoustid and been matched to the shorter video of the Donington concert when should only be matched to the correct length audio only album.

The Prince example is a dilemma, I had already gone through the Prince matches myself and wasn’t sure what do about this one which is why the reason why that is the only one in the report (plus one other that i have now fixed)

But the situation is I have spent some time creating these reports, but there is a lack of interest in fixing the issues in the report, and I certainly do not have the time to do it manually myself. If a very small number of valid matches are disabled by an automated tool is that such a problem ?

BTW I have emailed Lukas, raised issues and discussed Acoustid on this forum, but had no response from him on anything Acoustid related.

1 Like

I agree. Removing bad links with a bot. Some good links will be deleted but this is much better than that all the bad links stays in the DB.
Good data should come back to replace it.

3 Likes

If they’re the same master and the only difference is where a fadeout in the end is placed (and/or the amount of silence at beginning or end), then it’s the same Recording in MusicBrainz terms. While not common it’s also not unheard of to have Track lengths with >30 second variance within a Recording which are still perfectly valid.

Would I be right in thinking this mostly applies to live recordings, I could filter releases with secondary album type of live.

It must be pretty rare, because for someone to merge two recordings they would have to have both albums to compare and be interested enough to merge the recordings and to be confident enough to do it because as I understood it recording merging is not encouraged unless really sure since because once merged you cant unmerge two recordings.

Improved report headers so you can see the good artists matches at top of each page.

Tightened up reports with good fingerprint match/bad fingerprint match to enforce that the good match must actually be a good match wrt fingerprint length, then after going some of these modified reports manually very carefully found very few pairings that should remain linked.

Since the Albunack report usually has enough information to let us disable the pair I found a way to add Unlink button that disables the link without having to go to interim Acoustid page, no need to add disabled comment since it is is not shown anyway and not being read by anyone. I have also added a Unlink by Artist button that unlinks all the links on the page that have the same value for Good Artist. So this speeds up deletion but please note I had to enforce a 1 second delay between each row and this will tie up your browser tab showing report page until complete, but does not affect other tabs, tested on Firefox.

Have now gone through the various reports that have a good match with multiple submissions and a bad match with only one submission and removed most of the pairs. So should now be relatively easy to keep on top of this as new bad data is added.

I have also removed links for the MB Recordings Ids linked to at least 50 Acoustids report and removed those links that have at least one minute difference between the fingerprint and MusicBrainz recording.

I then split the reports that only have a bad match (and no good match) with a MusicBrainz link more than 30 seconds adrift of the Acoustid fingerprint into multiple reports

Acoustid links to songs that vary by at least 4 minutes from fingerprint length
Acoustid links to songs that vary by at least 3 minutes from fingerprint length
Acoustid links to songs that vary by at least 2 minutes from fingerprint length
Acoustid links to songs that vary by at least 1 minutes from fingerprint length
Acoustid links to songs that vary by at least 30 seconds from fingerprint length

And have cleared out most of the links where the difference is over 4 minutes

It seems unlikely to me that a link can be valid if there is a difference of more than 3 minutes between the Acoustid and the recording but I have left these alone for now in case anyone has any input on this.

All reports are listed here

1 Like