Report showing acoustids likely to be bad link to musicbrainz recordings

I didn’t expect each to get a reply. :smiley: It was a list to put some of the oddities together to help refine the report (or reports). I am not knocking your excellent work, just showing examples that break the pattern to help improve the report script. As you noticed, many of those have their own patterns

Maybe multiple types of smaller reports would give better focus. Smaller sub-sets would also get more eyes looking I expect.

I am no fan of automatic bots making decisions due to the error rate that gets introduced. I don’t like @aerozol’s idea of deleting all of one user’s data because of a few errors. That could only work if it can focus on a date range - that can be checked. Picking on people is the wrong focus.

Unless it is an auto bot that removes all those random Stephen King audio book random additions. :rofl:

No problem, I realized that.

I may well do that when have improved existing report, if you have suggestions of specific reports I can consider that.

AcoustID.org adds more then 10’000 AcoustID’s daily to its database. Do you expect that @IvanDobsky and others process your reports from now on for all eternity?
What I meant was: Acoustid.org should fix it at the source right after submission or even doesn’t accept /reject obviously wrong submissions.

1 Like

By creating and refining these reports it would give AcoustID something to look at to help that out. The trouble is that AcoustID has problems knowing what is good or bad data. For example, there are many lone AcoustIDs with no data attached - these are especially confusing to spot. If it is a first fingerprint you have nothing to compare to. If the length doesn’t match the MB track, then how do you know the error isn’t on MB?

Messy

1 Like

No I agree that Acoustid could do alot more to improve their data quality, and I would prefer not to have to spend this time creating reports and then using them. Im just saying that at least I have come up with a partial solution that helps everyone, whereas your solution only helps Picard users and wouldnt work in many cases either, if it was easy to do I would have already done it in SongKong.

As for processing reports to eternity, the fact is musicbrainz editors are already correcting errors in the MusicBrainz database for eternity this is just another tool to help.

1 Like

That is NOT my suggestion :rage:

That discussion has been moved here if anyone’s wondering what’s being referenced.

2 Likes

Done some more work on this report

I have incorporated work name (where it exists) into the report creation, so if mutiple mb recordings for the same acoustid resolve to the same work name then they wont be considered a potential bad match if their simple names vary.

And I have split the results up, the first report now shows only those cases where the artistname is the same but the songname is different. The chances of the same artist having two actually different songs that resolve to the same fingerprint is virtually zero, so these are almost definitently bad pairings. The only cases when they are not will be when they are actually just different names for the same song that the report has not managed to filter out.

Then the other reports are where both the artistname and the songname are different, these maybe valid but since they have only one source it is more likely to be bad data, have split into 4 different reports bad upon how many data sources the good link has.

Lastly we have the original report, this contains more records than the others put together and in my testing most seem to be bad matches, so I have kept this report.

Note any links to Acoustid you have visited in old report should still shows as visited in new reports, all these reports listed at http://www.albunack.net/reports.jsp

4 Likes

New report added that shows Acoustid links to at least 5 different songs names, this is not just different recordings but recordings with different names

http://reports.albunack.net/new_acoustid_report6_1.html

Interesting… but hard to fix ones like this: Track "e026986d-943c-4108-af54-c8b1e60da12a" | AcoustID

And goes to show that The Beatles are a random mess of randomness. (I had not looked at them before)

Also doesn’t surprise me to see plenty of Classical in there as the names vary a lot

Good news, all the reports have now each record checked against the live Acoustid database, and if the offending pairings have already been disabled then the record is removed from the report.

So as of this moment the report should only show rows that still have the potential problem.

4 Likes

Just updated the reports so they have latest Acoustid/MusicBrainz data

Good to see making steady progress, for example the Multiple recordings for same artist with different name report has gone down from over 21,000 to just under 14,000 entries, so thanks to everyone who has helped with this.

3 Likes

Added the average fingerprint length (can span 7 seconds) and the Mb Recording lengths to the Acoustid reports.

I have been going through the first report that shows Acoustids linked to multiple songs with different names by same artist, this is clearly wrong in almost all cases, however when there is little difference in track lengths and no of sources between the good and match it is not always so clear which is the bad match

So I have now split the report into:

Multiple songs by same artist, song length doesnt match fingerprint length

Multiple songs by same artist, song length matches fingerprint length

The first report has the easiest cases, and I think there is a strong case for fixing these automatically (i dont know if technically possible to do this), the second report has more difficult cases

1 Like

Added one more report - http://reports.albunack.net/new_acoustid_report8_1.html

Shows cases where the linked MusicBrainz recording is at least 30 seconds different to the fingerprint length (and not covered by earlier reports) , so cannot really be a valid match

This gives us 132,000 matches.

This is not going to be cleared manually so my question is if it is possible to write a bot to remove the pairs from Acoustid - think the entries in http://reports.albunack.net/new_acoustid_report1_1.html would be safe to delete.

Probably is possible to write a bot but not something I’m familar with, my first issue would be how to programmatically login to Acoustid, if anyone can give help with this that would be great.

1 Like

There’s at least one controversial MusicBrainz guideline that can explain why this may be:

The original recording Don’t Push appears on tracks which vary in length from 3:45 to 3:55 because they have been mastered at different speeds. Therefore, the pitch of the audio is different on these tracks, but because there is no difference in mixing they are considered the same recording.

The more reasonable MusicBrainz guideline of merging recordings with differing amounts of silence at either end of the track can also explain this.

1 Like

There is no AcoustID API endpoint for it, but you can link to a link such as

https://acoustid.org/edit/toggle-track-mbid?track_gid=5c3e6fbc-0bea-4e51-8d73-c562f5617b96&mbid=27a9018a-5132-4ed4-9979-f2779885bb15&state=1

where track_gid is the AcoustID track ID and mbid is the MB reocrding ID. I think you could add such links to your report. Also maybe link the recording names in the Good Match / Suspect Match columns to the corresponding recording ID.

1 Like

Ah yes, I meant to do this, I will add it, this will speed up manual editing a bit.

The good match/suspect match columns group together similar names (but different mb recordings) so not so easy to provide a link, the link can be grabbed via the corresponding link on the Acoustid page.

I would have assumed if something has been mastered at two different speeds, then it will create different AcoustIDs. I work with bootlegs and have seen stuff like this and pretty sure AcoustIDs change.

In a similar vein, it is common in the bootleg world to chop 30 seconds off the end of a recording due to crowd noise.

Just found an example of that: https://acoustid.org/track/03432e70-1e0b-4803-835e-ad737039cb42 Obviously same track from same concert, but guess less crowd and\or chat on the end.
Still makes same acoustID.

@ijabz I see you still have very out of date data you are working with. I just did a quick dip in and a sample of just six all come up as “already fixed”. Would not want to be running any bot until the data has at least caught up. (And you know I am no fan of bots anyway - too many errors)

Oh - and 30seconds is very short in the Pink Floyd world. The ones I quick checked were 12 to 20 mins long tracks. And I tried another dozen - and all is fixed.

Sorry I’m just fixing for this new report now, it is due to error in the data feed provided by Acoustid it doesnt show the disabling of all pairings , so I have to use the api to check each value in the report table that feeds the html report before creating the html report, might take a few hours.

The data for the all other reports should only be 1 day out of date.

2 Likes

I realise it may take a few hours. Trouble is, report in that state would waste too many hours to check data that had already been fixed. Thanks for re-doing it.

I still poke in to these on bored evenings, but focus on artists I know.

Classical Music is a big puzzle in there. Track names change so much. Don’t know how that would be fixed with a bot.

It should always be run automatically before publishing report, just forgot about it.

So when I say bot, to be clear I just mean a tool that if provided with a list of pairings could login to acoustid and then disable each pairing.So for now i would only run on it report 1 because these seem to be always wrong, I don’t see a trackname problem with these.

So the bot is not having to work out what to disable, that work has already been done, it just takes away the manual effort of clicking on buttons in the report.

1 Like