IDs that are already there


#1

I tried looking for this, but did not find a match. I was wondering is it is a possibility to do something similar to the AccuRaterip data has where although the ID might exist, you can still submit it and add +1 to the confirmation count. I can understand the logic in not allowing submission of an ID(s) that is already there, but I can see a great use for a confirmation count as well. Rather than explaining the logic, the logic is basically the same as AccuRaterip’s logic on this topic. The need for this shows when there are multiple acoustIDs for a single recording. I would be happy to go further into depth on this should someone want or need.

For those who do not know, or do not want to look, here is a summary of that:
AccurateRip
Overtime AccurateRip can become like a wise-friend, someone you can rely on and trust. It works by storing peoples ripping results and comparing your result with theirs. For example 100 people rip Madonnas latest CD, of those 100 twenty have errors, the other 80 all have identical rips. If you were to rip your Madonna CD there are 2 possibilities, AccurateRip would report that 80 other people agree with your rip (confidence of 80), or that 80 disagree if your had errors. What are the odds of 80 people agreeing with your rip, but they really had a bad rip (ie those 80 people had bad rips which happened to give the same check code)? the odds are 4 billion x 4 billion (repeated 80 times), an astronomical number. If more than 3 people agree with your rip, it is 100% certainty it is accurate.


#2

AIUI: A downside is that a release can have multiple Disc IDs, all of which can be correct.
If one of those Disc IDs is far more common than the others and so gets to, say, +80, then new Picard users with rare Disc IDs which only have a score of +1 will be worrying unnecessarily. And may be motivated to make duplicate releases in the MB database to “correct” the “problem”.

This, AIUI, is different to AccurateRip where every rip of a release (even if they have different Disc IDs?) is meant to be identical.


#3

I think we may have crossed here. I am talking about AcoustIDs, not DiscIDs. The acoustID should be the same for all disc IDs from what I understand, correct?


#4

ahhh - Now I understand.
What follows is not coming from a place of deep understanding:
The same Disc ID should give the same acousticID.
And, I think, that different Disc IDs can give the same acousticID.

(I just had a quick look through the documentation and it seems that a Release is defined by its Cover Art. An consequence of this approach seems to be that a Release might consist of multiple versions with different track lengths with the same Cover Art.
Is this the case?)
.


#5

Well, yes and no. The release is determined by a combination of barcode, catalog number, release country and date, etc. So there should not be a case of a Release containing multiple versions of a track, as that would make a different Release, but the same Release Group.

I just worked on a release like this, where the original release had the original recording and a year later, there was a re-issue with a replacement version of a recording on it. So the releases are the same except the one recording, one is original album version and the other is a radio edit, they being about 20 seconds in time different.

Now, where the issue is, trying to work off of your words, is that a Recording can be used on different Releases, giving that Recording multiple potential Disc IDs to associate with. Also, this could be a digital release where there is no disc ID at all. On digital, we can also have FLAC, MP3, AAC, etc which I do not know if they will product different acoustIDs or not.

But where the issue I discuss here shows is that a Recording is often seen in the database with more than one acoustID. Why they are different, I Cannot say, but speculate. Maybe it is a different encoder for digital, maybe someone is submitting wrong/bad data, different disc id, etc. So, lets say that I lookup a digital release. I see my acoustIDs that are calculated and I see one that differs. There are 100 with ID A and 3 with ID B. This can raise a flag to me to check the recording and make sure it is in fact correct. With iTunes for example, there are many examples of a recording being titled “My Sample Song” when in reality is is “My Sample Song (My Remix)”. So when I look, I am thinking I am looking at the non remix, but it is remix. The acoust ID can tell me there is a difference here, and if it is just a list of 2 IDs, it is not showing me the same. Does this make sense?

So I could then look and see that acoustID 12345 is used here _____ 100 times and acoustID 54321 is used 3 times here _____. I may then be able to see what I really have and avoid a crossing of recordings that are actually different. So like the ripping I mentioned above, there I can see if my CD rips a little off compared to others, which is actually quite common and possible. The difference there might be nothing as I could be correct, it is just a visual confirmation / warning of my results. If I rip with 100 confirmations, there is good chance all is good. If I rip with 2, that is a lower confidence factor.


#6

I think you’ve made a good case for MB to record/report the number of times an acoustID has been submitted for a given Release.


#7

Just to be sure to understand your suggestion:
If you look at Madonnas album Rebel Heart:


and the fingerprints for the first song “Living for Love”:

and then click on the different fingerprints for this song, you can see on the acoustid.org webpage, that there are much more “Sources” for the first fingerprint 0380022c-4fe7-47e4-b6a1-7b1ff90e95f0 then all the others.

Do you think about something like this number of Sources?


#8

Partially, yes. But specifically what I refer to is the number of submissions that match the fingerprint. Example… I just used Picard to enter a release. When I went to submit the fingerprints, I could not do so. FRrom my understanding, this is because the fingerprints are already there, which I was easily able to confirm. So at a minimal example here, because they exist, that is a 1 count. I am proposing I be able to submit mine, making the count 2. This would apply to each fingerprint. So I could look at my release in comparison to the other releases in more detail. So I may see that tracks 1-8 have the same fingerprints as most all the other users. But when I get to track 9, the fingerprint with the majority count is not what I have. MAybe it is not even there, or it is there but only a count of 5 compared to the majority one which could be 50. This tells me that my release, which I believe to be correct, is only a match to 5 of 55 other users.

I hope that makes sense. What this tells me is not that my release is wrong, but for some reason, it is not a match to 50 others of the 55 total, but there were 5 that had it. So maybe I actually have some kind of specuial release (like Echosmith where a year after the first release there was a re-issue changing one recording from album version to radio edit and same release title and recording name), I might have a invalid release or a bootleg of sorts, I might have mixed up a recording and it is just totally wrong, etc.

The Echosmith example I like because at iTunes, the recording “Cool Kids” retains the same title on both the original and the re-issue. So by looking at the track listing of names, you will not see a difference. But if you look at the recording durations, you will see a 20 second difference. Since it is MB policy to list recordings as they are on the release, the real name of the recording “Cool Kids (Radio Edit)” is not used, but just “Cool KIds”, just like the original., because that is how it is listed on the release.


#9

This portion specifically, yes I do see value in that, but that is a bit off focus for this and not what I am suggesting exactly.


#10

What is wrong with AccuRaterip?
I can see the whipper cd ripper does support looking up information so that would suggest they have an api that third party software can use.

acoustid is not a database for testing the accuracy of your recordings. it has been designed to match similar recordings so a bad rip would match a good rip.

What you really want to do is get a hash of the recording part of the file and be able to query that the file matches,
Good thing we have this database with +1M recordings with a md5sum of this data: acousticbrainz.org
As there are already plugins that query the acousticbrainz web service it should be easy enough to check the md5sum and see if this matches the md5sum on record.


#11

I think I might be explaining poorly…

There is nothing wrong with AccuRaterip, aside the fact that it is useless with digital releases. And you are correct, Whipper, what it replaced and others do utilize that system. But for this, it does not even apply. The only reason I mentioned it is to show an example of a system that counts submissions, where acoustid does not.

acoustid is not a database for testing the accuracy of your recordings. it has been designed to match similar recordings so a bad rip would match a good rip.

That is perfect and fits into my point. By having a count tally of the IDs submitted, it can further illustrate this. In MB, when I view a recording, I can see the acoustIDs in a list. Sometimes there are none, sometimes one and sometimes more than one. It is my opinion that having a count next to those IDs would be useful. Currently if I use Picard, I can only submit IDs if they are not already there it seems. What I am suggesting is that I be able to submit and it be recorded as a +1 to the count, or a +1 to the user verification, which in turn can create a confidence rating. I have seen a few releases for example where tracks have been swapped, etc. This can cause an invalid ID submission.

I am hoping this makes sense. I fear not as responses keep doing back to physical media attribute, when this is not related to the medium, but the recording itself whether on a CD, digital, vinyl or 8track.


#12

Here is an example of a recording I just came upon with 4 acoustIDs.


#13

Looking at the Discogs page for the originating RG of Casablanca (Jay in the Mix) I’d think that those 4 acousticIDs probably came from different Releases.

Does the MB definition of a Recording mean that every instance of a specific Recording should have the same acousticID though?
I think not when I read Style/Recording: Following on from this, separate recordings should not be created for remastered tracks, since remastered tracks generally feature the original recording with different mastering applied.

To use a count as a confidence rating we’d need to know the number of possible re-masters. (or other legitimate causes for differing acousticIDs?)
Once there were more than that number of acousticIDs, then confirmatory counts would be useful I think.
So if there was only one form of a Recording then counts for different acousticIDs would seem very useful. As acousticIDs that didn’t match would indicate either a misattribution or a some other error.

Have I got the area that you are considering correct?
Does what I write address directly what you have written?


#14

Hmm, I think even in that case it would be useful:

Recording A: 526 AcoustID submissions
Recording B: 349 AcoustID submissions
Recording C: 3 submissions

This example still gives the user useful information, with multiple correct AcoustID’s, and without knowing the number of possible remasters. We can assume that Recording A and B are quite likely to be correct (eg one might be a remaster), whereas C is suspect. I would not necessarily use it as a good reason to unlink AcoustID C, which might still be correct (perhaps just a uncommon version), but it is still useful.

I guess my question is, why not allow multiple submissions?


#15

Somewhat, yes. I do understand what you are thinking, but reading aerozol’s reply is my point here exactly. I hate to assume what others are thinking, but it seems like many are looking at this as a black/white thing, where I am seeing this as grey. The counts provide a distribution to look at, nothing more.

A crazy example… there is a 1 in 100,000 chance of getting a coin with “Z” on it. So if I get one, I might want to check it to make sure. The reason is that the “counts” would show a low total (for acoustIDs), so I may want to make sure I am correct. It is not a eliminator, but just to tell me the user that what I have, if correct, is not in the majority.


#16

This is a perfect interpretation of what I am saying.


#17

Hi thwaller , sorry if I give the impression that I’m against the idea.

I think it has many benefits
And unless some heavy costs can be identified I think it would be good to institute your idea.

I’ve just been trying to point out a minor cost that could be largely managed by user education.
That minor cost: Naive users might mislead themselves by mis-interepreting the count data. This would happen easily when the figures in the count data are low and when the user doesn’t understand that multiple acousticIDs for the same recording can all be entirely correct. The obvious undesirable outcome is that a misunderstanding naive user might then create a separate Recording based on their unique acousticID. I doubt this would happen frequently and suspect that compared to the amount of other erroneous edits this source of errors would be miniscule.

I find merging Recordings difficult in terms of getting enough evidence to have sufficient confidence.
I think your idea would help a lot with that.

Once your proposal is instituted and acousticID counts collected, an appropriate search/match of Recordings/acousticID could throw up results that would greatly assist human editors in merging many, many of MB’s Recordings. This would greatly increase the power of the Recordings database. :chart_with_upwards_trend:


#18

I see more of what you mean now. I think there is a minor cost from the unknowing user for most things, and I think you pointed it out here. Another example is for labels, so many add labels that are not actually valid as release labels, but do so because they are an option in the “labels” list. As a user that has made my share and more of mistakes, I think overall MB could use some user education, but that is a whole different topic.

Regarding this specifically, I think if a mistake had to be made, the better mistake would be to create a new recording, vs using an existing recording that is wrong. It is fairly easy to merge a recording, and a duplicate recording has little impact on editors adding new releases. A recording that is used incorrectly though, where the recording is tied to recordings that are actually different, can cause issues even for those adding new releases and just amplify the problem. Regardless, this is still talking about the best option of a bad thing.

I think too that more explanation and documentation could be helpful with acoustIDs. One example is user accountability. There is no tracking (that is visible to me the user) as to who added what. Submitters all need (I assume) an API key, and I honestly thought as a user I would be able to see what IDs I had submitted using my key. For me, I would ideally love to see the addition of acousIDs via Picard to be recorded as an edit like anything else.

I really like the idea of the acoustIDs, it took me a while though especially since there were issues for me with Picard and submitting them in the past. With those resolved, I am considering the submission of acoustID as a part of any release I add, just like the track titles and durations, if I have it, it should be added. Part of the point of this statement is that due to a lack of understanding and ease of submission, there are many releases I have added that were done without the IDs, making that another cost of the unknowing user of me.


#19

I wanted to share this as a good example of a problem case.

In this album, there are a ton of IDs on each recording, and some recordings even have duplicate IDs, meaning that for example, recording A and B have the same acoustID. There are 7 instances of this on this one release alone… where an acoustID is listed on more than one recording. This is one example where the counts referred to above would come in handy. Additionally, one of the recordings has like 36 IDs. I guess that is possible. Could that actually be correct? Logic would tell me no, but I am still short on a full understanding of these IDs and exactly how they are generated and possible error points.


Differences on fingerprint softwares/packages
#20

Please see: Differences on fingerprint softwares/packages

I am trying to provide a solid reasoning for asking the questions that no one is answering. There are real editing scenarios where this information could be useful.