IDs that are already there

Partially, yes. But specifically what I refer to is the number of submissions that match the fingerprint. Example… I just used Picard to enter a release. When I went to submit the fingerprints, I could not do so. FRrom my understanding, this is because the fingerprints are already there, which I was easily able to confirm. So at a minimal example here, because they exist, that is a 1 count. I am proposing I be able to submit mine, making the count 2. This would apply to each fingerprint. So I could look at my release in comparison to the other releases in more detail. So I may see that tracks 1-8 have the same fingerprints as most all the other users. But when I get to track 9, the fingerprint with the majority count is not what I have. MAybe it is not even there, or it is there but only a count of 5 compared to the majority one which could be 50. This tells me that my release, which I believe to be correct, is only a match to 5 of 55 other users.

I hope that makes sense. What this tells me is not that my release is wrong, but for some reason, it is not a match to 50 others of the 55 total, but there were 5 that had it. So maybe I actually have some kind of specuial release (like Echosmith where a year after the first release there was a re-issue changing one recording from album version to radio edit and same release title and recording name), I might have a invalid release or a bootleg of sorts, I might have mixed up a recording and it is just totally wrong, etc.

The Echosmith example I like because at iTunes, the recording “Cool Kids” retains the same title on both the original and the re-issue. So by looking at the track listing of names, you will not see a difference. But if you look at the recording durations, you will see a 20 second difference. Since it is MB policy to list recordings as they are on the release, the real name of the recording “Cool Kids (Radio Edit)” is not used, but just “Cool KIds”, just like the original., because that is how it is listed on the release.

1 Like

This portion specifically, yes I do see value in that, but that is a bit off focus for this and not what I am suggesting exactly.

What is wrong with AccuRaterip?
I can see the whipper cd ripper does support looking up information so that would suggest they have an api that third party software can use.

acoustid is not a database for testing the accuracy of your recordings. it has been designed to match similar recordings so a bad rip would match a good rip.

What you really want to do is get a hash of the recording part of the file and be able to query that the file matches,
Good thing we have this database with +1M recordings with a md5sum of this data: acousticbrainz.org
As there are already plugins that query the acousticbrainz web service it should be easy enough to check the md5sum and see if this matches the md5sum on record.

I think I might be explaining poorly…

There is nothing wrong with AccuRaterip, aside the fact that it is useless with digital releases. And you are correct, Whipper, what it replaced and others do utilize that system. But for this, it does not even apply. The only reason I mentioned it is to show an example of a system that counts submissions, where acoustid does not.

acoustid is not a database for testing the accuracy of your recordings. it has been designed to match similar recordings so a bad rip would match a good rip.

That is perfect and fits into my point. By having a count tally of the IDs submitted, it can further illustrate this. In MB, when I view a recording, I can see the acoustIDs in a list. Sometimes there are none, sometimes one and sometimes more than one. It is my opinion that having a count next to those IDs would be useful. Currently if I use Picard, I can only submit IDs if they are not already there it seems. What I am suggesting is that I be able to submit and it be recorded as a +1 to the count, or a +1 to the user verification, which in turn can create a confidence rating. I have seen a few releases for example where tracks have been swapped, etc. This can cause an invalid ID submission.

I am hoping this makes sense. I fear not as responses keep doing back to physical media attribute, when this is not related to the medium, but the recording itself whether on a CD, digital, vinyl or 8track.

2 Likes

Here is an example of a recording I just came upon with 4 acoustIDs.

1 Like

Looking at the Discogs page for the originating RG of Casablanca (Jay in the Mix) I’d think that those 4 acousticIDs probably came from different Releases.

Does the MB definition of a Recording mean that every instance of a specific Recording should have the same acousticID though?
I think not when I read Style/Recording: Following on from this, separate recordings should not be created for remastered tracks, since remastered tracks generally feature the original recording with different mastering applied.

To use a count as a confidence rating we’d need to know the number of possible re-masters. (or other legitimate causes for differing acousticIDs?)
Once there were more than that number of acousticIDs, then confirmatory counts would be useful I think.
So if there was only one form of a Recording then counts for different acousticIDs would seem very useful. As acousticIDs that didn’t match would indicate either a misattribution or a some other error.

Have I got the area that you are considering correct?
Does what I write address directly what you have written?

Hmm, I think even in that case it would be useful:

Recording A: 526 AcoustID submissions
Recording B: 349 AcoustID submissions
Recording C: 3 submissions

This example still gives the user useful information, with multiple correct AcoustID’s, and without knowing the number of possible remasters. We can assume that Recording A and B are quite likely to be correct (eg one might be a remaster), whereas C is suspect. I would not necessarily use it as a good reason to unlink AcoustID C, which might still be correct (perhaps just a uncommon version), but it is still useful.

I guess my question is, why not allow multiple submissions?

3 Likes

Somewhat, yes. I do understand what you are thinking, but reading aerozol’s reply is my point here exactly. I hate to assume what others are thinking, but it seems like many are looking at this as a black/white thing, where I am seeing this as grey. The counts provide a distribution to look at, nothing more.

A crazy example… there is a 1 in 100,000 chance of getting a coin with “Z” on it. So if I get one, I might want to check it to make sure. The reason is that the “counts” would show a low total (for acoustIDs), so I may want to make sure I am correct. It is not a eliminator, but just to tell me the user that what I have, if correct, is not in the majority.

1 Like

This is a perfect interpretation of what I am saying.

Hi thwaller , sorry if I give the impression that I’m against the idea.

I think it has many benefits
And unless some heavy costs can be identified I think it would be good to institute your idea.

I’ve just been trying to point out a minor cost that could be largely managed by user education.
That minor cost: Naive users might mislead themselves by mis-interepreting the count data. This would happen easily when the figures in the count data are low and when the user doesn’t understand that multiple acousticIDs for the same recording can all be entirely correct. The obvious undesirable outcome is that a misunderstanding naive user might then create a separate Recording based on their unique acousticID. I doubt this would happen frequently and suspect that compared to the amount of other erroneous edits this source of errors would be miniscule.

I find merging Recordings difficult in terms of getting enough evidence to have sufficient confidence.
I think your idea would help a lot with that.

Once your proposal is instituted and acousticID counts collected, an appropriate search/match of Recordings/acousticID could throw up results that would greatly assist human editors in merging many, many of MB’s Recordings. This would greatly increase the power of the Recordings database. :chart_with_upwards_trend:

1 Like

I see more of what you mean now. I think there is a minor cost from the unknowing user for most things, and I think you pointed it out here. Another example is for labels, so many add labels that are not actually valid as release labels, but do so because they are an option in the “labels” list. As a user that has made my share and more of mistakes, I think overall MB could use some user education, but that is a whole different topic.

Regarding this specifically, I think if a mistake had to be made, the better mistake would be to create a new recording, vs using an existing recording that is wrong. It is fairly easy to merge a recording, and a duplicate recording has little impact on editors adding new releases. A recording that is used incorrectly though, where the recording is tied to recordings that are actually different, can cause issues even for those adding new releases and just amplify the problem. Regardless, this is still talking about the best option of a bad thing.

I think too that more explanation and documentation could be helpful with acoustIDs. One example is user accountability. There is no tracking (that is visible to me the user) as to who added what. Submitters all need (I assume) an API key, and I honestly thought as a user I would be able to see what IDs I had submitted using my key. For me, I would ideally love to see the addition of acousIDs via Picard to be recorded as an edit like anything else.

I really like the idea of the acoustIDs, it took me a while though especially since there were issues for me with Picard and submitting them in the past. With those resolved, I am considering the submission of acoustID as a part of any release I add, just like the track titles and durations, if I have it, it should be added. Part of the point of this statement is that due to a lack of understanding and ease of submission, there are many releases I have added that were done without the IDs, making that another cost of the unknowing user of me.

1 Like

I wanted to share this as a good example of a problem case.

In this album, there are a ton of IDs on each recording, and some recordings even have duplicate IDs, meaning that for example, recording A and B have the same acoustID. There are 7 instances of this on this one release alone… where an acoustID is listed on more than one recording. This is one example where the counts referred to above would come in handy. Additionally, one of the recordings has like 36 IDs. I guess that is possible. Could that actually be correct? Logic would tell me no, but I am still short on a full understanding of these IDs and exactly how they are generated and possible error points.

1 Like

Please see: Differences on fingerprint softwares/packages

I am trying to provide a solid reasoning for asking the questions that no one is answering. There are real editing scenarios where this information could be useful.

AcoustId itself already keeps track of the number of submissions for a specific fingerprint. See the source counts on e.g. https://acoustid.org/track/82afc57c-a1a1-48f6-a1fb-5684e1a30e22

The thing is that Picard does not allow you to resubmit already existing AcoustIDs. I actually would like to have this in Picard, too, but it is probably not quite clear how this can be done.

I don’t remember the details, but we had a short discussion about this on IRC some years ago. The main fear was that if you would just be able to submit fingerprints for existing matches, that quickly would turn into a self-fullfilling prophecy: User use AcoustID to match songs in Picard, and then submit the matched IDs to AcoustId and increase the count. That way Picard would introduce a bias into the data. I still think this could be done somehow, maybe keep track of the files that where not matched by AcoustId.

1 Like

It took me a bit, but I think I see what you are saying. The issue is the user, lets say me, submitting my release with acoustIDs. Then, submitting my release again. This is really one submission and not 2, so the intent is/was to prevent duplicate user submissions. Am I understanding this correctly?

If so, I had mentioned having submissions logged/tracked by API key and/or ID, just like MB edits for example. If this were possible, you would have the ability to restrict one submission per recording per user, since you need your API key to make a submission.

No, not exactly. I mean actually different users. Currently if you use the AcoustId scan in Picard two things can happen:

  1. You don’t get a match. In this case you can use the other options Picard provides to match a release and then submit the AcoustId. This is the first submission for this AcoustId ↔ MusicBrainz recording match
  2. You get a match. In this case Picard will load the matching release and will not allow you to submit this match again, because it is already in the database.

Let’s say we allow resubmission in case 2. Then let’s say I do case 1 for a certain track, but I get it wrong and match to the wrong version of that song, then submit the AcoustId. Now you have the exact same track. You are in case 2, Picard will see the match and load the wrong track. As it looks ok on a quick look you trust it and resubmit the AcoustId, thus confirming and emphasizing the wrong data.

5 Likes

Ok, I see. It is opposite as I was looking. The concern is letting bad data spread vs tallying proper data. Assuming I am with you this time, that makes total sense.

Thank you again for the info. I have a few topics on similar topics in the forums and the info I am getting is great and very useful to me.

1 Like

I wanted to add an example of where this might be useful. I just entered artist information and added data to an existing release:


Now on this release, track 3 had an existing acoustID, none of the others did. I added my acoustICs via Picard, and it is not a match to the one on track 3 existing on the MB release.

So… I can only speak to what I know, and that is my release is a copy of the original release back in 2012. Given that the release came from her and all the recordings match in content to what they should be (ex matching to YouTube video, matching the lyrics, etc), I can be certain that my acoustID is correct.

Now, this does not mean the other is incorrect. I don’t know where it came from or anything, but I can say that it was tied to only one recording on the release whereas my IDs are all from the release as a whole, as it was released. Personally, I an highly suspicious of the existing ID assigned to that recording, but I cannot delete it as the duration is a match (within 1 second) and there are no other recording names tied to it, so there is nothing obvious telling me that it is in fact wrong.

So my point of stating all the fact above is that this information would be useful to others to see. The fact that the IDs I submitted were all submitted together as a whole release, and the other ID was submitted to the recording and only the one recording. So removing a right or wrong mentality, there is knowledge in the facts of ID acquisition and submission even at a submission count of only one per. Additional submissions, should the feature be there to tally duplicate submissions, would further highlight this data.

Sorry for the long post, but I wanted to express in detail a realistic example of how more data can be useful vs just saying it would be useful.

1 Like

If you have this album and as the recording only appears on this release (it was not a NAT / SAR beforehand, for instance) maybe you can unlink the other AcoustID as they really don’t match. IMO Compare fingerprints #32321958 and #63069758 | AcoustID

Yeah, I saw they do not really match. Only match is the time and assigned recording. I hesitate to remove data that I do not know for sure, or at least have a solid base for assumption, to say it is wrong. The problem is I have no idea where it actually came from, and there is no way to “reverse engineer” the fingerprint.

One of the main reasons I hesitate to remove such data is I also enter a lot of data that is difficult and in some cases I would say impossible to reasonable verify. The artist in question here is also not a top mainstream artist, so there exists a lot of dual releases (both paid like Amazon and iTunes as well as free promo and DJ types) of the same release. This also means there is not a large amount of data on the artist either. For example, I do not think anything online (officially) actually references her real name and birthdate. If I needed to show proof of what I entered, I would have to use something like an article from the university she attended that would tie it all together… as an example. So I would hate for someone to delete such things simply because there is no evidence of it.

1 Like