ID generation and calculation

thwaller · April 27, 2022, 12:07pm

Might I ask those here to have a look at this thread?

Of you look through all of the stuff, you will see that there is an issue with a WAV and a MP3, of the same recording, getting different AcoustIDs. I would appreciate any and all help in explaining this.

outsidecontext · April 27, 2022, 3:03pm

This article might help understanding how fingerprints get generated and then compared:

Unfortunately the images have gone missing

On the server side fingerprints that are similar enough and have a similar overall track duration are grouped together and are considered the same recording. These get grouped together and assigned an ID (which is commonly known as the AcoustID).

Looking up matches for a fingerprint is described here:

thwaller · April 28, 2022, 2:26am

This is quite interesting. Reason - I am seeing conflicting facts/statements. No ill intent there, but my findings are eye opening.

As stated prior in a few areas, I am happy to share the files I am working with, privately of course.

I have also experienced the Shazam algorithm identify different recordings (versions, remixes, edits, etc), while MB does not. I left out the AcoustID intentionally, I am trying to digest this.

outsidecontext · April 28, 2022, 4:03am

I sent a PM to you (,via MB)

thwaller · April 28, 2022, 4:52am

responded to. thanks.

outsidecontext · April 28, 2022, 6:28am

Ok, interesting results. So first off analyzing your files did indeed give some different AcoustIDs. This is the actual result of the fingerprint comparison between the WAVE and the 320kbps MP3:

https://acoustid.org/fingerprint/89447221/compare/89447222

So you see there is some notable difference, but not too bad. It’s reasonable to assume that this is somewhere near the threshold to be considered the same.

I then went on and converted the WAVE to different bitrates, starting at 128 kbps and going in 8 bit steps up to 320 kbps. Also I tried 64 kbps The conversion was done with sox like this:

sox -S --multi-threaded Dj\ Brokdorff\ -\ \(I\ Am\)\ OldSchool.wav -C 320 out-320.mp3

The result is interesting, the fingerprints for all bitrates up to 288 kbps produced a fingerprint more similar to the WAVE one, the fingerprints for 296 and above produced fingerprints more similar to the original MP3 one. I have unfortunately not saved the lookup results from the AcoustID API response, but at least some I saw contained both fingerprints as potential matches, just with different similarity.

So basically we are at a case where the fingerprint similarities are moving just along the threshold line.

Now I did something that actually changed the situation: I submitted some of the fingerprints of the other encodings. If AcoustID finds multiple potentially matching candidates for a fingerprint, that is a submitted fingerprint sits somewhere in between two AcoustIDs in regard to similarity, AcoustID checks whether it is reasonable to merge the two AcoustIDs. And that’s what it did in this case: Track "f7f3ee20-7c6b-462a-9188-73a063605970" | AcoustID

thwaller · April 28, 2022, 9:05am

Thank you for taking the time to look at this. It is greatly appreciated. It is interesting you used SoX, but it is great to see. SoX seems to be a forgotten tool as of late.

In my testing, I used what was given to me. It is not something I generated. As I tested further and converted, I used the tools specific to the format. Meaning I used LAME for MP3, vs ffmpeg or SoX, or other.

My question still remains… what caused this difference? We have a WAV, arguably the most lossless there is, and a MP3 of reasonable quality. What is it that creates the difference?

outsidecontext · April 28, 2022, 9:34am

I can’t answer you this in detail, that would require detailed analysis of the audio file and fingerprint generation at each step and I don’t know the algorithm good enough to do this.

But you basically gave the answers yourself in DJ promo releases - #112 by thwaller . Lossy audio codecs are lossy, the decoded audio is different from the source. You even said you could hear a difference (I can’t, but that doesn’t say much ).

That also means the fingerprints, while reasonable stable, differ so slightly. The important characteristic of the Chromaprint fingerprints is that they are comparable, so you can tell how similar two fingerprints are.

And that’s where the AcoustID server comes into play, it compares fingerprints and groups them together into tracks aka AcoustIDs. Actually there are even two levels of this similarity comparison: Very similar fingerprints are indeed just treated the same, so when you submit them they just increase the submission count of the existing fingerprint. And then there is the grouping of similar fingerprints above a certain threshold into AcoustIDs.

You can actually see this now at https://acoustid.org/track/f7f3ee20-7c6b-462a-9188-73a063605970:

Originally there were only two fingerprints, 89447221 and 89447222. These were for the two files from your ZIP. Now I generated different encodings from the WAVE file and submitted the fingerprints for 9 different of them. Actually each fingerprint for 288 kbps and below is a tiny bit different. But the import caused only two new fingerprints in the AcoustID database, 89474492 (which grouped 8 of the submitted fingerprints, meaning they were really really close to one another) and 89474493.

Also interesting might be that all the encodings for 296 kbps and higher up to 320 kbps produced the bit by bit identical fingerprint. So encoding quality surely has some impact. But in this example it is clear that there is some minor but notable difference to the WAVE, but all the files encoded at 296 and higher give consistent decoding results. This might be something very specific to this audio in regards to the encoding algorithm.

Also it is important to stress that you cannot generalize this. You can’t say “Fingerprints for MP3 encoded with 296 kbps will be more different to the source than fingerprints for 288 kbps”. As I said in the other thread the MP3 in my collection, of which most are encoded with 320 kbps, very consistently give the same AcoustID as the FLAC file they are derived from.

Overall while this is a very interesting case this all is neither unexpected nor surprising. As I said in the other thread this is all about comparison of slightly different data for similarity. And inevitably there will be these cases where things are just beyond the used threshold.

lukz · April 28, 2022, 10:51am

The process of assigning individual audio fingerprints to “AcoustIDs” is a little flawed, unfortunately.

There are two main factors:

It takes the reported duration into consideration and that is often reported incorrectly.
It uses a simple average of the difference scores over the entire duration of the fingerprint. This is not OK if we are dealing with small song variations, like exactly the same background track with different vocals. Or slight mixing changes. The algorithm is tuned to not consider the same AcoustID and one unfortunate side effect is that sometimes the same file in different codecs will get a different fingerprint.

If my life goes well, I’m hoping on fixing these issues in near future. Everything is ready for it, just need to put the things together.

There is also one extra issue, which is not easily fixable in the current version of AcoustID. The fingerprinting process uses fairly large processing windows, to keep the final fingerprints small. However, that means they are rather sensitive to tiny alignment changes, which often happens with MP3. Any MP3 encoded file always has some encoder delay in it.

thwaller · April 29, 2022, 12:09am

Thanks for this confirmation. This confirms that what myself, and it seems many others here, assumed… that the AcoustID “ignores” difference in codec. It is interesting to me… as in this case, the MP3 is of fair quality, where I would expect it to not be difference, vs a low quality where there is far more degradation. Regardless though, since it is known that this can happen, that is explanation enough.

This also answers AcoustID and its assignment to clean and explicit versions. You explanation implies that such differences might not be enough to generate a different ID.

lukz · April 29, 2022, 4:02am

There is a new fingerprint matching algorithm, that I developed a few years ago, which can safely detect the differences between clean and explicit versions, but at the same time keep multiple encodings of the same track always on the same AcoustID. Basically, if the difference is more or less uniform, it considers them the same track up to some threshold. But if there are periods in the song when the fingerprint difference is too high, even for a second or so, they will be always flagged as different.

The main reason why it’s not used on AcoustID yet is that changing the algorithm would ideally come with fixing of some of these incorrect AcoustIDs, which would mean either merging some (which is easy), but also splitting some (like the clean/explicit case) and the splitting part is a much more difficult one, given the size of the fingerprint database.

thwaller · April 29, 2022, 4:29am

Is there a way for users to use this, not as the AoustID in MB, but for their own uses?

thwaller · April 30, 2022, 5:02am

I wanted to clarify this further. Usually under general / casual listening, I cannot easily tell a difference. It is only on very dynamic types of recordings that I might. When using earphones however (or relaxing with music playing on my system), I am able to notice the difference. This also mostly depends on me hearing a better copy, then recognizing one not as good. The reason for this is so music today is produced so horribly, even uncompressed sounds compressed (basically because it is dynamic range compressed heavily) intentionally.

I am quite picky about the sound that I listen to, but I cannot say that listening casually… while cleaning, driving, cooking, etc … a difference will be noticed as long as you are playing a modern compressed format. some of those old 128 MP3s were so bad…