AcoustID data shown on MusicBrainz

Is there documentation somewhere that explains the meaning/source of different parts of the acoustid data shown on MusicBrainz, in detail?

For instance, I’m trying to figure out what to make of the fact that the two active acoustids shown here are a perfect match, but one reports a length that’s over a minute longer than the actual recording: https://beta.musicbrainz.org/recording/a554fc8d-0f1a-40a2-933e-820b9e0dcdae/fingerprints

I may be wrong, but I vaguely seem to recall that the AcoustID is based on the first X (30?) seconds of the recording, so would not show them as different if one had applause or other additional audio at the end.

2 Likes

That’s what I thought, too, but this seems to say otherwise: https://acoustid.org/faq

"Can the service identify short audio snippets?

No, it can’t. The service has been designed for identifying full audio files."

I saw that, but I took it to mean that it couldn’t be used to identify a file based on a snippet from the middle of the file. Also, there appears to be something in the code to ignore periods of silence. Perhaps that’s it?

2 Likes

Let’s dissect this a bit:

Both of those two AcoustIds have exactly one fingerprint attached. We can compare those fingerprints here:

https://acoustid.org/fingerprint/20567522/compare/36820223

They are nearly identical, even if there are small differences. So why are those not assigned to a single AcoustId? Likely because of the length difference. As mentioned above AcoustId fingerprints are based on the first ~30 seconds~ 2 minutes* of audio, but that’s not all information the AcoustId server uses. In addition to the fingerprint the total length of the recording is considered.

Now why are both AcoustIds linked to the same recording? Basically because someone decided to do so. The connection between recordings and AcoustIds is essentially a manual process (done by submitting the fingerprint with a recording ID with a tool like Picard).

If it is correct that both AcoustIds are linked to the same recording depends on whether we consider both the shorter and longer version the same recording on MB. If they are considered the same obviously both AcoustId should be linked to it. If the longer version is considered a separate recording the corresponding AcoustId shoud lribably also be removed from the shorter recording.


  • EDIT: The fingerprint is based on up to 2 minutes of audio, not as I wrote originally 30 seconds. I got confused by the previous discussion.
4 Likes

So what you’re saying is the acoustid with the longer time means someone attached it to this shorter recording, but their source was a longer recording? Because this is the only length in the database for this recording. (And therefore we can infer that there is a longer recording with the same first ~30 seconds, somewhere out in the wild?)

If the difference is after 2 minutes, AcoustID will be the same (it used to be even shorter than 2 minutes):


The 30 second mark is something else: AcoustID will not generated at all for sounds shorter than 30 seconds.

3 Likes

I knew I saw something about 30 seconds. Thanks for clarifying.

2 Likes

This is true for audio with the first 2 minutes identical and roughly identical length. If there is a notable length difference there should be a separate AcoustId, see my comment above. I’m not entirely sure where the threshold is, though.

It’s 2 minutes, sorry for adding to the confusion. But basically yes, at least the fingerprint got submitted with a different length. This does not necessarily mean this recording exists somewhere officially. It could also be just a file with some random radio moderation at the end, or the beginning of the next song, or silence. It also could be a submission error where the software doing the submission reported the wrong length.

3 Likes

This right here is some of the info I was looking for that I hadn’t found yet. Thank you.

2 Likes