DJ promo releases

I believe the intent is to catalog music releases. I believe the use to me helping users identify releases and recordings.

Yes, I would agree, the AcoustID performs quite well. It is not the ID itself where I See the problem.

Thinking through this more, I think the issue I am having is more a result of the bad data vs the use of the current features. While I agree that some recordings (now) can have numerous IDs, there are far too many where an acoustID matches numerous variations of a recording… but that is the result of bad input.

I am finding this issue as I look at these promo releases. They often contain a large amount of remixes of the same recording, and sometimes a lossless and lossy copy of the recordings.

The AcoustID, for me, performs perfectly. It can tell even the difference between a WAV and MP3 of the same recording. I did not know this prior I must admit. Learning this gave me two opposite feelings. One was great, it able to pick up on even differences that one may not be able to hear. The other was not great, since MB takes this accuracy and discards it. That is further amplified with bad data, but that is not part of the topic.

If you look at the scenario, if I have such a release, cataloging it into MB provides no benefit. Meaning that my directory listing is more clear than the result in MB. When I look at the directory, I might see this:

  • some-recording.wav
  • some-recording.mp3
  • some-recording.opus

I can visually see here that although the name of the recording is the same, I have 3 very different versions. If I look in MB at this same release, I would see this:

  • some-recording
  • some-recording
  • some-recording

From here I would need to look at other areas to see if I can figure out what is what, and why it looks weird (as it would to many as the same recording is there multiple times. I then also see that each of them has 3 AcoustIDs, where the reality is that each has only 1.

In this case outlined above, it is not the AcoustID that is problematic, it is MB and the structure. As noted prior, it brings into light that MB does not consider mastering, at times. I have seen releases get new recordings from CDs because you can “hear the difference”… which is generally attributed to mastering, likely some sort of remastering of old recordings. While I agree with that, it is an anomaly of sorts as the same is not applied to a digital release.

A digital release does not have the same identifiers as a CD. If I have a CD, there is a lot to look at, we all know the visual attributes in play. With a digital release a lot of that is just not there. So with “release in hand”, my options are far more limited. The AcoustID is a great tool as it is capable of getting in most cases exact results. I believe it would be fairly accurate (if the database supported it) to even take a file and match it to a specific release, like that was this MP3 version or that was this iTunes version, etc. That would help elevate the digital releases closer to CDs, where I can see two releases differentiated only by a single line in a booklet stating made in A vs B.

This is relevant to the topic as these DJ promos are no longer CD driven, although CDs are still available for some. When I take a CD, I can derive easily that this is track 1, this is 2, etc. So when I look it up in MB, track 1 = track 1. So even if all look the same, I can still match the recordings. On a digital release, say one with no track numbers, this is not possible. I have a folder of files and a MB listing that I may not be able to match file to recording. I would however, if the recordings were not all the same in MB. Not just with AcoustID, but with additional detail.

It is worth noting that you can not rely on AcoustID to be different between WAVE and MP3 or any other encoding, because it is specifically meant to produce the same AcoustID for lossy audio as well. Otherwise it wouldn’t work for what it is supposed to do. If every separate encoding of files ripped from a CD would identify as something completely different nobody would be able to identify their songs. Of course this is all just mathematics about similarity of fingerprints, and there is always that point where the audio is just different enough for the AcoustID server to consider them separate.

But in general the definition of saying that the MP3 encoding of a WAVE is the same recording makes a lot of sense to me. Otherwise for every CD there would need to be an endless number of recordings of all possible ways you can encode this audio. For streaming services this often is even totally fluid as the audio encoding and parameters will be chosen based on the playback device.

5 Likes

I thought this as well, but if you look above, this is proven untrue. If it is specifically meant to “produce the same AcoustID for lossy audio”, it does not do this. The release noted in that portion is proof of this.

Honestly, I like how AcoustID is working, now that I am testing it myself. It is quite nice, and I actually would like a tool to generate this ID, in GUI, locally, to generate these myself. I know there is a tool, a fingerprint GUI, but it only generates them to submit, not save local. I might need to create this myself.

I mean not to argue, I am simply discussing what I am finding in the process of trying to add these releases. What I have discovered is not at all what I was expecting and understood to be true.

I am still trying to think and wrap my head around all this, so any input is welcome. Just because I am generating results does not mean that these results are “scientific”. I am trying now to determine what exactly causes the fingerprinting process to see a WAV different from MP3. The files in the initial post are both DJ supplied, WAV used to make MP3. The MP3 is of reasonable quality, only using the soft 16kHz filter. If it used the hard filter, I would be more confident as to why the difference. I know that @IvanDobsky commented above that in his looking, the two fingerprints looked nearly identical, only small variation. If I recall correctly, the artist who produced this release uses Serato, so quality of product should be there.

Again, I mean no disrespect here. I am simply doing what I said I was going to do, try to add some of these releases. Given that, I can only share what I see in the process.

I will be using the following parameters for MP3 creation from WAV, using LAME encoder, next:

-m j -V 4 -q 3 -lowpass 20.5

That should produce good sample for testing.

It does in many cases. There is always that threshold where things start to become just different enough to be considered different. But that’s exactly why you can attach multiple AcoustIDs to a single MB recording.

And you can’t really say each different AcoustID gets a different MB recording if it is only the format that’s different. That does not work with all the music people have ripped and try to identify. If I have a CD and I rip this as 128kbps MP3 maybe this might maybe produce a different AcoustID than a higher quality rip. But it still needs to be associated with the recording of that CD of course.

3 Likes

I 100% agree with your statement. This is why I am discussing what I am seeing and finding, and thinking it through. I also want to look at using a CD, 16 bit FLAC and a 24 bit FLAC as sources for the tests. This obviously is after confirming that the three sources are proper and not upsampled.

I am seeing something that defies what I was told and used as my understanding of the workings of this. It seems that you also have the same as I did at the start. Something in that original understanding is not correct though. Comparing a 320 MP3 to a WAV should not differentiate, if the assumption is correct. However, I have proof that it does and can. Now, my issue is how and why.

@outsidecontext If you do not mind, please have a look at my post on this above. I am also happy to share the files of the release. I only request that the sharing be private, it is not up to me whether to share public or not, so I err on safe. This comparison was not like the 128 MP3 to lossless as you describe, I also would expect a difference there.

For what it is worth, these are the versions I am using for test:

  1. LAME 3.100
  2. FLAC 1.3.4
  3. M4A - hard to say exactly, using the latest QAAC with CoreAudio from approx 1 year ago.

The others are sort of irrelevant, OPUS is not common for non web streaming, so that is out of scope.

EDIT: I wanted to add some other thoughts…

  • The encoder settings are mastering steps. They have great impact on the result. The settings are never a “one size fits all”, if you want the best you can get.
  • There is a (or can be a) tonal difference between the MP3 and M4A compression. I wonder if there is impact on AcoustID due to this. I am unsure.
  • Using CoreAudio to create M4A files will produce a different result than using say libfdk_aac. They are different M4A files, no different than MP3 vs M4A.
  • FLAC, while lossless, is not all the same. It depends on the source, and this applies to WAV as well. I can make a lossless at any point in the process, and lossless means only lossless to the source. Thus, the source is a critical component, which often causes a difference in a 16 and 24 bit FLAC.
  • I have seen CDs that are not true 16 bit audio, so they actually differ from a true 16 bit FLAC.
  • Editing iterations also have impact. If proper head room is not there, lossless starts to create loss in its result. This is a result of poor mastering, or a result of a limited source.
  • The human ear, for most of us is only up to approx 16kHz, the logic of the MP3 cutoff. However, what is not considered is the result of not having the higher cutoff. Simply chopping the audio at a freq is nasty and can be heard, thus the head room need. I would expect this to also impact the AcoustID.

For those who may not follow along with my words, I wanted to find a good picture of this issue, here is it:

You can see how head room (db) relates to the source, ie 16 vs 24 bit.

Noting… every iteration removes head room. It should also make clear that a sound difference can in fact be heard over a speaker between 16 and 24 bit audio.

1 Like

It’s definitely the exception, not the rule. But as I wrote above it’s all mathematics, and given enough difference you can of course reach the point where the difference is just beyond the threshold for two fingerprints to be considered different enough to get different AcoustIDs.

It can very likely also depend on the kind of audio. The lossy compression might introduce more artifacts in some audio than in other.

I took a couple of albums and tested. The majority of my music collection is sourced from CD and I have it then both as a FLAC file and a lossy file created from the FLAC. Lossy files are mostly MP3, some OGG. There are occasionally some lossy files I encoded ages ago with lower bitrate that I never bothered to re-encode in higher quality.

The majority of examples I tried produced the same AcoustID for the lossy and lossless file (both for OGG and MP3). I had two albums that where both encoded OGG files with only 160 kbps. There for 5 of 24 tracks I got different AcoustIDs.

One example is https://musicbrainz.org/recording/251aeb66-069f-4181-be78-9656ac9c716e . The AcoustIDs are:

FLAC: Track "08ec24a1-5259-491b-a302-592b14b14851" | AcoustID
OGG: Track "b9aa51ef-6435-4a4c-8ccb-57ba12bc2af4" | AcoustID

Comparing the two most used fingerprints for each of those AcoustIDs shows they are still really similar: Compare fingerprints #37860991 and #79994161 | AcoustID

Another one https://musicbrainz.org/recording/60e94685-0481-4d3d-bd84-11c389d9b2a5 :

FLAC: Track "abc2c313-0230-47a1-9d1e-e1d75ac7de95" | AcoustID
OGG: Track "9cf386ac-8977-4829-ae50-70f52639bab0" | AcoustID
Comparison: Compare fingerprints #43461792 and #10544809 | AcoustID

But fingerprint lookup still worked for both the OGG and FLAC file because both AcoustIDs are linked to the relevant recording. That means there is the automatic, algorithmic way of deciding whether two audio can be considered the same recording, but given enough difference due to mastering / encoding this can fail. The linking to MusicBrainz recordings as a manual step can take care of that.

I think my point just is that if you want to use AcoustIDs to tell apart different encodings of the same source audio you are looking at the wrong tool, because it was specifically designed to not do this. If it would be overly sensitive with small encoder changes and consider each differently encoded file as a different recording it would stop being useful for its intended use (identifying a users music collection).

2 Likes

Absolutely, this is not the thread for that discussion though. My example used MP3 and WAV, which are both basically standard. Might I share the files I have with you? I respect and appreciate the opinions others can offer, as the intent is not to be right or wrong, but to reach an acceptable conclusion.

Yes, I agree. However, if I have 10,000 numerical examples, changing one makes a different result, as mathematics is not forgiving.

I need to look at your examples, which I will do. I am not sure who uses OGG anymore, but I can generate and test them all the same.

Thank you kindly for providing such a detailed response. I would appreciate if you, and others, might help me identify the core “issue”.

1 Like

I am still unclear on what your issue is sorry!

Multiple AcoustID’s being created for the same track don’t harm the goal of matching your songs to a recording, as long as they are all attached to that recording in MB.

If AcoustIDs are matched to the wrong recording, then they should be un-matched.

2 Likes

It is all good friend. I am in a private chat now trying to sort this out. Again, I mean no ill will, I am trying to make sense out of what is in front of me. The issue is that what I (and it also seems you) have understood is not exactly correct. The AcoustID system is quite nice, so there must be some variable in here that I or others are missing.

What I understand is: a recording can have multiple DiscID’s, and anything that changes a waveform (if it goes outside certain thresholds) can cause a new DiscID, sometimes including transcoding a file.

This seems correct to me?

Yes, seems correct. I was not under the impression that a lossy file derived from a lossless file could cause a change. Especially if that process is done proper, meaning that the lossy file was done with top quality encoder settings.

It was a shock to me to see a WAV and a MP3, provided to me directly, generated different IDs. My question here is why? What is it that caused this? What I can say at this time is the point can be proven with real files.

I have a few ideas in my mind, things such as the type of audio. It is well possible that the compression of certain types of audio might change the waveform enough to cause this. For me personally, I see a major difference. The MP3, even with top settings, does not produce the same result (as measured) as the source lossless. The M4A however, can produce this under specific criteria, mainly the use of QAAC and a recent CoreAudio back. Opus is also capable, again as it relates to the waveform itself, the truncation, loss from compression, etc.

Please see I have no interest in a hypothetical debate. I have now provided the files in question for another set of eyes to see and review. It is very possible I am missing something, or have made some sort of error. All I care about here is to understand what I have in front of me, and why the end results in a different than expected result.

It is also worth noting… there is major compression here. The WAV file is 80MB, the MP3 is 9.6MB. That is a lot of loss of data. The duration is 3 minute and 46 second for reference.

I also want to add that yes, I can hear the difference. I am not using anything special, but a cheap set of Numark HF125 headphones. Great headphones, but not at all top line.

I am starting to think this is an uphill battle with no end…

I am looking at another collection, DMC Commercial Collection 422. I can share more detail if anyone (or all) have interest, but I will summarize only here.

I am finding that trying to determine a name and artist for a recording is difficult to define many times. Even when it comes to an artist… Example, URBANHEADZ… they list themselves online as:

  • URBANHEADS
  • Urban Headz
  • Urbanheadz
  • UrbanHeadz
  • TheUrbanHeadz
  • etc

No need list all, you get the idea. I have no idea how to determine what is “MB proper” here. For this example, on the release, the DJ artists are all in all caps (URBANHEADZ) and the artists of the material being mixed are in “standard” caps (Destiny’s Child, Cassie). However, although this is sort of in violation of MB policy, I would not rely on the artwork on these releases for true accuracy. Example, this release has typos even on things like “Vs” being printed as “Js”. While MB would normally use that typo as that is how it was released, my point is that the printing, while generally accurate, cannot really be used for style.

Another example is the title vs artist fields. On this release, I have a recording titled “Camila Cabello Fe. Young Thug & Friends”. The artist being DJ Martin Pieters. Now, the actual recordings here are “Set Fire to the Rain”, “Havana”, “What Lovers Do” and others. This differs from another recording on the release titled “Waiting On Beautiful (Walking In Vain Vs Perfect)”, by “Bob Marley & The Wailers Vs Ed Sheeran”, mixed by “BERGWALL”.

If there is any interest in trying to make this MB friendly, that would be great. For my purposes, this is all just fine as these recordings are special purpose. They are also however great for casual listening as it brings a nice variation to what you normally hear on the radio or other.

for capitalization and the correct artist name, my general rules of thumb (in order):

  1. use whatever they use on social media (Twitter, Facebook, the artist’s Bandcamp, SoundCloud, etc.), if applicable. for example, a few recent artist name edits I’ve done (1, 2, and 3). obviously, if they’ve got a silly name on Twitter, like deadmau5 has right now (all hail the goat lord), don’t use that… :wink:
  2. if they have a website, you might be able to get a spelling from a bio or something. either an artist-created page or a label’s page could work.
  3. if they’re consistent on official releases by the artist, I might use that (probably very rare). I know @aerozol did this for PIG//CONTROL just recently (also probably from their Bandcamp)
  4. if you can find them in another database (i.e. Discogs, RateYourMusic, Wikipedia/Wikidata, etc.), you could use the same name they use there.
  5. otherwise, you’re probably left with trying to figure the most common spelling with whatever you’ve got.

a couple helpful pages from the Style Guideline: Titles and Artist

of course whatever artist name you choose, be sure to add the others as aliases, that way others can find them with other spellings. I don’t think they’re case-sensitive, so you wouldn’t need to add “UrbanHeadz”, “URBANHEADZ”, and “Urbanheadz”, and I also think the search ignores punctuation. I believe all the examples you gave above would be “artist name” aliases, as “search hint” aliases are for stuff like misspellings, alternate character encodings, and the like. see also the docs on Aliases.


I’m not sure if I understand the second part of your question, about the title vs artist fields… are you talking about a mashup release where the mashups are credited differently? (some have the original titles in the name and some don’t?)

a pic or two would be helpful in this case, if you’re able~

2 Likes






1 Like