Deleting "not so useful" entries in AcoustID database?

Currently (mid of February 2025) the situation with AcoustID server is really great:
It takes only a few seconds until new submitted fingerprints show up on MB!
Thanks again for your effort @lukz ! :+1:


Perhaps you will find some time to look at these “not so useful” (yellow highlighted) entries at some point? I currently see no reason why such entries should not be deleted.

We had some conversation starting here.

That “not so useful” isn’t that bad as “Code of True Hustlaz” is track 3 on that Lil Milt album. So still works as good confirmation.

3 Likes

How do you know that this “Track03” is the track “Code of True Hustlaz” from “Lil Milt” on the album “The Prophecy”?
I assume you derive this info from the other “really useful” entries?

Imagine if the other 3 entries (1 above and 2 below “Track03”) did not exist, how useful would this Unknown line still be?

AFAIK recordings can only be linked from AcoustID to MB if the Title, Artist and/or track length are known or I’m wrong?

No, recordings can also be linked to AcoustID entries that don’t have any of these metadata entries. The best source for metadata are the linked recordings.

It’s also possible to submit AcoustIDs without a link to MB but with user provided metadata (it was also possible to submit them without any metadata at all, but luks has announced to stop this). AFAIK the intention between user provided metadata is to allow submission independent of MB mapping and still allow the AcoustID service to return some metadata.

Picard does not make use of this metadata, it neither uses it for tagging nor does it submit it. I think someone suggested that Picard should be able to tag files with AcoustID metadata without linked MB recordings (so just applying the data from AcoustID without matching to MB), but I can’t find a ticket for it.

3 Likes

I don’t think that data is what is slowing down AcoustID; its all of the junk fingerprints that have no metadata that were and I believe Lukz nuked them a little while ago as we all agreed that those have little value.

4 Likes

I based it on the fact that other acoustIDs for this are labelled as “Code of True Hustlaz” and this is the third track on the album “The Prophecy”

In the same way that we literally see four DIFFERENT spellings of the track name in your screen shot. With and without brackets, with a “feat:” or a “Feat.” It seems logical they are all the same track.

Also remember that the list you are looking at down there is not submitted from Picard.

There are other outstanding issues that would be more useful to fix than worrying about some tracks being called “Track 3”. But then we had this discussion before in the other thread you link.

2 Likes

Another “not so useful” example:

I’m not sure if “that have no metadata” is the same as has “- - -” as metadata.

no he meant entries in the acoustid database that have NOTHING that fingerprint obviously has something going by the 6 rows above it.

I really dont think removing this data will make any difference to the performance of the system.

Yeah, the database is currently overgrown with fingerprints that server really no purpose besides having an “ID”, there is no information whatsoever. The only purpose these fingerprints were allowed was to allow de-duplication workflows even if there are no metadata, but it turns out this was not worth it.

Things like the Track03 example are going to stay, they are useful, even though their value is very small.

AcoustID was a whole is at the same time a miracle and a disaster. :slight_smile:

It’s a miracle, because for the last 6 or so years, I barely even touched it and it keeps running, still handling things just fine, even though it was never designed for the scale at which it is right now (mostly because of those useless empty fingerprints).

It’s also a disaster, because I couldn’t really focus much time on refactoring all the changes, so it’s currently a mix of very old code and small pieces of new code, there is no coherent solution. The majority is 15 years old code untouched for all those years.

Thank you for the reminder, getting MBID merges resolving working again should be a priority and figuring out why the submission status endpoint doesn’t work as well, so I’ll try to tackle that soon.

13 Likes

And one thing, in the past, I had a bot that was actually unlinking obviously wrong recordings from AcoustIDs. Most of the top songs have many recordings linked to them by user error, they are completely wrong and it’s obvious. If there is going to be any work on removing useless metadata, it’s going to be reviving this bot, making it smarter, maybe even using some of the LLM APIs are they are fairly good at evaulating if something fits in there or not.

12 Likes

The most recent version of EAC (or maybe of its AccurateRip plugin?) started submitting AcoustIDs of the tracks it ripped.

So I think we may expect more entries without metadata. I think I’m guilty of several at least. Not everyone is going to fetch metadata in EAC before ripping.

1 Like

Why should this be useful?

Found a possible answer in their forum:

It doesn’t do anything directly for you, but it helps the AcoustID project (in combination with MusicBrainz): https://acoustid.org/

For that, the music fingerprint is taken and together with the MusicBrainz DiscID (CD hash code) and track number send to the AcoustID server (who provides music fingerprint recognition for free, the data is public domain).

No personal information is collected nor transferred - only the hash code of the musical information of a track and the DiscID which is also used for MusicBrainz metadata retrieval.

That’s actually interesting, because it could be used by Picard for matching. So far Picard only uses an AcoustID if it is linked to a MB recording.

But if EAC submits the disc ID, Picard could probably use this information to lookup by disc ID and then find a match. That EAC also submits the track number can be an additional matching help.

I’d need to see how the submitted data shows up in AcoustID API responses. Anyone has an example submitted by EAC?

3 Likes

I think I first noticed it when ripping this CD: Release “Międzyczas” by Odium Humani Generis - MusicBrainz

Example AcoustID entry for one of the tracks: Track "b2a2e23f-e49a-44a8-a705-d43954e9ee30" | AcoustID

I could provide you full EAC log file, but it doesn’t say anything interesting, just:

---- AcoustID Plugin V1.2.0

Submitting 8 results to AcoustID

AcoustID successfully submitted

Is this helpful information?

Edit: maybe it’s also useful to say that the CD was ripped on 2024-11-30, around 23:24 CET.

3 Likes

AcoustID can’t receive discids, so I guess what EAC is doing is:

  1. discid lookup on MB
  2. get mb metadata based on the discid lookup
  3. submit fingerprints with mb recordings to AcoustID

If it works like that, then it’s very useful.

5 Likes

I suspect this only works if you are ripping to individual tracks instead of “image & cue sheet”. In any case I need to bring down the new version and have a look at it.

A post was split to a new topic: Embed the associated AcoustID along with the fingerprint into my rip

I’m still not convinced that such yellow highlighted entries are useful:

@IvanDobsky Would you say this track is #22 or #10?

@InvisibleMan78 I think they are undoubtedly not very useful (though a filename of Track04 can give some hint about the track number). But I think the reason that not many show much enthusiasm to remove them is that they are not a big problem either.

The entries don’t seem to be a problem for the database. They also don’t affect matching with Picard. They can be easily told apart by humans looking at the page.

So the entries are just mostly useless, but not really harmful. Given that the AcoustID server is a one-man project and there are definitely other things much more important to do it’s understandable that this doesn’t get that much priority.

6 Likes

Thanks for your explanation.