Embed the associated AcoustID along with the fingerprint into my rip

dashv · February 17, 2025, 2:48pm

While slightly off topic, I would like to embed the associated AcoustID along with the fingerprint into my rip, I have not figured out how to get the AcoustID for my fingerprint. The reason for this is I am starting to do post processing on my large collection I want to look for duplicate recordings along with checking for not so obvious live recordings along with other stuff I think up. As of now only the fingerprint is embedded.

outsidecontext · February 17, 2025, 3:02pm

That’s definitely a different topic, I have moved it

If you want to get the AcoustID for a fingerprint you need to call the AcoustID API, see Web Service | AcoustID

Specificall you use the https://api.acoustid.org/v2/lookup endpoint with the fingerprint and duration parameters. It will give you a list of matching AcoustIDs (most often one), with the closest match first. You can also get additional (MB) metadata.

Of course you can also use Picard for doing the above an adding the AcoustIDs to your files If you want to know which tags Picard uses to store the AcoustID see Appendix B: Tag Mapping — MusicBrainz Picard v2.13.2 documentation

dashv · February 17, 2025, 3:07pm

Thanks, looks like I will be busy today.

IvanDobsky · February 17, 2025, 3:28pm

I also like having the AcoustID in the recording tags. I do this with Picard by dragging tracks to left, cluster, do a scan, then reassemble back on the release in question.

Due to my matching settings a scan does not always put recording back onto the exact tracks they came from. So just throwing bulk albums in and hitting Scan doesn’t tend to work.

Maybe there is a more efficient way to do this, but this at least works for me in a clunky way to add AcoustID and fingerprint to the files.

dashv · February 17, 2025, 4:17pm

I have a pretty specific workflow which only uses picard as the last step to tag the files. I have a process that embeds the MB releaseID tag into the initial ID3 tag for picard to pick up so I am always guaranteed of a correct match. The last step of the workflow is to tag the MP3’s with picard so that the album/recording data is embedded into the MP3. Since I have such a large collection I learned that I never want to do the audio acquisition (rip, audio capture, convert LP/cassette, etc) a second time, I always keep the original capture along with basic metadata (in multiple places) so that I can always recover. Long ago I started with MythTV(music) then gave it up and started to look around for something else but quit that search years ago. I have now gotten to the point that my collection is mature (as am I) and want that “Pie in the Sky” system to play and display my music.

All that gets us back to the original post of embedding information into the tags so that post processing or a media player has the required info it needs. The fantastic thing about MusicBrainz is the wealth of information within it that keeps growing, and it just becomes a simple matter of using picard to “refresh” that information. That is how I use picard.

IvanDobsky · February 17, 2025, 4:37pm

My workflow is also pretty stubborn and fixed.

(Optional first step uses Picard to add a DiscID to my freshly added MB release I have just manually typed out out due to it not being in the database)
EAC rips CD, using the MusicBrainz plugin to tag the initial track names.
While the CD is in the machine ripping I fire up Picard and do a CD lookup. Selecting the exact release version.
Once ripped, drop files into Picard, do a SCAN. This may or may not find the correct release match. Especially unlikely if a compilation. But it is done to get the AcoustID.
Depending on accuracy I may have to only move a file or two back to that correct release.
Even if it was a messy compilation it is still a quick action to drag all tracks back to the left, fresh cluster, remove the mess on the right, and relookup that CD.
Manually drag and drop the files from left to right. Now these files have AcoustIDs ready to go with the rest of the Picard lookup data. Hit save.
While the above is happening the scanner is also kicked into action for the artwork and booklet.

The bonus of doing this is the DiscID is also embedded in the tracks. So I know which CD something came from.

As to “Pie in the Sky” systems - currently I have KODI tied to a Home Automation system with VPN access from my phone to give me my own home grown version of Spotify when I am out.

The main use of the Home Automation system is to control where audio is sent around the house.

The key to flexibility is the quality tagging from Picard.

candude43 · March 20, 2025, 9:22pm

I have the same question for the same reason. AcoustIDs are handy for de-duplicating.

I found that if I scan unclustered files, and then drag the results from the right panel back to the left panel unclustered files, I can save them there retaining the AcoustID and fingerprint without altering the other tags.

I was hoping to find a method that does this directly without pulling down other data that I’m not going to use. Jaikoz could do this, but I don’t want to buy another version just for that.

InvisibleMan78 · March 21, 2025, 10:10am

Could you please explain this idea with a litte bit more details?
Do you say that tracks with the same AcoustID are “duplicates” in any case?

dashv · March 21, 2025, 1:19pm

I cannot reply for @candude43 but “deduping” is the process of identifying “unique” duplicate data (in this case recordings) for removal to cut down on wasted space. I have seen other references in this form and other forms from those that want only one copy of a recording/song. That could be one per artist regardless of uniqueness such as live, alternate take, mix, other reason, or just duplicates from complications or other re-releases. We all have our own reasons for wanting to know and maybe store this information.

ernstlx · March 21, 2025, 4:49pm

I think, you don’t have to use “unclustered files”. Clustering does not change the tags.

But I would rather look for a method to automatically update the files with MB information. You can “keep” certain tags and modify others with scripts to suit your needs. Even complex scripts with many very specific rules have no negative impact on performance.

eloise_freya · March 23, 2025, 12:23am

Am I misunderstanding here? Surely you don’t need to drag tracks to the left panel in Picard, you can select a track, release or more and right click and select to calculate the AcousticID… then save to write to file tags.

Forgive me if i’m missing something!

dashv · March 23, 2025, 1:39am

The original post was to get the AcousticID stored along with the fingerprint. Which happens if you use “scan” to match the files. Scan calculates the fingerprint then looks up that fingerprint to get the matching AcousticID. If you already have your files tagged outside of Picard but want to use Picard to get both AcousticID and fingerprint but change nothing else is what I think @candude43 was talking about (kind of a brute force solution).

For me, the OP, I do not use “scan” I have a procedure that adds the ReleaseID to the file for Picard to pickup and use, very fast accurate operation, and I can literally tag thousands of files real fast since there is no scan. I then create, save and upload the fingerprint, but the Picard create fingerprint operation in the right panel does not do a lookup so there is no AcoustID to store. I need to create a plugin (or modify existing one) to use the fingerprint to lookup the AcoustID.

dashv · March 23, 2025, 6:36pm

I just found this @IvanDobsky reply that explains more. I need to think this over, I have over 105,000 files in over 8,000 albums .

sound.and.vision · March 23, 2025, 8:51pm

but is it not just easier to have Picard do its normal thing, store the Recording MBID to each track and then if you’re that concerned find duplicate MBID’s instead?

dashv · March 23, 2025, 10:11pm

Actually no. Picard is great at doing a lot of things but I found long ago I had to spend too much time messing with it to get the correct album match. Maybe its better now, but long ago I developed my routine which guarantees me the correct album match 100% of the time without having to do a scan (and it scales to big numbers). Its only in the last few months I saw I did not have the AcoustID and started to explore how to get it. Now I know how things work. I only wanted it in the tag so I could extract it and put it in a DB. I just finished playing with the AcoustID API today and see how easy it is to get it and the things I need to worry about when there are multiple tags. (Funny thing I fixed some wrongly entered IDs in the process). I do almost everything with scripting and keep all my logs. All I need to get the AcoustID is the fingerprint and track length and write a simple curl script for the query. I did find it is important to use the actual track length of the file you submitted the fingerprint from to get “your” ID, a few seconds difference can give you a close match. In any case I am not going through the scan process for 8000+ already processed albums and I am not changing my process. It was a good learning process and I learned a lot. Thanks to everyone who replied and even more to @IvanDobsky for the post of his that I finally stumbled onto.

IvanDobsky · March 24, 2025, 2:09pm

Not really the same. A Recording MBID can have multiple different versions of a recording. A good example is an original CD release, and then the remastered one. Especially when remastered louder. You can end up with different AcoustIDs on two files, but the same RecordingID.

ernstlx · March 24, 2025, 2:45pm

I would not recommend this either. I’m in a similar situation, but I’ve decided to only determine AcoustIDs for newly added releases and otherwise only on a case-by-case basis. Usually I don’t need them and if I need them for distinguishing recordings, I determine the acoustIDs for the release in question.

And I submit fingerprints for all new, correctly assigned releases to increase the count on the good ones.

dashv · March 24, 2025, 3:19pm

I am not going to do it right away and I enjoy writing new stuff to add “dimension” to my collection. If load on the AcoustID server is the concern I always take this into account by doing my own rate limiting and would probably do a request once a minute which would only take 7 days to complete. As to storage, I am well over 20TB in capacity and am working on cutting down on the insane number of backups I have.

I have lots of thoughts on what to do with my collection other than just listening to it. I was intrigued by the project out of Spain that was canceled, I had set up routines to process and submit the results and was half way through when it got canceled. I will always help out on those kinds of projects.

IvanDobsky · March 24, 2025, 3:37pm

If you do a plugin (or other script) that selectively adds AcoustIDs, I’d be interested in having a look at it. I’ve often wondered about a tool of some form that can insert just one tag without disrupting everything else.

Data I add now is very different to what I added when I first started to use Picard, but don’t fancy “fixing” everything. Somethings AcoustIDs would be good in. Especially to spot how the “same” recording changes

I’d also likely hack the same code around a bit and see if I can make it into a tool to update just Genres. (but that would be a little OT for this thread)

dashv · March 24, 2025, 4:11pm

I looked on git at a plugin that exposed the AcoustID scan function but there was too much other crap around it and it did not look like it was supported any longer. After playing (manually) with the AcoustID API yesterday for a couple of hours I decided the best way to just use the API to grab the ID. Last month I tested out a quick one-liner to grab the track length out of the MP3 file and may do the same for the fingerprint. Since I always upload the fingerprint, my testing tells me that I can almost always get “my” ID if I use the file that I uploaded the fingerprint from to source the query data for the API call (duration/length & fingerprint). Linux or in my case cygwin is your friend, I install cygwin on all PCs I touch. I will most likely use wget in a bash or more probably perl script to make the API call since it is a one time deal and a full REST script using curl is overkill for the project.