Loading big releases when scanning files

Pha3drus · June 1, 2020, 12:56pm

More of an observation for discussion than a request / complaint. I just did a scan on just over 300 tracks by Louis Armstrong. I didn’t look at a clock, but it took maybe an hour to load all the results. The main reason was that so many of the tracks were found on some really big box sets, including “The Ultimate Jazz Archive” (3201 tracks), and a couple volumes of “The Encyclopedia of Jazz” (Over 2000 tracks each), along with a multitude of smaller hundred+ track collections.

I know there is the slider to limit the number of compilations, but I am loathe to turn that down much more than I have. It appears to affect results when scanning legitimate compilations.

It seems wasteful to pull all that useless data from the server, and it’s very frustrating to wait through. I wonder if I’m the only one who thinks this way. I’m not sure how best to address this in software, but I did have a thought or two about it.

hiccup · June 1, 2020, 1:11pm

This may be related to Picard looking for advanced track/release relationships.

Do you have those activated in Picard’s settings?
Are you using plugins that make use of relationships? Such as e.g. genre plugins?

What you could do is disable these settings and plugins when looking up a new release that was not matched before.
A good chance it will be matched much faster.
After MB’s ID tags have been written, you could run it through Picard again with the relationships and plugins activated.

For this purpose I am using separate portable Picard’s myself.
One that is ‘clean’ and is used for ‘first contact’.
Another one with all bells and whistles activated.

(well, I am using a few more, but that’s off-topic )

Pha3drus · June 1, 2020, 2:04pm

Unless I am misunderstanding how things work; the matching was quick even with the advanced relationships plugins. Picard populated the right pane “loading album information” within a minute or two. The 300 or so tracks generated over 20,000 requests as indicated in the status bar of Picard. This was what was taking the time. (I had exactly 1 match on the “The Ultimate Jazz Archive.”)

I work in Picard episodically, and sometimes there are several months between episodes. I had started down the road you are describing and after the third page of process documentation; I abandoned it. I much prefer a single configuration of Picard that meets all my needs, with additional tweaks that don’t affect my core requirements. My singular tagging script with it’s abundance of IF nests is testament to that. Of course, I haven’t really gotten into the Classical yet, so this may change.

I had been using a batch menu to copy ini files around based on usage, but separate portable installations sounds like a really good method.

Deleted_Editor_2136044 · June 1, 2020, 11:56pm

I’ve managed to reproduce the issue. 3 songs resulted in 7k requests and the screen completely froze while processing the queueing of the requests. Taking a look at it.

Update: There seems to be two different issues. First one is related to the wikidata plugin, that freezes up the screen while it seems to be processing all tracks from the album, even though they might not be present (which is a problem for 3k file albums). Second issue is related to the AcousticBrainz Tonal-Rythm plugin, which generates a ton of requests.

Update 2: Worst workaround ever, but should work in most cases even though it breaks the listing of missing files of the album. Two lines to skip processing of entries that don’t match the title of the loaded files, in Album()._finalize_loading(). The right way to mitigate would be skip/delay plugin processing for entries that have not been loaded.

file_names = [file.metadata['title'] for file in self.unmatched_files.files]  # Here

track_count = medium_node['track-count']
if track_count:
    tracklist_node = medium_node['tracks']
    for track_node in tracklist_node:
        absolutetracknumber += 1
        if track_node['title'] in file_names:  # And here
            track = self._finalize_loading_track(track_node, mm, artists, va, absolutetracknumber, discpregap)

Pha3drus · June 2, 2020, 12:55am

Thanks for the confirmation. This wasn’t necessarily a problem for me, as I just started the scan and monitored the status as I did other work. Although this did slow down my music process. I’m more concerned about resource utilization of a free service, that I would like to continue. My 20k + requests, most of which I ignored, was 20k requests that could have been used more efficiently

I cannot confirm the screen freezing, as I am not using the wikidata plugin. I had issues with it in the past, and am waiting for it to be fixed. It seems it is still causing problems. (I don’t remember details off the top of my head, but I’m sure the thread is in here somewhere.)

I am using the Tonal-Rythm plugin, and just going from memory, it seems about triple the number of tracks in all found releases. Normally, this isn’t a big deal as most of my music is not found in large compilations.

Pha3drus · June 2, 2020, 12:57am

I was crafting my reply while you were posting the code. Where does this code go? It doesn’t look like a tagging script.

Deleted_Editor_2136044 · June 2, 2020, 1:01am

My bad. I’ve taken a look at the plugins code, but they process files individually, so the source of both issues are on Picard’s side. The source file on the Git repository is /picard/album.py.

Update: Forget the previous code. This one for Album()._finalize_loading_track() doesn’t break listing of missing files, but is way slower (due to UI updates, but that is a different issue). Still working on calling the plugins as more files are added to the album, because it assumes plugins are always called during the album setup.

 file_names = [file.new_metadata['title'] for file in self.unmatched_files.files]  # Here
 if track_node['title'] in file_names:  # And here
     # Run track metadata plugins
     try:
         run_track_metadata_processors(self, tm, track_node, self._release_node)
     except BaseException:
         self.error_append(traceback.format_exc())