Script to create an album genre from its various recording genres

I’m trying to create a script to coalesce recording genres into a more general, short list of album genres. For genre sources, I’m using both the Musicbrainz & Wikidata’s servers. My setup options roughly follows @hiccup here.

Using @rdswift’s Persistent Variables plugin, I have successfully created a persistent genre variable that has combined all the genres, but it’s not what I expected. Here’s my code snippet:

$noop(___coalesce genres across tracks___)
$if($lte(%tracknumber%,%totaltracks%),$set_a(_common_genre,$get_a(_common_genre); %genre%))
$set(genre,$if2($get_a(_common_genre),None))

Let’s say five tracks in an album each have unique genres: A, B, C, D, E.
After running that code, the genre for each track changes like this:

  1. A to “; A; B; C; D; E; A”
  2. B to “; A; B; C; D; E; A; B”
  3. C to “; A; B; C; D; E; A; B; C”
  4. D to “; A; B; C; D; E; A; B; C; D”
  5. E to “; A; B; C; D; E; A, B, C; D; E”

The new string starts with a semicolon because the _genre_common variable is empty for track 1. Can someone explain what’s happening beyond that? I expected to see “; A; B; C; D; E” for all tracks. It’s as if _genre_common is not static in that final line of code. When I look at the persistent variable via the context menu function, all tracks show _genre_common to equal the lengthy genre value found in the last track.

I can clean this up by making track 1’s ‘genre’ common and achieve my goal, but it’s messy as it stands. Any help would be most appreciated.

2 Likes

A couple of comments…

I think the $if() in the first line may not be necessary because %tracknumber% should always be less than or equal to %totaltracks%.

It appears that the script is being executed twice, which explains the growing list. To ensure that each item is only added once you can create a normal variable when each track is processed and only append the genre if the variable is not set. Something like:

$if(%_processed%,,$set_a(_common_genre,$get_a(_common_genre); %genre%))
$set(_processed,1)

Another, perhaps better, option would be to use the $unique() function to remove any duplicate entries from the list. Something like:

$if($and(%genre%,$not(%_processed%)),
  $setmulti(_temp,$trim($get_a(_common_genre); %genre%, ;))
  $set_a(_common_genre,$unique(%_temp%))
  $set(_temp,)
)
$set(_processed,1)

Note that I left in the %_processed% check to save a few processing cycles if the script was run more than once, although it could be removed and should yield the same result. I also used the $trim() function to get rid of any extra semicolons and spaces at the ends of the list.

Finally, considering how tagging scripts are processed (see the processing order in the Scripts documentation), I would actually separate the aggregating of the genres and updating the %genre% tag into two separate scripts. The first script would be:

$if($and(%genre%,$not(%_processed%)),
  $set_a(_common_genre,$unique($trim($get_a(_common_genre); %genre%, ;),,; ))
)
$set(_processed,1)

I shortened it a bit from the version above by removing the temporary variable %_temp% and feeding the semicolon/space separated list directly to the $unique() function.

The second script (located after the first script in the list of tagging scripts) would be:

$set(genre,$if2($get_a(_common_genre),None))

Hopefully this helps explain what you’re seeing, and how you might be able to get around the duplication.

2 Likes

Thanks for the great pointers!

I don’t want to use $unique() early because I would like to count the occurrences for each unique genre entry. That way I’ll be able to trim the genres that have the fewest counts so that the final tag has only the most relevant genres.

For example (from B-Tribe’s “Suave Suave”) here are the genres followed by their counts of occurrence across all tracks:
Techno - 12
New Age - 12
Electronic - 12
Ambient - 3
Chillout - 2
Club - 2
Dance - 2
Latin - 2
Downtempo - 1

So the final album genre here should be “Electronic; Techno; New Age”; and I’d like the script to sort that out.

Thanks for the link to the processing order, as I forget the particulars after long lapses between new scripts.

I took your second batch of code, but removed $unique() per my comments above. I pasted the code into a new first script (Script A):

$if($and(%genre%,$not(%_processed%)),
  $set_a(_common_genre,$trim($get_a(_common_genre); %genre%, ;))
)
$set(_processed,1)

In a new subsequent script (Script B), I set “temp” to _common_genre to take a look. It came out the same as it was in my OP - a growing string from track to track all starting with “A; B; C; D; E;”. The _common_genre variable becomes again: “A; B; C; D; E; A; B; C; D; E”
:thinking:

Edit: completed code paste

That’s because you didn’t include the final line of the script that sets the %_processed% variable:

$set(_processed,1)

Sorry, it is there. I just neglected to copy/paste it here.

It must be something in the way that you’re running the scripts after the persistent variable has been set. It doesn’t occur here in any of my testing. Are you running the script manually or multiple times, or are you refreshing the album and re-running the script? Are you running the script in the clustering pane or the album pane (or both)?

Try adding (yet another) new tagging script ahead of everything else like:

$unset_a(_common_genre)

That should ensure that the persistent %_common_genre% variable is cleared before each new processing run.

EDIT: You can also get an indication of how many times the aggregating script has run by changing the line:

$set(_processed,1)

to:

$set(_processed,$add($if2(%_processed%,0),1))

“Suave Suave” is my test album. It has previously been tagged by Picard so it opens into the album pane automatically. Most of my albums have been so tagged and this script is for re-tagging the genre tag exclusively. I’ve created an alternate “Option Profile” for this purpose.

The new profile has the following options enabled: Preserved tags list; File Naming; and, Scripting. Genre is not one of the preserved tags. File naming has been disabled for this profile. Scripting is very basic - just the code we’ve been discussing.

My previous tests were run by pressing Ctrl-R with the album already loaded. Today I loaded the album into a fresh instance of Picard, and the result was the same as I reported in my last post.

I entered the 2 new lines of code you just suggested (BTW, I had tried the unset_a command on _genre_common prior to my OP without success). I placed the unset_a command at the start of Script A. The new _processed command replaced the old one.

The result is that there is no aggregation of the genres now - the tracks genres are unchanged. The _common_genre is now genre “E” per my example above. And _processed is “1”.

I really appreciate your help with this, but as I said before, I can work with the messy result from before if I have to. I don’t want to take up too much of your time.

Edit: Removing the $unset_a command restores the previous result and _common_genre is again "A; B; C: D; E; A; B; C: D; E.

Forget what I said about the track genres as there’s no attempt to change them with this latest code.

That’s why it didn’t work. It needs to be a new separate script ahead of Script A. I’ll try some more testing here to see if I can replicate your problem.

The unset_a command is now the only command in a new script at the top of the script list. The result returns to the previous result: no aggregation and the _common_genre now concludes as “E”. Alas.

I’m looking forward to your outcome as I’m beginning to wonder if there’s a bug with the option profile implementation.

Okay, here is what I have set up for testing:

Script A:

$noop( Script A - Aggregate genres from all tracks )
$if($and(%genre%,$not(%_processed%)),
  $set_a(_common_genre,$trim($get_a(_common_genre); %genre%, ;))
)
$set(_processed,$add($if2(%_processed%,0),1))

Script B:

$noop( Script B - Set tags showing aggregated genres and aggrecate count )
$set(aggregate_genre,$get_a(_common_genre))
$set(aggregate_genre_length,$lenmulti($get_a(_common_genre),; ))

Script C:

$noop( Script C - Set tags displaying count of each genre in aggregate )
$if(%_genre_count_processed%,,
  $foreach($get_a(_common_genre),
    $set(_genre_key,Genre '%_loop_value%' count)
    $set(%_genre_key%,$add($if2($get(%_genre_key%),0),1))
    $unset(_genre_key)
  ,; )
)
$set(_genre_count_processed,1)

I don’t have any genre plugins installed so the genres are the ones coming from MB, and these are the only three scripts enabled.

When I load the Suave Suave release (I don’t have the release so I’m not matching any files) I get the following (new) tags that are the same for all tracks:

  • aggregate_genre: “Electronic; Electronic; Electronic; Electronic; Electronic; New Age; Electronic; New Age; Electronic; Electronic; New Age; Club”
  • aggregate_genre_length: “12”
  • Genre ‘Club’ count: “1”
  • Genre ‘Electronic’ count: “8”
  • Genre ‘New Age’ count: “3”

If I refresh the release in the album pane or manually run any of the scripts, the tags remain the same.

I suggest that you try this first (without the genre plugins enabled or loading from the files) to see if your results are the same. Then enable the genre plugins and see if the results are as expected (the genre tag for the tracks will not be overwritten). Then try your normal process of loading the release from the files and see what the results look like (but don’t save the files with any changes). Of course, you should remove the release from the album pane after each test to ensure that you are starting fresh each time.

Hopefully this will help identify where things are going wrong.

1 Like

I disabled the Wikidata genre plugin, and it appears my other plugins won’t affect scripting. I deleted all the old scripts, and copied yours exactly. I then loaded the same release from your link into a fresh instance of Picard. Results:

  • aggregate_genre: "Ambient;Electronic;New Age;Techno; Electronic;New Age;Techno; Ambient;Chillout;Electronic;New Age;Techno; Ambient;Chillout;Downtempo;Electronic;New Age;Techno; Electronic;New Age;Techno; Electronic;New Age;Techno; Electronic;New Age;Techno; Electronic;Latin;New Age;Techno; Electronic;New Age;Techno; Electronic;New Age;Techno; Club;Dance;Electronic;Latin;New Age;Techno; Club;Dance;Electronic;New Age;Techno
  • aggregate_genre_length: “12”
  • Genre ‘Ambient;Chillout;Downtempo;Electronic;New Age;Techno’ count: “1”
  • Genre ‘Ambient;Chillout;Electronic;New Age;Techno’ count: “1”
  • Genre ‘Ambient;Electronic;New Age;Techno’ count: “1”
  • Genre ‘Club;Dance;Electronic;Latin;New Age;Techno’ count: “1”
  • Genre ‘Club;Dance;Electronic;New Age;Techno’ count: “1”
  • Genre ‘Electronic;Latin;New Age;Techno’ count: “1”
  • Genre ‘Electronic;New Age;Techno’ count: “6”

There are some differences here, so I haven’t tried to load files. For grins, I’ll disable all my other plugins and repeat.

Edit: Disabled all plugins except Persistent Variables and get the same result.

Edit2: I removed file naming and tag options from this option profile so they’re back to default. No change to the result. I also reviewed all my other preferences and I don’t see anything that might be relevant.

Edit3: Regarding the Musicbrainz server genres, I had the option for ‘Minimal genre usage’ setting at 1%. I bumped that back up to 90% (the default, I think. My ‘Maximum number of genres’ is set to 8. The results changed, but aren’t identical to yours. The genre_count variables are still multi-genre versoins like before. What are your settings for these?

The problem is that whatever is providing your genre values for each track is separating them with a semicolon only and not the standard semicolon/space combination used for multi-values. Now your differing results are starting to make sense.

Try replacing Script A with the following and see if that helps:

$noop( Script A - Aggregate genres from all tracks )
$if($and(%genre%,$not(%_processed%)),
  $set(_temp,$replace($replace(%genre%,; ,;),;,; ))
  $set_a(_common_genre,$trim($get_a(_common_genre); %_temp%, ;))
  $unset(_temp)
)
$set(_processed,$add($if2(%_processed%,0),1))

1 Like

I noticed this phenom early on, but assumed it was because a user had entered them into the dbase that way. I fixed them with some clumsy lines of code, but still had other issues.

Done. Data much neater, but not identical to yours, even after playing with the MB genre tag settings. The following results are with my MB genre retrieval settings at 99% & max 3.

  • aggregate_genre: “Electronic; New Age; Electronic; New Age; Electronic; New Age; Electronic; New Age; Electronic; New Age; New Age; Electronic; New Age; Techno; New Age; Electronic; New Age; Techno; Electronic; New Age; Techno; New Age; Club; Dance; Electronic”
  • aggregate_genre_length: “25”
  • Genre ‘Club’ count: “1”
  • Genre ‘Dance’ count: “1”
  • Genre ‘Electronic’ count: “9”
  • Genre ‘New Age’ count: “11”
  • Genre ‘Techno’ count: “3”

Is there some reason why we may be receiving different data from the MB server for the same release?

Probably something different in my settings regarding genres. I don’t use genres at all, so I never really pay any attention to the settings. In any event, it looks like things are finally starting to work for your setup. Sorry it took so long for me to spot the difference.

Now I’m curious as to how you plan to select which of the genres in the list to include in you new ‘genre’ setting for the tracks. Is it just the top ‘x’ genres, or does the count need to meet a specific threshold (number or percentage of tracks containing that genre)?

EDIT: It’s because I have the maximum number of genres set to “1”.

We’re not out of the woods yet, I’m afraid. I just loaded the files from before, and the results are all screwed up. …into a new instance for Picard.

The result from our data table is now unique to each track. The aggregate_genre_length varies from “27” at track 1 up to “50” track 10. There are 5 solo genre counts, but they bounce around for each track. Arg!

To answer your question, I hadn’t decided on a specific selection routine yet. Perhaps accept the genre’s with the top 3 counts, then fallback to 2, then 1. I thought once the script was working, I’d play around with it using a variety of albums and optimize.

Edit: I made a copy of the album files and moved them out of my music folder. I loaded them into foobar2000 and deleted all the tags except for artist, album_artist, title, and track name. Loading into a fresh Picard gave almost the same result. The only difference was that the aggregate_genre_lengths changed order - Track 10 was “50” above, and it’s now track 12. I confirmed the tracks were in the same order.

I hope you’re taking a break, as I think I’ll come back to it tomorrow. Thanks a ton for your efforts!

That’s not good. Off hand, I’m not sure what could be causing that. It really sounds like some of the tracks are being processed multiple times which is affecting the counts. I’ll have to ponder it a bit.

I’ve analyzed the results for the last code on my tagged files for “Suave Suave” and on the same files that have been de-tagged except for titles & track numbers. There are 4 unique genre groups across all tracks and here’s how they fall for both file groups:

Actual Genre per track
01. Electronic;New Age = A
02. Electronic;New Age = A
03. Electronic;New Age = A
04. Electronic;New Age = A
05. Electronic;New Age = A
06. New Age = B
07. Electronic;New Age;Techno = C
08. New Age = B
09. Electronic;New Age;Techno = C
10. Electronic;New Age;Techno = C
11. New Age = B
12. Club;Dance;Electronic = D

I’ve given each genre group the letter shown to simplify viewing the aggregates. For the files with only basic tags, we get:

Files with basic tags
01. A; A; A; A; A; B; C; B; C; C; B; D; A
02. A; A; A; A; A; B; C; B; C; C; B; D; A; A
03. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A
04. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A
05. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A; A
06. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A; A; B
07. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A; A; B; C
08. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A; A; B; C; B
09. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A; A; B; C; B; C
10. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A; A; B; C; B; C; C
11. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A; A; B; C; B; C; C; B
12. A; A; A; A; A; B; C; B; C; C; B; D; A; A; A; A; A; B; C; B; C; C; B; D

For the files that were tagged months ago by Picard, we get:

Files with full tags
01. A; A; A; A; A; B; C; B; C; C; B; D; A
02. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A; C; C; A
03. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A
04. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A
05. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A; C; C; A; C; B; D; A
06. A; A; A; A; A; B; C; B; C; C; B; D; A; B
07. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A; C
08. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A; C; C; A; C; B
09. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A; C; C; A; C
10. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A; C; C
11. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A; C; C; A; C; B; D; A; B
12. A; A; A; A; A; B; C; B; C; C; B; D; A; B; A; A; C; C; A; C; B; D

The first group for the files with basic tags is like what I was seeing in my OP.

There are two common features for the two groups: the same 13 starting elements, and the same ending element. It’s wild that tags caused the scrambling in-between those common features. FYI, the genre in the tagged files is set to “Ambient”. Also, I restored my ‘preserve tags’ list in options, and that seems to have changed the grouping as well since the track with the longest genre count is now track 11 - not 10.

Edit: The files correspond to this MB release: Suave Suave. It’s different from your link above, but the code still works fine when the files are absent.

A third common feature between the two groups above is that the aggregated genres in track 1 are the same. Working on the idea that that’s always correct, I’ve modified the scripts as follows.

Script A is unchanged.

In Script B, I set “aggregate_genre1” to be a new persistent variable from “_common_genre” from only track 1. Because the prior results have always shown the aggregate genre has the track 1 genre(s) added again at the end, I slice them off. The aggregate_genre is now correct with the proper length.

$noop( Script B - Set tags showing aggregated genres and aggrecate count )
$if($eq(%tracknumber%,1),
  $set_a(aggregate_genre1,$get_a(_common_genre))
  $setmulti(_temp,$replace($replace(%genre%,; ,;),;,; ))
  $set_a(t1_genre_length,$lenmulti(%_temp%))
  $unset(_temp)
)
$setmulti(aggregate_genre,$get_a(aggregate_genre1))
$setmulti(aggregate_genre,$slice(%aggregate_genre%,,-$get_a(t1_genre_length)))
$set(aggregate_genre_length,$lenmulti(%aggregate_genre%))

Script C is disabled.
…but the new Script D is the same except for using “aggregate genre” instead of “common_genre”.

$if(%_genre_count_processed%,,
  $foreach($get(aggregate_genre),
    $set(_genre_key,Genre '%_loop_value%' count)
    $set(%_genre_key%,$add($if2($get(%_genre_key%),0),1))
    $unset(_genre_key)
  ,; )
)
$set(_genre_count_processed,1)

Results:
The 25 genres are properly aggregated and the solo genre counts match:
Genre ‘Club’ count: “1”
Genre ‘Dance’ count: “1”
Genre ‘Electronic’ count: “9”
Genre ‘New Age’ count: “11”
Genre ‘Techno’ count: “3”

I get the same result with both file sets. I hope it’s robust.

I’m going to start a new script to derive an album genre list derived from the solo genre counts…

Around holiday tasks, I’ve created an ‘album genre’ script. It’s based on the idea that if a recording genre appears in half or more of the tracks, it gets added to the ‘album genre’. It’s based on the clever routine that @rdswift provided.

$noop(___Create album genre from genre counts___)
$if(%_album_genre_processed%,,
  $foreach($get(aggregate_genre),
    $if($gte($mul($get(Genre '%_loop_value%' count),2),%totaltracks%),
      $if($not(%album_genre%),$setmulti(album_genre,%_loop_value%),
      $copymerge(album_genre,_loop_value))
    )
  )
)
$set(_album_genre_processed,1)

After this script, the album_genre result for “Suave Suave” is “Electronic; New Age”.

Next, I will be creating a genre prioritization routine to set the order of genres within the multivariable from high to low (like Rock before Folk-Rock, Electronic before Techno, Jazz before Hard Bop, etc.). It will likely be similar to @hiccup’s approach.

Once they’re prioritized & sorted, I’ll truncate a potential long list to 5 or so genres maximum.

3 Likes