Some sort_name statistics for Releases and Release groups

I am interested in proposing some Style guidelines for sort names of Releases and Release Groups. It seemed helpful to understand what existing Release and Relase Group sort-names are already in the database. I ran some queries on my replica of the MusicBrainz database in November 2025. Here are the results.

For background, see earlier threads on sort names for Releases and Release Groups:

  • Sort names style for Releases? (Sept-Oct 2025)
  • Release Title sort name style? (July 2022)
  • Note that neither the Release nor the Release Group entities have a sort_name field. This is different from the Artist entity, which does have a sort_name field. So, the way to apply a sort name to a Release or Release Group is to create an Alias for the Release (Group). The Alias does have a sort_name field. However there are no Style guidelines for this technique.

The first observation is that almost no Releases or Release Groups have Aliases. Any Style Guideline for sort names of Releases and Release Groups will only apply to a tiny subset of entries.

Of 3,949,111 Release Groups[1], only 47,426 (1.20%) have any Aliases. 3,901,685 (98.80%) have no Aliases, and therefore no Release Group sortnames. Similarly, of 5,061,745 Releases[2], only 35,717 (0.71%) have any Aliases. 5,026,028 (99.29%) have no Aliases.

The second observation is that 93-97% of Release Group and Release Aliases have the Sort Name set to the Name. The MusicBrainz web site displays this by omitting the Sort Name of an Alias if it is identical to the Name, and displaying the Sort Name only if it differs. Also, there are zero Release Group and Release Aliases which have either empty Name or Sort Name fields.

Of 59,730 Release Group Aliases[3], 58,171 (97.39%) have the Sort_name set to the same value as the Name. That indicates that the Sort_name for the Release group would not differ from the Name, and so a Sort Name style would not be relevant. Of the remaining Aliases, 779 (1.30%) have the fields filled out which I contemplate defining in a Release Group Sort Name Style: Sort_name, Alias Type, and Locale. Another 780 (1.31%) are missing either the Alias Type or Locale.

Similarly, of 46,243 Release Aliases[4], 43,318 (93.67%) have the Sort_name set to the same value as the Name. Of the remaining Aliases, 1863 (4.03%) have the fields filled out which I contemplate defining in a Release Group Sort Name Style: Sort_name, Alias Type, and Locale. Another 1062 (2.30%) are missing either the Alias Type or Locale.

One question in earlier threads was whether sort names should be applied to Release Groups, or to Releases, or to both, with some kind of merging of the results. What I see from these numbers is that Release Groups have more aliases overall, but Releases have more of the kind of aliases that might indicate a benefit from a Sort Name style. However, the numbers of Aliases are both of the same order of magnitude, about 1000-2000. This says to me that Sort Name Style guidelines should be written for both Release and Release Group entities.

If Release and Release Group Sort Name styles were defined that made sense to me, my attention would be drawn to the approximately 1900 “complete” Release Aliases, and the 779 “complete” Release Group Aliases. I believe that a database query could help focus on the entries which actually refer to sort names.

Queries

To help future readers who may want to reproduce my results, or improve on my queries, I am including the text of the SQL query and the raw results. I ran these queries on a replica of the MusicBrainz database in November 2025.

Query 1

This SQL query counts the number of Release Group entries with and without Aliases.

... 'WITH rga_ranked AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY release_group ORDER by id) AS rnk FROM release_group_alias
) SELECT  COUNT(*) AS n_rg, 
SUM(CASE WHEN rga.id IS NULL THEN 1 ELSE 0 END) as n_without_alias, to_char(100.0*SUM(CASE WHEN rga.id IS NULL THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_without_alias,
SUM(CASE WHEN rga.id IS NOT NULL THEN 1 ELSE 0 END) as n_with_alias, to_char(100.0*SUM(CASE WHEN rga.id IS NOT NULL THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_with_alias
FROM release_group AS rg LEFT JOIN rga_ranked AS rga ON rg.id = rga.release_group AND rga.rnk = 1'
  n_rg   | n_without_alias | pct_without_alias | n_with_alias | pct_with_alias 
---------+-----------------+-------------------+--------------+----------------
 3949111 |         3901685 |   98.80           |        47426 |    1.20
(1 row)

Query 2

This SQL query counts the number of Release entries with and without Aliases.

... 'WITH ra_ranked AS ( 
SELECT *, ROW_NUMBER() OVER (PARTITION BY release ORDER by id) AS rnk FROM release_alias            
) SELECT  COUNT(*) AS n_r,
SUM(CASE WHEN ra.id IS NULL THEN 1 ELSE 0 END) as n_without_alias, to_char(100.0*SUM(CASE WHEN ra.id IS NULL THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_without_alias, 
SUM(CASE WHEN ra.id IS NOT NULL THEN 1 ELSE 0 END) as n_with_alias, to_char(100.0*SUM(CASE WHEN ra.id IS NOT NULL THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_with_alias 
FROM release AS r LEFT JOIN ra_ranked AS ra ON r.id = ra.release AND ra.rnk = 1'                  
   n_r   | n_without_alias | pct_without_alias | n_with_alias | pct_with_alias 
---------+-----------------+-------------------+--------------+----------------
 5061745 |         5026028 |   99.29           |        35717 |    0.71
(1 row)

Query 3

This SQL query counts the number of Release Aliases which have Sort_name identical to Name (and not both empty), the number which are similar to a “complete” Sort_name (Sort_name field differs from Name field, Alias Type is Release Name, and Locale is filled out), and the number which are “incomplete” (Alias Type or Locale field are not correct).

... 'SELECT count(*) as N_rga, 
SUM(CASE WHEN COALESCE(rga.name, '"''"') = COALESCE(rga.sort_name, '"''"') AND COALESCE(rga.name, '"''"') <> '"''"' THEN 1 ELSE 0 END) as N_sort_name_same,
TO_CHAR( 100.0*SUM(CASE WHEN COALESCE(rga.name, '"''"') = COALESCE(rga.sort_name, '"''"') AND COALESCE(rga.name, '"''"') <> '"''"' THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_same_nonempty_sort_name,
SUM(CASE WHEN COALESCE(rga.name, '"''"') <> COALESCE(rga.sort_name, '"''"') AND COALESCE(rga.name, '"''"') <> '"''"' AND COALESCE(rga.sort_name, '"''"') <> '"''"' AND COALESCE(rga.locale, '"''"') <> '"''"' AND COALESCE(rga.type, -99) = 1 THEN 1 ELSE 0 END) AS N_complete_sort_name,
TO_CHAR( 100.0*SUM(CASE WHEN COALESCE(rga.name, '"''"') <> COALESCE(rga.sort_name, '"''"') AND COALESCE(rga.name, '"''"') <> '"''"' AND COALESCE(rga.sort_name, '"''"') <> '"''"' AND COALESCE(rga.locale, '"''"') <> '"''"' AND COALESCE(rga.type, -99) = 1 THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_complete_sort_name,
SUM(CASE WHEN COALESCE(rga.name, '"''"') <> COALESCE(rga.sort_name, '"''"') AND COALESCE(rga.name, '"''"') <> '"''"' AND COALESCE(rga.sort_name, '"''"') <> '"''"' AND (COALESCE(rga.locale, '"''"') = '"''"' OR COALESCE(rga.type, -99) <> 1) THEN 1 ELSE 0 END) as N_incomplete_sort_name,
TO_CHAR( 100.0*SUM(CASE WHEN COALESCE(rga.name, '"''"') <> COALESCE(rga.sort_name, '"''"') AND COALESCE(rga.name, '"''"') <> '"''"' AND COALESCE(rga.sort_name, '"''"') <> '"''"' AND (COALESCE(rga.locale, '"''"') = '"''"' OR COALESCE(rga.type, -99) <> 1) THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_incomplete_sort_name,
SUM(CASE WHEN COALESCE(rga.name, '"''"') = COALESCE(rga.sort_name, '"''"') AND COALESCE(rga.name, '"''"') = '"''"' THEN 1 ELSE 0 END) as N_both_empty,
SUM(CASE WHEN COALESCE(rga.name, '"''"') = '"''"' AND COALESCE(rga.sort_name, '"''"') <> '"''"' THEN 1 ELSE 0 END) AS N_name_empty,
SUM(CASE WHEN COALESCE(rga.name, '"''"') <> '"''"' AND COALESCE(rga.sort_name, '"''"') = '"''"' THEN 1 ELSE 0 END) AS N_sort_name_empty
FROM release_group_alias AS rga'
 n_rga | n_sort_name_same | pct_same_nonempty_sort_name | n_complete_sort_name | pct_complete_sort_name | n_incomplete_sort_name | pct_incomplete_sort_name | n_both_empty | n_name_empty | n_sort_name_empty 
-------+------------------+-----------------------------+----------------------+------------------------+------------------------+--------------------------+--------------+--------------+-------------------
 59730 |            58171 |   97.39                     |                  779 |    1.30                |                    780 |    1.31                  |            0 |            0 |                 0
(1 row)

Query 4

This SQL query counts the number of Release Aliases which have Sort_name identical to Name (and not both empty), the number which are similar to a “complete” Sort_name (Sort_name field differs from Name field, Alias Type is Release Name, and Locale is filled out), and the number which are “incomplete” (Alias Type or Locale field are not correct).

... 'SELECT count(*) as N_ra, 
SUM(CASE WHEN COALESCE(ra.name, '"''"') = COALESCE(ra.sort_name, '"''"') AND COALESCE(ra.name, '"''"') <> '"''"' THEN 1 ELSE 0 END) as N_sort_name_same,
TO_CHAR( 100.0*SUM(CASE WHEN COALESCE(ra.name, '"''"') = COALESCE(ra.sort_name, '"''"') AND COALESCE(ra.name, '"''"') <> '"''"' THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_same_nonempty_sort_name,
SUM(CASE WHEN COALESCE(ra.name, '"''"') <> COALESCE(ra.sort_name, '"''"') AND COALESCE(ra.name, '"''"') <> '"''"' AND COALESCE(ra.sort_name, '"''"') <> '"''"' AND COALESCE(ra.locale, '"''"') <> '"''"' AND COALESCE(ra.type, -99) = 1 THEN 1 ELSE 0 END) AS N_complete_sort_name,
TO_CHAR( 100.0*SUM(CASE WHEN COALESCE(ra.name, '"''"') <> COALESCE(ra.sort_name, '"''"') AND COALESCE(ra.name, '"''"') <> '"''"' AND COALESCE(ra.sort_name, '"''"') <> '"''"' AND COALESCE(ra.locale, '"''"') <> '"''"' AND COALESCE(ra.type, -99) = 1 THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_complete_sort_name,
SUM(CASE WHEN COALESCE(ra.name, '"''"') <> COALESCE(ra.sort_name, '"''"') AND COALESCE(ra.name, '"''"') <> '"''"' AND COALESCE(ra.sort_name, '"''"') <> '"''"' AND (COALESCE(ra.locale, '"''"') = '"''"' OR COALESCE(ra.type, -99) <> 1) THEN 1 ELSE 0 END) as N_incomplete_sort_name,
TO_CHAR( 100.0*SUM(CASE WHEN COALESCE(ra.name, '"''"') <> COALESCE(ra.sort_name, '"''"') AND COALESCE(ra.name, '"''"') <> '"''"' AND COALESCE(ra.sort_name, '"''"') <> '"''"' AND (COALESCE(ra.locale, '"''"') = '"''"' OR COALESCE(ra.type, -99) <> 1) THEN 1 ELSE 0 END)/COUNT(*), '"'990D00'"') AS pct_incomplete_sort_name,
SUM(CASE WHEN COALESCE(ra.name, '"''"') = COALESCE(ra.sort_name, '"''"') AND COALESCE(ra.name, '"''"') = '"''"' THEN 1 ELSE 0 END) as N_both_empty,          
SUM(CASE WHEN COALESCE(ra.name, '"''"') = '"''"' AND COALESCE(ra.sort_name, '"''"') <> '"''"' THEN 1 ELSE 0 END) AS N_name_empty,     
SUM(CASE WHEN COALESCE(ra.name, '"''"') <> '"''"' AND COALESCE(ra.sort_name, '"''"') = '"''"' THEN 1 ELSE 0 END) AS N_sort_name_empty
FROM release_alias AS ra'
 n_ra  | n_sort_name_same | pct_same_nonempty_sort_name | n_complete_sort_name | pct_complete_sort_name | n_incomplete_sort_name | pct_incomplete_sort_name | n_both_empty | n_name_empty | n_sort_name_empty 
-------+------------------+-----------------------------+----------------------+------------------------+------------------------+--------------------------+--------------+--------------+-------------------
 46243 |            43318 |   93.67                     |                 1863 |    4.03                |                   1062 |    2.30                  |            0 |            0 |                 0
(1 row)

  1. See Query 1 below ↩︎

  2. See Query 2 below ↩︎

  3. See Query 3 below ↩︎

  4. See Query 4 below ↩︎

A question I had before in the previous thread - how many Release Groups \ Releases actually start with “The, A, etc”. How many Release Group \ Releases actually need a sort order applied to them?

This changes your results in the above searches. I expect the majority of albums don’t need a sort order applied. So the actual number of missing sort orders is going to be much lower.

What gets a sort order? “The Best of the Beatles”? “A Song for You by Carpenters”? “The Dark Side of the Moon by Pink Floyd”… i.e. actual sentences.

What about names? I assume “Mr Brightside by The Killers” and “Dr Feelgood by Mötley Crüe” would be left on the shelf unsorted as these are titles and not a person’s name.

Thanks for sharing these stats, reosarevok super insightful dive into alias/sort name quirks! The low alias adoption (under 1.5%) really underscores how we could nudge more editors toward consistent sort naming via targeted queries like these. I’ve bookmarked your SQL snippets for my next cleanup sprint; they look like a game-changer for spotting edge cases. Cheers to making MB even more sortable!

You are welcome! But the queries are by me, @Jim_DeLaHunt , not by @reosarevok . FYI.

1 Like

Yes, you asked this before. Two comments:

  1. It is not easy to gather those statistics, because the queries have to be language specific. It is not enough to look for prefix strings like 'The ', 'A ', 'An ', etc. An English-language title “The Best of Jim” would have a different sort name. But a French-language title “The et café” would use the Release Name as the sort name.
  2. Right now I’m interested in proposing the Style Guidelines. The statistics of how many Release (Groups) actually need a different Sort_Name from their Name is a question of implementation, of improving existing entries to meet the new guidelines. How will that information affect what Style Guidelines we set?

Maybe. It depends on your personal preference. As I mentioned in “I have been thinking deeply about Release sort names, and I have come to some conclusions”, “There is no one true sort order.” You might sort a title like “Mr Brightside” one way, and I might sort it a different way. It is the user’s client software (or tagging software) which will decide what rules to apply to names like these.

In this case, it would be “Thé et café” (not The). But it’s just a typo, it doesn’t change what you meant.