Sort names style for Releases?

I have been thinking deeply about Release sort names, and I have come to some conclusions.

Release Sortname style should not aim to achieve some One True Sort Order. There is no one true sort order. Each library has their own rules for sort order. For instance, there is a common English language convention that articles (“the”, “an”) are disregarded when sorting, but NISO TR03-1999 says,

An initial article in a heading should be treated as any other initial word. When it is
deemed appropriate or desirable to arrange headings with initial articles by the word
following the article … the headings may be structured to achieve the desired arrangement.

Some sort orders operate word-by-word. Some operate letter-by-letter, disregarding word boundaries. Typical computer sort functions operate character-by-character, sorting punctuation as well as letters.

Release Sortname style should be language-specific. Different languages have different sort order rules for their characters. Alphabetic order in French sorts accented letters according to the last accented character in a word, while in German, accented letters are sorted together with unaccented letters. A sorting order must apply the correct language-specific conventions. If the sort order disregards leading articles, it must use language-specific rules to find them (e.g. “the” in English, “der” in German, “le” in French).

Release Sortname style is related to transcription, because both use Alias entries. MusicBrainz has Release names in many different scripts: “狂気” (Japanese), “Тьма и Свет” (Cyrillic), “The Dark Side of the Moon” (Latin). A MusicBrainz user may not want to read all those scripts, and so will want names transcribed into a script which they can read. MusicBrainz Aliases provide both a place to store sortnames and a place to store transcribed names. The Release Sortname style should describe how to enter data so that it serves both purposes.

Release Sortname style should not primarily aim to tailor the sort order in the MusicBrainz UI, rather it should primarily aim to supply sufficient information to sort operations external to MusicBrainz. MusicBrainz always provides ways to search for a Release name and get back a short list, in which it is easy to find the desired name. But if you want to find a directory of music files on a drive, the computer’s OS, not MusicBrainz, will do the sorting, and the filename, not MusicBrainz data, is what gets sorted. The crucial link is the client software — which reads MusicBrainz data, and adapts it to work well as a filename, or an ID3 tag, or whatever.

Using ReleaseGroup metadata for Releases is difficult and probably not worth the effort. If a ReleaseGroup has a set of Alias entries and a Release has a different set of Alias entries, it is difficult to combine them. It is probably necessary to correlate the Type and Locale settings of the Alias entries between sets. The Name fields should probably have to match the Title of the Release or ReleaseGroup. Something has to be done with all the Alias entries with no type. And, the payoff will be limited to a small fraction of Release Groups. As reported above, 95% of Release Groups have only 1 or 2 Releases in them. It is not hard to apply Aliases to 1 or 2 Releases in a Release Group, compared to the difficulty of merging sets of Alias entries. Thus, I think Release Sortname style should guide editors to first apply Alias entries to Releases rather than Release Groups.

MusicBrainz code presently has little support for Release Sortnames. As far as I can tell, the code which runs when you press the Guess Case button when entering a Release Sortname is at in moveArticleToEnd() in scripts/guess-case/MB/GuessCase/Handler/Base.js#L927-L934. The only leading words it detects for moving to the end of the sortname are “The” (English) and “Los” (Spanish). Other leading articles in English, such as “A”, “An”, or in other languages, are not handled. That means there is less likely to be useful Sortname data already in the database, predating a new Release Sortname style.

So.
I think my next step is to propose style guidelines for Release entry Aliases which are language-specific, and serve to provide both transcriptions of the Release Title and linguistic information about the Title which client software could not readily derive from the characters of the Release Title string.

I have created MusicBrainz ticket STYLE-2760, “Sort name style for Release/ReleaseGroup” to officially propose this as a Style Guideline change.

The post, here, and the ticket text are very long.

I don’t understand what is added to the already existing Titles, Aliases and Language Style Guidelines.

Briefly:

  1. Add guidelines about Release Sort Names to Style/Release, and to Style/Titles. Right now, neither place mentions sort names for Releases.
  2. Since the sort names for Releases have to be stored as Aliases, add to Style/Aliases instructions on how to fill out an Alias for use with a Release to hold a Release sort name.
  3. Since there are often locale-specific rules on how to make sort names from titles, add sections about sort name construction to the respective Language Style Guidelines.

Yes, you are right. That is a weakness of mine. I explain with too many words.

1 Like

I prefer not modify these.

Sort name is not an attribute of Release and is described already in Aliases doc (which applies to aliases of any entity kinds, including Releases).

Same for Titles. They already apply to all entity titles, of all kinds.
No need for redundant text about releases, specifically. And why not release group aliases, which are more important than release aliases?

Aliases doc is already valid for any kinds of aliases.
You can add a release group example, but no need for more texts, IMO.

This is about language guidelines, they can be indeed improved, language per language (one smaller ticket each, IMO).

It’s not easy to apply a ticket with so many changes.
And if you understand them, all changes are not necessary, IMO.

I think it’s good how per language Title and Aliases guidelines are kept not entity-specific.

2 Likes

I think the release group is generally the best place to store aliases, both native-language and translated/transliterated (and therefore their sort names). it keeps most aliases in one place (making editors jobs easier) and works quite nicely with the current display, with aliases showing on the main artist page if the alias in your language differs from the release title, like here.

image

the only case when release aliases might make sense to me is if the title is distinctly different than the original or localized release group title, something like The Dark Side of the Moon (deluxe edition), but even then, those could theoretically be added as non-primary aliases on the release group if we’d like. all a data consumer would have to do would be to match the release title to one of the release group aliases (optionally consulting the release language), and you’d get a sort name

2 Likes

Can I point out that the differences in Release titles of The Dark Side of the Moon is the inclusion or not of “The” at the front. Which means the sort order for each variation would be the same. “Dark Side of the Moon” and “Dark Side of the Moon, The”. :grin:

A sort order at Release Group level makes more sense. And is less work.

The spec seems to be very over written for something that is simple. And would be made more simple by looking at the actual examples where this is actually needed. I expect well over 95% of the Release Groups don’t need a sort order.

2 Likes

true, the more common variant title is Dark Side of the Moon, lol

a quick search of my personal library shows that out of my 3,096 releases, 234 start with The, An, or A. this of course isn’t all cases (and doesn’t include titles that start with punctuation either), but that’s pretty close to 10% of my releases that’d need a sort name

This is a circular argument. You are saying that because the MusicBrainz data model has a defect (no field for Sort name), we should not document the workaround for the defect. The reality is, Releases have titles, a sort name is important for making lists of Release titles correctly, so a good music metadata database has to store a sort name somewhere. I am arguing that we should describe how to store sort names with current MusicBrainz, rather than be silent about it.

If we want to use this opportunity to add a Sort name field to the Release (and Release Group) entity, that is marvellous. But my understanding is that a change to the database structure is harder than a change to the style guideline.

The present text does not mention Releases. What I am advocating is that the Aliases doc be clear about the special role of Aliases as carrier for Sort name of Release and Release Group titles.

The current Style / Title text does not mention sort names at all. I think sort name is maybe out of scope for that guideline.

I was trying to use fewer words, so I did not mention Release Groups. My proposal does include Release Groups. However, I think that storing Release sort names in Release Groups will cause more complexity that people expect.

The proposal includes documenting how to store sort names for Release Groups as well as Releases.

It may seem to be less work for the person entering Sort Names for an unusual case like Dark Side of the Moon. I suspect it will be more work for the person writing Picard code to figure out which alias to use as the albumsort string for a Release.

There is not really a conflict. Let’s document how to store Sort Names for both Releases and Release Groups. Then editors will work at whichever level they prefer.

The sort names for RGs should be for the main RG name, and the sort names for releases should be for the specific release. Those are different things, and we should not consider them equivalent. That said, Picard could prefer a RG name and sort name of the right locale if desired over the release ones, in the same way we provide a release group cover art that can be used instead of the cover art for a specific release.

Calling it a defect is very strong words :slight_smile: It’s an intentional choice, and in fact we dropped sort names from other entities (such as labels) that used to have them. That is because of two reasons:

  1. for most entities, “the sort name” is actually relatively meaningless, and sort names only make sense when associated with a specific locale. Admittedly, releases are generally (although not always) titled in just one language and not translated, and we could instead make the users specify a locale for the title and a sort name, but:
  2. most users don’t care at all about sort names and much less sort names for releases, so having them be optional as aliases makes a lot more sense than forcing them into the main release editing flow and confusing most users even more.

All this proposed text seems entirely too complicated, to be honest. Basically, AFAICT, they can be summarized as “we have an alias feature, you can and should use it if you want to store this info”. It seems that until we have more specific per-language guidelines, most of your proposed guidelines are just “apply the sort name alias guidelines and the localised alias guidelines”. Even if we add per-language guidelines, the alias ones already say “The sort name should be in the appropriate form for the alias locale” so the only change needed would be to add something to the guidelines for specific languages/locales.

I kind of agree with @jesus2099 here that minimal modifications would do the trick elsewhere too. Under Style/Release#Title, which already mentions several versions of a title, we could add a third sentence that says something like “Any other versions of the title can be stored using aliases, which can also be used to indicate the expected sort name for the title(s).” It’s kind of unnecessary since that’s what aliases are for anyway, but it might work as a bit of a reminder for users that they can actually use aliases on releases/RGs too which I think is often forgotten entirely.

1 Like

Indeed, that’s clever, as the sort name is most useful for artists.
But even there, the alias sort names could be enough.

Usually people will order their records like in libraries and shops:

  1. By artist sort name or even by artist alias sort name, in one or other language
  2. By original release date (aka release group earliest release date), which doesn’t need sort name

Thank you for reviewing this proposal, @reosarevok .

I agree with you, but from earlier in this thread I believe there are other editors who will disagree. Or at least, they will say, it is convenient to store Release sort names as Release Group aliases, so they want a way to do that.

Rather than “over” the release [sort names], I would suggest “as a fallback if no Release sort names are found”. That would please the people who want to put Release sort names into Release Group aliases. However, the people who have to write the Picard code to find Release sort names at the Release Group level may be disappointed with the complexity of their task.

I respect your opinion, but I stand by my words. However, debating the data design is off topic for a Style proposal discussoin.

In part, yes. Right now, Style/Release has zero guidance about how to handle sort names. I think that is too little. It should at least say, “the correct place to store a Release sort name is in an Alias”.

But also, I think there needs to be guidelines for how to fill out a sort name Alias so that Picard code can pick it out from other search hint Aliases, legal name Aliases, title translation Aliases, etc. The reason for specifying that a sort name Alias should have a Type, and a Locale, and a Name equal to the Release Name, is to allow Picard code to find the sort names among the search hints.

That said, I would be glad to go with minimal modifications under Style/Release, and Style/Alias.

I think a mention of sort name in Style/Release is necessary precisely because users doing data entry on a Release might well not also know Aliases backwards and forwards. It is good to provide users the information they need to complete a task (enter metadata for a Release) in the place where they are probably looking when working on the task (Style/Release).

Since we are talking about a modification to Style/Release, and to Style/Alias, and to per-language guidelines as needed, I think “only change” is being stretched pretty far :-). However, yes, part of helping editors enter good sort names for Releases (and Artists, for that matter) is adding per-language guidelines. Each could be a separate Style proposal.

There is raw material for over 40 languages worth of guidelines in the US Library of Congress MARC appendix on “Definite and Indefinite Articles”. Each should be evaluated by editors who know that language.

@jesus2099 , I can provide a counter-example: for opera and musicals collectors, ordering by Artist sort name and release date may well not be enough. When the Artist is a composer like Mozart or Verdi, there are many releases under each Artist. For these genres, opera name is likely more significant than release date, and more useful as a second level of sorting. We should be building the database to support many different uses, not a single “usual” way.

Ah, ok, classical, maybe good point.

But I still don’t think we are lacking any guidelines.
Everything is already there: language title styles and alias style.

But still no mention of sort names in Style/Release (or Style/Release Group, for those who favour that). I still maintain that zero is too little documentation of sort names in Style/Release.