I have been thinking deeply about Release sort names, and I have come to some conclusions.
Release Sortname style should not aim to achieve some One True Sort Order. There is no one true sort order. Each library has their own rules for sort order. For instance, there is a common English language convention that articles (“the”, “an”) are disregarded when sorting, but NISO TR03-1999 says,
An initial article in a heading should be treated as any other initial word. When it is
deemed appropriate or desirable to arrange headings with initial articles by the word
following the article … the headings may be structured to achieve the desired arrangement.
Some sort orders operate word-by-word. Some operate letter-by-letter, disregarding word boundaries. Typical computer sort functions operate character-by-character, sorting punctuation as well as letters.
Release Sortname style should be language-specific. Different languages have different sort order rules for their characters. Alphabetic order in French sorts accented letters according to the last accented character in a word, while in German, accented letters are sorted together with unaccented letters. A sorting order must apply the correct language-specific conventions. If the sort order disregards leading articles, it must use language-specific rules to find them (e.g. “the” in English, “der” in German, “le” in French).
Release Sortname style is related to transcription, because both use Alias entries. MusicBrainz has Release names in many different scripts: “狂気” (Japanese), “Тьма и Свет” (Cyrillic), “The Dark Side of the Moon” (Latin). A MusicBrainz user may not want to read all those scripts, and so will want names transcribed into a script which they can read. MusicBrainz Aliases provide both a place to store sortnames and a place to store transcribed names. The Release Sortname style should describe how to enter data so that it serves both purposes.
Release Sortname style should not primarily aim to tailor the sort order in the MusicBrainz UI, rather it should primarily aim to supply sufficient information to sort operations external to MusicBrainz. MusicBrainz always provides ways to search for a Release name and get back a short list, in which it is easy to find the desired name. But if you want to find a directory of music files on a drive, the computer’s OS, not MusicBrainz, will do the sorting, and the filename, not MusicBrainz data, is what gets sorted. The crucial link is the client software — which reads MusicBrainz data, and adapts it to work well as a filename, or an ID3 tag, or whatever.
Using ReleaseGroup metadata for Releases is difficult and probably not worth the effort. If a ReleaseGroup has a set of Alias entries and a Release has a different set of Alias entries, it is difficult to combine them. It is probably necessary to correlate the Type and Locale settings of the Alias entries between sets. The Name fields should probably have to match the Title of the Release or ReleaseGroup. Something has to be done with all the Alias entries with no type. And, the payoff will be limited to a small fraction of Release Groups. As reported above, 95% of Release Groups have only 1 or 2 Releases in them. It is not hard to apply Aliases to 1 or 2 Releases in a Release Group, compared to the difficulty of merging sets of Alias entries. Thus, I think Release Sortname style should guide editors to first apply Alias entries to Releases rather than Release Groups.
MusicBrainz code presently has little support for Release Sortnames. As far as I can tell, the code which runs when you press the Guess Case button when entering a Release Sortname is at in moveArticleToEnd() in scripts/guess-case/MB/GuessCase/Handler/Base.js#L927-L934. The only leading words it detects for moving to the end of the sortname are “The” (English) and “Los” (Spanish). Other leading articles in English, such as “A”, “An”, or in other languages, are not handled. That means there is less likely to be useful Sortname data already in the database, predating a new Release Sortname style.
So.
I think my next step is to propose style guidelines for Release entry Aliases which are language-specific, and serve to provide both transcriptions of the Release Title and linguistic information about the Title which client software could not readily derive from the characters of the Release Title string.
