Comma in sort names for Surname Name artist names

I would like to point out that Pinyin transcriptions aren’t English, as were being added by the bot, they’re Mandarin Chinese in Latin script. Some Chinese artists do have English names which are not direct transcriptions. I wrote this in a note to an edit, and got a quick reply accepting the transcriptions shouldn’t be added as English, but it’s probably a good idea to leave it here also for discussion.


The first is not person’s name, there’s no surname (黑鸭子 Hei Yazi means “Black Ducks”)

The second one is actually a pet peeve of mine. I think it’s totally wrong to add the comma there because it implies the name is in the wrong order, “sort order”, but there is no sort order in Chinese.

Western names have the surname at the end; but to sort them in a list, you generally move the surname to front and add a comma to indicate the surname has been moved. E.g. “Dylan, Bob” means the name is actually “Bob Dylan” with surname Dylan moved to the front, “Swift, Taylor” means name is actually “Taylor Swift” with surname Swift moved to the front. “Ma, Siwei” implies the name is actually “Siwei Ma” — but it isn’t. The sort name for “Ma Siwei” is “Ma Siwei”, because the surname is already at the beginning, there is no moving, there should be no comma.

4 Likes

I don’t disagree at all. But it’s not written anywhere, yet, in sortname guidelines.

1 Like

This is becoming fairly OT (we can split this into a separate topic if you want to keep discussing it) but I think you are reading too much into this - the sort name field is just meant to be “Surname, Name” for consistency. It doesn’t necessarily imply that the name should be Name Surname, it’s just for better basic sorting. That way, a Ma Siwei from China and a Siwei Ma who was born in the US and as such uses Name Surname in their daily life would both sort consistently together (which no longer happens if you don’t add the comma to the Chinese one).

Better, more precise sorting can be added at the alias level, where the sort name for the Chinese name would probably be something else entirely that does not need to conform to our basic sorting rules nor be in the Latin script.

3 Likes

It really is a separate issue, but the comma isn’t decoration, it has a purpose. If someone uses the name “Siwei Ma”, then the sort order would be “Ma, Siwei”. He wouldn’t even have to be born in the US, many Chinese people chose to invert the order of their name in the West, it could even be added as an “English name” if it appears on English releases. But for the name 马思唯 the direct transcription is “Ma Siwei”, the surname is the first character, nothing was moved. And indeed both people would be sorted alphabetically in the same place, as the comma doesn’t affect alphabetic order, the reason is exactly to mark the surname was moved to the front.

Your point about being different on the alias section vs. the main name, I just didn’t understand. A Chinese name is a Chinese name anywhere.

To be clear, I know it has basically become standard to add a comma after Chinese surnames. I think I’ve spoken against this in the forum, and I definitely have multiple times in edit notes. I don’t change the names after people add a comma because I don’t want to start an edit war. I just think it should be mentioned even if I end up talking to myself, because it just doesn’t make sense.

My point here was that the alias section and the main sort name are meant to be different things that work differently.

The alias section is supposed to contain “the name in Chinese” and “the sort name in Chinese”. It likely should not even sort in Latin script at all for the Chinese name.

The artist sort name is just a Latin construct (it’s not Chinese nor English nor anything of the sort) to allow us to sort all the artists in the same way. Even if a language specifically used the “Name Surname” order to sort their names for some reason, we would still use Surname, Name for the main artist sort name here. See the guidelines that explicitly say:

All disparate parts of a sort name should be separated by “, ” (comma and space).

This is not only about the surname but all disparate parts :slight_smile:

adding my two cents from Japanese editing, I think it makes sense to add a comma even when the names aren’t necessarily flipped around

like Chinese, Japanese names are surname first, given name second. usually the name will be flipped to the English standard when it’s translated to English, but it is not uncommon to keep the Japanese name order too. for example, Taku Iwasaki here, “Iwasaki” being the surname vs. Tsukumo Sana, “Tsukumo” being the surname

I think standardizing English sort names as Surname, Given Names makes sense, as it clarifies to editors that the sort name is in the proper order and would properly sort artist names in more cases. this also follows the same format as the artist sort name, which actually gives a Hungarian example that’s also Surname, Given

edit: this only applies to English aliases, of course. individual languages have their own sorting rules

There is no non-Latin Chinese sortname system?

For example, in Japanese, they have specific sortname system, based on syllabaries (consonants アカサタナハマヤラワン then vowelsアイウエオ inside each):

Order

アイウエオ
カキクケコ
サシスセソ
タチツテト
ナニヌネノ
ハヒフヘホ
マミムメモ
ヤ ユ ヨ
ラリルレロ
ワ   ヲ
ン

And also, modified consonants come after pure consonants: for example ヒビピ

Expanded sort order

ア  イ  ウ  エ  オ
カガ キギ クグ ケゲ コゴ
サザ シジ スズ セゼ ソゾ
タダ チヂ ツヅ テデ トド
ナ  ニ  ヌ  ネ  ノ
ハバパヒビピフブプヘベペホボポ
マ  ミ  ム  メ  モ
ヤ     ユ     ヨ
ラ  リ  ル  レ  ロ
ワ           ヲ
ン

Examples (in order)

(from street shops and libraries)

I didn’t mean to start a huge discussion, but to make my point clear.

This is what I don’t understand. The name in pinyin is Chinese. Chinese characters are not sortable, if you don’t sort the name in the Latin script, how are you proposing to sort them?

I am aware of this. I am saying this is a mistake based on assuming all names are similar to Western names, and clearly doesn’t make sense for names that are naturally in the reverse order.

“Sort name” is not a MusicBrainz invention, it’s the standard name to list names alphabetically, especially in libraries.

I think you are speaking against yourself… If the name may or may not be flipped, that makes it much more important to mark the difference.

The sort name of 岩崎琢 is “Iwasaki Taku”
The sort name of “Iwasaki Taku” is “Iwasaki, Taku”
The sort name of “Taku Iwasaki” is “Iwasaki, Taku”

Otherwise it’s impossible to say what the natural order is. If you write “Iwasaki, Taku” in both cases, it’s impossible to say if the name you are trying to order is “Iwasaki Taku” or “Taku Iwasaki”.

This isn’t really relevant here, but the traditional way is by counting strokes, or Kangxi radical plus remaining number of strokes. So 一 (yī) would could first because it’s only one stroke, then 二 (èr), then all the way to 𰻝 (biáng) probably.

The concept of sorting names by a different order isn’t natural in Chinese, because the surname always comes first anyway.

If you mean phonetic but not Latin, you can transcribe Chinese in different “topolects” and writting systems, but Pinyin is the official romaniztion system both in the PRC and Taiwan.

1 Like

In that case, I expect for every Chinese alias, the Chinese alias sort name should just be exactly the same as the alias name :slight_smile:

It’s not necessarily about a different order - we just want the Chinese alias to have a sort name that would be a good fit for how Chinese sorts. If it doesn’t sort in any specific way, then that’s fine too :slight_smile: (I don’t know if standard sorting in computer software takes into account the number of strokes as you mentioned earlier or not).

All names that have a family name and a first name (or more) are sorted as “family name, first name” for consistency - we do not care whether they are actually similar to western names, or how the language sorts them. It’s just a standard - we could have picked a different one, but that’s the one we have.

Most Icelandic names for example only have a first name and a patronymic, no surname. So we sort “Anna Sigríður Þorvaldsdóttir” as “Anna Sigríður Þorvaldsdóttir” since there’s only “one section” to this name. Chinese names are closer to the family name and first name usual Western combo, so we sort according to that.

Again, the whole point of the artist sort name is to be a lowest common denominator “standard across all the database” thing. Yes, it’s somewhat western-centric and it will fit some languages and cultures worse than others - that’s why we have specific per-locale sort options at the alias level that allow each locale to sort according to their own rules, ideally - but it’s consistent and relatively easy to follow, and given how confusing sort names are for most people, that’s the least bad option.

2 Likes
3 Likes

And do you know the day-to-day ordering method, used in record shops, book shops, or in libraries?
Are they still sorting by stroke counts (which still needs some other instructions for all characters with same stroke count, like stroke kind/style/direction/orientation) or are they really sorting by Pinyin Latin transcription?

Or they just sort by thematic, maybe?

@blackteadarkmatter
Without knowing what method is used in common shops and libraries, maybe we should just let the Unicode sorting (which is reportedly Kangxi radical order based) do its good enough job?

I seem to have started exactly what I did not want to start: a huge discussion about a tiny thing: a comma.

@jesus2099

On the character sorting: I was trying to give a simplified explanation because I didn’t think it was very relevant; the Wiki article @yindesu linked above gives a more detailed explanation of different ways to sort characters. The simplest way is simply counting strokes, but in dictionaries it’s usually by Kangxi radical + remaining number of strokes (e.g. for 草 it’s Kangxi radical 140 艸 plus 6 strokes).

I’m old enough that I studied Chinese with paper dictionaries, and you still need to understand this system to look up a character in those. The words are actually now usually listed alphabetically by pinyin, but if you don’t know a character, you don’t know how it’s pronounced, so you need to find it in an index by guessing the “main” radical and then the number of remaining strokes, then find the character in the index, which gives the pronunciation in pinyin, so you can then actually find the word you’re looking for. Except you will often miscount the strokes, or pick the wrong radical and have to start from the beginning again. I love books, but I don’t miss paper dictionaries.

All of these methods are also not flawless and clear: there are regional differences on how people count strokes, and it’s not always clear which “main” radical to pick, the same character can have multiple Kangxi radicals. Different dictionaries can put the same character under different radicals.

Which is what I meant by “not sortable”. There are many methods used historically to sort characters, but there is no inherent order to the characters like alphabetical order, where every speaker knows which letter comes after which letter. — An exception is bopomofo, which is similar to the Japanese kana (bo-po-mo-fo are actually the first syllables, like ABCD), but it was only used in Taiwan (so over 99% of Chinese speakers are not familiar with it), and it’s being deprecated even in Taiwan.

“Unicode sorting”, can actually mean different things and the Wiki article is quite misleading. The way computer systems without advanced Chinese support sort Chinese characters is simply by their Unicode code point, so U+8BCD (词 cí) before U+8BCE (诎 qū). It’s possible the first characters added were approximately by Kangxi order (I don’t actuallt know), but new characters are added every year to the Unicode standard, and don’t follow any specific order. Ordering by the Unicode code-point looks basically arbitrary. I don’t know if it’s still the case, but Windows used this order for characters unless you installed East-Asian support which allowed you to sort by a human-understandable order (pinyin or stroke count) — and which anyone writing and reading Chinese had to install.

Something else is the Unihan database, which is produced by the Unicode Consortium, but isn’t part of the Unicode standard itself. It’s basically a free database of information about Chinese characters, including pronunciation in different topolects and languages (pinyin, Japanese, Korean…) relations between characters (simplified version of, rare variant of, etc,) and the place of the character in the Kangxi dictionary, or the place it would be if it doesn’t appear in the dictionary. It’s basically HanziBrainz. This is a very useful DB is you’re a coder working with the Chinese language, but it’s not “Unicode sorting” — it’s not part of the Unicode standard at all and computer systems can’t use it by default.

In my experience, at least in Mainland China where I lived, studied, and worked for a decade, the most common and the only way most people understand is alphabetical order of pronunciation in pinyin. Stroke count is still used officially, but nobody knows by heart how many strokes each character has, especially if it’s complex, or it’s place in a list of thousands and thousands of characters. There’s only 26 letter in the Latin alphabet, pinyin is taught in schools, it’s how most people input characters, and they actually know it.

Here’s a very recent example of character stroke count order being ordered by the government. Here the goal is that is basically randomised the names, so people can’t get obsessed about the order they appear.


Which brings me to my actual point, because I wasn’t saying that we should stop using pinyin in the sort name field. I just think that the comma means something!

This is what annoys me. It’s not just an arbitrary thing. The comma isn’t decoration, it has been used for centuries to indicate that something is out of place for the order it appears in a list. In “Dylan, Bob” the comma means the name is “Bob Dylan” and the surname was moved to the front. In “Beatles, The” the comma means the name is “The Beatles”, but the first noun (actually the first non-article) was moved to the front. In “Anna Sigríður Þorvaldsdóttir” there is no comma because the names are in the natural order, nothing was moved. In “Ma, Siwei” the comma means the name 马思唯 is “Siwei Ma”, WHICH IS WRONG. That is my only point and I stand by it.

3 Likes

About the comma, it has a meaning indeed (in MB).
It says that left side is family name and that right side is given name.

So, even if there is swap or no swap, I think it’s still useful for readers not able to identify family names and given names, otherwise.