I seem to have started exactly what I did not want to start: a huge discussion about a tiny thing: a comma.
@jesus2099
On the character sorting: I was trying to give a simplified explanation because I didn’t think it was very relevant; the Wiki article @yindesu linked above gives a more detailed explanation of different ways to sort characters. The simplest way is simply counting strokes, but in dictionaries it’s usually by Kangxi radical + remaining number of strokes (e.g. for 草 it’s Kangxi radical 140 艸 plus 6 strokes).
I’m old enough that I studied Chinese with paper dictionaries, and you still need to understand this system to look up a character in those. The words are actually now usually listed alphabetically by pinyin, but if you don’t know a character, you don’t know how it’s pronounced, so you need to find it in an index by guessing the “main” radical and then the number of remaining strokes, then find the character in the index, which gives the pronunciation in pinyin, so you can then actually find the word you’re looking for. Except you will often miscount the strokes, or pick the wrong radical and have to start from the beginning again. I love books, but I don’t miss paper dictionaries.
All of these methods are also not flawless and clear: there are regional differences on how people count strokes, and it’s not always clear which “main” radical to pick, the same character can have multiple Kangxi radicals. Different dictionaries can put the same character under different radicals.
Which is what I meant by “not sortable”. There are many methods used historically to sort characters, but there is no inherent order to the characters like alphabetical order, where every speaker knows which letter comes after which letter. — An exception is bopomofo, which is similar to the Japanese kana (bo-po-mo-fo are actually the first syllables, like ABCD), but it was only used in Taiwan (so over 99% of Chinese speakers are not familiar with it), and it’s being deprecated even in Taiwan.
“Unicode sorting”, can actually mean different things and the Wiki article is quite misleading. The way computer systems without advanced Chinese support sort Chinese characters is simply by their Unicode code point, so U+8BCD (词 cí) before U+8BCE (诎 qū). It’s possible the first characters added were approximately by Kangxi order (I don’t actuallt know), but new characters are added every year to the Unicode standard, and don’t follow any specific order. Ordering by the Unicode code-point looks basically arbitrary. I don’t know if it’s still the case, but Windows used this order for characters unless you installed East-Asian support which allowed you to sort by a human-understandable order (pinyin or stroke count) — and which anyone writing and reading Chinese had to install.
Something else is the Unihan database, which is produced by the Unicode Consortium, but isn’t part of the Unicode standard itself. It’s basically a free database of information about Chinese characters, including pronunciation in different topolects and languages (pinyin, Japanese, Korean…) relations between characters (simplified version of, rare variant of, etc,) and the place of the character in the Kangxi dictionary, or the place it would be if it doesn’t appear in the dictionary. It’s basically HanziBrainz. This is a very useful DB is you’re a coder working with the Chinese language, but it’s not “Unicode sorting” — it’s not part of the Unicode standard at all and computer systems can’t use it by default.
In my experience, at least in Mainland China where I lived, studied, and worked for a decade, the most common and the only way most people understand is alphabetical order of pronunciation in pinyin. Stroke count is still used officially, but nobody knows by heart how many strokes each character has, especially if it’s complex, or it’s place in a list of thousands and thousands of characters. There’s only 26 letter in the Latin alphabet, pinyin is taught in schools, it’s how most people input characters, and they actually know it.
Here’s a very recent example of character stroke count order being ordered by the government. Here the goal is that is basically randomised the names, so people can’t get obsessed about the order they appear.
Which brings me to my actual point, because I wasn’t saying that we should stop using pinyin in the sort name field. I just think that the comma means something!
This is what annoys me. It’s not just an arbitrary thing. The comma isn’t decoration, it has been used for centuries to indicate that something is out of place for the order it appears in a list. In “Dylan, Bob” the comma means the name is “Bob Dylan” and the surname was moved to the front. In “Beatles, The” the comma means the name is “The Beatles”, but the first noun (actually the first non-article) was moved to the front. In “Anna Sigríður Þorvaldsdóttir” there is no comma because the names are in the natural order, nothing was moved. In “Ma, Siwei” the comma means the name 马思唯 is “Siwei Ma”, WHICH IS WRONG. That is my only point and I stand by it.