English Spelling Vs Japanese Spelling

I mostly agree. You know that the DB already allows for what you want, since aliases can have sort names as well?

I don’t think we can give a useful result for this (esoteric, IMHO) request, until transliterations are universal. You’d want to sort 相川嘉男 before Пётр Ильич Чайковский because A comes before T, but if you’re missing Aikawa-san’s transliteration, you’re bound to sort him after all the artists where you do have them.

I am proposing that MusicBrainz include code to transliterate names on demand, so that there is no need to add aliases to each entity for each target language.

And, I probably wasn’t being clear when I wrote:

What I mean is the English conventions for sorting a list of strings, some of which are non-Latin. The English convention I expect is: Latin-script strings first, in English-language sorting order; then all the non-Latin strings, grouped by script and in some reasonable order within script; e.g. the Cyrillic strings in Russian sort order, then all the Greek strings in Greek sort order, then the Japanese strings in kanji character code order, etc.

The English convention i expect for sorting a list of non-Latin strings does not call for sorting by the English Latin-script transliteration, but displaying the original non-Latin string.

That would be absolutely wonderful. But I’m dubious about it being tractable. I would love to be proven wrong, however. That would require extensive language and script expertise to write and could degenerate into an N x N mapping (where N is the number of languages supported). Perhaps some of the automatic translation services out there could be employed. Once transliteration is available, I don’t think we need to then have special collation rules for the transliterated names. Just let them be listed in their proper order. As an English speaker, I certainly wouldn’t want to see Tchaikovsky at the end or anywhere else than in the T’s.

After sending this, I did some poking around for Translation API’s. There may be something there. But the sense I had, and still have, is that automatic translation is a non-trivial undertaking and I wouldn’t advise doing it ourselves. If we want to do it, I think the best bet would be to leverage the hard work of others through an API/Service.

Point 1: I think the idea is for MB to eventually move to a BB-style “there are no names, only aliases, some of which are marked as primary” model.

Point 2: Automatic transliteration does not remove the need for a sort name field (as in the posts above: with Kanji-based Japanese names, you’d want a katakana or hiragana version as sort name).

Point 3: Even with a sort name field present and transliteration, that does not necessarily produce a usable value for the target locale’s name sorting rules. An example might be the Dutch name sorting rules, where initial particles in last names (typically “van”, “de” and “van der”, but others exist) are not considered for the sort order (they are placed after the first name).
For example, the sort name in Dutch for “Ludwig van Beethoven” would be “Beethoven, Ludwig, van”. To make this extra fun, this only applies to Dutch in the Netherlands. In the Flanders region of Belgium, the “van” is considered a fixed part of the name. So there the sortname should be “vanBeethoven, Ludwig” (note: no space, since for Dutch, it is recommended to ignore spaces and apostrophes in last names).

3 Likes

Good news, then. The Internationalization Classes for Unicode (ICU) project applied the extensive language and script expertise necessary, and released their library as free software. ICU transliterates your choice of input strings according to instructions given by “Transliterator Identifiers”. You can chain transliterations together.

Take a look at the ICU Transform Demonstration. Under “Insert Sample”, select “Names”.

In “Compound 1”, type Latin; Title . You get results like:
정, 병호 → Jeong, Byeongho
たけだ, まさゆき → Takeda, Masayuki
Догилева, Татьяна → Dogileva, Tatʹâna
Καφετζόπουλος, Θεόφιλος → Kaphetzópoulos, Theóphilos

Next, in “Compound 1”, type Latin; Cyrillic . You get results like:
정, 병호 → йеонг, быеонгхо
たけだ, まさゆき → такеда, масаыуки
Догилева, Татьяна → Догилева, Татьяна
Καφετζόπουλος, Θεόφιλος → Капхетзо́поулос, Тхео́пхилос

Not perfect for MB needs immediately, but a promising sign that the transliteration problem may be tractable.

1 Like

Agreed, correct name sorting needs to allow choice among many culturally-appropriate options. The good news is, we are not the only project that wants to sort lists in culturally-appropriate ways.

The ICU project also has a collation service. It lets you program in sorting rules that you want. I don’t see that they already have tailorings for Dutch name sorting, and Belgium-Flanders name sorting, but the architecture exists. And maybe someone else has come up with tailorings which we could re-use.

Hi & thanks for the replys. Here are a few examples of the artists I am on about

http://musicbrainz.org/artist/5cde54ba-59d2-4c4f-adb8-1a093c2ba0af http://vgmdb.net/artist/121

http://musicbrainz.org/artist/0fcb6831-ff9a-47ff-a509-45f267aa8c21 http://vgmdb.net/artist/161

Cheers Rob

Very good stuff !
How did you test it ?


EDIT: Oh, OK I see your test link now ! :smile:


Notice that MB in Japanese should not sort the same way as just transliterated.

They sort this way (in music stores for instance), you would expect those shelves:

  • あ where artists are ordered as あいうえお
  • か where artists are ordered as かきくけこ
  • さ where artists are ordered as さしすせそ
  • た where artists are ordered as たちつてと
  • な where artists are ordered as なにぬねの
  • は where artists are ordered as はひふへほ
  • ま where artists are ordered as まみむめも
  • や where artists are ordered as やゆよ
  • ら where artists are ordered as らりるれろ
  • わ where artists are ordered as わ maybe some を but I doubt
  • ん but as を above, I don’t remember such a shelf BTW

They won’t trim “The” from band names and they won’t change the order of the given and family names for the sake of ordering (kind of no sort name):

  • 浅井健一 will be stored in Kenichi Asai あ shelf (さい けんいち)
  • AJICO will will be stored in AJICO あ shelf (ジコ)
  • 陰陽座 will be stored in Onmyōza あ shelf (んみょうざ)
  • リスタルキング will be stored in Crystal King か shelf
  • The Beatles will be stored in The さ shelf (・ビートルズ), :warning: not in Beatles は shelf
  • 鄧麗君 will be stored in Teresa Teng た shelf (レサ・テン is her Japanese artist name)
  • David Bowie will be stored in David た shelf (ビッド・ボウイ), :warning: not in Bowie は shelf

The order above is realistic, notice that, for instance, David Bowie would come after tata because we compare David’s で (:warning: not Bowie’s ぼ) and tata’s た, both in same た shelf.
And then it’s neither Bowie nor David then tata, it’s TAta then ÏVID: だちつづてとど

Foreign artists usually have given name first and Japanese artists usually have family name first. And they don’t change it for the sort order.

I tried a few strings on the page you linked, and was underwhelmed. First, transliterations depend on the target language, not the target script – I could not set a destination language, anywhere.

In my tests the result was recognisable, but never identical to the popular transliteration in any language I know.

My opinion is: a 20% solution is easy, a 80% solution very, very hard.

1 Like

The ICU solution is not the only one out there. The best solutions in my opinion aren’t free. But there does seem to be some free solutions. But in all case, including the very best ones are no where close to a 100% solution. Regardless of what automatic transliteration/automatic translation approach is used, it should be treated as the solution of last resort and we should rely on transliterations/translations provided by people. Therefore I think the solution will be something like the following:

  1. Provide the means to have language/culture specific translations of the base name.
  2. Provide the means to have language/culture specific versions of the collation name supplemented with language/culture collation rules.
  3. In the absence of manually provided translation/transliteration; employ the automatic facility.

I firmly believe that a 100% automated solution is not possible. I think we’d be lucky to get to a 50% solution. When I have examined the professional solutions; they rely on a set of rules backed up by a large and evolving database of exceptions.

I agree. If you want a good sort in Japanese, then start with the name in Japanese (foreign names transliterated into Japanese), then sort with Japanese rules. Don’t start by transliterating Japanese names into Latin script, then sorting the Latin script with English language rules. Some Japanese names just can’t be sorted right without a “reading” or “sortname” field in Japanese kana. Some foreign names (e.g. “The Beatles”, many in fact, will need a Japanese-language alias to specify the expected Japanese for the name (e.g. ザ・ビートルズ).

Bear in mind that the ICU Transform Demonstration demonstrates transliteration, not sorting. A different part of ICU provides sorting.

1 Like

Oh, no-one is promising a 100% automated solution. But my overall point is that what we have in MusicBrainz now is something like a 1% automated, 5% manual solution. The original poster was an English-language user of MusicBrainz wondering why MB would provide names of Japanese artists only in Japanese language, not in Latin transliteration. My point is that this is a reasonable expectation, that MusicBrainz should have aspire to become multilingual enough to satisfy it.

Of course, current MusicBrainz isn’t there yet. It is fantastic in many ways, but hasn’t yet become fantastic in multi-lingual service.

Yes, I agree. I expect that MusicBrainz will eventually be fully multi-lingual, and it will probably behave as you have described.

But I argue that current MusicBrainz does a tiny fraction of #1, and approximately none of #2 and #3.

There exist components, like ICU, that can provide big parts of the solution. Multilingual MusicBrainz is a reasonable aspiration.

My motivation in participating in threads like this are to ensure VideoBrainz starts off with issues like this handled from the start. This thread and threads like them have been very instructive. VideoBrainz has the benefit of not having an existing database and schema to migrate. I very much aspire to have a multilingual/multicultural VideoBrainz from the start.

2 Likes

Outstanding!

I haven’t been involved in VideoBrainz at all so far. However, my day job is software engineering consulting on multilingual websites ang internationalization of software products. If it be helpful for me to get involved, please let me know how.

I have an aspiration to propose a design and plan to make MusicBrainz fully multilingual. Maybe a multilingual VideoBrainz first would be a good role model for MusicBrainz.

2 Likes

Hardly anyone has been involved in VideoBrainz thus far. It’s only good ideas on paper at the moment. Given the current level of involvement it probably won’t be alpha for a while yet. I’m taking my time to make sure the schema is right and to learn all the lessons from MB that I can. At this point I think it’s a bit premature to ask for your particular expertise; but keep your eye out for VideoBrainz threads. They’ll be coming.

You should keep BookBrainz in mind as well. It’s much further along than VideoBrainz. It’s currently in the alpha stage and I believe they intend to make it multilingual early on as well. There is a general desire to collaborate and share our efforts between the two projects.

2 Likes

Yes, particularly given @Zastai details.

1 Like

I only skimmed this topic, but there seems to be some misconceptions about the “Sort Name” in the database. This is a useless column that ought to be treated as deprecated, and it’s actually already been dropped from some top-level MusicBrainz objects (like the MB Label).

What editors should actually be caring about is the locale-primary alias’s sort name. This is where locale sorting should be defined. Here’s an example of where you’d define the Japanese sort name (as well as the English name and English sort name): http://musicbrainz.org/artist/b539e453-c4fe-47e3-8a07-8517eac74429/aliases

(Your version of Picard may be broken, but that’s a different discussion)

3 Likes

Maybe there should be some way to enter aliases from the “Add Artist” screen. I doubt many people are even aware of the aliases tab. And when you are entering an artist, you often have different locale names at the ready anyway.

4 Likes

I agree;

1 Like

I’ve added my vote. :slight_smile: