Dutch IJ vs IJ?

Which is preferred in MB for the Dutch I-J diagraph? Wikipedia says “IJ” is deprecated, so I imagine it should be entered as two characters “IJ”.

IJ (digraph) - Wikipedia

Example usage: this Tangerine Dream work:

Song “IJ” - MusicBrainz

1 Like

Interesting question!

This looks to me like a question which is partly about Dutch language usage, and partly about Unicode text encoding standards. I know very little about Dutch, so I will let MusicBrainz editors who know the language speak to that. I do, however, know a fair bit about Unicode text encoding.

MusicBrainz text fields, such as a Track Title or Recording Title, are strings of plain text. “Plain” means, there is no font format or language identification applied to the text. The Unicode standard is quite clear that the proper way to represent the Dutch IJ digraph is as two characters:

But we should not use the “IJ” or “ij” ligatures, U+0132 LATIN CAPITAL LIGATURE IJ or U+0133 LATIN SMALL LIGATURE IJ. As the Wikipedia article states, “they are included for compatibility and round-trip convertibility with legacy encodings, but their use is discouraged.”

If we want IJ ligatures in Dutch language titles to be displayed with proper Dutch ligatures, we should have the text use the separate “IJ” characters, then we should write the MusicBrainz web app to mark these titles as being in the Dutch language, and request use of Dutch-compatible fonts which include the appropriate ligatures.

I will say that I know one expert on language, fonts, and Unicode, a native speaker of Dutch, who is resolute that IJ should be represented as a single character rather than as an I, J digraph. His opinion is well-informed, but he has not yet won the argument with the rest of the Unicode community.

3 Likes

I’m Dutch and I always type the IJ as two separate characters. There are only a couple of fonts where that combination looks really bad (including the font used on MusicBrainz, unfortunately :stuck_out_tongue:).

2 Likes

If IJ was easy to type as a single character, would that be preferable?

The places where it might make a difference are capitalization (words beginning with IJ often get incorrectly rendered “Ij”, while a single character IJ should be capitalized properly) and alphabetizing (IJ separate would get sorted in the middle of the I’s, while as I understand it really should be down by Y (maybe between Y & Z ? – I’m not Dutch and so don’t really know).

As an American with a decidedly boring character set, I love funky characters like IJ, ß, Þ, etc.

1 Like

That is a question for each editor to answer themselves. We all have our different keyboards, OS’s, input methods, preferences, etc.

But data input is one time. Data usage is forever. We should concentrate on getting the right data into the field. A convenient method for entering the data is a bad reason for accepting the wrong data. In this case, I believe that separate I,J or i,j characters are the correct data to arrive at, rather than the single digraph ligature characters.

Both capitalisation and alphabetising (collation) are language-dependent rules. Long experience in text encoding tells us that you cannot get these right for all languages using a single algorithm based only on character codes. You have to use capitalisation and collation algorithms which are tailored according to human language, and often application rules (e.g. postal addresses and phone books may sort differently than corporate databases). The ICU Collation overview is a peek into this rich world.

MusicBrainz is not nearly as far down this road as, say, LibreOffice. I would love us to get there. But for now, the best way a plain-text database field can support these language-dependent algorithms is by getting the underlying data representation right.

I too was born in the USA, I am a native speaker of North American English, and I love the characters beyond those used in plain text of my mother tongue. But in my opinion, the best way to appreciate those characters is to let them be used correctly for the benefit of the languages to which they contribute, not to include them simply as amusing baubles.

Also, we English native speakers should not sell our own orthography short. Proper, publication-quality English text layout and typography is complex and beautiful, and decidedly not boring. See for example the W3C draft Requirements for Latin Text Layout and Pagination.

2 Likes