Torc
November 4, 2016, 5:06am
1
Is there a list of everything affected by the Convert Unicode punctuation characters to ASCII option in Picard? The options doc gives a couple examples, but not everything.
1 Like
I think this is it (not sure if “here have the code” counts as an answer but it seems reasonably clearly laid out there):
u"\u2033": u"”", # DOUBLE PRIME
}
_re_additional_compatibility = _re_any(_additional_compatibility.keys())
def unicode_simplify_compatibility(string):
interim = _re_additional_compatibility.sub(lambda m: _additional_compatibility[m.group(0)], string)
return unicodedata.normalize("NFKC", interim)
_simplify_punctuation = {
u"\u013F": u"L", # LATIN CAPITAL LETTER L WITH MIDDLE DOT (compat)
u"\u0140": u"l", # LATIN SMALL LETTER L WITH MIDDLE DOT (compat)
u"\u2018": u"'", # LEFT SINGLE QUOTATION MARK (from ‹character-fallback›)
u"\u2019": u"'", # RIGHT SINGLE QUOTATION MARK (from ‹character-fallback›)
u"\u201A": u"'", # SINGLE LOW-9 QUOTATION MARK (from ‹character-fallback›)
u"\u201B": u"'", # SINGLE HIGH-REVERSED-9 QUOTATION MARK (from ‹character-fallback›)
u"\u201C": u"\"", # LEFT DOUBLE QUOTATION MARK (from ‹character-fallback›)
u"\u201D": u"\"", # RIGHT DOUBLE QUOTATION MARK (from ‹character-fallback›)
u"\u201E": u"\"", # DOUBLE LOW-9 QUOTATION MARK (from ‹character-fallback›)
u"\u201F": u"\"", # DOUBLE HIGH-REVERSED-9 QUOTATION MARK (from ‹character-fallback›)
7 Likes