Strengthening an existing plugin "Non-ASCII Equivalents"

I added 40 new characters to the existing “Non-ASCII Equivalents” plugin without the knowledge of the author Anderson Mesquita.

My characters are: Polish and My others

Here is the code:

# -*- coding: utf-8 -*-

# Copyright (C) 2016 Anderson Mesquita <andersonvom@gmail.com>
#
# This program is free software: you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation, either version 3 of the License, or (at your option) any later
# version.
#
# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
# details.
#
# You should have received a copy of the GNU General Public License along with
# this program. If not, see <http://www.gnu.org/licenses/>.

from picard import metadata

PLUGIN_NAME = "Expanded Non-ASCII Equivalents"
PLUGIN_AUTHOR = "Anderson Mesquita <andersonvom@trysometinghere>, Peter69"
PLUGIN_VERSION = "0.5"
PLUGIN_API_VERSIONS = ["0.9", "0.10", "0.11", "0.15", "2.0"]
PLUGIN_LICENSE = "GPL-3.0-or-later"
PLUGIN_LICENSE_URL = "https://gnu.org/licenses/gpl.html"
PLUGIN_DESCRIPTION = '''Replaces accented and otherwise non-ASCII characters
with a somewhat equivalent version of their ASCII counterparts. This allows old
devices to be able to display song artists and titles somewhat correctly,
instead of displaying weird or blank symbols. It's an attempt to do a little
better than Musicbrainz's native "Replace non-ASCII characters" option.

Currently replaces characters on "album", "artist", and "title" tags.'''

CHAR_TABLE = {
    # Acute     # Grave     # Umlaut    # Circumflex
    "Á": "A",  "À": "A",  "Ä": "A",  "Â": "A",
    "É": "E",  "È": "E",  "Ë": "E",  "Ê": "E",
    "Í": "I",  "Ì": "I",  "Ï": "I",  "Î": "I",
    "Ó": "O",  "Ò": "O",  "Ö": "O",  "Ô": "O",
    "Ú": "U",  "Ù": "U",  "Ü": "U",  "Û": "U",
    "Ý": "Y",  "Ỳ": "Y",  "Ÿ": "Y",  "Ŷ": "Y",
    "á": "a",  "à": "a",  "ä": "a",  "â": "a",
    "é": "e",  "è": "e",  "ë": "e",  "ê": "e",
    "í": "i",  "ì": "i",  "ï": "i",  "î": "i",
    "ó": "o",  "ò": "o",  "ö": "o",  "ô": "o",
    "ú": "u",  "ù": "u",  "ü": "u",  "û": "u",
    "ý": "y",  "ỳ": "y",  "ÿ": "y",  "ŷ": "y",

    # Misc Letters
    "Å": "AA",
    "å": "aa",
    "Æ": "AE",
    "æ": "ae",
    "Œ": "OE",
    "œ": "oe",
    "ẞ": "ss",
    "ß": "ss",
    "Ç": "C",
    "ç": "c",
    "Ñ": "N",
    "ñ": "n",
    "Ø": "O",
    "ø": "o",

    # Punctuation
    "¡": "!",
    "¿": "?",
    "–": "--",
    "—": "--",
    "―": "--",
    "«": "<<",
    "»": ">>",
    "‘": "'",
    "’": "'",
    "‚": ",",
    "‛": "'",
    "“": '"',
    "”": '"',
    "„": ",,",
    "‟": '"',
    "‹": "<",
    "›": ">",
    "⹂": ",,",
    "「": "|-",
    "」": "-|",
    "『": "|-",
    "』": "-|",
    "〝": '"',
    "〞": '"',
    "〟": ",,",
    "﹁": "-|",
    "﹂": "|-",
    "﹃": "-|",
    "﹄": "|-",
    """: '"',
    "'": "'",
    "「": "|-",
    "」": "-|",

    # Mathematics
    "≠": "!=",
    "≤": "<=",
    "≥": ">=",
    "±": "+-",
    "∓": "-+",
    "×": "x",
    "·": ".",
    "÷": "/",
    "√": "\\/",
    "∑": "E",
    "≪": "<<", # these are different
    "≫": ">>", # from the quotation marks

    # Misc
    "ª": "a",
    "º": "o",
    "°": "o",
    "µ": "u",
    "ı": "i",
    "†": "t",
    "©": "(c)",
    "®": "(R)",
    "℠": "(SM)",
    "™": "(TM)",
    
    # Polish
    "Ą": "A",
    "ą": "a",
    "Ć": "C",
    "ć": "c",
    "Ę": "E",
    "ę": "e",
    "Ł": "L",
    "ł": "l",
    "Ń": "N",
    "ń": "n",
    "Ó": "O",
    "ó": "o",
    "Ś": "S",
    "ś": "s",
    "Ź": "Z",
    "ź": "z",
    "Ż": "Z",
    "ż": "z",
    
    # My others
    "μ": "u",
    "õ": "o",
    "ọ": "o",
    "ő": "o",
    "Ž": "Z",
    "þ": "p",
    "Þ": "P",
    "ð": "d",
    "č": "c",
    "š": "s",
    "ș": "s",
    "♥": "-",
    "ã": "a",
    "ŵ": "w",
    "→": "-",
    "・": "-",
    "☆": "-",
    "★": "-",
    "/": ",",
    "*": ".",
    ":": "-",
    ">": "(",
    "<": ")",
}

FILTER_TAGS = [
    "album",
    "artist",
    "title",
]


def sanitize(char):
    if char in CHAR_TABLE:
        return CHAR_TABLE[char]
    return char


def ascii(word):
    return "".join(sanitize(char) for char in word)


def main(tagger, metadata, *args):
    for name, value in metadata.rawitems():
        if name in FILTER_TAGS:
            metadata[name] = [ascii(x) for x in value]


metadata.register_track_metadata_processor(main)
metadata.register_album_metadata_processor(main)

1 Like

I stick with a shorter list on my plugin. And found a few more dashes but the copy and paste didn’t always work for me so I went hex.

CHAR_TABLE = {
    # Hyphens and Dashes
    "\u2010": "-",
    "\u2011": "-",
    "\u2012": "-",
    "\u2013": "-",
    "\u2014": "-",
    "\u2015": "-",
    "\u2212": "-",
    # Apostrophe's - \u2018 \u2019
    "‘": "'",
    "’": "'",
    "\u2023": "'",
    "\u2032": "'",
    # Speechmarks - \u201c \u201d
    "“": "\"",
    "”": "\"",
    "\u2033": "\"",
    # Elipsis - \u2026
    "…": "...",

I think you missed ellipsis out on your list?

Also do the swaps in a few other places

FILTER_TAGS = [
    "title",
    "artist",
    "artists",
    "artistsort",
    "album",
    "albumsort",
    "albumartist",
    "albumartists",
    "albumartistsort",
]

My focus is on swapping out stuff I can’t visually tell a difference on. So just hit punctuation and keep the letters.

This version is unpublished and just for personal use. Realise I now need to update it for the comments.

2 Likes

Ivan, do it. It’s simple. :wink:

I have no need to publish it. Too much faff with setting up accounts. And not massively different to other plugins.

I just thought I’d share a few details with you as you are working on the same plugin and we solved things in a different way. :smiley:

Or if you mean add comments - already done as it is quick fix. Just when I wrote this there wasn’t an editor going round changing the disambigs.

All together now. :wink:

This thread is funny timing if you notice the next post in the forum is about an editor who is trying to remove unicode’s apostrophe’s from the database by edit’s.

I started on this plugin mission when I noticed confusion’s caused in my file-system when I had two folder’s side-by-side which looked identical and took me an age to realise the difference’s were the hyphen’s.

I know.

Hyphen or minus.

I inserted 20 Polish diacritics automatically as my native language.

The 20 special characters “My others” come from my collection of 14,000 MP3 files after using Picard.

My program written in CPP to rename files detects these special characters that distort file names under Windows.

I had to eliminate them. :wink:

But all these special characters are in the MB database.

Someone, for example, entered this character: :heart:
in song titles.

I follow what you are doing as you are trying to strip it all back to just A-Z and literally just ASCII. My aim is very different as I want to see the different characters in my titles.

If a title is in Japanese, I want to see that. Motörhead needs its umlaut. My Icelandic music needs its correct characters. If odd hearts are in their title - then so be it.

Windows handles them all fine in file names. And my players also all now handle these kinds of characters fine on their displays. In the past I had old MP3 players that were way more limited. All long gone. I don’t like forced simplification.

Where I do want to change are the different punctuation which I can barely tell apart by eye. I feel it is as if Unicode got bored and categorised a few too many different things for my needs. It’s music, not a literary lesson. :laughing:

The music is only on real instruments, not with 0 and 1. :wink:

The music is only on real instruments, not with – and ’. :upside_down_face:

Had to change your quote as some of my music is created digitally so full of 0 and 1…

@Peter69 If the changes you have written are in line with, and extend, the original aim of this plugin i.e. convert non-ASCII characters to ASCII counterparts, then I am sure that a PR to the original plugin will be favourably considered.

You do not need to worry about the original author’s knowledge or approval because this plugin is open source and because the plugins repo is managed by other people who will decide on a PR on its merits. And of course the original author will be able to review your changes when you submit a PR and comment on them.

@IvanDobsky Ivan, of course, the same applies to you, but you are experienced enough to know this and to have made a conscious decision not to submit a PR (though you can of course change your mind if you ever wish to do so).

@Sophist my personal plugin code was based on Alan Swanson’s “Hyphen unicode” and serves a different use to “Non-ASCII Equivalents”. As I explained above.

It is solely personal use and the code snippet was posted to show a few more dashes and apostrophe’s that are missing from the other plugin.

I was not expecting anyone to change anything on my behalf. I am just not interested in creating and maintaining plugins. I just thought I’d share to the conversation. Sorry for any misunderstandings.

1 Like

@Ivan - there is nothing to apologise about. Making private changes is (obviously) OK and I had spotted that your use case was different too.

@Sophist

One warning from Codacy. I don’t know how to fix this.

Check warning on line 184 in plugins/non_ascii_equivalents_enhanced/non_ascii_equivalents_enhanced.py

@codacy-productioncodacy-production/ Codacy Static Code Analysis

plugins/non_ascii_equivalents_enhanced/non_ascii_equivalents_enhanced.py#L184

Redefining built-in 'ascii'

Link Github:

https://github.com/metabrainz/picard-plugins/pull/387

No. You’re wrong.

“My other characters” are just garbage in my MP3 tags.

@Peter69 Yet again you have:

  1. Decided to utterly ignore my advice; and
  2. Still want me to help you with your coding.

22 hours ago I recommended that you put this enhancement forward as an update to the existing plugin, but you decided that I didn’t know what I was talking about and went ahead and instead 11 hours ago submitted a PR to create an entirely new “Extended” plugin that is a fairly minor tweak to an existing plugin.

This is like me saying I want to tweak something in Picard, and I do so by publishing Picard Enhanced which does exactly what Picard does but with my additional tweak.

This is a route to madness - an exponentially expanding ecosystem of almost identical functionality, but each slightly different from the other. If I wanted to enhance this plugin further, am I supposed to publish a “Non-ASCII Equivalents Enhanced Extended” version - and what happens when (sooner) we run out of synonyms for “Enhanced” OR (later) it literally takes more characters for the name than there are atoms in the Universe?

But then, despite ignoring my advice, and despite the abuse I received from you last time I tried to help you, and despite me making it clear that I am not willing to help you any further as a consequence of this, you have the darned cheek to ask me for my help to fix a Codacy issue.

I am always willing to help budding plugin authors except when they act immaturely, ungratefully, unintelligently, disrespectfully, abusively, and unapologetically so.

@Sophist

OK. So remove PR Enhanced and overwrite the existing plugin?

What if you don’t like my version of the code?

Do you know this website? https://symbl.cc/en/2018/

If the Picard repo maintainers (and I am not one of them) don’t like your version of the code, the PR will not be merged regardless of whether it is a new plugin or an enhancement of an existing one.

Unnecessarily duplicating an existing plugin just to make a slight enhancement will just be an extra reason to reject a PR.