It has been a while since the last update, but I finally found the time to realize my most wanted feature: Support for localized quotes based on the selected language in the release editor.
This means the result of the “Guess punctuation” button for inputs of the form "..." and '...' is now dependent from the release’s (tracklist) language
So far I have only integrated the rules for German and French quotes in addition to the English quotes, which will still be used as the fallback.
Other languages with which I am not familar enough (e.g. are there any pitfalls?) have been skipped so far, but it should only be a matter of adding an additional line of code (at least for most of them):
So I am happy to receive PRs on GitHub or comments in this forum to add more languages which you want to have and are familiar with
I always find these kind of images funny. As a Brit I always was taught to use “double quotes” for a quote. With a less used option of sometimes ‘single quotes’ (mainly if you are quoting inside a quote).
For the sake of your Guess Case I would have thought Double Quotes would make more sense. (i.e. England\Scotland\Wales in your image would be the same as Ireland.)
That was exactly my thought, I also rather associate double quotes with English text. But I have just checked a random selection of British books from my collection and was surprised that all of them use the single quote variant, the only ones with double quotes I have found were North-American editions.
Luckily this does not matter for the userscript since it only converts ASCII single quotes to the specific Unicode single quotes and double quotes to double quotes. I mainly included the map to illustrate that there are many more combinations of different types of quotation marks in other countries/languages.
Yes, I’ve rarely seen these single angle quotes but if they aren’t there in the ASCII version there will be no issue with them. And if you want to use the curly quotes instead of the guillemets you can achieve this by temporarily changing the language to English before you press the button.
I’ve also thought about using (narrow) non-breaking spaces (instead of regular ASCII spaces) to pad the guillemets, but this seems to be pointless at the moment:
I was working with someone on formatting their English university thesis a couple of years back. Something that gets pretty fussy on formatting. And that was still using a normal “double quotes” when quoting text.
Perfect. Artist intent should still rule. I know there are Pixies out there who are determined to ignore artists. If a plugin like this got too controversial it would not get used.
Please don’t go overboard with these substitutions. This is a music database, not a perfect language class. As it is I’m now planning to finish my first plugin as I need to strip this punctuation from my tags as it makes searching with my media player tricky. (Dozens of different hyphens being the biggest headaches) I get why people want to see it on screen, but it causes havoc on my files. And the current Unicode to ASCII plugin is too brutal as I still want to keep stuff like ™. It’s the punctuation I can’t see or type that is trouble in my files.
And every time you say guillemets I just see this:
Nice job on the plugin. I can never keep up with what needs to be used where, so only ever do the apostrophe’s. Will this also highlight the changes? (like the Search and Replace does). I sometimes miss errors that Guess Case adds due to lack of highlight.
I fully agree, but the thing with the non-breaking spaces is more of a display issue, i.e. they will prevent ugly line breaks directly after opening quotes or directly before closing quotes if you have a text like « This quote where the line break occurs right after the opening quote » - But since MBS converts to them into normal spaces there is nothing I can do or need to do.
Maybe one day I will also need your Picard plugin for some of my files, but so far I haven’t found a case where the Picard standard option has failed me, and where I would be digging myself a hole
Yes, it does (and it’s indeed inspired by @jesus2099’s userscript). This is even one of the features which is listed in the description because I can’t live without it:
Highlights all updated input fields in order to allow the user to review the changes.
There are some obscure titles for which I simply don’t know the applicable rule and where I need to see which hyphens had been replaced in order to immediately revert the changes
@aerozol has even created a ticket to integrate this feature into MBS and @Zas has brought in the idea to have highlighting at character level and the possibility to revert changes (also for Guess Case):
It took me a while to notice it, because I was experimenting with the release editor first, but the solution is simple: The two releases are using two different recordings, one of them still uses hyphen-minus, the other one and the track titles already use the correct Unicode hyphen. I have not corrected it, so you are still able to see it yourself.
I tried the following conversion in an annotation field with the “Guess Unicode Punctuation” script:
Before conversion (all three lines use hyphen minus (U002D) from the standard keyboard)
Figure Dash (U2012) Used as a dash within numbers (e.g. 555-1212).
En Dash (U2013) Indicates a range of numbers (e.g. 1989-90).
Hyphen (U2010) Joins words and syllables of a word (e.g. co-operate) and used within dates (e.g. 2022-01-01 or 2021-31)
Figure Dash (U2012) Used as a dash within numbers (e.g. 555–1212). actual result: U2013 expected result: U2012
En Dash (U2013) Indicates a range of numbers (e.g. 1989‐90). actual result: U2010 expected result: U2013
Hyphen (U2010) Joins words and syllables of a word (e.g. co‐operate) and used within dates (e.g. 2022‐01‐01 or 2021‐31) actual result: U2010
I’d vote to keep dates as ISO standard as that is the layout ISO are using, it is what the website is using, and what external apps will expect. I understand wanting to change apostrophe’s, but dates have their own standards.
But for me it is fairly irrelevant as I strip these all out with my own custom plugin.
It is very easy to add new transformation rules to the script, but I’ve decided to keep the scope limited to the most important ASCII to Unicode replacements.
Although it would be nice to also replace inappropriately used Unicode characters with the correct ones, that would potentially lead to many rarely used rules which have to run for every single input field.
I’ve also declined a similar previous request for the same reason and suggested a few possible alternatives there. Admittedly I still haven’t written the proposed “customizable search and replace rulesets” userscript and don’t have the time to do that soon