Acute (and grave) accents as apostrophes

Apparently, it’s relatively common for acute and grave accents to be (mis)used as apostrophes. Some examples I just saw: “Devil ` S Tango”, “Never Trust ` Em”, “Zoomed Out We ´ Re Ants”, “It ´ s alright (extended mix)”. Guess case doesn’t understand this (to be honest neither do I!) so it ends up as shown here.

There’s a ticket asking for these to be treated as apostrophes and converted as such with guess case. That said, I’ve seen other cases where these are meant as quotes (“Extract From `the Great Hunger`”), and plenty of cases where they are meant as just funny symbols in a nonsensical title. So I’d like some opinions on what to do here.

I can see a bunch of options:

  1. Do nothing since they are not that common
  2. Make guess case treat them as the equivalent of an apostrophe (lowercase after, don’t add a space) but leave them untouched otherwise
  3. Replace them with a proper Unicode apostrophe character ()
  4. Replace them with an ASCII apostrophe (').

Open to any other ideas if someone has good ones. I’m afraid of messing up with all the titles such as “~ ` ~ ` ~ ` ~” or “メチャキュン♡サマー( ´ ▽ ` )ノ” so I’m not super excited about changing the symbol automatically, but maybe I’m overthinking it.

5 Likes

It’s not exactly that.

In fact these weird examples are here because of Guess Case, and the ticket is for removing that Guess Case bug.

For example, Guess Case does convert “Zoomed Out We´re Ants” (using acute) to “Zoomed Out We ´ Re Ants”.

But of course, using agrave and acute instead of either typewriter quote or left/right curly quotes, is a user mistake.

1 Like

No it’s a good concern.
My fix search does filter out Japanese texts, for this reason.

We could use regex to test if it is preceded and followed by only letter or space or end or begin.

I think, currently, Guess Case is not replacing typewriter quotes with curly apostrophes, so maybe not.

Maybe that, yes… :thinking:

There’s a guess unicode punctuation userscript that handles various cases of apostrophes already, that might be a better point to handle this, especially if it was integrated into the site itself. (Though the unicode apostrophes themselves are a contentious topic for some, e.g. https://www.last.fm/music/Old+Man’s+Child/+wiki .)

4 Likes

I’d vote a 3 (or 4). At least guess case getting something closer to what it is support to be is better than leaving those weird options.

@PacCeggowk9oc your link has been knackered by the dumb forum. This forum software forces all ASCII ’ to be Unicode ’ - which is clearly wrong for your link. Funny how this forum is a good example of substitution gone dumb.

Actually double dumb as the “.” has also got added to the link. :joy:

The link is supposed to have the unicode ’ and someone there is complaining about it being wrong. I didn’t pay enough attention to the “.”, my browser showed it as separate from the link, but clicking it got added anyway.

1 Like

As jesus2099 said - sometimes there is a “special” apostrophe hiding in a track list, and guess case will add the spaces. Hopefully I’ve noticed every time it’s happened :person_shrugging:

That said sometimes it’s useful that guess case derps it up, because I wouldn’t notice the errant symbol otherwise!

I know this would be new behavior, but it could be nice to simply highlight a symbol that is very likely to be in there by accident? For instance, a yellow background. Maybe always, maybe after pressing guess case. Then the editor could decide, and they are visible.

4 Likes

Unfortunately, I think that this is impossible to do without gnarly hacks like drawing colored boxes behind the input or switching to contenteditable, which may cause lots of problems. (I think I remember you suggesting it on a PR a while back too. It’s still a nice idea but still probably not feasible. :slight_smile: )

2 Likes

Hmm, I think I remember - the issue is with highlighting individual characters, as opposed to the whole box, right?

I think it would also be okay to subtly highlight the whole box and add a tooltip or message at the bottom.

But I don’t want to get too into the weeds with it, it was just a additional thought :slight_smile:

1 Like

Yeah, that’s correct. Highlighting the whole box would be possible, and it might also be possible to highlight individual characters in the preview on the edit note screen (although it might be confusing there since the differences between old and new text are already highlighted).

1 Like

It looks like all of the cases you listed would be handled properly if the regex considered the context of the character.

  1. If it’s between two alphabetic characters (including any diacritical variations), replace it with a Unicode apostrophe
  2. If a single occurrence is preceded by whitespace and followed by an alphabetic character (or vice versa), replace it with a Unicode apostrophe
  3. If it occurs in a pair, and the first occurrence is preceded by whitespace and followed by an alphabetic character, and the second occurrence is preceded by an alphabetic character and followed by whitespace or the end of the string, replace it with single curved quotation marks.
  4. In all other contexts, do nothing.

Even though the match pattern is a lot simpler if the replacement string is an ASCII apostrophe, I don’t want to advocate that because of cases where a word with an ASCII apostrophe occurs inside a quoted title that also uses ASCII apostrophes as single quotation marks. Plus, microtypography is just nicer.

1 Like

First occurrence is preceded by whitespace or start of the string.

But the problem is when there is combination of all these.

There is a good userscript for that.
It would be enough if Guess Case would replace agrave and acute with ASCII typewriter straight apostrophe.
Then the userscript can be used behind, like I often do:

1 Like

Yes indeed! Well-spotted.

But it would be possible (and practical) to use Unicode when that can be done with certainty, and to use ASCII when we can’t guess whether a grave accent is used as a quotation mark or as an apostrophe.

Here’s a hypothetical example:

`Tain`t Nobody`s Biz-ness If I Do / Ain`t Misbehavin`

By using look-ahead and look-behind to find alphabetic characters, a first pass can correctly replace all the grave accents with , giving us

`Tain’t Nobody’s Biz-ness If I Do / Ain’t Misbehavin`

But since a second pass can’t determine whether the pair that remain at the beginning and end of the string are quotation marks or if they’re apostrophes of elision, we can either leave them alone or replace them with the ASCII '.

All typewriter rather than a mix of typewriter and curly, looks more “professional.” :wink: