Correct hyphen: Unicode HYPHEN or HYPHEN-MINUS

On dates, not date ranges, but dates (2021-05-24), most are hyphen minus. I’m noticing that some scripts, i.e. script to help out with proper hyphens, apostrophes, etc. and the one to devise dates from titles, etc for comments are correcting those hyphens in the dates to true hyphens. Is it proper to use true hyphens in dates formatted like this?

2 Likes

Yes, it is proper to use true hyphens.

From ISO-8601: In representations the following characters are used as separators: [-] (hyphen): to separate the time elements “year” and “month”, “year” and “week”, “year” and “day”, “month” and “day”, and “week” and “day”;

In an environment where use is made of a character repertoire based on ISO/IEC 646, “hyphen” and “minus” are both mapped onto “hyphen-minus”

5 Likes

Thanks! I couldn’t find that.

Is this text extracted from when you buy the ISO 8601 norm document?
I found a so called ISO.org website but everything (detailed docs) seems for sale at a store.

1 Like

It is from the draft version linked at wikipedia, Wayback Machine

I think while it probably isn’t a perfect match for the final standard it is likely close enough for this detail.

2 Likes

But even the MB Website used HYPHEN-MINUS for the date representation:
image

1 Like

I submitted a ticket for this.

ISO themselves don’t even use hyphen.

3.2 Symbols

3.2.1 General

Representations and expressions specified in this document make use of the symbols listed in 3.2.2 through 3.2.6.

Representations (also referred to as “format representations”) give rise to expressions for dates, times, intervals and recurring intervals.

EXAMPLE 1

[YYYY] is a format representation for a calendar year, where each Y is to be replaced by a single digit creating an expression, for example ‘1985’.

EXAMPLE 2

The date and time representation [YYYY][“-”][MM][“-”][DD] gives rise to the expression ‘2003-02-10’ which identifies 10 February 2003.

To clearly separate date and time representations from the text, punctuation marks and associated symbols used to describe them, the following symbols are used to demarcate boundaries of expressions and representations in this document:

  • — single quotation marks enclose expressions (for example ‘1985’); in some cases they are omitted to reflect the actualities of the examples; they are omitted in Clause 5;

  • — all individual tokens that are part of a representation are contained between the open and close bracket symbols (“[“ and “]”);

EXAMPLE 3

For the date and time representation [YYYY][“-”][MM][“-”][DD], [YYYY], [“-”], [MM], [“-”], and [DD] are individual tokens enclosed by brackets.

  • — when double quotations marks enclose a string within a representation, that string is literal and becomes part of any expression of that representation.

EXAMPLE 4

The representation [i][“Y”] represents a positive integer followed by the symbol “Y”. ‘12Y’ meaning “12 years” is an expression of that representation.

Quotation marks and brackets are not part of the expression or representation itself and shall be omitted in implementation.

All characters used in date and time expressions and representations are part of the ISO/IEC 646 repertoire, except for “hyphen”, “minus” and “plus-minus”. In an environment where use is made of a character repertoire based on ISO/IEC 646, “hyphen” and “minus” should be both mapped onto “hyphen-minus”.

In this excerpt, you can see the use of “fancy” punctuation like “ (U+201C), ” (U+201D), ‘ (U+2018), ’ (U+2019), and — (U+2014)… but never ‐ (U+2010). They actually use hyphen-minus themselves.

3 Likes

Whichever is used, I’d like for them to be non-breaking hyphens. It bugs me when I see a table and the dates look like:

2022-
01-
04

2 Likes

A thought I had in a recent edit note:

Languages I know have usually used slash or dot to separate date elements.

ISO 8601 introduced hyphens for computer reasons (filenames).

So for me ISO 8601 is more tech than language and therefore it’s why I think the computer hyphen-minus is the most suited character in this spirit of data/filename/coding/programmability/computer inter-compatibility.

1 Like

From note above: Correct hyphen: Unicode HYPHEN or HYPHEN-MINUS - #62 by Hawke

Update: I don’t care about which one to use, but it’d be nice if an agreement was reached on this. I thought it had been settled to use true hyphen, but I didn’t see the follow-ups after the post above. The punctuation script changes them to true hyphens, so that’s what I’ve been using. I always thought it looked wrong though because I’ve always seen hyphen-minus everywhere.

2 Likes

I changed everything related to Bruce Springsteen to unicode hyphen…

1 Like

I think they just mean hyphen as the high level meaning of the word (in opposition to slash or period), but they are not advocating any exact character codepoint, like U+2010 HYPHEN.
If you copy this sentence from the given ISO 8601 source itself, it’s the U+002D HYPHEN-MINUS that they use between square brackets:

[-] (hyphen): to separate the time elements “year” and “month”, “year” and “week”, “year” and

As @yindesu said, they always use HYPHEN-MINUS character to mean hyphen.

And MBS, for the moment, has also chosen to use HYPHEN-MINUS for dates (2013-07-14).
While MBS does use smart character EN DASH for spans (tracks 2–21).

ISO chose “-” instead of the natural “/” or even “.”, for file name computer reasons so I think it’s good to keep it easiest possible for programs like userscripts to parse dates, etc. to stick to the plain HYPHEN-MINUS, that will always be typed by new editors, and can never get down to 0% usage.

3 Likes

I agree with the logic of what @jesus2099 says - the ISO standard for dates is about standardised dates and not prettified display.

On my own system I want my search to be consistent and learnt how to write Picard plugins specifically to get consistency in tag naming. Especially on dates.

1 Like

I think it’s worth pointing out that in that document the hyphens in words are also encoded by U+002D though (e.g. “non-governmental”, page 5). This is very normal: it is even the same in the Unicode Standard where they say to prefer U+2010. (This could be a limitation of the software used to produce the document: it looks like TeX to me, and that will output U+002D even if you feed it U+2010 as input.) If we’re trying to imitate usage in other sources, I think it would be necessary to find one that uses U+2010 in words but still U+002D in dates (unless we are also re-opening the debate about which hyphen to use in words on MB).

2 Likes