.. -=- Halfwidth and Fullwidth blame -=-

= Whence Unicode's Halfwidth and Fullwidth forms? =

Unicode's Halfwidth and Fullwidth Forms block encodes equivalents to other Unicode characters, 
differing entirely in being intended to display wider or narrower than their compatibility-normalised
equivalents.&ensp;This usually harks back to East Asian multiple-byte character sets (MBCSs), which
combined a single byte character set such as ASCII or another ISO 646 variant (a SBCS) with a
(possibly reärranged) double byte character set (DBCS) often of separate orgin.&ensp;This resulted
in duplicate encoding of some characters, where the single byte versions often rendered narrower
than the double byte ones (duospace typesetting).

That being said, the origin of the individual characters is not always obvious.&ensp;Being that not
everyone is motivated to read code pages if they don't have to for some reason, I have decided to
write this up.

So… blame for Halfwidth and Fullwidth forms, by subheading?

(Most characters in this block were present since Unicode 1.0.0, exceptions are noted below.)

== Fullwidth ASCII variants (U+FF01&ndash;FF5E) ==
Pretty much every East Asian MBCS is responsible for at least some of it.&ensp;E.g. EUC-CN (GBK,
its superset, didn't exist at the time), its underlying GB character set encodes the entirety of an
ISO 646 variant in row 3 (with Yuan/Yen not Dollar, Overline/Macron not Tilde) and the remaining
ASCII (Tilde and Dollar) in row 1.&ensp;Unicode layout here copies ASCII though.

JIS X 0208 row 3, by contrast, only includes the letters and numbers, though most of the ASCII
punctuation is included in row 1.&ensp;It is, however, missing straight quotes (added in some
vendor extensions), has a "wave dash" not a wide tilde for all that they differ (Microsoft still
treat it as a wide tilde), and has separate hyphen and minus sign assignments (Microsoft treat the
minus sign as a wide hyphen-minus).&ensp;JIS X 0213 didn't exist at the time.

&ensp;:kome:&ensp;The fullwidth tilde is the fullwidth form of the ASCII tilde, which is in itself 
an ambiguously defined character from the pre-Unicode ASCII era.&ensp;So in principle, the 
fullwidth tilde could be a mathematical tilde operator centred within an em-square, a tilde accent 
in the upper centre of an em square, or (yes) a wave/swung dash.&ensp;In practice, it is a wave 
dash, matching the intent of most of the East Asian character sets it retains compatibility with,
with the notable exception of South Korean Wansung.

&ensp;:kome:&ensp;A separate wave dash codepoint was added to Unicode for use for the JIS X 0208
character (which is officially not considered a tilde).&ensp;Its reference glyph was incorrectly
mirrored, however, and Microsoft initially took that as gospel (a) not using it as the mapping of
the SJIS character and (b) using the mirrored glyph in fonts for Windows XP.&ensp;This glyph error
was fixed in fonts introduced in later versions of Windows (although the existing fonts such as
MS PMincho remained the same) and, subsequently, in later versions of the Unicode charts.

&ensp;:kome:&ensp;JIS X 0213 displays 01-02-18 as a tilde accent, and maps it onto the ASCII tilde 
in Shift_JIS and the fullwidth tilde in EUC-JP.&ensp;In the latter case, mapping onto the Unicode
small tilde (which is found in Windows-1252 and is necessarily a spacing accent) might be more
practical in reality, in order to ensure a distinct glyph from the wave dash (and to avoid
colliding with Microsoft's mappings).&ensp;To the best of my knowledge, no{\"o}ne does this in 
JIS, though.&ensp;South Korean Wansung is a different story, where 02-06 variously gets mapped to 
the small tilde or the fullwidth tilde depending on vendor.

== Fullwidth brackets (U+FF5F&ndash;FF60) ==
"White" (hollow or doubled) parentheses.&ensp;Not needed for round trip reasons, but included due
to differing formatting / rendering requirements for East Asian and mathematical versions of the
graphemes in question.&ensp;Added in Unicode 3.2, making them the most recent characters to be
added to this block.

== Halfwidth CJK punctuation, Halfwidth Katakana variants (U+FF61&ndash;FF9F) ==
Shift_JIS is ultimately responsible.&ensp;Also supported in most EUC-JP and some extended
ISO-2022-JP, probably for Shift_JIS round trip reasons, given that they are not present in standard
ISO-2022-JP (though the extensions are still just taken from ISO 2022, and are present in some
standardised supersets of ISO-2022-JP), and that they are still two bytes (like the fullwidth ones)
in EUC-JP.

== Halfwidth Hangul variants (U+FF9F&ndash;FFDC) ==
Used in IBM-1364 (and its subset IBM-933), an IBM EBCDIC code for Korean including compatibility
jamo but also allowing locking shifting to a double byte host code (actually Johab, only with a
non-syllable (non-hanja and hanja) area with what would have been ASCII lead bytes, a private use 
area in place of what would have been the non-Hangul area in the ASCII-based Johab, and with 
IBM-933 not including all possible syllable clusters, while IBM-1364 does).&ensp;Also 
in IBM-944, an old predecessor to IBM-949 which used a proprietary 94×94 (or 123×94 with extensions) plane rather than the KS X 1001 one, and used the trail byte range and two-rows-per-lead-byte format from Shift JIS (differing only in not skipping the 0xA0–DF lead bytes, since the single-byte codes are in 0xC0–FC instead).&ensp;Note that the layout of the jamo, including reserved 
codepoints, matches the layout in IBM-944, similarly to how the layout of the kana matches 
Shift_JIS.

The layout in IBM-944 appears to be basically a transposition of the Hangul consonant and Hangul
vowel polygons from IBM-933 onto a 8-by-32 extended ASCII grid, resulting in otherwise inexplicable
positioning of the empty space.&ensp;That being said, the original KS C 5601-1974 (before Wansung)
is basically the letter subset of the second half of IBM-891, which is a subset of IBM-1040, the single-byte set of 
IBM-944.&ensp;I'm not entirely sure which came first.

&ensp;:kome:&ensp;EBCDIC being EBCDIC, things tend to be laid out in polygons drawn on a 16-by-16 
grid rather than in ranges.

&ensp;:kome:&ensp;Noting here because I don't have a better place to put it: the Hangul Filler 
serves two purposes: firstly, it marks the start of a jamo composition sequence in KS X 1001 
(whereas the KS X 1001 jamo will otherwise appear as standalone characters, like the correponding 
Unicode compatibility jamo but unlike the regular Unicode jamo); secondly, it stands in for an 
unused position in such a sequence (e.g. if there is no final consonant, the filler will be 
inserted in its place).&ensp;The compatibility jamo in Unicode itself (including the filler) stand 
for isolated characters, the sequences may be processed by the decoder (but often aren't, e.g. the
UHC code (Windows-949, WHATWG EUC-KR) does not, as all the supported Hangul syllables are
provided precomposed anyway due to extensions).

== Fullwidth symbol variants (U+FFE0&ndash;FFE6) ==
These are present for a variety of reasons.

- Yen sign and overline: have both single byte and double byte representations in Shift_JIS and in
  some variants of EUC-JP.&ensp;Why the compatibilty mapping is to Macron, when it's used as the
  fullwidth form of Overline in these contexts I do not know, and guess it doesn't matter all that
  much in practice (the Unicode code chart notes "sometimes treated as fullwidth overline").

- Won sign: has both single byte and double byte representations in, for example, at least some
  variants of EUC-KR.
 
- Pound, cent and not signs and broken vertical bar: used for the standard representations of those characters in encodings
  where the normalised mappings are used for IBM's single byte extension characters (e.g. IBM-942
  variant of Shift_JIS).&ensp;Also used in encodings trying to retain compatibility with them (e.g.
  Windows-932 variant of Shift_JIS, modified from IBM-932 which is a subset of IBM-942; further,
  MS-932 / Windows-932 is in turn copied by OSF's eucJP-ms and by WHATWG's Shift_JIS and
  EUC-JP).&ensp;The cent sign, not sign and broken vertical bar are very common inclusions in EBCDIC
  (both single-byte and double-byte), hence IBM encodings (including non-EBCDIC ones trying to
  support the repertoire of the EBCDIC ones) often include them, sometimes with both
  single-byte and double-byte forms.

Mappings often disagree on the question of whether a character with a fullwidth variant form,
and a halfwidth compatibility normalised form, should be mapped to the normalised or the fullwidth
codepoint if it's the only representation in the encoding, but is double byte.

== Halfwidth symbol variants (U+FFE8&ndash;FFEE) ==
Used as C0 replacement graphics in several IBM East Asian MBCSs, including their variants of
Shift_JIS (IBM-932, IBM-942, IBM-943), IBM-936 (a GB variant) and IBM-944 (see above).&ensp;(As
these are ambiguous control/graphic characters as is par the course on DOS, and the ICU mapping is
to the control meanings, this is only obvious upon reading the code pages.)&ensp;Accompanied by
some other C0 replacements, but those ones don't also have double byte forms (e.g. the double byte
box drawing is single lined, the C0 box drawing is double lined (excepting the lone light
vertical), so they have different mapping anyway).&ensp;Added in Unicode 1.0.1.