Encoding hints in Apple’s Private Use Area
While Apple’s CORPCHAR.TXT
lists all the codepoints used as “hints”, it does not elaborate on what all of them do: several are simply numbered, while many do serve consistent purposes in the individual encodings, e.g. the ones used in MacJapanese are described according to their function in the blurb of JAPANESE.TXT
. I have attempted to compile a list of all their functions, variously from descriptions in CORPCHAR.TXT
, descriptions of their usage in the individual mapping files’ comments, and records of what newer Unicode characters were mapped using them in the past from both CORPCHAR.TXT
(in cases where they got their own PUA points even earlier) and the individual mappings’ comments.
Ranges:
Range | Usage |
---|---|
0xF85x | CJK language hint. |
0xF86x | Composition hint. |
0xF87x | Variant hint. |
CJK language hints:
Hint | Meaning |
---|---|
U+F850 | Default |
… | Not used. |
U+F85C | Simplified Chinese |
U+F85D | Traditional Chinese |
U+F85E | Japanese |
U+F85F | Korean |
Composition hints:
Hint | Meaning |
---|---|
U+F860 | Composition of 2 characters |
U+F861 | Composition of 3 characters |
U+F862 | Composition of 4 characters |
U+F863 | Composition of 4 characters, negative, vertical, bold-serif or other alternate form. |
U+F864 | Composition of 4 characters, shadowed sans-serif form. |
U+F865 | Composition of 4 characters, negative sans-serif form. |
U+F866 | Composition of 4 characters, negative light form. |
U+F867 | Composition of 2 characters, large form. |
U+F868 | Composition of 2 characters, small form. |
U+F869 | Composition of 2 characters, small bold form. |
U+F86A | Composition of 2 characters in right-to-left direction. |
U+F86B | Composition of 4 characters in right-to-left direction. |
U+F86C | Not used. |
U+F86D | Not used. |
U+F86E | Not used. |
U+F86F | Not used. |
Variant hints (it is important to note that there are exceptions to every rule here, and that many variant hints for HangulTalk seem to be decided upon by an elimination or ad hoc basis):
Hint | Meaning |
---|---|
U+F870 | Dependent on character type, see below. |
U+F871 | Dependent on character type |
U+F872 | Dependent on character type |
U+F873 | Dependent on character type |
U+F874 | Left position. |
U+F875 | Low left position or alternative negative form. |
U+F876 | Rotated form. |
U+F877 | Superscript form. |
U+F878 | Small form. |
U+F879 | Large form. |
U+F87A | Negative form. |
U+F87B | Medium-bold weight. |
U+F87C | Bold weight. |
U+F87D | Horizontal presentation form. |
U+F87E | Vertical presentation form. |
U+F87F | Other alternate form, e.g. hankaku, duplicate or sans-serif. |
Hints for brackets, braces and integrands:
Hint | Meaning |
---|---|
U+F870 | Upper part. |
U+F871 | Middle part. |
U+F872 | Lower part. |
U+F873 | Small, bold form. |
Hints for arrows:
Hint | Meaning |
---|---|
U+F870 | Negative (outlined if otherwise filled or vice versa), heavy weight. |
U+F871 | Form with “umbrella” / “drafting point” arrowhead (➛). |
U+F872 | Form with teardrop shaped strokes. |
U+F873 | Bold barbed form (➔) |
Hints for asterisks and asterisms:
Hint | Meaning |
---|---|
U+F870 | Low position and large form. |
U+F871 | Horizontally off-centre position and large form. |
U+F872 | Not attested. |
U+F873 | Centered position. |
Hints for marks, diacritics, modifier letters and primes:
Hint | Meaning |
---|---|
U+F870 | Not attested. |
U+F871 | Not attested. |
U+F872 | Not attested. |
U+F873 | Low position of something which would usually be in a high position. |