Encoding hints in Apple’s Private Use Area

While Apple’s CORPCHAR.TXT lists all the codepoints used as hints”, it does not elaborate on what all of them do: several are simply numbered, while many do serve consistent purposes in the individual encodings, e.g. the ones used in MacJapanese are described according to their function in the blurb of JAPANESE.TXT. I have attempted to compile a list of all their functions, variously from descriptions in CORPCHAR.TXT, descriptions of their usage in the individual mapping files’ comments, and records of what newer Unicode characters were mapped using them in the past from both CORPCHAR.TXT (in cases where they got their own PUA points even earlier) and the individual mappings’ comments.


Range  Usage
0xF85x CJK language hint.
0xF86x Composition hint.
0xF87x Variant hint.

CJK language hints:

Hint  Meaning
U+F850 Default
Not used.
U+F85C Simplified Chinese
U+F85D Traditional Chinese
U+F85E Japanese
U+F85F Korean

Composition hints:

Hint  Meaning
U+F860 Composition of 2 characters
U+F861 Composition of 3 characters
U+F862 Composition of 4 characters
U+F863 Composition of 4 characters, negative, vertical, bold-serif or other alternate form.
U+F864 Composition of 4 characters, shadowed sans-serif form.
U+F865 Composition of 4 characters, negative sans-serif form.
U+F866 Composition of 4 characters, negative light form.
U+F867 Composition of 2 characters, large form.
U+F868 Composition of 2 characters, small form.
U+F869 Composition of 2 characters, small bold form.
U+F86A Composition of 2 characters in right-to-left direction.
U+F86B Composition of 4 characters in right-to-left direction.
U+F86C Not used.
U+F86D Not used.
U+F86E Not used.
U+F86F Not used.

Variant hints (it is important to note that there are exceptions to every rule here, and that many variant hints for HangulTalk seem to be decided upon by an elimination or ad hoc basis):

Hint  Meaning
U+F870 Dependent on character type, see below.
U+F871 Dependent on character type
U+F872 Dependent on character type
U+F873 Dependent on character type
U+F874 Left position.
U+F875 Low left position or alternative negative form.
U+F876 Rotated form.
U+F877 Superscript form.
U+F878 Small form.
U+F879 Large form.
U+F87A Negative form.
U+F87B Medium-bold weight.
U+F87C Bold weight.
U+F87D Horizontal presentation form.
U+F87E Vertical presentation form.
U+F87F Other alternate form, e.g. hankaku, duplicate or sans-serif.

Hints for brackets, braces and integrands:

Hint  Meaning
U+F870 Upper part.
U+F871 Middle part.
U+F872 Lower part.
U+F873 Small, bold form.

Hints for arrows:

Hint  Meaning
U+F870 Negative (outlined if otherwise filled or vice versa), heavy weight.
U+F871 Form with umbrella” / drafting point” arrowhead (➛).
U+F872 Form with teardrop shaped strokes.
U+F873 Bold barbed form (➔)

Hints for asterisks and asterisms:

Hint  Meaning
U+F870 Low position and large form.
U+F871 Horizontally off-centre position and large form.
U+F872 Not attested.
U+F873 Centered position.

Hints for marks, diacritics, modifier letters and primes:

Hint  Meaning
U+F870 Not attested.
U+F871 Not attested.
U+F872 Not attested.
U+F873 Low position of something which would usually be in a high position.