.. -=- Musings on salvage of SGML -=- # Musings on the salvage of SGML The current ''status quo'' is as follows: - Most applications of SGML, such as DocBook, adopted XML in later versions. Most SGML-derived formats from after the turn of the millennium, such as XAML, used XML from the beginning. Outside of isolated legacy systems, the main hold-outs were HTML and BBCode. - Although software which implemented BBCode in its heyday may still support it, it has been nearly fully displaced by various dialects of Markdown. Unlike BBCode, Markdown makes no attempt to be definable in terms of SGML. - Yes, BBCode counts as SGML about as much as HTML ever did (making heavy use of `SHORTREF`s and syntax-token redefinition notwithstanding). - I don't know of any pre-existing `SYNTAX` FPI for the BBCode syntax though. - HTML never fully adopted XML. - HTML files may still need to be written to be parsable with an XML parser in certain specific contexts (iXBRL-format financial reports, for example, are expected to be polyglot documents, readable as XML by XBRL software and as HTML by a browser; e-book formats also tend to target minimum specs with an XML parser but not necessarily an HTML5 parser). - Many years of stigmatising tag omission die hard. Omitting `
` and `` opening and closing tags is still, in practice, incorrectly perceived as something merely condoned by browsers, rather than "correct" HTML as is actually is. - HTML5 is not defined in terms of SGML, since its ''de facto'' parsing requirements for compatibility with the web as it exists cannot be expressed in terms of SGML as it currently exists. - The major clinchers are: - The conditional handling of self-closing syntax (which is ignored on HTML-namespace elements, and honoured on SVG-namespace or MathML-namespace elements, where the namespaces are inferred by recognising the respective root-element names). - The different "misnested" handling of certain omitted tags (e.g. if an omitted closing `` tag is inferred before a closing `` tag, it will also infer an omitted *opening* `` tag straight after the closing `` tag). - As particularly evident for misnested `` tags, attributes behave as `#CURRENT` in this case even if they don't normally. - What does this mean for `id=`? - How would HTML's handling for duplicated `id=`s interact with an extended SGML layer (or XLink or HyTime, for that matter)? - HTML5's inference of omitted tags doesn't derive implicitly from DTD content models (note that SGML tag-omission specifiers state which elements are *allowed* to omit tags without flagging up a validation error; they do not affect the logic for determining *if* a an omitted tag will be inferred). - Different end conditions for `CDATA` elements. - Even `CDATA` elements without special handling, such as `