I think that at this point, folks have realized this is not in fact true. If any...

wvenable · on Nov 26, 2020

What's the argument for that? If HTML was strictly parsed the first page with an image tag would have been broken in all other browsers.

morelisp · on Nov 26, 2020

Ignoring for one second the specifics of <img> in relation to SGML's `O` option (which was rectified in XML), this doesn't really need to have been the case. HTML could easily have said "if you encounter an unknown tag, render its contents as PCDATA" and sites would have degraded at least as gracefully as they do today. (If less gracefully than they did in 2000.)

(Heck, it could've been a generic SGML feature! "Unknown elements' contents are CDATA, unless they have this attribute in which case they're PCDATA, or this attribute in which case they're ignored" as a rule the DTD...)

wvenable · on Nov 26, 2020

> HTML could easily have said...

But it didn't! The problem with being strict here is that every possible usage has to be pre-imagined and perfectly implemented. You're suggesting the original developers should have just made affordances for everything that will be added in the next 30 years. That's easy to say now. The first web browser was essentially just a hugely successful prototype.

And can you imagine having to type all your tags in upper-case? Yuck. :)

erik_seaberg · on Nov 27, 2020

We never needed to parse tag soup. We only needed to say which DTD defines the new elements a document uses, and what a browser should do with valid but unknown elements. The latter could use #FIXED or default attribute values in the DTD, because some new elements have human-readable content and others don’t.

bmm6o · on Nov 26, 2020

The looseness of html is usually in regards to automatically closing tags or unquoted attributes. You can keep strict syntax enforcement and still recognize and skip unknown tags or attributes.