Darxus

[web] I want to write a schema language to validate HTML5.

[web] I want to write a schema language to validate HTML5.

Previous Entry Add to Memories Share this! Next Entry
2009-09-29
An XML schema is a machine readable document specification. There are several schema languages. All but the first version of the HTML standard have been translated int the DTD (Document Type Definition) schema language, and the url of that DTD is specified in the DOCTYPE that must appear at the top of all files which conform to the corresponding HTML standard.

The best HTML validators read the DTD an html file is associated with, and verify that the document conforms to the DTD.

DTD has been described as "woefully inadequate" to specify the requirements to conform to HTML5 (and previous HTML versions). There is no HTML5 DTD. There is and will be no official schema. The authors believe multiple competing schemas based on the HTML5 standard will improve the overall quality of web content.

It turns out that there is no schema language capable of fully specifying the conformance requirements of HTML5.

Validators of web content are very important. Conforming to the HTML specifications significantly improves a document's chances of rendering usefully in the most web browsers, past, present, and future.

Only one HTML5 validator has been written so far. As a master's thesis. The author used two existing schema languages, an html parser he had previously written, and someone else's partially written HTML5 schema, and some extra custom programming to validate bits that couldn't be specified in either schema languages. And the remaining work was sufficient for a master's thesis. And he has stated that some of the validator's restrictions are not as tight as they should be.

I want to write a schema language in which HTML5's conformance requirements can be sufficiently defined. This does not seem too hard to me. But the master's thesis thing is interesting.

I emailed the creator of the validator asking why he didn't write a new schema language for it.

Another interesting possibility is to create a schema for HTML5, and separate schemas each corresponding to a browser, requiring a document to only use features which that browser is known to support correctly (which unfortunately maps poorly to HTML versions).


"This is only a 'SHOULD' and not a 'MUST' requirement because it has been proven to be impossible." - HTML5 conformance requirements of an HTML5 conformance validator.

04:49PM < annevk2> anything Turing-complete is sufficient to write a HTML5 validator
04:50PM < gsnedders> annevk2: Have we proven this?

05:19PM < Darxus> Is there any reason a schema language could not be written that could specify all the rules required for validation?
05:40PM < Philip`> You could define a schema language which permits you to write one schema, which is the empty string
05:41PM < Philip`> and its semantics are that it checks all the HTML5 conformance requirements
Powered by LiveJournal.com