Expand | Collapse

◀ 2. Quick start4. Main stylesheet driver ▶

3. About this document

This program, xhtml2to1, is a literate program; that is, its “source” is an essay containing its computer “source code” annotated with English explanations for human readers.

3.1. Why literate programming?

We discuss why xhtml2to1 uses literate programming1.

3.1.1. Correctness of programs

Firstly, it is a way of producing documentation for programs, something sorely lacking in open source software (in general).

The author was also inclined to employ literate programming, because of his personal tastes: he is a student of mathematics and is used to reading and writing mathematical expositions. He is constantly dismayed at the bugginess of many computer programs. In contrast, almost all widely accepted mathematics have no errors. And mathematicians take errors in published work, if discovered, very seriously, unlike most computer programmers.

Why is it so hard to achieve correctness in computer programs? One reason is that computer programs are, by their nature, too formal and low level. A computer follows precisely a set of instructions written in a formal syntax. If we make one small typo or omission in a C program, it core dumps.

We want the computer to do what we mean it to do, not exactly what we say. Obviously we cannot forgo programming languages with formal syntax altogether, so the next best thing is to complement “code”2 written in a computer programming language with exposition in a natural language. This is not unlike mathematical writings, where even mathematical proofs, which are expected to be formal, are written in English (or another natural language) with formulae interspersed throughout.

The author believes literate programming leads to more correct programs — after all, TEX, the first literate program, is mostly bug-free — and this alone should justify the method.

3.1.2. Design of programs

Literate programming also helps with the elegant design of programs. A logically correct program that has a baroque design, or is difficult to use, could be as bad as a buggy program. A person who wants to write a good program must have some confidence that the design of the program is good. And what could be more convincing than writing the program as an persuasive essay?

3.1.3. Is literate programming too hard?

There is no doubt that literate programming is harder, at least at first, than “normal programming”. The author recognizes that most literate programming systems out there are just too clumsy to use. That is why the author has written his own literate programming system to try to fix these problems.

In the case of xhtml2to1, since its primary implementation language is XSLT, an XML-based syntax, fragments the program can simply be embedded inside a XHTML 2.0 document — no extra wrappers are necessary. When developing xhtml2to1, the author tends to think through the design of the program in his head, meanwhile experimenting with the implementation in XSLT (that is: do write-compile-test cycles). When the author gets a part of the program working right and just needs to polish the results, at that time he takes the opportunity to add in the English explanations of what he has done, and to put down concretely the vague notions in his mind.

So literate programming need not be much harder than developing the code directly!

3.2. xhtml2to1 is formatted with itself

The points made above are not just some theoretical ruminations. It is precisely the fact that xhtml2to1 is written with literate programming, that we have the opportunity to use the xhtml2to1 stylesheets to format the xhtml2to1 document itself. So we automatically get the testing of both the design and implementation of xhtml2to1 with a non-trivial application.

For the reader interested in the technical workings of this program, almost every portion of the source code is presented nicely in this document with copious English explanations. The author hopes the reader will enjoy reading this document (the literate program) as much as the author has enjoyed writing it.


  1. Pioneered by the famous Donald E. Knuth in his TEX typesetting system.
  2. The term “code” (or “coding”) itself is illustrative of the point expounded here. The term evokes images of cryptic, undecipherable gobledegook only used by eccentric computer geeks. If the author had his way in changing the existing language, he would suggest the terms “formula” or “recipe” instead of “code”, to refer to “instructions for a computer written in a formal language”.

Formatted using xhtml2to1 by Steve Cheng.