% This is tree_doc.tex, the documentation for the treetex macro package
% as it will appear in the conference proceedings of the third European
% TeX meeting in Exeter, England, 1988.
\documentstyle[12pt,DIN-A4]{article}
\advance\voffset by -2cm
\clubpenalty=10000
\widowpenalty=10000
\def\addcontentsline#1#2#3{\relax}% Some captions are too long for some
% TeX installations (buffer size too small)
\newenvironment{lemma}{\begingroup\samepage\begin{lemmma}\ }{\end{lemmma}%
\endgroup}
\newtheorem{lemmma}{Lemma}[section]
\newenvironment{proof}{\begin{prooof}\rm\ \nopagebreak}{\end{prooof}}
\newcommand{\proofend}{\qquad\ifmmode\Box\else$\Box$\fi}
\newtheorem{prooof}{Proof}
\renewcommand{\theprooof}{} % makes shure that prooof doesn't get numbers
\newenvironment{Figure}{\begin{figure}\vspace{1\baselineskip}}%
{\vspace{1\baselineskip}\end{figure}}
\newlength{\figspace} % space between figures in a single
\setlength{\figspace}{30pt} % Figure environment
\newcommand{\var}[1]{{\it #1\/}} % use it for names of variables
\newcommand{\emph}[1]{{\em #1\/}} % use it for emphazided text
% (This notion sticks to the
% applicative style of markup.)
\renewcommand{\O}{{\rm O}} % O-notation, also for math mode
\newcommand{\T}{{\cal T}} % the set T in math mode
\newcommand{\TreeTeX}{Tree\TeX}
\newcommand{\fig}[1]{Figure~\ref{#1}}
\let\p\par
\input TreeTeX
\Treestyle{\vdist{20pt}\minsep{16pt}}
\dummyhalfcenterdim@n=2pt
\def\Node(#1,#2){\put(#1,#2){\circle*{4}}}
\def\Edge(#1,#2,#3,#4,#5){\put(#1,#2){\line(#3,#4){#5}}}
\def\enode{\node{\external\type{dot}}}
\def\inode{\node{\type{dot}}}
\def\e{\node{\external\type{dot}}}
\def\i{\node{\type{dot}}}
\def\il{\node{\type{dot}\leftonly}}
\def\ir{\node{\type{dot}\rightonly}}
\newcommand{\stack}[3]{%
\vtop{\settowidth{\hsize}{#1}%
\setlength{\leftskip}{0pt plus 1fill}%
\setlength{\baselineskip}{#2}#3}}
\let\multic\multicolumn
\newlength{\hd} % hidden digit
\setbox0\hbox{1}
\settowidth{\hd}{\usebox{0}}
\newcommand{\ds}{\hspace{\hd}} % digit space
\newcommand{\ccol}[1]{\multicolumn{1}{c}{#1}}
\hyphenation{post-or-der sym-bol Karls-ruhe bool-ean}
\begin{document}
\bibliographystyle{plain}
\title{Drawing Trees Nicely with \TeX\thanks{This work was supported by
a Natural Sciences and Engineering Research Council of Canada
Grant~A-5692 and a Deutsche Forschungsgemeinschaft Grant~Sto167/1-1.
It was started during the first author's stay with
the Data Structuring Group in Waterloo.}}
\author{Anne Br\"uggemann-Klein\thanks{Institut f\"ur Informatik,
Universit\"at Freiburg, Rheinstr.~10--12, 7800~Freiburg,
West~Germany}\ \and Derick Wood\thanks{Data
Structuring Group, Department of Computer Science, University of
Waterloo, Waterloo, Ontario, N2L~3G1, Canada}}
\maketitle
\begin{abstract}
Various algorithms have been proposed for the difficult problem of
producing aesthetically pleasing drawings of trees, see~%
\cite{TidierTrees,TidyTrees} but
implementations only exist as ``special purpose software'',
designed for special environments. Therefore,
many users resort to the
drawing facilities available on most personal computers, but the
figures obtained in this way still look ``hand-drawn''; their quality is
inferior to the quality of the surrounding text that can be realized by
today's high quality text processing systems.
In this paper we present an entirely new solution that
integrates a tree drawing algorithm into one of the best text
processing systems available. More precisely, we present a \TeX{} macro package
\TreeTeX{} that produces a drawing of a tree from a purely logical
description. Our approach has three advantages. First, labels
for nodes can be handled in a reasonable way. On the one hand, the tree
drawing algorithm can compute the widths of the labels and take
them into account for the positioning of the nodes; on the other hand,
all the textual parts of the document can be treated uniformly. Second,
\TreeTeX{} can be trivially ported to any site running \TeX{}. Finally,
modularity in the description of a tree and \TeX{}'s macro capabilities
allow for libraries of subtrees and tree classes.
In addition, we have implemented an option that produces
drawings which make the
\emph{structure} of the trees more obvious to the human eye,
even though they may not be as aesthetically pleasing.
\end{abstract}
\section{Aesthetical criteria for drawing trees}
One of the most commonly used data structures in computer science is the tree.
As many people are using trees in their research or just as illustration
tools, they are usually struggling with the problem of
\emph{drawing} trees. We are concerned primarily with ordered
trees in the sense of~\cite{ACP}, especially binary and unary-binary
trees. A binary tree is a finite set of nodes which either
is empty, or consists of a root and two disjoint binary trees called
the left and right subtrees of the root. A unary-binary tree is
a finite set of nodes which either is empty, or consists of a root and
two disjoint unary-binary trees, or consists of a root and one
nonempty unary-binary tree. An extended binary tree is a binary tree
in which each node has either two nonempty subtrees or two
empty subtrees.
For these trees there
are some basic agreements on how they should be drawn, reflecting
the top-down and left-right ordering of nodes in a tree;
see \cite{TidierTrees} and \cite{TidyTrees}.
\begin{enumerate}
\item[1.] Trees impose a distance on the nodes; no node
should be closer to the root than any of its
ancestors.
\item[2.] Nodes of a tree at the same height should lie on a straight
line, and the straight lines defining the levels should be
parallel.
\item[3.] The relative order of nodes on any level should be the same
as in the level order traversal of the tree.
\end{enumerate}
These axioms guarantee that trees are drawn as planar graphs: edges do
not intersect except at nodes. Two further axioms improve the aesthetical
appearance of trees:
\begin{enumerate}
\item[4.] In a unary-binary tree, each left child should be positioned
to the left of its parent, each
right child to the right of its parent, and each unary child
should be positioned below its parent.
\item[5.] A parent should be centered over its children.
\end{enumerate}
An additional axiom deals with the problem of tree drawings becoming too wide
and therefore exceeding the physical limit of the output medium:
\begin{enumerate}
\item[6.] Tree drawings should occupy as little width as possible without
violating the other axioms.
\end{enumerate}
In \cite{TidyTrees}, Wetherell and Shannon introduce two algorithms for
tree drawings, the first of which fulfills axioms~1--5, and the second
1--6. However, as Reingold and Tilford in \cite{TidierTrees}
point out, there is a lack of symmetry in the algorithms of
Wetherell and Shannon which may lead to unpleasant results.
Therefore, Reingold and Tilford introduce a new structured
axiom:
\begin{enumerate}
\item[7.] A subtree of a given tree should be
drawn the same way regardless of where it occurs in the given tree.
\end{enumerate}
Axiom~7 allows the same tree to be drawn differently when it occurs as
a subtree in different trees.
Reingold and Tilford give an algorithm which fulfills axioms~1--5
and~7. Although
this algorithm doesn't fulfill axiom~6,
the aesthetical improvements are well worth the additional space.
\fig{algorithms} illustrates the benefits of axiom~7, and \fig{narrowtrees}
shows that the algorithm of Reingold and Tilford violates axiom~6.
\begin{Figure}
\centering
\leavevmode\noindent
\begin{Tree}
\enode
\enode\enode\inode\enode\enode\inode\inode\inode
\node{\external\type{dot}\rght{\unskip\hskip2\mins@p\hskip2\dotw@dth}}
\enode\enode\inode\enode\enode\inode\inode\inode
\inode
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\qquad
\begin{Tree}
\enode
\enode\enode\inode\enode\enode\inode\inode\inode
\enode
\enode\enode\inode\enode\enode\inode\inode\inode
\inode
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\
\caption{The left tree is drawn by the algorithm of Wetherell and Shannon,
and the tidier right one is drawn by the algorithm of Reingold and Tilford.}
\label{algorithms}
\vspace{\figspace}
\centering
\leavevmode\noindent
\begin{Tree}
\enode\enode\enode\enode\enode\enode\enode\enode\enode
\enode\inode\inode\inode
\enode\inode\inode\inode
\enode\inode\inode\inode
\enode\inode\inode\inode
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\qquad
\begin{Tree}
\enode\enode\enode\enode\enode\enode\enode\enode
\node{\external\type{dot}\rght{\unskip\hskip\mins@p\hskip\dotw@dth}}
\enode\inode\inode\node{\type{dot}\rght{\unskip\hskip\mins@p\hskip\dotw@dth}}
\enode\inode\inode\node{\type{dot}\rght{\unskip\hskip\mins@p\hskip\dotw@dth}}
\enode\inode\inode\node{\type{dot}\rght{\unskip\hskip\mins@p\hskip\dotw@dth}}
\enode\inode\inode\inode
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\
\caption{The left tree is drawn by the algorithm of Reingold and Tildford, but
the right tree shows that narrower drawings fulfilling all aesthetic axioms
are possible.}
\label{narrowtrees}
\end{Figure}
\section{The algorithm of Reingold and Tilford}
The algorithm of Reingold and Tilford (hereafter called ``the RT~algorithm'')
takes a modular approach to the
positioning of nodes: The relative positions of the nodes in a subtree
are calculated independently from the rest of the tree. After the
relative positions of two subtrees have been calculated, they can be
joined as siblings in a larger tree by placing them as close
together as possible and centering the parent node above them.
Incidentally, the modularity principle is the reason that the
algorithm fails to fulfill axiom~6; see~\cite{Complexity}.
Two sibling subtrees are placed as close together as possible,
during a postorder traversal, as follows. At each node \var{T},
imagine that its two subtrees have been drawn and cut out of paper along
their contours. Then, starting with the two subtrees superimposed at their
roots, move them apart until a minimal agreed upon distance
between the trees is obtained at each level. This can be done gradually:
Initially, their roots are separated by some agreed upon minimum
distance. Then, at the next lower level,
they are pushed
apart until the minimum separation is established there.
This process is continued at successively lower levels until the
bottom of the shorter subtree is reached. At some levels no movement may be
necessary; but at no level are the two subtrees moved closer
together. When the process is complete, the position of the
subtrees is fixed relative to their parent, which is centered over them.
Assured that the subtrees will never be placed closer together,
the postorder traversal is continued.
A nontrivial implementation of
this algorithm has been obtained by Reingold and Tilford that runs
in time $\O(N)$, where $N$ is the number of
nodes of the tree to be drawn.
Their crucial idea is to keep track of the contour of the subtrees
by special pointers, called threads, such that whenever
two subtrees are joined, only the
top part of the trees down to the lowest level of the
smaller tree need to be taken into account.
The RT algorithm is given in \cite{TidierTrees}.
The nodes are positioned on a fixed grid and are
considered to have zero width. No labelling is provided. The algorithm only
draws binary trees, but is easily extendable to multiway trees.
\section{Improving human perception of trees}
It is common understanding in book design that aesthetics and readability
don't necessarily coincide, and---as Lamport (\cite{LaTeX}) puts it---%
books are meant to be read, not to be hung on walls. Therefore, readability is
more important than aesthetics.
When it comes to tree drawings, readability means that the structure of
a tree must be easily recognizable. This criterion is not always met
by the RT~algorithm. As an example, there are trees whose structure is very
different, the only common thing being the fact that they have the same number
of nodes at each level. The RT~algorithm might assign identical positions to
these nodes making it very hard to perceive the different structures.
Hence, we have modified the RT~algorithm such that additional white space
is inserted between subtrees of
\emph{significant} nodes. Here a binary node
is called significant if the minimum distance
between its two subtrees is taken \emph{below} their root level.
Setting the amount of additional white space to zero retains the original RT~%
placement. The effect of having nonzero additional white space between
the subtrees of significant
nodes is illustrated in \fig{addspace} .
Another feature we have added to the RT~algorithms is the possibility to draw
an unextended binary tree with the same placement of nodes as its
associated extended version. We define the \emph{associated extended version}
of a binary tree to be the binary tree obtained by replacing each empty subtree
having a nonempty sibling with a subtree consisting of one node. This feature
also makes the structure of a tree more prominent; see \fig{extended}.
\begin{Figure}
\centering
\leavevmode\noindent
\begin{Tree}
\e\il\e\e\i\i\il % the left subtree
\e\ir\il % the right subtree
\i
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\qquad
\begin{Tree}
\e\il\il\il % the left subtree
\e\e\i\e\i\il % the right subtree
\i
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\qquad
\adds@p10pt
\begin{Tree}
\e\il\e\e\i\node{\type{dot}\lft{$\longrightarrow$}}\il % the left subtree
\e\ir\il % the right subtree
\node{\type{dot}\lft{$\longrightarrow$}}
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\qquad
\begin{Tree}
\e\il\il\il % the left subtree
\e\e\i\e\i\il % the right subtree
\node{\type{dot}\lft{$\longrightarrow$}}
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\
\adds@p0pt
\caption{The first two trees get the same placement of their nodes
by the RT~algorithm, although the structure of the two trees is very different.
The alternative drawings highlight the structure of the trees by adding
additional white space between the subtrees of
($\longrightarrow$) significant nodes.}
\label{addspace}
\end{Figure}
\begin{Figure}
\centering
\leavevmode\noindent
\begin{Tree}
\e\e\i\il\e\e\i\i
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\qquad
\begin{Tree}
\e\e\i\e\i\e\ir\i
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\qquad
\extended
\begin{Tree}
\e\e\i\il\e\e\i\i
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\qquad
\begin{Tree}
\e\e\i\e\i\e\ir\i
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\\
\noextended
\begin{Tree}
\e\e\i\e\i\e\e\i\i
\end{Tree}
\hskip\leftdist\box\TeXTree\hskip\rightdist\
\caption{In the first two drawings, the RT~algorithm assigns the same placement
to the nodes of two trees although their structure is very different. The modified
RT~algorithms highlights the structure of the trees by optionally
drawing them like their extended
counterpart, which is given in the second row.}
\label{extended}
\end{Figure}
\section{Trees in a document preparation environment}
Drawings of trees usually don't come alone, but are included in some text
which is itself typeset by a text processing system. Therefore, a typical
scenario is a pipe of three stages. First comes the tree drawing
program which calculates the positioning of the nodes of the tree to
be drawn and outputs a description of the tree drawing in
some graphics language; next comes a graphics system which transforms this
description into an intermediate language which can be interpreted by the output
device; and finally comes the
text processing system which integrates the output of the
graphics system into the text.
This scenario loses its linear structure once nodes have to be labelled, since
the labelling influences the positioning of the nodes. Labels usually occur
inside, to the left of, to the right of, or beneath nodes (the latter only for
external nodes), and their extensions certainly should be taken into account
by the tree drawing algorithm. But the labels have to be typeset first
in order to determine their extensions,
preferably by the typesetting program that
is used for the regular text, because this method makes for the uniformity in the textual
parts of the document and provides the author with the full power of the
text processing system for composing the labels. Hence, a more complex
communication scheme than a simple pipe is required.
Although a system of two processes running simultaneously might be the most
elegant solution, we wanted a system that is easily portable to
a large range of hardware at our sites
including personal computers with single process
operating systems.
Therefore, we thought of using a text processing system
having programming facilities powerful enough to program a tree drawing algorithm
and graphics facilities powerful enough
to draw a tree. One text processing system
rendering outstanding typographic quality and good enough programming
facilities is \TeX, developed by Knuth at Stanford University;
see~\cite{TeXbook}.
The \TeX{} system includes the following programming facilities:
\begin{enumerate}
\item[1.] datatypes:\\
integers~(256), dimensions\footnote{The term \emph{dimension} is used
in \TeX\ to describe physical measurements of typographical objects,
like the length of a word.}~(512), boxes~(256), tokenlists~(256), boolean
variables~(unrestricted)
\item[2.] elementary statements:\\
$a:=\rm const$, $a:=b$ (all types);\\
$a:=a+b$, $a:=a*b$, $a:=a/b$ (integers and dimensions);\\
horizontal and vertical nesting of boxes
\item[3.] control constructs:\\
if-then-else statements testing relations between integers,
dimensions, boxes, or boolean variables
\item[4.] modularization constructs:\\
macros with up to 9~parameters (can be viewed as procedures without
the concept of local variables).
\end{enumerate}
Although the programming
facilities of \TeX{} hardly exceed the abilities of a Turing machine,
they are sufficient to
handle relatively small programs. How about the graphics facilities?
Although \TeX{} has no built-in graphics facilities, it
allows the placement of characters in arbitrary positions on
the page. Therefore, complex pictures can be synthesized from elementary
picture elements treated as characters. Lamport has included such
a picture drawing environment in his macro package \LaTeX, using
quarter circles of different sizes and line segments (with and without
arrow heads) of different slopes as basic elements; see~\cite{LaTeX}.
These elements are sufficient for drawing trees.
This survey of \TeX's capabilities implies that \TeX{} may be a suitable
text processing system to implement a tree drawing algorithm directly.
We are basing our algorithm on the RT~algorithm, because this algorithm
gives the aesthetically most pleasing results. In the first version
presented here, we
restrict ourselves to unary-binary trees, although our method is
applicable to arbitrary multiway trees. But in order to take advantage
of the text processing environment, we expand the algorithm to allow
labelled nodes.
In contrast to previous tree drawing programs, we feel no necessity to
position the nodes of a tree on a fixed grid. While this may be
reasonable for a plotter with a coarse resolution, it is certainly not
necessary for \TeX, a system that is capable of handling
arbitrary dimensions
and produces device \emph{independent} output.
\section{A representation method for \TeX{}trees}
The first problem to be solved in implementing our tree drawing algorithm
is how to choose a good internal representation
for trees. A straightforward adaptation
of the implementation by Reingold and Tilford requires, for each node,
at least the following fields:
\begin{enumerate}
\item two pointers to the children of the node
\item two dimensions for the offset to the left and the right child (these
may be different once there are labels of different widths to the
left and right of the nodes)
\item two dimensions for the $x$- and $y$-coordinates of the final
position of the nodes
\item three or four labels
\item one token to store the geometric shape (circle, square, framed text etc.)
of the node.
\end{enumerate}
Because these data are used very frequently in calculations, they should be
stored in registers (that's what variables are called in \TeX),
rather than being recomputed, in order to obtain
reasonably fast performance. This gives a total of $10N$ registers for
a tree with $N$ nodes, which would exceed
\TeX's limited supply of registers. Therefore, we present a
modified algorithm hand-tailored to the abilities of \TeX{}.
We start with the following observation.
Suppose a unary-binary tree is constructed bottom-up, in a postorder
traversal. This is done by iterating the following three steps in
an order determined by the tree to be constructed.
\begin{enumerate}
\item Create a new subtree consisting of one external node.
\item Create a new subtree by appending the two subtrees created last
to a new binary node; see \fig{Construct}.
\item Create a new subtree by appending the subtree created last as a left,
right, or unary subtree of a new node; see \fig{Construct}.
\end{enumerate}
(A pointer to) each subtree that has been
created in steps 1--3 is pushed onto a stack, and
steps 2 and 3 remove two trees or one, respectively,
from the stack before the push
operation is carried out. Finally, the tree to be constructed will
be the remaining tree on the
stack.
\begin{Figure}
\centering
\begin{Tree}
\treesymbol{\lvls{2}}%
\hspace{-\l@stlmoff}\usebox{\l@sttreebox}\hspace{\l@strmoff}
$+$
\treesymbol{\lvls{2}}%
\hspace{-\l@stlmoff}\usebox{\l@sttreebox}\hspace{\l@strmoff}\quad
$\Longrightarrow$\quad
\treesymbol{\lvls{2}}%
\treesymbol{\lvls{2}}%
\node{\type{dot}}%
\hspace{-\l@stlmoff}\raisebox{\vd@st}{\usebox\l@sttreebox}\hspace{\l@strmoff}%
\end{Tree}
\vskip\baselineskip
\begin{Tree}
\treesymbol{\lvls{2}}%
\hspace{-\l@stlmoff}\usebox{\l@sttreebox}\hspace{\l@strmoff}\quad
$\Longrightarrow$\quad
\treesymbol{\lvls{2}}%
\node{\leftonly\type{dot}}%
\hspace{-\l@stlmoff}\raisebox{\vd@st}{\usebox\l@sttreebox}\hspace{\l@strmoff}%
\quad or\quad
\treesymbol{\lvls{2}}%
\node{\unary\type{dot}}%
\hspace{-\l@stlmoff}\raisebox{\vd@st}{\usebox\l@sttreebox}\hspace{\l@strmoff}%
\quad or\quad
\treesymbol{\lvls{2}}%
\node{\rightonly\type{dot}}%
\hspace{-\l@stlmoff}\raisebox{\vd@st}{\usebox\l@sttreebox}\hspace{\l@strmoff}%
\end{Tree}
\caption{Construction steps 2 and 3}
\label{Construct}
\end{Figure}
This tree traversal is performed twice in the RT~algorithm.
During the first pass,
at each execution of step 2 or step 3, the relative positions of the
subtree(s) and of the new node are computed.
A closer examination of the RT~algorithm reveals that information about the
subtree's coordinates is not needed during this pass; the contour information
alone would be sufficient. Complete information is only needed in the second
traversal, when the tree is actually drawn. Here a special feature of
\TeX{} comes in that allows us to save registers.
Unlike Pascal, \TeX{} provides the capability of
storing a drawing in a single box register that can be positioned freely in
later drawings. This means that in our implementation the two passes
of the original RT~algorithm can be intertwined into a single pass,
storing for each subtree on the stack its contour and its drawing.
Although the latter is a complex object, it takes only one of
\TeX's precious registers.
\section{The internal representation}
Given a tree, the corresponding \TeX{}tree is a box containing
the ``drawing'' of the tree, together with some additional
information about the contour of the tree.
The reference point of a \TeX{}tree-box is always in the root of the
tree. The height, depth, and width of the box of a \TeX{}tree are
of no importance in this context.
The additional information about the contour of the tree is stored in some
registers for numbers and dimensions and
is needed in order to put subtrees together to form a larger tree.
\var{loff} is an array of dimensions which contains for each
level of the tree the horizontal offset between the
left end of the
leftmost node at the current level and the
left end of the leftmost node at
the next level.
\var{lmoff} holds the horizontal offset between the root
and the leftmost node of the whole tree. \var{lboff} holds the
horizontal offset between the root and the leftmost node at
the bottom level of the tree.
Finally, \var{ltop} holds the distance between the reference point
of the tree and the leftmost end of the root.
The same is true for
\var{roff}, \var{rmoff}, \var{rboff}, and \var{rtop}; just replace
``left'' by ``right''. Finally,
\var{height} holds the height of the tree, and \var{type} holds the
geometric shape of the root of the tree. \fig{TeXtree} shows an example \TeX{}tree,
i.e. a tree drawing and the corresponding additional information.
\begin{Figure}
\centering
\begin{Tree}
\e\ir\ir\e
\node{\type{dot}\rightonly\rght{\unskip\vrule height.8pt width5pt depth0pt}}%
\i % A
\end{Tree}
\leavevmode
\stack{-10pt}{\vd@st}{%
-10pt\\10pt\\10pt\\\var{loff}}%
\hspace{1em}%
\hspace{\leftdist}\usebox{\TeXTree}\hspace{\rightdist}%
\hspace{1em}%
\stack{-10pt}{\vd@st}{%
15pt\\5pt\\-10pt\\\var{roff}}%
\vskip\baselineskip\raggedright
height:~3, type:~dot, ltop:~2pt, rtop:~2pt, lmoff:~-10pt, rmoff:~20pt, lboff:~10pt,
rboff:~10pt.
\caption{A \TeX{}tree consists of the drawing of the tree and the
additional information. The width of the dots is 4pt, the minimal separation between
adjacent nodes is 16pt, making for a distance of 20pt center to center.
The length of the small rule labelling one of the nodes is 5pt. The column left (right)
of the tree drawing is the array \var{loff} (\var{roff}),
describing the left (right) contour of the tree. At each level,
the dimension given is the horizontal
offset between the border at the current and at the next level. The offset between
the left border of the root node and the leftmost node at level~1 is -10pt,
the offset between the right border of the root node and the rightmost node at
level~1 is 15pt, etc.}
\label{TeXtree}
\end{Figure}
Given two \TeX{}trees \var{A} and \var{B},
how can a new \TeX{}tree \var{C} be built that
consists of a new root and has \var{A} and \var{B} as subtrees?
An example is given in \fig{AddInfo}.
\begin{Figure}
\centering
\begin{Tree}
\e\ir\ir\e
\node{\type{dot}\rightonly\rght{\unskip\vrule height.8pt width5pt depth0pt}}%
\i % A
\end{Tree}
\leavevmode
A: \stack{-10pt}{\vd@st}{%
-10pt\\10pt\\10pt\\\ \\\var{loff}(\var{A})}%
\hspace{1em}%
\hspace{\leftdist}\usebox{\TeXTree}\hspace{\rightdist}%
\hspace{1em}%
\stack{-10pt}{\vd@st}{%
15pt\\5pt\\-10pt\\\ \\\var{roff}(\var{A})}%
\qquad
\begin{Tree}
\e\il\e\i\il\il\ir % B
\end{Tree}
\leavevmode
B: \stack{-10pt}{\vd@st}{%
10pt\\-10pt\\-10pt\\-10pt\\-10pt\\\ \\\var{loff}(\var{B})}%
\hspace{1em}%
\hspace{\leftdist}\usebox{\TeXTree}\hspace{\rightdist}%
\hspace{1em}%
\stack{-10pt}{\vd@st}{%
10pt\\-10pt\\-10pt\\10pt\\-30pt\\\ \\\var{roff}(\var{B})}%
\\[\figspace]
\begin{Tree}
\e\ir\ir\e
\node{\type{dot}\rightonly\rght{\unskip\vrule height.8pt width5pt depth0pt}}%
\i % A
\e\il\e\i\il\il\ir % B
\i % C
\end{Tree}
\leavevmode
C: \stack{-10pt}{\vd@st}{%
-20\\-10pt\\%
\makebox[0pt][r]{\var{loff}(\var{A})$\smash{\left\{\vrule height\vd@st
depth\vd@st width0pt\right.}$ }%
10pt\\10pt\\%
\makebox[0pt][r]{$\longrightarrow$ }%
10pt\\%
\makebox[0pt][r]{\raisebox{-.5\vd@st}{\var{loff}(\var{B})$\smash
{\left\{\vrule height.5\vd@st
depth.5\vd@st width0pt\right.}$ }}%
\makebox[0pt][r]{-}10pt\\\ \\\var{loff}(\var{C})}%
\hspace{1em}%
\hspace{\leftdist}\usebox{\TeXTree}\hspace{\rightdist}%
\hspace{1em}%
\stack{-10pt}{\vd@st}{%
20pt\\10pt\\-10pt\\-10pt%
\makebox[0pt][l]{\raisebox{-.5\vd@st}{
$\smash{\left\}\vrule height2.5\vd@st
depth2.5\vd@st width0pt\right.}$\var{roff}(\var{B})}}%
\\10pt\\-30pt\\\ \\\var{roff}(\var{C})}%
\vspace{\figspace}
\centering
\begin{tabular}{|l|r|r|r|}
\hline
&\multic{1}{c|}{\var{A}}&\multic{1}{c|}{\var{B}}&\multic{1}{c|}{\var{C}}\\
\hline
height&\multic{1}{c|}{3}& \multic{1}{c|}{5}& \multic{1}{c|}{6}\\
type& \multic{1}{c|}{dot}&\multic{1}{c|}{dot}&\multic{1}{c|}{dot}\\
ltop& 2pt& 2pt& 2pt\\
rtop& 2pt& 2pt& 2pt\\
lmoff& -10pt& -30pt& -30pt\\
rmoff& 20pt& 10pt& 30pt\\
lboff& 10pt& -30pt& -10pt\\
rboff& 10pt& -30pt& -10pt\\
\hline
\end{tabular}\qquad
\begin{tabular}{|c|r|r|}
\hline
\multic{1}{|c|}{level}&\multic{1}{c|}{\var{totsep}}&
\multic{1}{c|}{\var{currsep}}\\
\hline
0&20pt&0/16pt\\
1&25pt&11/16\\
2&40pt&1/16pt\\
3&40pt&16pt\\
\hline
\end{tabular}
\caption{The \TeX{}trees \var{A} and~\var{B} are combined to form the
larger \TeX{}\-tree~\var{C}. The small table gives the
history of computation for \var{totsep} and \var{currsep}.}
\label{AddInfo}
\end{Figure}
First we determine which tree is higher; this is
\var{B} in the example.
Then we have to compute the minimal distance
between the roots of \var{A} and \var{B}, such that at all levels
of the trees there is free space of at least \var{minsep} between
the trees when they are drawn side by side.
For this purpose we keep track of two values, \var{totsep} and
\var{currsep}. The variables \var{totsep} and \var{currsep}
hold the total distance between the roots and the distance
between the rightmost node of \var{A} and the leftmost node
of \var{B} at the current level. In order to calculate
\var{totsep} and \var{currsep}, we start at level 0 and
visit each level of the trees until we reach the bottom level
of the smaller tree; this is \var{A} in our example.
At level 0, the distance between the roots of \var{A} and \var{B}
should be at least \var{minsep}. Therefore, we set
$\var{totsep}:=\var{minsep} + \var{rtop}(\var{A})
+ \var{ltop}(\var{B})$ and $\var{currsep}:=\var{minsep}$.
Using $\var{roff}(\var{A})$ and $\var{loff}(\var{B})$, we can
proceed to calculate \var{currsep} for the next level.
If $\var{currsep} < \var{minsep}$, we have to increase \var{totsep} by
the difference and update \var{currsep}. This process is
iterated until we reach the lowest level of \var{A}.
Then \var{totsep} holds the final distance between the
nodes of \var{A} and \var{B}, as calculated by the RT~algorithm.
If the root of \var{C} is a significant node, then the additional space ,
which is 0pt by default, is added to \var{totsep}.
However, the approach of synthesizing
drawings from simple graphics characters allows only a finite
number of orientations for the tree edges; therefore, \var{totsep}
must be increased slightly to fit the next orientation
available.
Now we are ready to construct the box of \TeX{}tree~\var{C}.
Simply put \var{A} and~\var{B} side by side, with the reference
points \var{totsep}~units apart, insert a new node
above them, and connect the parent and children by edges.
Next, we update the additional information
for \var{C}. This can be done by using the additional information
for \var{A} and~\var{B}.
Note that most components of $\var{roff}(\var{C})$ and
$\var{lroff}(\var{C})$ are the same as in the higher tree, which
is \var{B} in our case.
So, if we can avoid moving this information around, we only have
to access $\var{height}(\var{A}) + \var{const}$ many counters in
order to update the additional information for \var{C}.
This implies that we can apply the same argument as
in~\cite{TidierTrees}, which gives
us a running time of $\O(N)$ for drawing a tree with N nodes.
Therefore, we must carefully design the storage allocation for
the additional information of \TeX{}trees in order to fulfill the
following requirements:
If a new tree is built from
two subtrees, the additional information of the new tree should
share storage with its larger subtree.
Organizational overhead, that is,
pointers which keep track of the locations of different parts of additional
information, must be avoided.
This means that all the additional information
for one \TeX{}tree should be stored in a row of consecutive dimension registers
such that only one pointer granting access to the first element
in this row is needed.
On the other hand, each parent
tree is higher and therefore needs more storage than its subtrees.
So we must ensure that there is always enough space in the row
for more information.
The obvious way to fulfill these requirements is to use a stack and to
allow only the topmost \TeX{}trees of this stack to be
combined into a larger tree at any time.
This leads to the following register allocation: A subsequent number of
box registers contains the treeboxes of the subtrees in the stack. A
subsequent number of token registers contains the type information for the
nodes of the subtrees in the stack. For each subtree in the stack,
a subsequent number of dimension registers contains the contour
information of the subtree. The ordering of these groups of dimension
registers reflects the ordering of the subtrees in the
stack. Finally, a subsequent number of counter registers contains
the height and the address of the first dimension register for
each subtree in the stack. Four address counters store the addresses
of the last treebox, type information, height, and address of contour
information. A sketch of the register organization for a stack of \TeX{}trees
is provided in \fig{Registers}.
\begin{Figure}
Dimension registers\\
\var{lmoff}(1) \var{rmoff}(1) \var{lboff}(1) \var{rboff}(1) \var{ltop}(1)
\var{rtop}(1)\\
\var{loff}($h_1$) \var{roff}($h_1$) \dots\ \var{loff}(1) \var{roff}(1)\\
\dots\\
\var{lmoff}($n$) \var{rmoff}($n$) \var{lboff}($n$) \var{rboff}($n$)
\var{ltop}($n$) \var{rtop}($n$)\\
\var{loff}($h_n$) \var{roff}($h_n$) \dots\ \var{loff}(1) \var{roff}(1)\\
\ \\
Counter registers\\
\var{lasttreebox} \var{lasttreeheight} \var{lasttreeinfo} \var{lasttreetype}\\
\var{treeheight}(1) \var{diminfo}(1) \dots\ \var{treeheight}($n$)
\var{diminfo}($n$)\\
\ \\
Box registers\\
\var{treebox}(1) \dots\ \var{treebox}($n$)\\
\ \\
Token registers\\
\var{type}(1) \dots\ \var{type}($n$)
\caption{\var{lasttreebox}, \var{lasttreeheight}, \var{lasttreeinfo},
\var{lasttreetype} contain pointers to \var{treebox}($n$)
\var{treeheight}($n$), \var{lmoff}($n$), \var{type}($n$),
\var{diminfo}($i$) contains a pointer to
\var{lmoff}($i$). Unused dimension registers are
allowed between the dimension registers of subsequent trees. The counter
registers \var{lasttreebox},\ldots,\var{diminfo}($n$) serve as a directory
mechanism to access the \TeX{}trees on the stack.}
\label{Registers}
\end{Figure}
When a new node is pushed onto the stack, the treebox, type information,
height, address of contour information, and contour information are
stored in the next free registers of the appropriate type, and the
four address counters are updated accordingly.
When a new tree is formed from the topmost subtrees on the stack,
the treebox, type information, height, and address of contour information
of the new tree are sorted in the registers formerly used by the bottommost
subtree that has occured in the construction step, and the four address registers are
updated accordingly. This means that these informations for the subtrees
are no longer accessible. The contour information of the new subtree
is stored in the same registers as the contour information of the larger
subtree used in the construction, apart from the left and right offset
of the root to the left and right child, which are stored in the
following dimension registers. That means that gaps can occur
between the contour information of subsequent subtrees in the
stack, namely when the right subtree, which is on a higher position on the
stack, is higher than the left one. In order to avoid these
gaps, the user can specify an option \verb.\lefttop. when entering a
binary node, which makes the topmost tree in the stack the
left subtree of the node.
This stack concept also has consequences for the design of the user interface
that is discussed in Section~\ref{Interface}.
\section{Space cost analysis}
Suppose we want to draw a unary-binary tree $T$ of height $h$ having
$N$ nodes\footnote{The height $h$ and the number of nodes $N$ refer to the
drawing of the tree. $N$ is the number of circles, squares etc.~actually
drawn, and $h$ is the number of levels in the drawing minus 1.}.
According to our internal representation,
for each subtree in the stack we need
\begin{enumerate}
\item one box register to store the box of the \TeX{}tree.
\item one token register to store the type of the root of the subtree.
\item $2h^\prime+6$ dimension registers to store the additional
information, where $h^\prime$ is the height of the
subtree.
\item three counter registers to store the register numbers of the
box register, the token register, and the first dimension register above.
\end{enumerate}
The following lemma relates to $h$ and $N$ the number
of subtrees of $T$ which are on the
stack simultaneously and their heights.
\begin{lemma}
\begin{enumerate}
\item At any time, there are at most $h+1$ subtrees of $T$ on the
stack.
\item For each set $\T$ of subtrees of $T$ which are on the stack
simultaneously we have
$$\sum_{T^\prime\in \T}({\rm ht}(T^\prime)+1)
\le\min(N,{(h+1)(h+2)\over2}).$$
\end{enumerate}
\end{lemma}
\begin{proof}
\begin{enumerate}
\item By induction on $h$.\label{stackdepth}
\item The trees in $\T$ are pairwise disjoint, and each tree of
height $h^\prime$ has at least $h^\prime+1$ nodes. This implies
$$\sum_{T^\prime\in \T}({\rm ht}(T^\prime)+1)
\le N.$$
The second part is shown by induction on $h$.
The basis $h=0$ is clear.
Assume the assumption holds for all trees of height less than
$h$. If $\T$
contains only subtrees of either the left or the right subtree
of $T$, we have
$$\sum_{T^\prime\in \T}({\rm ht}(T^\prime)+1)\le
{h(h+1)\over2}\le{(h+1)(h+2)\over2}.$$
Otherwise, $\T$ contains the left or the right subtree $T_s$ of
$T$. Then all elements of $\T-\{T_s\}$ belong to the other
subtree. This implies
\begin{eqnarray*}
\sum_{T^\prime\in \T}({\rm ht}(T^\prime)+1)&\le&
{\rm ht}(T_s)+1
+\sum_{T^\prime\in \T-\{T_s\}}({\rm ht}(T^\prime)+1)\\
&\le& h+{h(h+1)\over2}\le{(h+1)(h+2)\over2}.\proofend
\end{eqnarray*}
\end{enumerate}
\end{proof}
Therefore, our implementation uses at most $9h+2\min(N,(h+1)(h+2)/2)$
registers. In order to compare this with the
$10N$ registers used in the straightforward implementation,
an estimation of the average height of a tree with $N$ nodes is
needed. Several results, depending on the type of trees and of the
randomization model, are cited in \fig{Stat}, which
compares the number of registers used in a straightforward
implementation with the average number of registers used in our
implementation. This table shows clearly the advantage of our
implementation.
\begin{Figure}
\centering
\begin{tabular}{|c|c|c|c|c|}
\hline
®isters&\multicolumn{3}{c|}{average registers}\\
\cline{3-5}
nodes&(straight-&extended&unary-binary&binary\\
&forward)&binary trees&trees&
search trees\\
&&($\sqrt{\pi n}$) \cite{AverageHeight}&
($\sqrt{3\pi n}$) ~\cite{BinaryTrees}&
($4.311\log n$) \cite{BinarySearchTrees}\\
\hline
\ds8& \ds80& \ds61.12& \ds94.15& \ds51.04\\
\ds9& \ds90& \ds65.86& 100.89& \ds55.02\\
10& 100& \ds70.44& 107.37& \ds58.80\\
11& 110& \ds74.91& 113.64& \ds62.41\\
12& 120& \ds79.26& 119.71& \ds65.87\\
20& 200& 111.34& 163.56& \ds90.48\\
30& 300& 147.37& 211.33& 117.31\\
40& 400& 180.89& 254.75& 132.58\\
50& 500& 212.80& 295.37& 143.54\\
\hline
\end{tabular}
\caption{The numbers of registers used by a straightforward implementation
(second column) and by our modified implementation (third to fifth column)
of the RT~algorithm are
given for different types of trees and randomization models.
The formula in parentheses indicates the average height of the respective class
of trees, as depending on the number of nodes.}
\label{Stat}
\end{Figure}
\section{The user interface}\label{Interface}
\subsection{General design considerations}
The user interface of \TreeTeX{} has been designed in the spirit of
the thorough separation of the logical description of document components
and their layout; see~\cite{DocumentFormatting,GML}. This concept
ensures both uniformity and flexibility of document layout and frees
authors from layout problems which have nothing to do with the
substance of their work. For some powerful implementations and projects
see \cite{Tables,Karlsruhe,LaTeX,Grif,Scribe}.
In this context, the description of a tree is given in a purely
logical form, and layout variations are defined by a separate style
command which is valid for all trees of a document.
A second design principle is to provide defaults for all specifications,
thereby allowing the user to omit many definitions
if the defaults match what he or she wants.
The node descriptions of a tree must be entered in postorder.
This fits the internal representation
of \TeX{}trees best. Although this is a natural method of describing a
tree, a user might prefer more flexible description methods.
However, note that instances of well defined tree classes can be described
easily by \TeX{} macros. In section~\ref{ExampleClasses}. we give examples of macros
for complete binary trees and Fibonacci trees.
\TreeTeX{} uses the picture making macros of \LaTeX. If \TreeTeX{} is used with
any other macro package or format, the picture macros of
\LaTeX{} are included automatically.
\subsection{The description of a tree}
The description of a tree is started by the command \verb.\beginTree.
and closed by \verb.\endTree. (or \verb.\begin{Tree}. and
\verb.\end{Tree}. in \LaTeX). The description can be
started in any mode; it defines a box and two dimensions. The
box is stored in the box register \verb.\TeXTree. and contains the
drawing of the tree. The box has zero height and width, and its depth
is the height of the drawing. The reference point is in the
center of the node of the tree. The dimensions are stored in the
registers \verb.\leftdist. and \verb.\rightdist. and describe
the distance between the reference point and the left and
right margin of the drawing. These data can be used to position the
drawing of the tree.
Note that the \TreeTeX{} macros don't contribute anything to the current
page but only store their results in the registers
\verb.\TeXTree., \verb.\leftdist., and \verb.\rightdist.. It is the
user's job to put the drawing onto the page, using the
commands \verb.\copy. or \verb.\box. (or \verb.\usebox. in \LaTeX).
Each matching pair of \verb.\beginTree. and \verb.\endTree. must
contain the description for only \emph{one} tree.
Descriptions of trees cannot be nested and
new registers cannot be allocated inside
a matching pair of \verb.\beginTree. and \verb.\endTree..
As already stated, each tree description defines the nodes of the tree in
postorder, that is, a tree description is a particular sequence of node
descriptions.
A node description, in turn, consists of the macro \verb.\node.,
followed by a list of node options, included in braces. The list
of node options may be empty. The node options describe the labels,
the geometric shape (type), and the outdegree of the node. Default values are
provided for all options which are not explicitly specified.
The following node options are available:
\begin{enumerate}
\item[1.] \verb.\lft{