\input blue.tex \loadindexmacros \report \bluepictures\indmodelpic \bluechapter Creating an Index \beginsummary The creation of a modest index within a one-pass \TeX{} job has been treated. In general a proof run and a final run are needed. \endsummary Making an index is an art. The fundamental problem is \bluedisplaycenterline What to include in an index? Computer-assisted indexing is not simple either. Issues are \bitem the markup of keywords or phrases \bitem to associate page numbers \bitem to sort and compress raw Index Reminders (^{IR}s), and \bitem to typeset the result. \smallbreak My approach is to create proof indexes\Dash also called mini-indexes\Dash for each chapter and learn from those what should be included in the total index. I perceived this as very pleasant in practice. Even if you prefer \cs{makeindex} for the real index, this processing on the fly of a chapter index can be of great help.\ftn{It is said that the automatic generation of an index is a feature of the Literate Programming tools. For LP with \TeX{} as such, as for example Gurari's Pro\TeX, this on-the-fly indexing within \TeX{} can be used.} \bluehead Use I'll show how to mark up Knuth's four types of IRs, how to mark up accents, how to mark up font switching, and how to mark up spaces as part of the IR. \blueexample Markup, commands and resulting index The right column has been obtained via \bitem ^|\loadindexmacros|, at the beginning of the script \bitem ^|\sortindex|, at the place of indexing, and \bitem ^|\pasteupindex|, for the pasteup of the index. \smallbreak \thisverbatim{\catcode`\|=12 \catcode`!=12 \unmc \catcode`\*=0 } \begindemo Types of IR 0 ^{return} 1 ^|verbatim| 2 ^|\controlsequence| 3 ^\ Accents ^{\'el\`eve!}, font changing ^{\bf bold} and spaces ^{control\ symbol} Control sequences ^{\TeX, and \AmSTeX} ^{Lamport and \LaTeX} brackets ^{\tt< \rm and \tt>} \newpage ^{return} \newpage ^{return}%on purpose \sortindex\pasteupindex\bye *yields\obeylines \quad {\tt {}< \rm {}and \tt {}>}{} {\oldstyle1} \quad {\bf {}bold}{} {\oldstyle1} \quad {control\ symbol}{} {\oldstyle1} \quad {\tt \char 92\hbox {controlsequence}}{} {\oldstyle1} \quad {\'el\`eve!}{} {\oldstyle1} \quad {Lamport and \LaTeX{}} {\oldstyle1} \quad {\TeX, and \AmSTeX{}} {\oldstyle1} \quad {return}{} {\oldstyle1}--{\oldstyle3} \quad $\langle \hbox {syntactic\ quantity}\rangle ${} {\oldstyle1} \quad {\tt verbatim}{} {\oldstyle1} \enddemo The representation of page numbers as a range comes out automatically. \exercise What makes a good index? Of course this is a million-dollar question. Let us concentrate on the number of entries and on the number of page numbers per entry. Which of the two extremes sketched below is the better one in your opinion? One with many entries pointing to issues spread throughout the book\Dash like \TB{} ;-))), and pushing the limits just for the imagination, an index with pointers to related work on the internet, accessible by just clicking the mouse\Dash or one with few page numbers per entry\ftn{Courtesy Erik Frambach.}? \answer As usual it all depends on your application. End of answer. But\Dash there is always a but\Dash the complaint I heard most about \TB{} was that the information is spread all over, and that it is hard to find what you are looking for. Therefore I consider a few page numbers per entry beneficial. (Let us forget about the intrinsic complexity of the subject, certainly at the time.) BLUe's format supports scrutinizing parts of an index, because it is so easy to generate an index per chapter on the fly. It is hardly not more difficult than generating a table of contents. An index per chapter can be scrutinized more easily, and redundancies removed. That the index provides a mechanism to link things over chapters is a good thing, however. Don't misunderstand me. But don't overuse it, IMHO, with all respect. Remember DeVinne's adage `The last thing to learn is simplicity.' %end answer \bluehead Markup of Index Reminders IR-s are at the heart of the process. ^^{IR,\ markup} Knuth distinguished {\oldstyle4} types to facilitate the outside processing. I'll adopt his IRs syntax and types. \bluesubhead Syntax Knuth's IRs obey the following syntax. ^^{IR,\ syntax} \begincenterverbatim !]!!!]. !endcenterverbatim The digits {\oldstyle0}, {\oldstyle1}, {\oldstyle2}, or {\oldstyle3} denote the types: words, verbatim words, control sequences, and syntactic quantities. A user does not have to bother about the digits nor about the page numbers. Knuth has adopted the accompanying conventions for the word(s) of IRs.\ftn{See \TB{} {\oldstyle424}, for the IR types, and what is typeset in the result. In \cs{vref} the markup is inserted as replacement text of \cs{next}. What is set in the index is governed by the macros which are included after \cs{begindoublecolumns} in the \TeX book script.} $$\vbox{\offinterlineskip\def\tstrut{\vrule height2.5ex depth.5ex width0pt} \halign{\tstrut#\hfill\quad\vrule\quad&#\hfill\quad\vrule\quad&#\hfill\cr Mark up&Typeset in copy$^*$&IR \cr \noalign{\hrule} |^{...}| &\dots &|... !!0 |$\langle page\, no\rangle$.\cr |^!vrt...!vrt| &|!vrt...!vrt|& |... !!1 |$\langle page\, no\rangle$.\cr |^!vrt\...!vrt| &|!vrt\...!vrt| & |... !!2 |$\langle page\, no\rangle$.\cr |^\<...>|&$\langle\dots\rangle^{**}$& |... !!3 |$\langle page\, no\rangle$.\cr \noalign{\vskip.5ex\hrule width1cm\relax\vskip1ex} \multispan3{\quad$^*\,$\vrt\dots\vrt\ denotes manmac's, TUGboat's,\dots verbatim \hfil}\cr \multispan3{\quad$^{**}\,$in \cs{rm} \hfil}\cr }}$$ For the user the word(s) is (are) important. The markup allowed for the IRs and the result in the copy are given in the accompanying table. \bluesubhead Markup The markup for IRs is near to natural. Precede the entry by a circumflex, or a double one in case of a silent\ftn{Silent IRs mean that these will appear only in the index, not on the page.} index entry. \blueexample IR markup \thisverbatim{\catcode`\|=12 \catcode`\^=12 \catcode`!=12 \catcode`*=0 \unmc} \beginverbatim ^{\'el\`eve!}^|verbatim text|^|\controlsequence|^\ ^^\ %for silent ones, double the ^ {\sl^{ligatures}} |'$|^|\,||$''|%from the TeX book script ^^{markup commands, see control sequences} ^{Lamport and \LaTeX} %text and control sequences with sort keys *endverbatim \thissubsubhead{\runintrue} \bluesubsubhead Spaces \par are difficult as always. ^^{IR\ and\ spaces} In the IR they separate parts of the IR and are used in the word part. \bitem Just typing a space has as an effect that it will be neglected during sorting \bitem The markup `\cs{\char32}', a control space, will yield a space subject to sorting, according to the ordering table \bitem \cs{space} as markup will be neglected during sorting. This token is default member of the set of control sequences to be ignored. It will be set in the index as \cs{\char32}. \smallbreak \exercise What to do when part of a title should reappear in the index? \answer The naive approach is to enclose that part by braces and precede it by a circumflex. However, that goes wrong because a title is stored and reused in many places. So copy the words and mark them as a silent IR. \blueexample Spaces \thisverbatim{\catcode`\|=12 } \begindemo ^{\space}%an ignored cs ^{a\ a} %control space ^{aa} ^{a\ b} ^{a \TeX} ^{a\ \bf a} ^{\TeX book} ^{xyz beta}%space neglected in %sorting ^{xyza} ^|\space| \sortindex\pasteupindex\bye !yields \noindent Sorted result in file index.srt \thisverbatim{\catcode`\!=12 \catcode`\;=0 } \beginverbatim \space {} !0 1. a\ \bf a{} !0 1. a\ a{} !0 1. a\ b{} !0 1. aa{} !0 1. a \TeX {} !0 1. space{} !2 1. \TeX book{} !0 1. xyza{} !0 1. xyz beta{} !0 1. ;endverbatim \enddemo Explanation. \cs{space} belongs to the set of control sequences to be ignored, ^^{ignored\ control\ sequences} ICSs for short. This means that it is skipped with respect to sorting, except when it occurs as the last token of the word part. In that case they are ordered as a space, i.e., according to the lowest value. This explains the position of `\cs{space}.' `\cs{TeX},' and `\cs{TeX} book,' are subject to the default sorting keys. `xyza' precedes `xyz beta,' because the space is silent. When word ordering is preferred a \cs{\char32}, a control space, must be included. \bluehead Special tokens Tokens are either neglected or replaced by another sequence while sorting. \bluetex{} provides two sets of tokens to be ignored while sorting: ^|\conseqs| and ^|\consyms|.\ftn{There are two sets because of the handling of the space after the token in the result.} Replacing a control sequence by another sequence is called associating a sorting key to the control sequence. Active symbols can't be part of the IR, for the moment. \bluesubhead Tokens to be ignored In practice I needed things like \cs{tt} as part of the IR, which must be neglected while sorting.\ftn{The reason is that {\tt <, and >} are used, and printed wrongly.} I decided to ignore those tokens while sorting and to include the tokens in the final index.elm as such. Default \bluetex{} knows about the following sets of tokens to be ignored. \begincenterverbatim \conseqs{\c\space\bf\it\rm\tt\sub\relax} \consyms{\`\'\"\^\~} !endcenterverbatim \bluesubhead Sorting keys In order to extend a set, use the macro \cs{add}. \blueexample Use of sorting keys Default \bluetex{} provides the following sorting keys. \begincenterverbatim \srtkeypairs{\AmSTeX{amstex} \LAMSTeX{lamstex} \LaTeX{latex} \TeX{tex} \PS{PostScript}} !endcenterverbatim Suppose that we have \cs{fourtex} and that we like this to be sorted as `4tex.' This can be done by extending the set of ^|\srtkeypairs|, ^^|\add| as follows. \thisverbatim{\unmc} \begindemo \add\fourtex{4tex}to\srtkeypairs Copy with ^{IR \fourtex} ^{IR 1} ^{IR 5} ^{IR a} % \sortindex %with 4tex for \fourtex \pasteupindex%Set `IR \fourtex{} %' \bye !yields then the file index.srt will contain the IRs \thisverbatim{\catcode`\!=12 \catcode`\;=0 } \beginverbatim IR 1 !0 . IR \fourtex{} !0 . IR 5 !0 . IR a !0 . ;endverbatim \enddemo with \cs{fourtex} sorted on 4tex. \exercise What to do when `to' is part of the sorting key? \answer Add an extra level of braces. \bluehead Ordering A fundamental issue with indexes is the ordering. ^^{ordering} The ^{ASCII} table is not suited because lowercase and uppercase letters differ by 32. I decided to rank these as equal, more precisely to assign the lowercase ASCII values to both. I prefer from the accompanying table the {\oldstyle1}$^{st}$ column to the {\oldstyle2}$^{nd}$ one. Moreover, accented letters are not part of ASCII. How should we order for example e, \'e, \`e, \^e, \"e? I decided to rank accented letters equal to those without an accent, because I prefer from the accompanying table the {\oldstyle3}$^{rd}$ column to the {\oldstyle4}$^{th}$ one. I know that non-letters precede letters, but what about their relative ordering? I decided to stay as close as possible to the ASCII ordering. Then there is the problem of digits. In IRs they come as part of the word(s) and as page numbers. For the latter I used the numerical ordering. For the former I used the alphabetical ordering.\ftn{I could have applied a look ahead mechanism and use numerical ordering throughout. Maybe another time.} Furthermore, a user can select the so-called ^{word\ ordering},\ftn{This means that a space precedes all letters. A space as such is neglected in the ordering.} by \cs{\char32}, \TeX nically a control space, as markup for a space. Personally, I like from the accompanying table the {\oldstyle5}$^{th}$ column better than the {\oldstyle6}$^{th}$. \def\btablecaption{} \def\footer{} \nonframed \def\rowstblst{} $$\def\header{lower vs.\ upper case\cs accents vs.\ unaccented\cs word ordering} \vruled\btable{\vtop{\halign{&\tstrut\quad#\hfil\cr el & el \cr El\`eve & em \cr em & El\`eve \cr}}\cs \vtop{\halign{&\tstrut\quad#\hfil\cr el & el \cr \hbox{\'el\`eve} & em \cr em & \hbox{\'el\`eve} \cr}} \cs\vtop{\halign{&\tstrut\quad#\hfil\cr sea lion & seal \cr seal & sea lion\cr}} }%end \btable $$ \bluehead Typesetting the index The specifications for typesetting a \bluetex{} index are \bitem represent the four IR types the same as in the \TeX book \bitem set in two-columns, balanced, possibly preceded by one-column copy \bitem set subsidiary entries analogous to the \TeX book \bitem indent continuation lines by {\oldstyle2}em \bitem indent subsidiary entries by {\oldstyle1}em %\bitem underline page numbers which represent the definition or % the main source of information %\bitem represent a page number in italics when that page contains % an instructive example of the concept in question. \smallbreak Users can edit index.elm\Dash read: add markup\Dash and provide the necessary macros in for example \cs{preindex}. In short follow Knuth. To please Frans Goddijn I introduced the tag ^|\numberstyle|, by default equal to \cs{oldstyle}. \exercise And what about subentries? \answer My approach is to consider subentries as a typesetting problem in the sense that the full entries are specified and only when sorted and typeset redundant first parts can be suppressed, if one considers this better. This is similar to how BLUe's format system typesets references, inspired by the \AMS. For the moment I did not implement subentries handling, because as of {\oldstyle1995} I consider it of low priority. \bluehead Customization A user might wish to interfere in places \bitem to include other tokens to be ignored while sorting \bitem to supply an ordering of his/her own \bitem to enrich the sorted and compressed file index.elm. \smallbreak \bluesubhead Adding tokens What are reasonable requirements to impose upon the handling of markup control sequences (cs for short)? In my opinion \bitem the cs must be defined \bitem ^|\makexref| writes the cs unexpanded \bitem ordering? unknown, and therefore must be supplied \bitem ^|\setupnxtokens| guards that the cs-s are written %, unexpanded, to ^{index.srt} and ^{index.elm}. \smallbreak As a consequence I decided to neglect the `in between' control sequences while sorting. For those who favour a one-pass job, I have provided the following, though.\ftn{It is simpler to add those control sequences to index.elm.} The extension of a set of tokens can be done via ^^|\add| \beginverbatim \add\hfil to\conseqs or \add\`to\consyms or \add\hfil{hfil}to\srtkeypairs %with auxiliary \def\add#1to#2{...} !endverbatim Each element from \cs{conseqs} is redefined in such a way that the control sequence token is written to the file with a space appended.\ftn{\cs{noexpand} is used instead of \cs{string}.} \bluesubhead Modifying ordering A general way is to `copy' the ordering table and to modify it.\ftn{My \cs{fifo} is just a shortcut, which also prevents typos in assigning the ASCII values. For \cs{fifo}, see my `FIFO and LIFO sing the BLUes.'} And what about a macro to add to the table? This can be done easily, and superficially looks convenient for an innocent user. At the moment I don't trust the macros to be worthwhile for an innocent user, unless a very modest index has to be made. And this completes the circle: different ordering is not wanted, I guess. \bluesubhead The process and files involved Like in manmac, \bluetex{} stores the raw IRs in the file index. ^^{IR,\ processes\ and\ files} The file index^^{index,\ file}\ftn{Default index is the value of the toks variable ^|\irfile|, which is used in \cs{sortindex}.} is read and stored in an array for internal sorting. After sorting, the number of entries is reduced,\ftn{Those which differ by page number are collected in one entry.} and the result is written to the file ^{index.srt}. Then, index.srt is transformed into the file index.elm.\ftn{Default ^{index.elm} is the value of the toks variable ^|\indexfile|, which is used in |\pasteupindex|. The transformation abandons the IR syntax. The part which specifies the kind of IR is deleted and the word part marked up accordingly.} The result is typeset via ^|\pasteupindex|. Schematically it comes down to the following. \bigskip $$\vbox{\hsize.5\hsize% \indmodelpic }$$ ^|\loadindexmacros| loads the index and sorting macros, and performs initializations. It is safeguarded against double loading.\ftn{I introduced this because I start each chapter with \cs{loadindexmacros}, independent from whether it is run on its own or as part of the total.} \bluesubhead Enriching the index This use is necessary when for example ^^{index,\ enrich} \bitem control sequences have to be typeset \bitem special symbols are needed, or \bitem cross-references within the index are required. \smallbreak The best way is to start from the ^{index.elm} file. \bluesubhead Typesetting the enriched file When the default name is used\Dash index.elm\Dash just say \cs{pasteupindex}. For another file name assign this name to the toks variable \cs{indexfile}, prior to the invocation of \cs{pasteupindex}. \bluehead Extras Ubdoubtedly people favour their own subset of \TeX, or more likely \LaTeX. There is good news. You don't have to use BLUe's format system. I gathered the sorting and indexing stuff as an independent self-contained set in the file plainindex.tpl. The bad news is that up till now I did not do much about preventing name clashes. \bluesubhead \TeX nical details The details with respect to indexing have been treated in `BLUe's Indexes,' and the sorting aspects have been treated in `Sorting in BLUe,' both available from the CTAN. \endinput \bye