\section*{Resolving trailing dots\ldots} Gregory Tucker-Kellogg\\ gtk@walsh.med.harvard.edu \subsection*{Introduction} Unlike \verb|WEB|, \verb|noweb| does not allow the use of trailing dots in chunk (section) names. \verb|Dots| corrects for this. It is similar but not identical to \verb|disambiguate|, an \verb|Icon| program to accomplish the same task. \verb|Dots| is written in \verb|perl|. Before it does much else, \verb|noweb| creates a markup description of a source file. That markup description is passed along (in both \verb|noweave| and \verb|notangle|) to other programs in the pipeline (\verb|totex| for \verb|noweave| and \verb|nt| for \verb|notangle|). \verb|Dots| intervenes after the markup stage as a filter. The chunk name references are passed in the form described in Ramsey's paper, i.e., \begin{quote} \leavevmode\rlap{\begin{tabular}{ll} \tt @defn {\rm\it name}&The code chunk named {\rm\it name} is being defined\\ \tt @use {\rm\it name}&A reference to code chunk named {\rm\it name}\\ \end{tabular}} \end{quote} If trailing dots are used in a chunk name, they will be passed along at the markup stage verbatim without any attempt at resolution. That's where \verb|dots| comes in. We require two passes over the noweb code as passed through \verb|markup|. The first pass picks out all of the unambigious chunk names and stores them in associative arrays. In between the passes, we expand the ambigious names and do some simple error checking. The second pass does a simple replace on incomplete names and writes output to the next stage of the pipeline. The choices for handling the input stream seems to be between sucking the whole markup into memory at once (as \verb|disambiguate| does) or storing the markup as a temporary file between the passes. The second is slower but will not break as the file gets bigger. We'll choose the first for now. \subsection*{Program outline} <<*>>= #!/usr/local/bin/perl while (<>) { # the first pass takes the input from STDIN <> } <> <> @ \subsection*{Representation} What's the best structure for the list of chunk names? It could just be a normal array, except we would have to check if a given name is already defined before adding it too the list. We could make an associative array, except we really don't have a key to associate. On the other hand, we could make a single associative array of names with associations ``complete'' and ``incomplete'' depending on the presence of dots. This would require no checking on predefinitions, and a key sorted list brings up each full chunk name as the {\em next} member of the list for which [[$completion{$identifier}=$complete]]. <>= if (/^@(defn|use)\s(.*)$/) { # we've found a name of some sort if (($truncated = $2) =~ s/\.\.\.$//) { # this one ends in dots. $completion{$truncated} ="incomplete"; $truncations{$.-1} = $truncated; $usage_type{$.-1} = $1; } else {$completion{$2} ="complete";} } push(lines,$_); @ \subsection*{Chunkname resolution} The associative array [[%completion]] contains all of the names. The associative array [[%truncation_table]] contains the line numbers of the names with trailing dots. We can change the values of [[%completion]] from ``complete'' and ``incomplete'' to a number representing the index of the appropriate completion. If there is more than one, we can print out a warning but still resolve on the closest name. <>= @namelist = sort(keys(%completion)); $j = $i = 0; while ($i < $#namelist) { #collect all the ambiguities in a row while ($completion{$namelist[$j]} eq "incomplete") { $ambiguity_found = 1; $j = $i + 1; } <> foreach $name (@namelist[$i..$j]) { $completion{$name} = $namelist[$j]; } $j=$i=$j + 1; undef($ambiguity_found); } @ After we've gotten the expansions of abbreviated chunk names, we still might run into a problem. First, if no correct expansion was established, we might just missassign the abbreviation. The expansion might still be ambiguous if more than one complete expansion can give the same abbreviation. The first case is a fatal error. The second can be resolved by seeing if a complete chunkname immediately following the first completion is a solution. If so, we take the first completion anyway but print a warning for the user. <>= if (defined $ambiguity_found) { $suggested = $namelist[$j]; $nextchance = $namelist[$j+1]; foreach $name (@namelist[$i..$j-1]) { if (substr($suggested,0,length($name)) ne $name) { die "FATAL ERROR: can't resolve @<<$name...>>\n" } } if ($completion{$nextchance} eq "complete") { foreach $name (@namelist[$i..$j-1]) { if (substr($nextchance,0,length($name)) eq $name) { print STDERR "WARNING--Ambiguous chunkname:\n"; print STDERR "\t<<${name}...@>> could be either\n"; print STDERR "\t<<$suggested@>> or\n\t<<$nextchance@>>\n"; print STDERR "I will use <<$suggested@>>\n" } } } } @ \subsection*{Printout} Finally, the [[%truncations]] and [[%usage_type]] arrays are put to work. We use the line numbers (as [[keys()]]) to pull up the truncations, and then associate truncations with completed names. Since we found everything on the first pass we don't have to scan each line for a [[@defn]] or [[@use]] statement. Note: this part of the program, analogous to {\em pass2} in \verb|disambiguate|, is different from \verb|disambiguiate|, which went through a search on the second pass. If we decided to store the markup in a temporary file after the first pass to save memory, we would change this section for blockwise printout. We still would not be forced to scan each line. <>= foreach $trunc_line (sort(keys(%truncations))) { $lines[$trunc_line] = "\@$usage_type{$trunc_line} $completion{$truncations{$trunc_line}}\n"; } print @lines; @