140x Filetype PDF File size 0.21 MB Source: tug.org
150 TUGboat, Volume 25 (2004), No. 2 \newcount\n Software & Tools \newcommand{\astsslow}[1]{% \n=#1 \xdef\asts{}% A \loop\ifnum\n>0 \xdef\asts{\asts*}\advance\n-1 PerlT X: Defining LT X macros using Perl E E Scott Pakin \repeat} (a) Slow version from The T Xbook Abstract E A Although writing documents with LT X is straight- E \newcount\n A forward, programming LT X to automate repetitive E \newcommand{\astsfast}[1]{% tasks—especially those involving complex string \n=#1 manipulation—can be quite challenging. Many op- \begingroup erations that a novice programmer can express easily \aftergroup\edef\aftergroup\asts\aftergroup{% in a general-purpose programming language cannot \loop \ifnum\n>0 \aftergroup*\advance\n-1 A be expressed in LT X by any but the most expe- \repeat E A \aftergroup}\endgroup} rienced LT X users. PerlT X attempts to bridge E E A the worlds of document preparation (LT X) and E (b) Fast but non-scalable version from The T Xbook general-purpose programming (Perl) by enabling an E A author to define LT X macros in terms of ordinary E Perl code. \newcommand{\asts}{} 1 Introduction \perlnewcommand{\astsperl}[1] {’\renewcommand{\asts}{’ . ’*’ x $_[0] . ’}’} Although T X is a Turing machine and can there- E (c) Fast PerlT X version fore express arbitrary computation, the language is E not conducive to programming anything sophisti- cated. As in an assembly language, arithmetic ex- Figure 1: Macro to define \asts as a sequence of N pressions are written in terms of register modifica- asterisks tions (e.g., “\advance\myvar by 3”) and relational expressions involving conjunction and disjunction Perl script supports only one-way communication: are constructed from nested comparison opera- A Perl can pass information to LT X but not the other tions (e.g., “\ifnum\myvar>10 \ifnum\myvar<15”). E way around. Loops are expressed in terms of tail-recursive macro In this article, we present PerlT X, a package evaluation. The only forms of string manipulation E that consists of a Perl script (perltex.pl) and a are single-token lookahead (\futurelet) and macro A LT X2ε style file (perltex.sty). The user simply argument templates that either match a pattern or E installs perltex.pl in an executable directory and abort T X. Finally, there are scalars but no aggre- A E perltex.sty in a LT X2ε style-file directory, in- gate data types (although these can sometimes be E corporates “\usepackage{perltex}” into any doc- faked with clever use of macro expansion). While uments which need PerlT X’s features, and com- E A the LT X kernel and various packages slightly raise E piles such documents using perltex.pl instead of the level of programming abstraction, the typical the ordinary latex command. Together, perltex.pl programmer is rapidly frustrated when attempting and perltex.sty give the user the ability to define to code anything nontrivial. A LT X macros in terms of Perl code. Once defined, Perl, in contrast, offers a rich programming en- E a PerlT X macro becomes indistinguishable from vironment with most of the features one expects E A any other LT X macro. PerlT X thereby combines from a modern high-level language. However, Perl E E A LT X’s typesetting power with Perl’s programma- has no inherent support for document typesetting. E bility. For short or highly repetitive documents, it is rea- sonable to write a Perl script that outputs a .tex file 1.1 Asimple example and runs it through latex. However, it is generally APerlT X macro definition can be as simple as inconvenient to include a full-length article in its en- E tirety within a Perl script just so it can invoke some \perlnewcommand{\hello}{"Hello, world!"} simple function which is easier to express in Perl which is essentially equivalent to: A A \newcommand{\hello}{Hello, world!} than in LT X. Furthermore, a LT X-generating E E TUGboat, Volume 25 (2004), No. 2 151 % Given a list of words, build up a \measurements macro as alternating % words and word width in points, sorted by order of increasing width. \perlnewcommand{\splitandmeasure}[1]{ return "\\edef\\measurements{}%\n" . join ("", map "\\setbox0=\\hbox{$_}%\n" . "\\edef\\measurements{\\measurements\\space $_ \\the\\wd0}%\n", split " ", $_[0]) . "\\sortandtabularize{\\measurements}%\n"; } % Given the \measurements macro produced by \splitandmeasure, output a % two-column tabular showing each word and its width in points. \perlnewcommand{\sortandtabularize}[1]{ %word2width = split " ", $_[0]; return "\\begin{tabular}{|l|r|} \\hline\n" . " \\multicolumn{1}{|c|}{Word} &\n" . " \\multicolumn{1}{c|}{Width} \\\\ \\hline\\hline\n" . join ("", map (" $_ & $word2width{$_} \\\\ \\hline\n", sort {$word2width{$a} <=> $word2width{$b}} keys %word2width)) . "\\end{tabular}\n"; } A Figure 2: A PerlT X-defined LT X macro that outputs a table of words sorted by typeset width E E A (The extra " characters delimit a string constant in with basic Perl-programming and LT X macro- E Perl.) writing skills. Figure 1(c) presents an \astsperl To better motivate the use of PerlT X, con- macro that takes an argument and returns a E A sider the first programming challenge in the “Dirty \renewcommand string which LT X subsequently E Tricks” appendix of The T Xbook [3]: construct a evaluates. \astsperl{10000} takes less than a sec- E macro that accepts an integer N and defines an- ond to run on the same 2.8GHz Xeon system as did other macro, \asts, to be a sequence of N asterisks. the previous macros and uses no T X primitives, E A A Figure 1(a) presents a LT X wrapper, \astsslow, only ordinary LT X and Perl commands. E E for the initial T Xbook solution. Besides rely- E 1.2 Amore complex example ing on a set of T X primitives which are unlikely E A to be familiar to a LT X user, the code is slow; One of PerlT X’s capabilities which is not available E E \astsslow{10000} takes over 6 seconds to run on with a Perl script that outputs a .tex file is the A the author’s 2.8GHz Xeon-based workstation. ability to pass data bidirectionally between LT X E A Figure 1(b) presents a LT X version of the and Perl. Suppose, for example, that you wanted to E “fast” solution from The T Xbook. \astsfast is write a macro that accepts a string of text, splits it E highly unintuitive; it exploits artifacts of macro ex- into its constituent space-separated words, and out- pansion and execution that occur when used in the puts a table of those words sorted by their typeset A context of the T X \aftergroup primitive. Fur- width. Neither LT X nor Perl can easily do this on E E A thermore, it squanders space on T X’s input and its own. LT X can measure word width but cannot E E save stacks, limiting the number of asterisks to fewer easily split a string into words or sort a list; Perl than 300 when run using the default latex program cannot easily determine how wide a word will be that ships with teT X v1.02. when typeset but does have primitives for splitting E In contrast to The T Xbook’s solutions, the and sorting strings. E PerlT X solution is fast, scalable, and should A PerlT X macro to do the job, named E E be comparatively easy to understand by anyone \splitandmeasure, is presented in Figure 2. It 152 TUGboat, Volume 25 (2004), No. 2 A \edef\measurements{}% ing Perl to output LT X code which measures each E \setbox0=\hbox{How}% A word (Figure 3(a)). LT X then evaluates that E \edef\measurements{\measurements\space How \the\wd0}% code, producing the definition of \measurements \setbox0=\hbox{now}% shown in Figure 3(b) followed by an invocation of \edef\measurements{\measurements\space now \the\wd0}% \sortandtabularize. Control once again passes to \setbox0=\hbox{brown}% \edef\measurements{\measurements\space brown \the\wd0}% Perl, which sorts \measurements by word width and \setbox0=\hbox{cow?}% A outputsaLT Xtabularenvironment(Figure3(c)). \edef\measurements{\measurements\space cow? \the\wd0}% E A LT X then evaluates the tabular, producing the \sortandtabularize{\measurements}% E typeset output shown in Figure 3(d). (a) Result of the call to \splitandmeasure{How Macros such as \splitandmeasure which pass now brown cow?} A A control from LT X to Perl to LT X to Perl E E A and back to LT X are comparatively easy to im- E How 19.44447pt now 17.50003pt brown 26.97227pt plement with PerlT X—\splitandmeasure con- E cow? 21.11113pt sists of a single Perl statement; its helper macro, (b) Final contents of \measurements after evaluat- \sortandtabularize, consists of only two Perl ing the code in Figure 3(a) statements. However, it would be very difficult to implement comparable functionality without the help of PerlT X. E \begin{tabular}{|l|r|} \hline Therest of this article proceeds as follows. Sec- \multicolumn{1}{|c|}{Word} & tion 2 highlights some of the design decisions that \multicolumn{1}{c|}{Width} \\ \hline\hline went into PerlT X’s implementation. We contrast E now & 17.50003pt \\ \hline those design decisions to the ones made by similar How & 19.44447pt \\ \hline projects in Section 3. Section 4 describes the mech- cow? & 21.11113pt \\ \hline A anisms PerlT X uses to transfer data betwen LT X brown & 26.97227pt \\ \hline E E A and Perl. Defining Perl macros in LT X was the \end{tabular} E greatest challenge in implementing PerlT X and re- E A (c) Result of the call to quired some fairly sophisticated LT X trickery. The E \sortandtabularize{\measurements} solutions that were developed are described in Sec- tion 5. By comparison, the Perl side of PerlT X E Word Width is comparatively straightforward and is described now 17.50003pt briefly in Section 6. Section 7 presents some av- How 19.44447pt enues for future enhancements to PerlT X. Finally, E cow? 21.11113pt we draw some conclusions in Section 8. brown 26.97227pt 2 Design decisions (d) Final typeset table There are multiple ways that PerlT X could have E Figure 3: Overall PerlT X processing of been implemented. The following are the primary E alternatives: \splitandmeasure{How now brown cow?} • Use the semi-standard “\write18” mechanism to invoke the perl executable. • Patch the T X executable to interface with the accepts a string, splits it into words, and writes E Perl interpreter. A LT X (or more accurately in this case, T X) code E E A • Implement a Perl interpreter in LT X. which builds up a \measurements macro consist- E A • Construct macros that enable LT X to commu- ing of alternating words and word widths. This E code is followed by a call to a second PerlT X nicate with an external Perl interpreter. E (helper) macro, \sortandtabularize, which ac- Thefinaloptionistheonethatwasdeemedbest cepts a list of alternating words and word widths for PerlT X. The “\write18” approach is a secu- E (i.e., \measurements), sorts the list by word width, rity risk; enabling it (e.g., using the -shell-escape A and outputs a tabular environment for LT X to command-line option present in some T X distri- E E A typeset. butions) permits not only PerlT X but any LT X E E Figure 3 illustrates the step-by-step opera- package to execute arbitrary programs on the user’s tion of \splitandmeasure. Processing begins with system. Patching T X is inconvenient for the user, E A LT Xinvokingthe\splitandmeasuremacro, caus- who will need to recompile T X (plus pdfT X, ε- E E E TUGboat, Volume 25 (2004), No. 2 153 T X, pdf-ε-T X, Ω, and any other T X-based sys- which works like \input but accepts the name of a E E E A tem for which the user wants to add Perl support) Scheme file rather than a T X or LT X file. When E E A the Scheme interpreter evaluates the given file, then re-dump the LT X2ε format file for each Perl- E enhanced build of T X. Implementing a Perl inter- output procedures such as newline and display E A preter in LT X has the advantage of not requiring a write into the T X input stream. Two new pro- E E A separate Perl installation. However, a LT X-based cedures, pool-string and get-cmd, provide access E Perl interpreter, besides being extremely difficult to to T X internal state. As with Shibakov’s PerlT X, E E implement, would necessarily support only a small sT Xme’s tight integration with T X comes at the E E subset of Perl, as much of the language cannot be cost of having to recompile T X and re-dump all of E expressed in terms of the mechanisms provided by the format files before the extension language can T X. be used. E As this article will demonstrate, providing T X2page [7] uses also uses Scheme as a T X E E A LT X-level mechanisms to facilitate communication extension language. However, its design is closer to E A between LT X and an external Perl interpreter en- that of (this paper’s) PerlT X than to sT Xme’s. E E E ables safe execution of Perl code, ease of installa- T X2page provides an \eval macro which brackets E tion, compatibility with any underlying T X imple- Scheme code. The document is first compiled using E mentation, and access to every feature of the Perl the ordinary latex executable. As part of that pro- language. cess, \eval simply writes its argument to a file. The 3 Related work user then runs tex2page, which invokes the Scheme interpreter on the extracted Scheme code and writes A PerlT X is not the first system that attempts to the resulting LT X code to a file. Finally, the user E E A augment LT X macro programming with a general- re-runs latex and, on this pass, \eval loads the E A purpose programming language. However, Perl- Scheme-produced LT X code into the document, E T X’s approach, as outlined in the previous section, where it is typeset normally. Although T X2page’s E E makes it unique relative to other, similar systems. multi-pass approach supports two-way communica- A Note that many of the following systems support tion betwen LT X and Scheme, it does require an E A not only LT X but other formats as well (e.g., Plain extra run of tex2page and an extra run of latex for E T X, ConT Xt, and Texinfo); for the purpose of ex- each nesting level. For large documents or heavily E E A position we limit our discussion to LT X. nested \eval calls, this can be slow and tedious. E After releasing PerlT X, the author discovered PerlT X, in contrast, requires no more latex runs E E an existing program written by Alexander Shiba- than the document would otherwise require. kov also called PerlT X [6]. Unlike the PerlT X The idea behind PyT X [1] is to use Python, E E E A described in this paper, Shibakov’s version is im- not LT X, as the document’s top-level language. E plemented as a patch to T X. That is, the user With PyT X, the user’s Python code passes strings E E must recompile T X (and all its variants) with the to a T X dæmon [2] to evaluate. PyT X supports E E E A PerlT X patches and re-dump the desired formats. only one-way communication (i.e., Python to LT X E E A The result is that Perl is more integrated into T X but not LT X to Python). PerlT X, in contrast, E E E than is otherwise possible. All code between \perl supports two-way communication, which is neces- and \endperl is executed by Perl. Furthermore, sary when writing code in a general-purpose lan- Shibakov’s PerlT X also supports two-way commu- guage that requires access to typesetting informa- E nication between T X and Perl by enabling code tion such as string widths, page counts, or register E within a \perl...\endperl block to insert char- contents. rt m acters and control sequences into the T X input A ia [5] presents an integration framework E stream. While Shibakov’s PerlT X works with any based on re-entrant here documents which supports E A T Xformat—PlainT X,LT X,ConT Xt,Texinfo, communication among a variety of languages such E E E E A etc.—the PerlT X described in this paper works as Perl, Python, LT X, Ruby, and POV-Ray. Each E E A only with LT X. However, this paper’s PerlT X language can generate code to be executed by any E E has the important advantage of not requiring T X other language. The result of each execution (which E recompilation, which is tedious and may not be pos- itself may recursively generate code for additional sible when using a commercial T X implementation. languages) is code to be executed by the parent lan- E rt m Paraschenko takes a similar approach to Shi- guage. While A ia is a highly capable system, its bakov’s with his sT Xme [4], which uses Scheme power necessarily introduces an extra level of com- E rather than Perl as the T X extension language. plexity to the user. Relative to the generality of E rt m sT Xme adds a single command to T X: \stexme, A ia, PerlT X’s niche is that it enables users to E E E
no reviews yet
Please Login to review.