From 96204afa0954ce68a0354e7eee74853e4421d1d1 Mon Sep 17 00:00:00 2001 From: bdemsky Date: Fri, 3 Feb 2006 01:14:13 +0000 Subject: [PATCH] checking in cup --- Robust/cup/CHANGELOG | 313 ++++++++ Robust/cup/INSTALL | 107 +++ Robust/cup/INSTALL.QUICK | 17 + Robust/cup/LICENSE | 20 + Robust/cup/README | 48 ++ Robust/cup/cup_logo.gif | Bin 0 -> 8054 bytes Robust/cup/manual.html | 1452 ++++++++++++++++++++++++++++++++++++++ 7 files changed, 1957 insertions(+) create mode 100644 Robust/cup/CHANGELOG create mode 100755 Robust/cup/INSTALL create mode 100644 Robust/cup/INSTALL.QUICK create mode 100644 Robust/cup/LICENSE create mode 100644 Robust/cup/README create mode 100644 Robust/cup/cup_logo.gif create mode 100644 Robust/cup/manual.html diff --git a/Robust/cup/CHANGELOG b/Robust/cup/CHANGELOG new file mode 100644 index 00000000..5365a357 --- /dev/null +++ b/Robust/cup/CHANGELOG @@ -0,0 +1,313 @@ +CUP version 0.10k is a maintenance release. + +CUP will now accept a filename on the command-line if it is the last +argument and does not start with "-". This allows better GUI +integration. Some unix-isms in end-of-line handling have been fixed, +too; thanks to Jean Vaucher for the tip. + +The java_cup.runtime.Scanner interface has been refined to allow +the scanner to return null to signal EOF. JLex and JFlex users will +like this, as it means they can use the default scanner EOF behavior. + +Bruce Hutton , Zerksis Umrigar , +and Vladimir Antonevich all sent bug +reports touching on erroneous error recovery in the parser runtime. +Dr. Hutton provided the fixes that I've adopted; Zerksis sent a very +helpful CUP-vs-bison test case. If you're in a position to notice +correct/incorrect error recovery and this release works better for you +than previous ones, thank them --- and send me email so I know whether +we've quashed this bug for good. + +Klaus Georg Barthelmann caught an +oversight in the constructors for java_cup.runtime.Symbol. I've also +taken an obsolete constructor allowing specification of a start state +for some symbol to package-scope; if this causes anyone backwards +compatibility problems, email me and I will consider changing it back. + +C. Scott Ananian +Laboratory for Computer Science +Massachusetts Institute of Technology +Jul-24-1999 [CSA] +------------------------------------------------------------------------ +CUP version 0.10j adds new features. + +A "-version" command-line option is now accepted, which prints out the +working version of CUP and halts. This allows automatic version-checking, +for those applications which require it. + +Broadened the CUP input grammar to allow CUP reserved words in package and +import statements, and in (non)terminal labels. In addition, semicolons +after 'action code', 'parser code', 'init code', and 'scan with' sections +have been made optional (if language noise annoys you). Also, these four +sections may now appear in any order, instead of the strict ordering +previously required. Finally, you can now spell 'non-terminal' as either +"non terminal" (old way) *or* "nonterminal" without upsetting CUP. +[Flexibility requested by Stefan Kahrs ] +[Package and import reserved word issues noted by Frank Rehberger, + Brandon Schendel, and Bernie Honeisen, among others.] + +Clarified the parse table dumps generated by the -dump* options. + +I have added code to lr_parser to detect illegal Symbol recycling by the +scanner and to throw an Error in this case. The scanner must return +a fresh Symbol object on each invocation, because these objects are +tagged with parse state and added to the parse stack. Object sharing +does evil things to the parser; don't do it (you won't get away with it). +[Symbol recycling problems reported by Ken Arnold ] + +Improved scanner interface, designed by David MacMahon . +The new java_cup.runtime.Scanner interface is used by the default +implementation of lr_parser.scan(). See the manual for more details. +Old parsers will work with the new runtime, but parsers generated with +0.10j will not work with the runtime from earlier versions unless you +specify the (new) "-noscanner" option. + +C. Scott Ananian +Laboratory for Computer Science +Massachusetts Institute of Technology +Jul-24-1999 [CSA] +------------------------------------------------------------------------ +CUP version 0.10i is a maintenance release. + +A one-off bug in the parser error-recovery code has been caught and corrected +by Chris Harris . + +The fields in the emitted symbol class have been made public, rather than +package scope, since the class already was public. + +The issues formerly addressed in Appendix D (accessing parser methods/fields +from the action class) have been partially addressed by adding a new +private final field named 'parser' to the action object that points to +the parser object. THIS INTRODUCES A POTENTIAL INCOMPATIBILITY if you had +previously defined a field named 'parser' in the 'action code {: ... :}' +portion of your grammar. The solution is to rename your field. + +Finally, incorporated Jako Andras' suggestions to make CUP more friendly +to makefiles. + +A reminder: please submit bug-fixes or feature-additions as *patches*, not +complete archives. Your patch will have a greater chance of integration +into the distribution if you package each feature or fix as a separate patch, +instead of lumping everything together and leaving it to me to figure out +what you've changed and why. + +C. Scott Ananian +Laboratory for Computer Science +Massachusetts Institute of Technology +Feb-18-1999 [CSA] +------------------------------------------------------------------------ +CUP version 0.10h is a maintenance release. + +Starting with this version, CUP encodes the various parser tables as strings +to get around java's 64k method-size limitation. This allows larger +parse tables and thus more complicated grammars. + +Furthermore, a long-standing buglet that would cause CUP to occasionally +generate "Attempt to construct a duplicate state" internal errors has been +fixed. + +Another contributed Microsoft-compatible makefile has also been added +to the distribution. + +C. Scott Ananian +Laboratory for Computer Science +Massachusetts Institute of Technology +Feb-10-1999 [CSA] +------------------------------------------------------------------------ +CUP version 0.10g contains bug fixes, added functionality, and +performance improvements. Thanks to Matthias Zenger, Peter Selinger, +Jaroslaw Kachinarz, Ian Davis and others for contributions. + +- New command line option '-interface' added. This causes JavaCUP to + emit an *interface* with the symbol constants, instead of a + *class*. Without the command-line flag, behavior is identical to + v0.10f and before: the symbols are emitted as a class. +- (minor) Added toString() method to java_cup.runtime.Symbol and + modified the debugging parser to use it. This allows you to + override toString() to allow a more intelligible debugging parse. +- The CUP grammar has been extended to allow one to declare array types + for javaCUP terminals and non-terminals. Matthias Zenger first + suggested this feature; Peter Selinger was the first to show the + right way to do it. +- The symbols prefixed with CUP$ now have the parser class file name + added to the prefix as well, to allow more than one parser object + per package. Thanks to Jaroslaw Kachniarz for pointing out this + problem. +- Fixed bug that prevented one from invoking the parser multiple times. + To quote Ian Davis, who found and diagnosed the bug: + Repeat invocations of the same instantiation of lr_parser.java to parse + distinct input statements fail for the simple reason that the stack is + not emptied at start of parsing, but the stack offset is reset to 0. + This has been fixed. +- Fixed bug with implicit start productions not receiving a RESULT. +- Fixed bug with RESULT assignments that are not right-most in the + production. +- Updated documentation. + +Known issues: +- All known bugs have been fixed. +- The java_cup.runtime.SymbolStack / java_cup.runtime.intStack + performance hack originally suggested by Matthias Zenger has been + postponed to the next release. By eliminating typecasts and + synchronized methods, a substantial performance improvement can be + obtained. Backwards-compatibility issues have forced the postponement + of the code merge. + +C. Scott Ananian +Laboratory for Computer Science +Massachusetts Institute of Technology +3/24/98 [CSA] +------------------------------------------------------------------------ +CUP version 0.10f is a maintenance release. The code has been cleaned up +for JDK 1.1 functionality. No major functionality has been added; any bugs +in 0.10e are still in 0.10f. + +- Removed trailing semicolons from class definitions which upset strict + compilers (not Sun's javac, for some reason). +- Changed 'PrintStream's to 'PrintWriter's to eliminate deprecation + warnings. + +As of this release, the javaCUP code is being maintained by +C. Scott Ananian. Suggestions and bug-fixes should be sent to +cananian@alumni.princeton.edu. + +Known issues: + +- Precedence bug: rules unmarked by precedence information are treated + as if they had existing, but very low, precedence. This can mask + parser conflicts. +- Efficiency hack: java.util.Stack will be replaced in the next + release with a special-purpose stack to eliminate + performance-robbing type-casts. +- It has been suggested that the symbol *class* should be an + *interface* instead. This will be a command-line option in the next + release. + +C. Scott Ananian +Laboratory for Computer Science +Massachusetts Institute of Technology +12/21/97 [CSA] +------------------------------------------------------------------------ +CUP version 0.10e contains a few bug fixes from 0.10a + +- %prec directive now works correctly + fixed by cananian@princeton.edu +- Shift reduce conflicts are now correctly reported + fixed by danwang@cs.princeton.edu +- Error with reporting the positon of the error token also fixed + fixed by cananian@princeton.edu +- INSTALL script now has a slightly more complex test. +- foo.java.diff included for changes from previous release +- Fixed more bugs with reporting of shift reduce conflicts. + fixed by danwang@cs.princeton.edu +- Fixed bug introduced by previous fix patches from + Added '\r' as a whitespace character for the lexer suggested by + (dladd@spyglass.com) +- Fixed botched relase +Daniel Wang +Department of Computer Science +Princeton University + +Last updated: 9/12/97 [DW] +------------------------------------------------------------------------ +Changes and Additions to CUP v0.9e + +CUP version 0.10a is a major overhaul of CUP. The changes are severe, +meaning no backwards compatibility to older versions. + +Here are the changes: + +1. CUP now interfaces with the lexer in a completely different +manner. In the previous releases, a new class was used for every +distinct type of terminal. This release, however, uses only one class: +The Symbol class. The Symbol class has three instance variables which +are significant to the parser when passing information from the lexer. +The first is the value instance variable. This variable contains the +value of that terminal. It is of the type declared as the terminal type +in the parser specification file. The second two are the instance +variables left and right. They should be filled with the int value of +where in the input file, character-wise, that terminal was found. + +2. Terminal and non-nonterminal declarations now can be declared in two +different ways to indicate the values of the terminals or non-terminals. +The previous declarations of the form + +terminal {classname} {terminal} [, terminal ...]; + +still works. The classname, however indicates the type of the value of +the terminal or non-terminal, and does not indicate the type of object +placed on the parse stack. + +A declaration, such as: + +terminal {terminal} [, terminal ...]; + +indicates the terminals in the list hold no value. + +3. CUP doesn't use the Symbol class for just terminals, but for all +non-terminals as well. When a production reduces to a non-terminal, a +new Symbol is created, and the value field is filled with the value of +that non-terminal. The user must know that the terminal and non terminal +declarations specify a type corresponding to the type of the value field +for the symbol representing that terminal or non-terminal. + +4. Label references do not refer to the object on the parse stack, as in +the old CUP, but rather to the value of the value instance variable of +the Symbol that represents that terminal or non-terminal. Hence, +references to terminal and non-terminal values is direct, as opposed to +the old CUP, where the labels referred to objects containing the value +of the terminal or non-terminal. + +5. The RESULT variable refers directly to the value of the non-terminal +to which a rule reduces, rather than to the object on the parse stack. +Hence, RESULT is of the same type the non-terminal to which it reduces, +as declared in the non terminal declaration. Again, the reference is +direct, rather than to something that will contain the data. + +6. For every label, two more variables are declared, which are the label +plus left or the label plus right. These correspond to the left and +right locations in the input stream to which that terminal or +non-terminal came from. These values are propagated from the input +terminals, so that the starting non-terminal should have a left value of +0 and a right value of the location of the last character read. + +7. A call to parse() or debug_parse() return a Symbol. This Symbol is +of the start non-terminal, so the value field contains the final RESULT +assignment. + +8. CUP now has precedenced terminals. a new declaration section, +occurring between the terminal and non-terminal declarations and the +grammar specifies the precedence and associativity of rules. The +declarations are of the form: + +precedence {left| right | nonassoc} terminal[, terminal ...]; +... + +The terminals are assigned a precedence, where terminals on the same +line have equal precedences, and the precedence declarations farther +down the list of precedence declarations have higher precedence. left, +right and nonassoc specify the associativity of these terminals. left +associativity corresponds to a reduce on conflict, right to a shift on +conflict, and nonassoc to an error on conflict. Hence, ambiguous +grammars may now be used. For a better explanation, see the manual. + +9. Finally the new CUP adds contextual precedence. A production may be +declare as followed: + +lhs ::= {right hand side list of terminals, non-terminals and actions} + %prec {terminal}; + +this production would then have a precedence equal to the terminal +specified after the "%prec". Hence, shift/reduce conflicts can be +contextually resolved. Note that the "%prec terminal" part comes after +all actions strings. It does not come before the last action string. + +For more information read the manual, found in manual.html + +Frank Flannery +Department of Computer Science +Princeton University + +Last updated: 7/3/96 [FF] + + diff --git a/Robust/cup/INSTALL b/Robust/cup/INSTALL new file mode 100755 index 00000000..49708521 --- /dev/null +++ b/Robust/cup/INSTALL @@ -0,0 +1,107 @@ +#!/bin/csh -f +# +# Cup install and test script +# Scott Hudson 8/31/95 +# +# Last revision 7/3/96 (for v0.10a) +# By Frank Flannery +# +# Last revision 11/16/96 (for v0.10b) +# By Daniel Wang +# +# Updated version number 7/24/99 for 0.10k +# By C. Scott Ananian +echo +echo "================================" +echo "Installing and testing Cup v0.10k" +echo "================================" +echo + +# check for this directory in CLASSPATH +# +set cwd = `pwd` +set c_path = `printenv CLASSPATH` +if ($c_path !~ "*$cwd*") then + echo " " + echo "WARNING:" + echo "WARNING: The current directory does not appear in your CLASSPATH" + echo "WARNING: it will be added for this install/test only" + echo "WARNING:" + echo " " + setenv CLASSPATH $cwd':'$c_path + echo "CLASSPATH now set to " + printenv CLASSPATH +endif + +# change to the demo directory +# +echo " " +echo "changing to simple_calc subdirectory..." +echo "cd java_cup/simple_calc" +cd java_cup/simple_calc + +# remove old copies of parser.java and sym.java +# +echo " " +echo "removing any old copies of parser.java and sym.java..." +echo "rm -f parser.java sym.java" +rm -f parser.java sym.java + +# compile java_cup and run it against the demo program +# the -cs (for "checksource") option here will force the +# java_cup and java_cup.runtime source to be compiled prior +# to running it. +# +echo " " +echo "compiling java_cup then generating demo program..." +echo "java -cs java_cup.Main < parser.cup" +java -cs java_cup.Main < parser.cup + +# make sure parser.java and sym.java now exist +# +if ( ! -e parser.java) then + echo " " + echo "ERROR: for some reason parser.java was not created" + echo "ERROR: install was not successful" + exit 1 +endif +if ( ! -e sym.java) then + echo " " + echo "ERROR: for some reason sym.java was not created" + echo "ERROR: install was not successful" + exit 1 +endif + +# run the demo +# again, the -cs option will cause compilation of all the parts +# of the demo program (including parser.java and sym.java that +# should have been generated in the previous step). +# +echo "removing old test results..." +echo "rm -f test_results" +rm -f test_results +echo " " +echo "executing the demo program..." +echo "echo '1*-2+2;' | java -cs java_cup.simple_calc.Main >& test_results" +echo '1*-2+2;' | java -cs java_cup.simple_calc.Main >& test_results + +# compare with standard results +# +set res = `tail -1 test_results` +if ("$res" !~ "= 0") then + echo "ERROR: test program produced the wrong results" + echo "ERROR: output was:" + cat test_results + echo "ERROR: install was not successful" + rm -f test_results + exit 2 +endif + +# all is well +# +rm -f test_results +echo " " +echo "==============================" +echo "Install and test was successful" +echo "==============================" +exit 0 diff --git a/Robust/cup/INSTALL.QUICK b/Robust/cup/INSTALL.QUICK new file mode 100644 index 00000000..bf708b67 --- /dev/null +++ b/Robust/cup/INSTALL.QUICK @@ -0,0 +1,17 @@ +This is a quick installation guide for the CUP system. +For most people, the following steps will suffice: + + 1) extract the java_cup files somewhere, and ensure that + the extraction directory is in your classpath. + + This will ensure that java_cup.Main properly refers to the + Main class in the java_cup subdirectory of this release. + + 2) compile java_cup and its runtime system. + From the installation directory: + javac java_cup/*.java java_cup/runtime/*.java + +That's it! Read the manual now. + --Scott + +24-Mar-1998 diff --git a/Robust/cup/LICENSE b/Robust/cup/LICENSE new file mode 100644 index 00000000..f3f0299f --- /dev/null +++ b/Robust/cup/LICENSE @@ -0,0 +1,20 @@ +CUP PARSER GENERATOR COPYRIGHT NOTICE, LICENSE AND DISCLAIMER. + +Copyright 1996 by Scott Hudson, Frank Flannery, C. Scott Ananian + +Permission to use, copy, modify, and distribute this software and its +documentation for any purpose and without fee is hereby granted, +provided that the above copyright notice appear in all copies and that +both the copyright notice and this permission notice and warranty +disclaimer appear in supporting documentation, and that the names of +the authors or their employers not be used in advertising or publicity +pertaining to distribution of the software without specific, written +prior permission. + +The authors and their employers disclaim all warranties with regard to +this software, including all implied warranties of merchantability and +fitness. In no event shall the authors or their employers be liable +for any special, indirect or consequential damages or any damages +whatsoever resulting from loss of use, data or profits, whether in an +action of contract, negligence or other tortious action, arising out of +or in connection with the use or performance of this software. diff --git a/Robust/cup/README b/Robust/cup/README new file mode 100644 index 00000000..58bacd79 --- /dev/null +++ b/Robust/cup/README @@ -0,0 +1,48 @@ +This directory contains the CUP v0.10k release in source form. You should +find the following files and subdirectories in this release: + + README This file. + INSTALL.QUICK Quick installation instruction. + CHANGELOG A brief overview of the changes in v0.10k + java_cup A subdirectory containing CUP, runtime, and test sources. + cup_logo.gif A logo image used by the manual. + manual.html A user's manual in HTML format. (OLD) + INSTALL A shell script to install and test the system + +To get started quickly, read INSTALL.QUICK. If you are on a Windows +platform, you might want to look in the winnt subdirectory. +For complete installation information, keep reading. + +To install the release, copy the contents of this directory (if you +haven't done so already) into a "classes" directory accessible to the +java interpreter (i.e., a directory that is listed in the colon +separated list of directories in your CLASSPATH environment variable). +Note: if you have an older version of the system already accessible +from your CLASSPATH you will want to temporarily remove it to avoid +conflicts. + +Once files have been copied to an appropriate location, you should be able to +both compile and test the system by executing the INSTALL shell script +(sorry, but non Unix users are still on their own). Again, be sure that +you have placed these sources in a directory listed in your CLASSPATH +environment variable (or changed your CLASSPATH to include this directory). + +A manual page that explains the operation and use of the system can be found +in manual.html and from the CUP home page mentioned below. + +Bug reports regarding the installation +process or the system as a whole should be sent to + with "JavaCUP" in the subject. + +The CUP home page where the latest information regarding CUP can be found +(e.g., new releases) is: + + http://www.cs.princeton.edu/~appel/modern/java/CUP/ + +Enjoy, + +Scott Hudson +Graphics, Visualization, and Usability Center +Georgia Institute of Technology + +Last updated: 23-Jul-1999 [CSA] diff --git a/Robust/cup/cup_logo.gif b/Robust/cup/cup_logo.gif new file mode 100644 index 0000000000000000000000000000000000000000..6d25eda453401e2eda28f173661fe1ea146cf621 GIT binary patch literal 8054 zcmV-+ABo^cNk%w1VaNfP0J8u9|NsB)?(W^)-OS9)ySuxps;ZfpnTUvpcXxMYW@c4Y zRYXKYGcz+HA|eqH5dZ)HEC2ui0LTHD00RX7aL7rky*TU5yZ@jh2tgnag>fKBqP#6~ z7zJ^S<(aPStHd*cEkrCDkI1BQ$!t2G(5Q5SGYJMemJ61U8HSbW2&>ZifM4)*&2GEj z@VI==B;^LE70U-h(9{!lCU^yUTW5YXfOL$Fj*pO$l656kWITs3Gc^{1V_7|X37d

s;fnwS14tfFJ7D!u7t6Cv!txOzQ4f1a8-k_Ji0$JwSk_v$GTkyio(>@*4Nk~ z&Bh3OFAG1>%LBx(S>6iZExg(7?(gt%f|fha>3u=AmGmn1;K|a2#2ZMkpg|8KZV@mL zOIEZm%oH-j1dUdygBUYvtd_9S7;fnQxUlP(!(5q=;YeNF$g-u&Naajo$`s?zkeJT2 z(ClLKrO%&0<=`xEE)*AK0?FMXdeUe(q$`6;ok|r1PLNK~B)wX6re9)w_6=h>=P0%)@1So%KD+&ccBP4-!9Ez@Y%*#0V}BSfDsD zm;w(PP(a{7u>-pp5J#qvKrsV&Wf?SRpuo?yws=VZm`%GjZ4+|m1jg#M@b0dn{$5 z5W=wj9|-|w3&`z8OinX?SpxVl_NoHY4bV*m1}KoL zsV*b{UaMHRpebFu&SFFVH;CzZBN%kSpy>don)<^k@aZR3H+14D?zniiri=h}IcY>f zC*)y>D1#C@f)ItgduWjsbav$ijZQ!Shy#>{reuj>=71U^hDt$}10a@bV$&IbVPfiK zHbG}+5%Ab)2Xy8@#Sc`#8O4ZM_@bGL^@=IP9~?$aBw`ol zdLfHgnPL-^_H$>84uCNp8>b>;0L;!w?$lJvGOKCU1ekf~T!Nobw`SQCV5s5*96sBjuNj^# zLCG1wCINa5ATUJ#b9Na#UuP{m{&|M6E+8hu=qUh(h|E#&@dPJ7NhaWj4=}g`g`*;) z0B?a?_3yw(gYAk=aNQKV0cBvkz4(k{0f%i<=xGI@QoynX=wbNfLJj6lumjB{EkS1% zvVARLpfz_|-WP|g0ftE@$TMX#z>ERg8HhMU_G(~g7l(|IpO!|12hdTVIbzBihe8#p zTvZQeqFb27DiEfXt?xU!(@*ad0K5(YPg?zu+!3yj6G7cadGi?(yiSNEPG!Ic*kOP! zlt#cY@k|4cnb5=>zyOuGWibti9t3YlshZueG@-LqVzh&S(j={8hr`(GGN3&S$WKfV z0$j9awxS~c+U|x`ELvi^^RFnmZ~|du-1RP0L&IT^0{?O!61j7V-RXgY9?awKT2mLT zIe}hD!Uqd80LZ}z2Zyz@K%|CNnFhphLlxYBV!8u84>%E5d?TQ8#HXgVWhs$5^Nb9q z^GHlh3^Zj#PgDvP8l+7QlUTrxs%TgM42>psvBMT(z_&8u49s%}R16pZ_DZ-RaAq8u zB}_EvJ3TtHY9RFE5dgsrqZ|@J6{seuEJ8?D@(YxPi{1nCml!S~MwwJx&RP~SvGVwD zQ}AnFoi1>}CF8&W_~PqfcZI6h1jPowuu0}fEQ&~o_}4cg_l(vRykFcSzHU4kXw*iv9&}GG-p92Bi;oj z=A?fOOk->6%;hrVrwcSFellC)k3N)DrEU&pdqXDQ#5VxSSZAo{Y=nOFVA9HbYyr{# z=%aM|xChlZ)CJ=LS8a=Mlle4c3yS@kzeM2One^eYlD!+?@=4tG#Y#deDbr-Rc?*eb03FHMAlbc>$I6%OMfBJ zt-B&%upLY(Yh=*J-6E(TeN2Gq7|;a}cViUlvZ7L6@m<191Dl1a@lyga7dvHUltdX- z=He*AQu?Z|d~FR4-lCeg!j-gZDn;5l;93KSW@RwCayIqUmJ~FVZK^HA3v3wU=3wmr z6H`@bMiVK`c{c$jx^I+rjh72-SxS@IFppZ9kp>N)1rD9hz1o0&_oEq!!h{^ynr1#z zU5*-ZO^4ZF>%)1>p`Pk8tRvJ*q0N-ELbn?NJJ2}O;sCV>?ocBDH4oG+f@AN(EJE1% z!3L@Y+>NR|Q-4b?QjZRi4`na}%PCS10WaH)c8h|9QwM8dy(|_WPVhw})Q7$MH{klc zaX@72fIDK*3QSBQlDDB-2?e(Yzv}2H(4{OZ2v-Y6(F%9tLB~M2F{kPO4fJ?h+!ts1 z$$ay&2KWTvEtKvc&1cXC6^ugW#Zi&VZ{U$j*gF|F4Mf*#3ig6$!%IGiC(~aJfB>u@ zuT`VyxXUetLX=^p(Dk7P%2C3Kgq`oz^t;*DZVIs6)s=*AdfRh8^URR==t_`_LhS@+ zy(`Zn@BTU;|Bm^eXL|F4PkbcUw(2mQRrF18I^%)I&vY>&0KU3YY1qAYcgqANMjyH* z0*^eQqy7{HP;p>+^?NC(9_v-R=O`@y1^~p~Y7O8y2ZD5ar~IHWM%q2uuOIf}|AGK; z7IYNEvW0;|J_#OHOSTe9_aq!2#8WHhSP=buzb=6J@~2&i6{ws^=#HOt|8j{pD(H7@w)e<>75|G*P7#V#M%7a=${ z0w@g176sXp02H@LjPNRx16>ya4p2aW@JB97WLS-mA1iQD9motN_+k|i1f?(rfl`EN zkb=PwU`~J{bO&v8AcGB~P_{#bO)wj)_d8({9g+Yi6-ZD)m`y3*X(`cta8!oyHVSBX z5Z&T!G*A!%S9gE|!P_#&;WdU<0tH0t+|^-+*eZ0Xq#?92MaI0z&|Pfpv>5AkZ;8^BT=&_c))|ypfRiVF{=?71h7!)@G*{pK_bE$Wg};E7An3r zd=lb5bC^$Df`AnB5B9@2z8HKCf`#q}hLgvI$oPrb2mn7QE@85Wo2Z7>C{yyrh!hDR z!oiRrpn*D~BZW9ZGZJ&iXD$ztjMeZ?{?Zv5(oj?qM`I#>^aL3dfIezuPz@Dl?&DKs zf;qi{VQUjnDd0$>!UPm3bqhHOvyo6aNLwC2FB!T2a~p{k&o~j)hz7pb1^{qZ7ASp6 zFp@7eHPIB3!Qf4=q;c$_K71kb335a*bR36;FLBfi;uTzQ)k zi2)>)Zoe~bURVti=2K^tMYhz6GzCFal1nH9B25)a77}cy`DAOLg^?M5i$RujaC}qq zC|jqSi`ftLk(CI-o=kC=?|G4IaERlHC)v^ee$zJ_9%hWoDFZN84d^jT2xkEg^fSVk zPNPwvn}QxN0)w5>Hcps2uf}#v`8$ZwL&wPm?-zfIQ<%V6hBP1-F(#ixL82*YpZAG~ zI%u97kRMOi4JM^-!b1%emS+KlL6HKU9k4XTW@{ZHp>bs$0VYsZ5?sv}0D`HIVXsHo@9XbKa^(x8NApj~v>&B6C05AZwY%ZiGUpj3&DKl;I zP$5NH9)cM#!Z38{2KnKlvNT!cH9VKM7yP%!zJ0wAVi$}4EP1c~;j z)H`OTTC5P+Lx%d8&bNg+Sg45yh87s2!#bmws-In`H3z_?Wr~1*sS0#j zG%1xip6R0;U}PbZBH$5`o_V2=lOg^jlUf<8WqK!sxJ;0Vt7*op8UdLWNSGdQs}U=A z$T}bRNiJ4Go05R4WrK}HU@3~ENN0nIis`a%03UuzF66VXdf9fpqoK5AvOnRG7O1Nn z;BD$kuMI$A6#%q4h?}dhnQqYkKnBrvvih!6=Lp^MZ4oQ6$S8X7F)Cd-hW<&ZeszH; zsj)W8vA$rfZ%_c_a}Y}VrxI0cQKSJ@gAa#JL<`Tye_^fmye80vW+;1hnyO7VBf> z87re(qtF|@%V{nHM;$mj0J%GeLVE+VTc(fEXi<^9`nkJ=$D-~)vzc4Gn=2H_s|UnO zulLt0&nvz3TfgL@SazZR0{u3!Qc51o=X7sTruXCpLtC>FOTJ794^o@8?R2~!a2rwc zzVOR~^J~8re8C{}Qn)+15gU;gaeM-7yVnb*2JB-9oWSl7m|VEN9$Fh%a3c~t!9G;M z7hJ>jyLsdYyE=Hnwt$1R8^4d{kn^g!4vfN(z`x`Ps8b-LA#x)_Ts4ouxVW3XHeAJb zf;%z{k*3?#|wK@_E_=X%SyY`-jtfpE0R4_PPZ z1{%w(gWHR}_<776dVa1fu`bxm)=PF+4~?81?b9=-#Axxt?0f(;4Uy89AHGtoWSazI zP0Ua&4C~CiS^WejpisN^F?^KIA<)xYtw)cNa90qSL;AMjqMqbno0S^BUs=$qOx0~I z1r05+@CpY75UZAY*LeMhTfNtD7t=9XfAAyL-wcrALJgV5xXljS zwVCVxl;PbAJ&ilmT@B}r-c<7fi@3n-{Mf~-B8F|O{#?XH9ki>R-$r@goE;DU4dCRm zZb1kiUi>I84b281sD$j*YVF+$F5(bAQfAD<<_+BzekZMQfBflmovb)QecuRh&acdH zsW;0`UC}3=gj8& zwIUl0rh{C(Q*GrDjpb+Twnfh6>V`G!;=<}0En@xT`pGM2KBUGi zsXV@}an0s)pypw%<#8_OEap;|;lAH3)QEh)l!`P}>jcAVfkRH{kN`DJ{>TcP=!$Ot zra7WwIWE?oEiOKghzkw{<4QH6QRSL`1f34+$_477&a=?5B8IMzCjONW8?5~u>u*2+ zkxtgMj@-6>>*QI;CcS7$Dc5AKzQ-=?A5Z|}z2=#_kI^N)oH=s9&uW(E4g%{|1 zKJA5`?Ai{g8m(GuO*NGM=l!YJ+`l~}??W5yG%oZt4+TO{tkb>!@BGfp ze@Nnkxbdtm-?HiHM^6VIPV(q6=>Glin%(nxjpcX@^WPY&>D#kcF9~N+!eGDRP4D!1 z-Qk$d^?42AjjVFa!jjupZDmJpLN*x&G+|Q9rzva z)(VdJH;A7M?9b<#_{07M`8j)=-}P_5v~NoEC>{E7pU`*D`J*TEc>)fOua%JR(~=*f za{susKY^GZcSGvqtX~8vK=0UE=|n#I*xr%Fu7g_l1Ty+|P^|gCKMDUcp1}z6`aSy| zo%b^hzMIa3s ze>Jfm_X1GP5HO?!h+UlZ=G}iVdy-=^&r<-7%;I5TY_#MG+#G&E^rfh(jz~jH`BV>mCr}Ub-OX_x*psKq0Gw zHjSOaIKify9E691y&3=msk2MWGZ;xDfB~Kk2N*RaE>AndJQB{uR@Yb9Ks!6Pj06M& zD}&e~t^vCT2T~MD(9C0GjRr`p&O(<4MI=+4r=42WNZxJkZts()T;mI>E34*j3LKrb z2RNaC=j8e1;Nh*6agC7zI-CH10R;vaPykh`5X00DF}U=nD#!=}m*q8!9v_0Z=}pfq&t;Bc$NlZ&wE z{342Bln?>}^pT|M!QID6Qb8=BV$xtmXA;qQ0Q8R!&pu1nTmR(NxeaC^fjLFg#!}FUr(pr z$BM_L;T zLBN5&Vj@mR$HC%%1L9LG?xS1xCKJv47-y}B`i|FfkqYo05w7`9iV8<2II1*%a_E|t2 zTc80dh9xY+)ucm$);1d0Qc-Zj6?fc4(NuHO8dPAxgJl?m_FZ_FKB(=1dpY@

    Wl(*rGd@(bz|8Kj1N#4QHI#WgC`N5I&xR7Fy)krZtRAFuw#7%x@R< zF$PG5-Yp*&;F9`kp)K;)&@U17I!p!}WsLxV$~OCKw9{65ZMNHP`(a!!h$f&Xv9_CH z@JwcPfQOaz`)|Ml7kqHS3n%=85^rMpZpL~4=BwsVBcHOP#T&OA@6S$^eDhpk;PrCQ zv;HJ*&P%^Gnw>*eJyrt*pwV>LQG2iQ)oYLR2`?azefMMR`P_Ekp&g2J-ivRN^5Bz) z*BV-kcOH{v;P_p6>Lb;&`RB8DF&L|>_kL16Wz&BA4Rku|d-Np@abu3j&%Uhl)0f{( zq%v-lEBATN5HW4%_dh?Fjlo;wB47axct8Xu5L`_O#bKbfA7Le6dPzW*1PGu2B4Dr& z1xSFfc2S9Dz$F2!(BK3J(1HdefEV;iSn7hXfEXD-STm#oR8)vJTIo%PJLF*xeW=4v zKqE0sAOIW+5RnYtZ~za8Nev}%00s>Ip#To(g96M#8PK$00u&j51K0#b6Bxh&3IIX@ zN+^UWE};O_a!LP=a7Gm9;E1_6;-<9tzac5>RKd{#&`cN}2>^19wlRnUj%b1guqj~$ zF<~=~Sg1e(@;jVB(RXOzqZ6obje(>h0Um(J5*7dw6DU9w1BnGney0V8JY^tVc>*y` zP?F$j2X{1h{zsV_6Xw)EWa2rZt9LaM6kjKtT(JgUSX-fQxq=KmtZ7PYrrO7z424 z7$_4@1K>rU8{;Rn7(f|bV)6w4`y{0V`gza+oRb0Rq(upy$rUDc@H-0#Cl#OA03aB0 z0trCVI|>!B0W@F_(^6tFtCJFTL4ue@uz)*O3I>u+;FAH}q8ds$Mqy18Jn^jOC&38> z0(g;|TV!WJqNq|iL2{v-Fu(&Ou}K3ERZfBAXfv!DNCoU+L*JRGhDPZdHjo}HOyJ%ME_aTHGC&wIuFf66SjAfq%SxcN^pti#y z>Kb_($P^-glk%m733z#eF^q5mz>;TZn;KzGjFARTY=cKI{M-p!;j6zLfB_7(rp&4^ zX7@p^awVu!6WH^E@hz`wis1r3DuE|QRKPBxVGzZ(#0{H8YOO4((`IlH5>d3$Vox~2 zJv^bu2hA!5_j+IjPoS(`i~=d6Yr(!MI8DQOZzU|yMg(|L5+e1bNYE@wN3y8^3o!9h z-5jET{u6^JePY7uc*$8vRx8c9y%>~Zsi*diumoeRCmd_c7=Z9;15e39A%vA?3wCf9 z9E6V@dDZXjZLAV*0?LvL<&WpZ?7av(A;GcGqEZ)PA$VQwHYIV>PCIXN@{ EJI^I>4gdfE literal 0 HcmV?d00001 diff --git a/Robust/cup/manual.html b/Robust/cup/manual.html new file mode 100644 index 00000000..5d6f6a8b --- /dev/null +++ b/Robust/cup/manual.html @@ -0,0 +1,1452 @@ + +CUP User's Manual + + + +
    +[CUP Logo Image] +
    +

    CUP User's Manual

    +

    +Scott E. Hudson
    + +Graphics Visualization and Usability Center
    + +Georgia Institute of Technology

    +Modified by Frank +Flannery, C. Scott Ananian, +Dan Wang with advice from +Andrew W. Appel
    +Last updated July 1999 (v0.10j) +
    + +

    Table of Contents

    +
    +
    i.
    About CUP Version 0.10 +
    1.
    Introduction and Example +
    2.
    Specification Syntax +
    3.
    Running CUP +
    4.
    Customizing the Parser +
    5.
    Scanner interface +
    6.
    Error Recovery +
    7.
    Conclusion +
    References +
    A.
    Grammar for CUP Specification Files +
    B.
    A Very Simple Example Scanner +
    C.
    Incompatibilites between CUP 0.9 and CUP 0.10 +
    D.
    Bugs +
    E.
    Change log +
    + + +

    i. About CUP Version 0.10

    +
    Version +0.10 of CUP adds many new changes and features over the previous releases +of version 0.9. These changes attempt to make CUP more like its +predecessor, YACC. As a result, the old 0.9 parser specifications for CUP are +not compatible and a reading of appendix C of the new +manual will be necessary to write new specifications. The new version, +however, gives the user more power and options, making parser specifications +easier to write. + + +

    1. Introduction and Example

    + +This manual describes the basic operation and use of the +Java(tm) +Based Constructor of Useful Parsers (CUP for short). +CUP is a system for generating LALR parsers from simple specifications. +It serves the same role as the widely used program YACC +[1] and in fact offers most of the features of YACC. +However, CUP is written in Java, uses specifications including embedded +Java code, and produces parsers which are implemented in Java.

    + +Although this manual covers all aspects of the CUP system, it is relatively +brief, and assumes you have at least a little bit of knowledge of LR +parsing. A working knowledge of YACC is also very helpful in +understanding how CUP specifications work. +A number of compiler construction textbooks (such as +[2,3]) cover this material, +and discuss the YACC system (which is quite similar to this one) as a +specific example. In addition, Andrew Appel's Modern Compiler +Implementation in Java textbook [4] uses +and describes CUP in the context of compiler construction. +

    + +Using CUP involves creating a simple specification based on the +grammar for which a parser is needed, along with construction of a +scanner capable of breaking characters up into meaningful tokens (such +as keywords, numbers, and special symbols).

    + +As a simple example, consider a +system for evaluating simple arithmetic expressions over integers. +This system would read expressions from standard input (each terminated +with a semicolon), evaluate them, and print the result on standard output. +A grammar for the input to such a system might look like:

    +  expr_list ::= expr_list expr_part | expr_part
    +  expr_part ::= expr ';'
    +  expr      ::= expr '+' expr | expr '-' expr | expr '*' expr 
    +	      | expr '/' expr | expr '%' expr | '(' expr ')'  
    +              | '-' expr | number 
    +
    +To specify a parser based on this grammar, our first step is to identify and +name the set of terminal symbols that will appear on input, and the set of +non-terminal symbols. In this case, the non-terminals are: + +
      expr_list, expr_part  and  expr .
    + +For terminal names we might choose: + +
      SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD, NUMBER, LPAREN,
    +and RPAREN
    + +The experienced user will note a problem with the above grammar. It is +ambiguous. An ambiguous grammar is a grammar which, given a certain +input, can reduce the parts of the input in two different ways such as +to give two different answers. Take the above grammar, for +example. given the following input:
    +3 + 4 * 6
    +The grammar can either evaluate the 3 + 4 and then multiply +seven by six, or it can evaluate 4 * 6 and then add three. +Older versions of CUP forced the user to write unambiguous grammars, but +now there is a construct allowing the user to specify precedences and +associativities for terminals. This means that the above ambiguous +grammar can be used, after specifying precedences and associativities. +There is more explanation later. + +Based on these namings we can construct a small CUP specification +as follows:
    +
    +
    // CUP specification for a simple expression evaluator (no actions)
    +
    +import java_cup.runtime.*;
    +
    +/* Preliminaries to set up and use the scanner.  */
    +init with {: scanner.init();              :};
    +scan with {: return scanner.next_token(); :};
    +
    +/* Terminals (tokens returned by the scanner). */
    +terminal            SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD;
    +terminal            UMINUS, LPAREN, RPAREN;
    +terminal Integer    NUMBER;
    +
    +/* Non terminals */
    +non terminal            expr_list, expr_part;
    +non terminal Integer    expr, term, factor;
    +
    +/* Precedences */
    +precedence left PLUS, MINUS;
    +precedence left TIMES, DIVIDE, MOD;
    +precedence left UMINUS;
    +
    +/* The grammar */
    +expr_list ::= expr_list expr_part | 
    +              expr_part;
    +expr_part ::= expr SEMI;
    +expr      ::= expr PLUS expr 
    +            | expr MINUS expr  
    +            | expr TIMES expr  
    +            | expr DIVIDE expr  
    +            | expr MOD expr 
    +	    | MINUS expr %prec UMINUS
    +            | LPAREN expr RPAREN
    +	    | NUMBER
    +	    ;
    +
    +

    +We will consider each part of the specification syntax in detail later. +However, here we can quickly see that the specification contains four +main parts. The first part provides preliminary and miscellaneous declarations +to specify how the parser is to be generated, and supply parts of the +runtime code. In this case we indicate that the java_cup.runtime +classes should be imported, then supply a small bit of initialization code, +and some code for invoking the scanner to retrieve the next input token. +The second part of the specification declares terminals and non-terminals, +and associates object classes with each. In this case, the terminals +are declared as either with no type, or of type +Integer. The specified type of the +terminal or non-terminal is the type of the value of those terminals or +non-terminals. If no type is specified, the terminal or non-terminal +carries no value. Here, no type indicates that these +terminals and non-terminals hold no value. +The third part specifies the precedence and +associativity of terminals. The last precedence declaration give its +terminals the highest precedence. The final +part of the specification contains the grammar.

    + + +To produce a parser from this specification we use the CUP generator. +If this specification were stored in a file parser.cup, then +(on a Unix system at least) we might invoke CUP using a command like: +

     java java_cup.Main < parser.cup 
    +or (starting with CUP 0.10k): +
     java java_cup.Main parser.cup 
    +The system will produce two Java source files containing +parts of the generated parser: sym.java and parser.java +(these names can be changed with command-line options; see +below). +As you might expect, these two files contain declarations for the classes +sym and parser. The sym class contains a series of +constant declarations, one for each terminal symbol. This is typically used +by the scanner to refer to symbols (e.g. with code such as +"return new Symbol(sym.SEMI);" ). The parser class +implements the parser itself.

    + +The specification above, while constructing a full parser, does not perform +any semantic actions &emdash; it will only indicate success or failure of a parse. +To calculate and print values of each expression, we must embed Java +code within the parser to carry out actions at various points. In CUP, +actions are contained in code strings which are surrounded by delimiters +of the form {: and :} (we can see examples of this in the +init with and scan with clauses above). In general, the +system records all characters within the delimiters, but does not try to check +that it contains valid Java code.

    + +A more complete CUP specification for our example system (with actions +embedded at various points in the grammar) is shown below:
    +


    +
    // CUP specification for a simple expression evaluator (w/ actions)
    +
    +import java_cup.runtime.*;
    +
    +/* Preliminaries to set up and use the scanner.  */
    +init with {: scanner.init();              :};
    +scan with {: return scanner.next_token(); :};
    +
    +/* Terminals (tokens returned by the scanner). */
    +terminal           SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD;
    +terminal           UMINUS, LPAREN, RPAREN;
    +terminal Integer   NUMBER;
    +
    +/* Non-terminals */
    +non terminal            expr_list, expr_part;
    +non terminal Integer    expr;
    +
    +/* Precedences */
    +precedence left PLUS, MINUS;
    +precedence left TIMES, DIVIDE, MOD;
    +precedence left UMINUS;
    +
    +/* The grammar */
    +expr_list ::= expr_list expr_part 
    +	      | 
    +              expr_part;
    +
    +expr_part ::= expr:e 
    +	      {: System.out.println("= " + e); :} 
    +              SEMI              
    +	      ;
    +
    +expr      ::= expr:e1 PLUS expr:e2    
    +	      {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} 
    +	      | 
    +              expr:e1 MINUS expr:e2    
    +              {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} 
    +	      | 
    +              expr:e1 TIMES expr:e2 
    +	      {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} 
    +	      | 
    +              expr:e1 DIVIDE expr:e2 
    +	      {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} 
    +	      | 
    +              expr:e1 MOD expr:e2 
    +	      {: RESULT = new Integer(e1.intValue() % e2.intValue()); :} 
    +	      | 
    +              NUMBER:n                 
    +	      {: RESULT = n; :} 
    +	      | 
    +              MINUS expr:e             
    +	      {: RESULT = new Integer(0 - e.intValue()); :} 
    +	      %prec UMINUS
    +	      | 
    +              LPAREN expr:e RPAREN     
    +	      {: RESULT = e; :} 
    +	      ;
    +
    +

    +Here we can see several changes. Most importantly, code to be executed at +various points in the parse is included inside code strings delimited by +{: and :}. In addition, labels have been placed on various +symbols in the right hand side of productions. For example in:
    +
      expr:e1 PLUS expr:e2    
    +	{: RESULT = new Integer(e1.intValue() + e2.intValue()); :} 
    +
    + +the first non-terminal expr has been labeled with e1, and +the second with e2. The left hand side value +of each production is always implicitly labeled as RESULT.

    + +Each symbol appearing in a production is represented at runtime by an +object of type Symbol on the parse stack. The labels refer to +the instance variable value in those objects. In the +expression expr:e1 PLUS expr:e2, e1 and e2 +refer to objects of type Integer. These objects are in the value fields +of the objects of type Symbol representing those non-terminals +on the parse stack. RESULT is of type Integer as +well, since the resulting non-terminal expr was declared as of +type Integer. This object becomes the value instance +variable of a new Symbol object.

    + +For each label, two more variables accessible to the user are declared. +A left and right value labels are passed to the code string, so that the +user can find out where the left and right side of each terminal or +non-terminal is in the input stream. The name of these variables is the +label name, plus left or right. for example, given +the right hand side of a production expr:e1 PLUS expr:e2 the +user could not only access variables e1 and e2, but +also e1left, e1right, e2left and e2right. these +variables are of type int.

    + + + +The final step in creating a working parser is to create a scanner (also +known as a lexical analyzer or simply a lexer). This routine is +responsible for reading individual characters, removing things things like +white space and comments, recognizing which terminal symbols from the +grammar each group of characters represents, then returning Symbol objects +representing these symbols to the parser. +The terminals will be retrieved with a call to the +scanner function. In the example, the parser will call +scanner.next_token(). The scanner should return objects of +type java_cup.runtime.Symbol. This type is very different than +older versions of CUP's java_cup.runtime.symbol. These Symbol +objects contains the instance variable value of type Object, +which should be +set by the lexer. This variable refers to the value of that symbol, and +the type of object in value should be of the same type as declared in +the terminal and non terminal declarations. In the +above example, if the lexer wished to pass a NUMBER token, it should +create a Symbol with the value instance variable +filled with an object of type Integer. Symbol +objects corresponding to terminals and non-terminals with no value +have a null value field.

    + +The code contained in the init with clause of the specification +will be executed before any tokens are requested. Each token will be +requested using whatever code is found in the scan with clause. +Beyond this, the exact form the scanner takes is up to you; however +note that each call to the scanner function should return a new +instance of java_cup.runtime.Symbol (or a subclass). +These symbol objects are annotated with parser information and pushed +onto a stack; reusing objects will result in the parser annotations +being scrambled. As of CUP 0.10j, Symbol reuse should be +detected if it occurs; the parser will throw an Error +telling you to fix your scanner.

    + +In the next section a more detailed and formal +explanation of all parts of a CUP specification will be given. +Section 3 describes options for running the +CUP system. Section 4 discusses the details +of how to customize a CUP parser, while section 5 +discusses the scanner interface added in CUP 0.10j. Section + 6 considers error recovery. Finally, Section 7 +provides a conclusion. + + +

    2. Specification Syntax

    +Now that we have seen a small example, we present a complete description of all +parts of a CUP specification. A specification has four sections with +a total of eight specific parts (however, most of these are optional). +A specification consists of: + +Each of these parts must appear in the order presented here. (A complete +grammar for the specification language is given in +Appendix A.) The particulars of each part of +the specification are described in the subsections below.

    + +

    Package and Import Specifications
    + +A specification begins with optional package and import +declarations. These have the same syntax, and play the same +role, as the package and import declarations found in a normal Java program. +A package declaration is of the form: + +
        package name;
    + +where name name is a Java package identifier, possibly in +several parts separated by ".". In general, CUP employs Java lexical +conventions. So for example, both styles of Java comments are supported, +and identifiers are constructed beginning with a letter, dollar +sign ($), or underscore (_), which can then be followed by zero or more +letters, numbers, dollar signs, and underscores.

    + +After an optional package declaration, there can be zero or more +import declarations. As in a Java program these have the form: + +

        import package_name.class_name;
    +
    +or +
        import package_name.*;
    +
    + +The package declaration indicates what package the sym and +parser classes that are generated by the system will be in. +Any import declarations that appear in the specification will also appear +in the source file for the parser class allowing various names from +that package to be used directly in user supplied action code. + +
    User Code Components
    + +Following the optional package and import declarations +are a series of optional declarations that allow user code to be included +as part of the generated parser (see Section 4 for a +full description of how the parser uses this code). As a part of the parser +file, a separate non-public class to contain all embedded user actions is +produced. The first action code declaration section allows code to +be included in this class. Routines and variables for use by the code +embedded in the grammar would normally be placed in this section (a typical +example might be symbol table manipulation routines). This declaration takes +the form: + +
        action code {: ... :};
    +
    + +where {: ... :} is a code string whose contents will be placed +directly within the action class class declaration.

    + +After the action code declaration is an optional +parser code declaration. This declaration allows methods and +variable to be placed directly within the generated parser class. +Although this is less common, it can be helpful when customizing the +parser &emdash; it is possible for example, to include scanning methods inside +the parser and/or override the default error reporting routines. This +declaration is very similar to the action code declaration and +takes the form: + +

        parser code {: ... :};
    +
    + +Again, code from the code string is placed directly into the generated parser +class definition.

    + +Next in the specification is the optional init declaration +which has the form: + +

        init with {: ... :};
    + +This declaration provides code that will be executed by the parser +before it asks for the first token. Typically, this is used to initialize +the scanner as well as various tables and other data structures that might +be needed by semantic actions. In this case, the code given in the code +string forms the body of a void method inside the parser +class.

    + +The final (optional) user code section of the specification indicates how +the parser should ask for the next token from the scanner. This has the +form: + +

        scan with {: ... :};
    + +As with the init clause, the contents of the code string forms +the body of a method in the generated parser. However, in this case +the method returns an object of type java_cup.runtime.Symbol. +Consequently the code found in the scan with clause should +return such a value. See section 5 for +information on the default behavior if the scan with +section is omitted.

    + +As of CUP 0.10j the action code, parser code, init code, and scan with +sections may appear in any order. They must, however, precede the +symbol lists.

    + +

    Symbol Lists
    + +Following user supplied code comes the first required part of the +specification: the symbol lists. These declarations are responsible +for naming and supplying a type for each terminal and non-terminal +symbol that appears in the grammar. As indicated above, each terminal +and non-terminal symbol is represented at runtime with a Symbol +object. In +the case of terminals, these are returned by the scanner and placed on +the parse stack. The lexer should put the value of the terminal in the +value instance variable. +In the case of non-terminals these replace a series +of Symbol objects on the parse stack whenever the right hand side of +some production is recognized. In order to tell the parser which object +types should be used for which symbol, terminal and +non terminal declarations are used. These take the forms: + +
        terminal classname name1, name2, ...;
    +    non terminal classname name1, name2, ...;
    +    terminal name1, name2, ...;
    +
    + +and + +
        non terminal name1, name2, ...;
    +
    + +where classname can be a multiple part name separated with +"."s. The +classname specified represents the type of the value of +that terminal or non-terminal. When accessing these values through +labels, the users uses the type declared. the classname +can be of any type. If no classname is given, then the +terminal or non-terminal holds no value. a label referring to such a +symbol with have a null value. As of CUP 0.10j, you may specify +non-terminals the declaration "nonterminal" (note, no +space) as well as the original "non terminal" spelling.

    + +Names of terminals and non-terminals cannot be CUP reserved words; +these include "code", "action", "parser", "terminal", "non", +"nonterminal", "init", "scan", "with", "start", "precedence", "left", +"right", "nonassoc", "import", and "package".

    + +

    Precedence and Associativity declarations
    + +The third section, which is optional, specifies the precedences and +associativity of terminals. This is useful for parsing with ambiguous +grammars, as done in the example above. There are three type of +precedence/associativity declarations: +
    
    +	precedence left     terminal[, terminal...];
    +	precedence right    terminal[, terminal...];
    +	precedence nonassoc terminal[, terminal...];
    +
    + +The comma separated list indicates that those terminals should have the +associativity specified at that precedence level and the precedence of +that declaration. The order of precedence, from highest to lowest, is +bottom to top. Hence, this declares that multiplication and division have +higher precedence than addition and subtraction: +
    
    +	precedence left  ADD, SUBTRACT;
    +	precedence left  TIMES, DIVIDE;
    +
    +Precedence resolves shift reduce problems. For example, given the input +to the above example parser 3 + 4 * 8, the parser doesn't know +whether to reduce 3 + 4 or shift the '*' onto the stack. +However, since '*' has a higher precedence than '+', it will be shifted +and the multiplication will be performed before the addition.

    + +CUP assigns each one of its terminals a precedence according to these +declarations. Any terminals not in this declaration have lowest +precedence. CUP also assigns each of its productions a precedence. +That precedence is equal to the precedence of the last terminal in that +production. If the production has no terminals, then it has lowest +precedence. For example, expr ::= expr TIMES expr would have +the same precedence as TIMES. When there is a shift/reduce +conflict, the parser determines whether the terminal to be shifted has a +higher precedence, or if the production to reduce by does. If the +terminal has higher precedence, it it shifted, if the production has +higher precedence, a reduce is performed. If they have equal +precedence, associativity of the terminal determine what happens.

    + +An associativity is assigned to each terminal used in the +precedence/associativity declarations. The three associativities are +left, right and nonassoc Associativities are also +used to resolve shift/reduce conflicts, but only in the case of equal +precedences. If the associativity of the terminal that can be shifted +is left, then a reduce is performed. This means, if the input +is a string of additions, like 3 + 4 + 5 + 6 + 7, the parser +will always reduce them from left to right, in this case, +starting with 3 + 4. If the associativity of the terminal is +right, it is shifted onto the stack. hence, the reductions +will take place from right to left. So, if PLUS were declared with +associativity of right, the 6 + 7 would be reduced +first in the above string. If a terminal is declared as +nonassoc, then two consecutive occurrences of equal precedence +non-associative terminals generates an error. This is useful for +comparison operations. For example, if the input string is +6 == 7 == 8 == 9, the parser should generate an error. If '==' +is declared as nonassoc then an error will be generated.

    + +All terminals not used in the precedence/associativity declarations are +treated as lowest precedence. If a shift/reduce error results, +involving two such terminals, it cannot be resolved, as the above +conflicts are, so it will be reported.

    + +

    The Grammar
    + +The final section of a CUP declaration provides the grammar. This +section optionally starts with a declaration of the form: + +
        start with non-terminal;
    +
    + +This indicates which non-terminal is the start or goal +non-terminal for parsing. If a start non-terminal is not explicitly +declared, then the non-terminal on the left hand side of the first +production will be used. At the end of a successful parse, CUP returns +an object of type java_cup.runtime.Symbol. This +Symbol's value instance variable contains the final reduction +result.

    + +The grammar itself follows the optional start declaration. Each +production in the grammar has a left hand side non-terminal followed by +the symbol "::=", which is then followed by a series of zero or more +actions, terminal, or non-terminal +symbols, followed by an optional contextual precedence assignment, +and terminated with a semicolon (;).

    + + + +Each symbol on the right hand side can optionally be labeled with a name. +Label names appear after the symbol name separated by a colon (:). Label +names must be unique within the production, and can be used within action +code to refer to the value of the symbol. Along with the label, two +more variables are created, which are the label plus left and +the label plus right. These are int values that +contain the right and left locations of what the terminal or +non-terminal covers in the input file. These values must be properly +initialized in the terminals by the lexer. The left and right values +then propagate to non-terminals to which productions reduce.

    + +If there are several productions for the same non-terminal they may be +declared together. In this case the productions start with the non-terminal +and "::=". This is followed by multiple right hand sides each +separated by a bar (|). The full set of productions is then terminated by a +semicolon.

    + +Actions appear in the right hand side as code strings (e.g., Java code inside +{: ... :} delimiters). These are executed by the parser +at the point when the portion of the production to the left of the +action has been recognized. (Note that the scanner will have returned the +token one past the point of the action since the parser needs this extra +lookahead token for recognition.)

    + + + +Contextual precedence assignments follow all the symbols and actions of +the right hand side of the production whose precedence it is assigning. +Contextual precedence assignment allows a production to be assigned a +precedence not based on the last terminal in it. A good example is +shown in the above sample parser specification: + +

    
    +	precedence left PLUS, MINUS;
    +	precedence left TIMES, DIVIDE, MOD;
    +	precedence left UMINUS;
    +
    +	expr ::=  MINUS expr:e             
    +	          {: RESULT = new Integer(0 - e.intValue()); :} 
    +	          %prec UMINUS
    +
    + +Here, there production is declared as having the precedence of UMINUS. +Hence, the parser can give the MINUS sign two different precedences, +depending on whether it is a unary minus or a subtraction operation. + + +

    3. Running CUP

    + +As mentioned above, CUP is written in Java. To invoke it, one needs +to use the Java interpreter to invoke the static method +java_cup.Main(), passing an array of strings containing options. +Assuming a Unix machine, the simplest way to do this is typically to invoke it +directly from the command line with a command such as: + +
        java java_cup.Main options < inputfile
    + +Once running, CUP expects to find a specification file on standard input +and produces two Java source files as output. Starting with CUP 0.10k, +the final command-line argument may be a filename, in which case the +specification will be read from that file instead of from standard input.

    + +In addition to the specification file, CUP's behavior can also be changed +by passing various options to it. Legal options are documented in +Main.java and include: +

    +
    -package name +
    Specify that the parser and sym classes are to be + placed in the named package. By default, no package specification + is put in the generated code (hence the classes default to the special + "unnamed" package). + +
    -parser name +
    Output parser and action code into a file (and class) with the given + name instead of the default of "parser". + +
    -symbols name +
    Output the symbol constant code into a class with the given + name instead of the default of "sym". + +
    -interface +
    Outputs the symbol constant code as an interface + rather than as a class. + +
    -nonterms +
    Place constants for non-terminals into the symbol constant class. + The parser does not need these symbol constants, so they are not normally + output. However, it can be very helpful to refer to these constants + when debugging a generated parser. + +
    -expect number +
    During parser construction the system may detect that an ambiguous + situation would occur at runtime. This is called a conflict. + In general, the parser may be unable to decide whether to shift + (read another symbol) or reduce (replace the recognized right + hand side of a production with its left hand side). This is called a + shift/reduce conflict. Similarly, the parser may not be able + to decide between reduction with two different productions. This is + called a reduce/reduce conflict. Normally, if one or more of + these conflicts occur, parser generation is aborted. However, in + certain carefully considered cases it may be advantageous to + arbitrarily break such a conflict. In this case CUP uses YACC + convention and resolves shift/reduce conflicts by shifting, and + reduce/reduce conflicts using the "highest priority" production (the + one declared first in the specification). In order to enable automatic + breaking of conflicts the -expect option must be given + indicating exactly how many conflicts are expected. Conflicts + resolved by precedences and associativities are not reported. + +
    -compact_red +
    Including this option enables a table compaction optimization involving + reductions. In particular, it allows the most common reduce entry in + each row of the parse action table to be used as the default for that + row. This typically saves considerable room in the tables, which can + grow to be very large. This optimization has the effect of replacing + all error entries in a row with the default reduce entry. While this + may sound dangerous, if not down right incorrect, it turns out that this + does not affect the correctness of the parser. In particular, some + changes of this type are inherent in LALR parsers (when compared to + canonical LR parsers), and the resulting parsers will still never + read past the first token at which the error could be detected. + The parser can, however, make extra erroneous reduces before detecting + the error, so this can degrade the parser's ability to do + error recovery. + (Refer to reference [2] pp. 244-247 or reference [3] pp. 190-194 for a + complete explanation of this compaction technique.)

    + + This option is typically used to work-around the java bytecode + limitations on table initialization code sizes. However, CUP + 0.10h introduced a string-encoding for the parser tables which + is not subject to the standard method-size limitations. + Consequently, use of this option should no longer be required + for large grammars. + +
    -nowarn +
    This options causes all warning messages (as opposed to error messages) + produced by the system to be suppressed. + +
    -nosummary +
    Normally, the system prints a summary listing such things as the + number of terminals, non-terminals, parse states, etc. at the end of + its run. This option suppresses that summary. + +
    -progress +
    This option causes the system to print short messages indicating its + progress through various parts of the parser generation process. + +
    -dump_grammar +
    -dump_states +
    -dump_tables +
    -dump +
    These options cause the system to produce a human readable dump of + the grammar, the constructed parse states (often needed to resolve + parse conflicts), and the parse tables (rarely needed), respectively. + The -dump option can be used to produce all of these dumps. + +
    -time +
    This option adds detailed timing statistics to the normal summary of + results. This is normally of great interest only to maintainers of + the system itself. + +
    -debug +
    This option produces voluminous internal debugging information about + the system as it runs. This is normally of interest only to maintainers + of the system itself. + +
    -nopositions +
    This option keeps CUP from generating code to propagate the left + and right hand values of terminals to non-terminals, and then from + non-terminals to other terminals. If the left and right values aren't + going to be used by the parser, then it will save some runtime + computation to not generate these position propagations. This option + also keeps the left and right label variables from being generated, so + any reference to these will cause an error. + +
    -noscanner +
    CUP 0.10j introduced improved scanner + integration and a new interface, + java_cup.runtime.Scanner. By default, the + generated parser refers to this interface, which means you cannot + use these parsers with CUP runtimes older than 0.10j. If your + parser does not use the new scanner integration features, then you + may specify the -noscanner option to suppress the + java_cup.runtime.Scanner references and allow + compatibility with old runtimes. Not many people should have reason + to do this. + +
    -version +
    Invoking CUP with the -version flag will cause it + to print out the working version of CUP and halt. This allows + automated CUP version checking for Makefiles, install scripts and + other applications which may require it. +
    + + +

    4. Customizing the Parser

    + +Each generated parser consists of three generated classes. The +sym class (which can be renamed using the -symbols +option) simply contains a series of int constants, +one for each terminal. Non-terminals are also included if the -nonterms +option is given. The source file for the parser class (which can +be renamed using the -parser option) actually contains two +class definitions, the public parser class that implements the +actual parser, and another non-public class (called CUP$action) which +encapsulates all user actions contained in the grammar, as well as code from +the action code declaration. In addition to user supplied code, this +class contains one method: CUP$do_action which consists of a large +switch statement for selecting and executing various fragments of user +supplied action code. In general, all names beginning with the prefix of +CUP$ are reserved for internal uses by CUP generated code.

    + +The parser class contains the actual generated parser. It is +a subclass of java_cup.runtime.lr_parser which implements a +general table driven framework for an LR parser. The generated parser +class provides a series of tables for use by the general framework. +Three tables are provided: +

    +
    the production table +
    provides the symbol number of the left hand side non-terminal, along with + the length of the right hand side, for each production in the grammar, +
    the action table +
    indicates what action (shift, reduce, or error) is to be taken on each + lookahead symbol when encountered in each state, and +
    the reduce-goto table +
    indicates which state to shift to after reduces (under each non-terminal +from each state). +
    +(Note that the action and reduce-goto tables are not stored as simple arrays, +but use a compacted "list" structure to save a significant amount of space. +See comments the runtime system source code for details.)

    + +Beyond the parse tables, generated (or inherited) code provides a series +of methods that can be used to customize the generated parser. Some of these +methods are supplied by code found in part of the specification and can +be customized directly in that fashion. The others are provided by the +lr_parser base class and can be overridden with new versions (via +the parser code declaration) to customize the system. Methods +available for customization include: +

    +
    public void user_init() +
    This method is called by the parser prior to asking for the first token + from the scanner. The body of this method contains the code from the + init with clause of the the specification. +
    public java_cup.runtime.Symbol scan() +
    This method encapsulates the scanner and is called each time a new + terminal is needed by the parser. The body of this method is + supplied by the scan with clause of the specification, if + present; otherwise it returns getScanner().next_token(). +
    public java_cup.runtime.Scanner getScanner() +
    Returns the default scanner. See section 5. +
    public void setScanner(java_cup.runtime.Scanner s) +
    Sets the default scanner. See section 5. +
    public void report_error(String message, Object info) +
    This method should be called whenever an error message is to be issued. In + the default implementation of this method, the first parameter provides + the text of a message which is printed on System.err + and the second parameter is simply ignored. It is very typical to + override this method in order to provide a more sophisticated error + reporting mechanism. +
    public void report_fatal_error(String message, Object info) +
    This method should be called whenever a non-recoverable error occurs. It + responds by calling report_error(), then aborts parsing + by calling the parser method done_parsing(), and finally + throws an exception. (In general done_parsing() should be called + at any point that parsing needs to be terminated early). +
    public void syntax_error(Symbol cur_token) +
    This method is called by the parser as soon as a syntax error is detected + (but before error recovery is attempted). In the default implementation it + calls: report_error("Syntax error", null);. +
    public void unrecovered_syntax_error(Symbol cur_token) +
    This method is called by the parser if it is unable to recover from a + syntax error. In the default implementation it calls: + report_fatal_error("Couldn't repair and continue parse", null);. +
    protected int error_sync_size() +
    This method is called by the parser to determine how many tokens it must + successfully parse in order to consider an error recovery successful. + The default implementation returns 3. Values below 2 are not recommended. + See the section on error recovery for details. +
    + +Parsing itself is performed by the method public Symbol parse(). +This method starts by getting references to each of the parse tables, +then initializes a CUP$action object (by calling +protected void init_actions()). Next it calls user_init(), +then fetches the first lookahead token with a call to scan(). +Finally, it begins parsing. Parsing continues until done_parsing() +is called (this is done automatically, for example, when the parser +accepts). It then returns a Symbol with the value +instance variable containing the RESULT of the start production, or +null, if there is no value.

    + +In addition to the normal parser, the runtime system also provides a debugging +version of the parser. This operates in exactly the same way as the normal +parser, but prints debugging messages (by calling +public void debug_message(String mess) whose default implementation +prints a message to System.err).

    + +Based on these routines, invocation of a CUP parser is typically done +with code such as: +

    +      /* create a parsing object */
    +      parser parser_obj = new parser();
    +
    +      /* open input files, etc. here */
    +      Symbol parse_tree = null;
    +
    +      try {
    +        if (do_debug_parse)
    +          parse_tree = parser_obj.debug_parse();
    +        else
    +          parse_tree = parser_obj.parse();
    +      } catch (Exception e) {
    +        /* do cleanup here - - possibly rethrow e */
    +      } finally {
    +	/* do close out here */
    +      }
    +
    + + +

    5. Scanner Interface

    + +In CUP 0.10j scanner integration was improved according to +suggestions made by David MacMahon. +The changes make it easier to incorporate JLex and other +automatically-generated scanners into CUP parsers.

    + +To use the new code, your scanner should implement the +java_cup.runtime.Scanner interface, defined as: +

    +package java_cup.runtime;
    +
    +public interface Scanner {
    +    public Symbol next_token() throws java.lang.Exception;
    +}
    +

    + +In addition to the methods described in section +4, the java_cup.runtime.lr_parser class has two new +accessor methods, setScanner() and getScanner(). +The default implementation of scan() +is: +

    +  public Symbol scan() throws java.lang.Exception {
    +    Symbol sym = getScanner().next_token();
    +    return (sym!=null) ? sym : new Symbol(EOF_sym());
    +  }
    +

    +The generated parser also contains a constructor which takes a +Scanner and calls setScanner() with it. In +most cases, then, the init with and scan +with directives may be omitted. You can simply create the +parser with a reference to the desired scanner: +

    +      /* create a parsing object */
    +      parser parser_obj = new parser(new my_scanner());
    +
    +or set the scanner after the parser is created: +
    +      /* create a parsing object */
    +      parser parser_obj = new parser();
    +      /* set the default scanner */
    +      parser_obj.setScanner(new my_scanner());
    +

    +Note that because the parser uses look-ahead, resetting the scanner in +the middle of a parse is not recommended. If you attempt to use the +default implementation of scan() without first calling +setScanner(), a NullPointerException will be +thrown.

    +As an example of scanner integration, the following three lines in the +lexer-generator input are all that is required to use a +JLex +or JFlex +scanner with CUP: +

    +%implements java_cup.runtime.Scanner
    +%function next_token
    +%type java_cup.runtime.Symbol
    +
    +The JLex directive %cup +abbreviates the above three directive in JLex versions 1.2.5 and above. +Invoking the parser with the JLex scanner is then simply: +
    +parser parser_obj = new parser( new Yylex( some_InputStream_or_Reader));
    +

    + +Note CUP handles the JLex/JFlex convention of returning null on EOF +without a problem, so an %eofval directive is not +required in the JLex specification (this feature was added in CUP 0.10k). + +The simple_calc example in the CUP distribution illustrates the use of +the scanner integration features with a hand-coded scanner. +The CUP website has a minimal CUP/JLex integration example for study.

    + + +

    6. Error Recovery

    + +A final important aspect of building parsers with CUP is +support for syntactic error recovery. CUP uses the same +error recovery mechanisms as YACC. In particular, it supports +a special error symbol (denoted simply as error). +This symbol plays the role of a special non-terminal which, instead of +being defined by productions, instead matches an erroneous input +sequence.

    + +The error symbol only comes into play if a syntax error is +detected. If a syntax error is detected then the parser tries to replace +some portion of the input token stream with error and then +continue parsing. For example, we might have productions such as: + +

        stmt ::= expr SEMI | while_stmt SEMI | if_stmt SEMI | ... |
    +	     error SEMI
    +	     ;
    + +This indicates that if none of the normal productions for stmt can +be matched by the input, then a syntax error should be declared, and recovery +should be made by skipping erroneous tokens (equivalent to matching and +replacing them with error) up to a point at which the parse can +be continued with a semicolon (and additional context that legally follows a +statement). An error is considered to be recovered from if and only if a +sufficient number of tokens past the error symbol can be successfully +parsed. (The number of tokens required is determined by the +error_sync_size() method of the parser and defaults to 3).

    + +Specifically, the parser first looks for the closest state to the top +of the parse stack that has an outgoing transition under +error. This generally corresponds to working from +productions that represent more detailed constructs (such as a specific +kind of statement) up to productions that represent more general or +enclosing constructs (such as the general production for all +statements or a production representing a whole section of declarations) +until we get to a place where an error recovery production +has been provided for. Once the parser is placed into a configuration +that has an immediate error recovery (by popping the stack to the first +such state), the parser begins skipping tokens to find a point at +which the parse can be continued. After discarding each token, the +parser attempts to parse ahead in the input (without executing any +embedded semantic actions). If the parser can successfully parse past +the required number of tokens, then the input is backed up to the point +of recovery and the parse is resumed normally (executing all actions). +If the parse cannot be continued far enough, then another token is +discarded and the parser again tries to parse ahead. If the end of +input is reached without making a successful recovery (or there was no +suitable error recovery state found on the parse stack to begin with) +then error recovery fails. + + +

    7. Conclusion

    + +This manual has briefly described the CUP LALR parser generation system. +CUP is designed to fill the same role as the well known YACC parser +generator system, but is written in and operates entirely with Java code +rather than C or C++. Additional details on the operation of the system can +be found in the parser generator and runtime source code. See the CUP +home page below for access to the API documentation for the system and its +runtime.

    + +This document covers version 0.10j of the system. Check the CUP home +page: + +http://www.cs.princeton.edu/~appel/modern/java/CUP/ +for the latest release information, instructions for downloading the +system, and additional news about CUP. Bug reports and other +comments for the developers should be sent to the CUP maintainer, +C. Scott Ananian, at + +cananian@alumni.princeton.edu

    + +CUP was originally written by + +Scott Hudson, in August of 1995.

    + +It was extended to support precedence by + +Frank Flannery, in July of 1996.

    + +On-going improvements have been done by + +C. Scott Ananian, the CUP maintainer, from December of 1997 to the +present.

    + + +

    References

    +
    + +
    [1] +
    S. C. Johnson, +"YACC &emdash; Yet Another Compiler Compiler", +CS Technical Report #32, +Bell Telephone Laboratories, +Murray Hill, NJ, +1975. + +
    [2] +
    A. Aho, R. Sethi, and J. Ullman, +Compilers: Principles, Techniques, and Tools, +Addison-Wesley Publishing, +Reading, MA, +1986. + +
    [3] +
    C. Fischer, and R. LeBlanc, +Crafting a Compiler with C, +Benjamin/Cummings Publishing, +Redwood City, CA, +1991. + +
    [4] +
    Andrew W. Appel, +Modern Compiler Implementation in Java, +Cambridge University Press, +New York, NY, +1998. + +
    + +

    +Appendix A. Grammar for CUP Specification Files (0.10j)

    +

    +
    java_cup_spec      ::= package_spec import_list code_parts
    +		       symbol_list precedence_list start_spec 
    +		       production_list
    +package_spec       ::= PACKAGE multipart_id SEMI | empty
    +import_list        ::= import_list import_spec | empty
    +import_spec        ::= IMPORT import_id SEMI
    +code_part          ::= action_code_part | parser_code_part |
    +                       init_code | scan_code
    +code_parts         ::= code_parts code_part | empty
    +action_code_part   ::= ACTION CODE CODE_STRING opt_semi
    +parser_code_part   ::= PARSER CODE CODE_STRING opt_semi
    +init_code          ::= INIT WITH CODE_STRING opt_semi
    +scan_code          ::= SCAN WITH CODE_STRING opt_semi
    +symbol_list        ::= symbol_list symbol | symbol
    +symbol             ::= TERMINAL type_id declares_term |
    +                       NON TERMINAL type_id declares_non_term |
    +		       NONTERMINAL type_id declares_non_term |
    +		       TERMINAL declares_term |
    +		       NON TERMINAL declares_non_term |
    +		       NONTERMIANL declared_non_term
    +term_name_list     ::= term_name_list COMMA new_term_id | new_term_id
    +non_term_name_list ::= non_term_name_list COMMA new_non_term_id |
    +	               new_non_term_id
    +declares_term      ::= term_name_list SEMI
    +declares_non_term  ::= non_term_name_list SEMI
    +precedence_list    ::= precedence_l | empty
    +precedence_l       ::= precedence_l preced + preced;
    +preced             ::= PRECEDENCE LEFT terminal_list SEMI
    +	               | PRECEDENCE RIGHT terminal_list SEMI
    +	               | PRECEDENCE NONASSOC terminal_list SEMI
    +terminal_list      ::= terminal_list COMMA terminal_id | terminal_id 
    +start_spec         ::= START WITH nt_id SEMI | empty
    +production_list    ::= production_list production | production
    +production         ::= nt_id COLON_COLON_EQUALS rhs_list SEMI
    +rhs_list           ::= rhs_list BAR rhs | rhs
    +rhs                ::= prod_part_list PERCENT_PREC term_id |
    +                       prod_part_list
    +prod_part_list     ::= prod_part_list prod_part | empty
    +prod_part          ::= symbol_id opt_label | CODE_STRING
    +opt_label          ::= COLON label_id | empty
    +multipart_id       ::= multipart_id DOT ID | ID
    +import_id          ::= multipart_id DOT STAR | multipart_id
    +type_id            ::= multipart_id
    +terminal_id        ::= term_id
    +term_id            ::= symbol_id
    +new_term_id        ::= ID
    +new_non_term_id    ::= ID
    +nt_id              ::= ID
    +symbol_id          ::= ID
    +label_id           ::= ID
    +opt_semi	   ::= SEMI | empty
    +
    +
    +

    + +

    Appendix B. A Very Simple Example Scanner

    +

    +
    +// Simple Example Scanner Class
    +
    +import java_cup.runtime.*;
    +import sym;
    +
    +public class scanner {
    +  /* single lookahead character */
    +  protected static int next_char;
    +
    +  /* advance input by one character */
    +  protected static void advance()
    +    throws java.io.IOException
    +    { next_char = System.in.read(); }
    +
    +  /* initialize the scanner */
    +  public static void init()
    +    throws java.io.IOException
    +    { advance(); }
    +
    +  /* recognize and return the next complete token */
    +  public static Symbol next_token()
    +    throws java.io.IOException
    +    {
    +      for (;;)
    +        switch (next_char)
    +	  {
    +	    case '0': case '1': case '2': case '3': case '4': 
    +	    case '5': case '6': case '7': case '8': case '9': 
    +	      /* parse a decimal integer */
    +	      int i_val = 0;
    +	      do {
    +	        i_val = i_val * 10 + (next_char - '0');
    +	        advance();
    +	      } while (next_char >= '0' && next_char <= '9');
    +	    return new Symbol(sym.NUMBER, new Integer(i_val));
    +
    +	    case ';': advance(); return new Symbol(sym.SEMI);
    +	    case '+': advance(); return new Symbol(sym.PLUS);
    +	    case '-': advance(); return new Symbol(sym.MINUS);
    +	    case '*': advance(); return new Symbol(sym.TIMES);
    +	    case '/': advance(); return new Symbol(sym.DIVIDE);
    +	    case '%': advance(); return new Symbol(sym.MOD);
    +	    case '(': advance(); return new Symbol(sym.LPAREN);
    +	    case ')': advance(); return new Symbol(sym.RPAREN);
    +
    +	    case -1: return new Symbol(sym.EOF);
    +
    +	    default: 
    +	      /* in this simple scanner we just ignore everything else */
    +	      advance();
    +	    break;
    +	  }
    +    }
    +};
    +
    + + +
    +

    Appendix C: Incompatibilites between CUP 0.9 and CUP 0.10

    + +CUP version 0.10a is a major overhaul of CUP. The changes are severe, +meaning no backwards compatibility to older versions. + +The changes consist of: + + +
    Lexical Interface
    + +CUP now interfaces with the lexer in a completely different +manner. In the previous releases, a new class was used for every +distinct type of terminal. This release, however, uses only one class: +The Symbol class. The Symbol class has three instance +variables which +are significant to the parser when passing information from the lexer. +The first is the value instance variable. This variable +contains the +value of that terminal. It is of the type declared as the terminal type +in the parser specification file. The second two are the instance +variables left and right. They should be filled with +the int value of +where in the input file, character-wise, that terminal was found.

    + +For more information, refer to the manual on scanners. + +

    Terminal/Non-Terminal Declarations
    + +Terminal and non-terminal declarations now can be declared in two +different ways to indicate the values of the terminals or +non-terminals. The previous declarations of the form + +
    
    +terminal classname terminal [, terminal ...];
    +
    + +still works. The classname, however indicates the type of the value of +the terminal or non-terminal, and does not indicate the type of object +placed on the parse stack. + +A declaration, such as: + +
    
    +terminal terminal [, terminal ...];
    +
    + +indicates the terminals in the list hold no value.

    + +For more information, refer to the manual on declarations. + +

    Label References
    + +Label references do not refer to the object on the parse stack, as in +the old CUP, but rather to the value of the value +instance variable of +the Symbol that represents that terminal or non-terminal. Hence, +references to terminal and non-terminal values is direct, as opposed to +the old CUP, where the labels referred to objects containing the value +of the terminal or non-terminal.

    + +For more information, refer to the manual on labels. + +

    RESULT Value
    + +The RESULT variable refers directly to the value of the +non-terminal +to which a rule reduces, rather than to the object on the parse stack. +Hence, RESULT is of the same type the non-terminal to which +it reduces, +as declared in the non-terminal declaration. Again, the reference is +direct, rather than to something that will contain the data.

    + +For more information, refer to the manual on RESULT. + +

    Position Propagation
    + +For every label, two more variables are declared, which are the label +plus left or the label plus right. These correspond +to the left and +right locations in the input stream to which that terminal or +non-terminal came from. These values are propagated from the input +terminals, so that the starting non-terminal should have a left value of +0 and a right value of the location of the last character read.

    + +For more information, refer to the manual on positions. + +

    Return Value
    + +A call to parse() or debug_parse() returns a +Symbol. This Symbol is the start non-terminal, so the value +instance variable contains the final RESULT assignment. + +
    Precedence
    + +CUP now has precedenced terminals. a new declaration section, +occurring between the terminal and non-terminal declarations and the +grammar specifies the precedence and associativity of rules. The +declarations are of the form: + +
    
    +precedence {left| right | nonassoc} terminal[, terminal ...];
    +...
    +
    +
    + +The terminals are assigned a precedence, where terminals on the same +line have equal precedences, and the precedence declarations farther +down the list of precedence declarations have higher precedence. +left, right and nonassoc specify the associativity +of these terminals. left +associativity corresponds to a reduce on conflict, right to a shift on +conflict, and nonassoc to an error on conflict. Hence, ambiguous +grammars may now be used.

    + +For more information, refer to the manual on precedence. + +

    Contextual Precedence
    + +Finally the new CUP adds contextual precedence. A production may be +declare as followed: + +
    
    +lhs ::= {right hand side list of terminals, non-terminals and actions}
    +        %prec {terminal};
    +
    + +this production would then have a precedence equal to the terminal +specified after the %prec. Hence, shift/reduce conflicts can be +contextually resolved. Note that the %prec terminal +part comes after all actions strings. It does not come before the +last action string.

    + +For more information, refer to the manual on contextual +precedence. + +These changes implemented by: +

    +Frank Flannery
    +Department of Computer Science
    +Princeton University
    +

    + +

    Appendix D: Bugs

    +In this version of CUP it's difficult for the semantic action phrases (Java code attached +to productions) to access the report_error method and other similar methods and +objects defined in the parser code directive. +

    +This is because the parsing tables (and parsing engine) are in one object (belonging to +class parser or whatever name is specified by the -parser directive), +and the semantic actions are in another object (of class CUP$actions). +

    +However, there is a way to do it, though it's a bit inelegant. +The action object has a private final field named +parser that points to the parsing object. Thus, +methods and instance variables of the parser can be accessed within semantic actions as: +

    +    parser.report_error(message,info);
    +    x = parser.mydata;
    +
    +

    +Perhaps this will not be necessary in a future release, and that +such methods and variables as report_error and +mydata will be available +directly from the semantic actions; we will achieve this by combining the +"parser" object and the "actions" object together. + +

    +For a list of any other currently known bugs in CUP, see + +http://www.cs.princeton.edu/~appel/modern/java/CUP/bugs.html. + + +

    Appendix E: Change log

    + +
    +
    0.9e
    March 1996, Scott Hudson's original version. +
    0.10a
    August 1996, several major changes to +the interface. +
    0.10b
    November 1996, fixes a few minor bugs. +
    0.10c
    July 1997, fixes a bug related to precedence declarations. +
    0.10e
    September 1997, fixes a bug introduced in 0.10c relating +to nonassoc precedence. Thanks to +Tony Hosking +for reporting the bug and providing the fix. +Also recognizes carriage-return character as white space and fixes a +number of other small bugs. +
    0.10f
    December 1997, was a maintenance release. The CUP source +was cleaned up for JDK 1.1. +
    0.10g
    March 1998, adds new features and fixes old bugs. +The behavior of RESULT assignments was normalized, and a problem +with implicit start productions was fixed. The CUP grammar was +extended to allow array types for terminals and non-terminals, and +a command-line flag was added to allow the generation of a symbol +interface, rather than class. Bugs associated with multiple +invocations of a single parser object and multiple CUP classes in one +package have been stomped on. Documentation was updated, as well. +
    0.10h-0.10i
    February 1999, are maintenance releases. +
    0.10j
    July 1999, broadened the CUP input grammar to allow more +flexibility and improved scanner integration via the +java_cup.runtime.Scanner interface. +
    + + +
    + + +Java and HotJava are +trademarks of Sun Microsystems, Inc., +and refer to Sun's Java programming language and HotJava browser +technologies. +CUP is not sponsored by or affiliated with Sun Microsystems, Inc. + +
    + + + -- 2.34.1