From 713b916890c3f4f5b9437e4992f5fb2f4bde1da5 Mon Sep 17 00:00:00 2001 From: Sean O'Connor Date: Sun, 9 Nov 2025 14:17:11 -0500 Subject: [PATCH] project10 - readme --- 10/readme.txt | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 10/readme.txt diff --git a/10/readme.txt b/10/readme.txt new file mode 100644 index 0000000..30cad0b --- /dev/null +++ b/10/readme.txt @@ -0,0 +1,115 @@ +Project 10 +In projects 10 and 11 you’ll develop a compiler for Jack, a simple, Java-like, object-based, high-level +language. You will write this compiler in two main stages. In this project you’ll develop a +compilation engine that performs syntax analysis (also called parsing). In project 11 you’ll extend +the compilation engine to a full-scale compiler that generates executable VM code. +Objective +Build a syntax analyzer that parses Jack programs according to the Jack grammar. To enable testing +for correctness, the analyzer will output XML code, reflecting the syntactic structure of the input +source code. This version of the syntax analyzer assumes that the source Jack code is error-free. +Error checking, reporting and handling can be added to later versions of the analyzer, but are not +part of this project. +Contract +Write a syntax analyzer for the Jack language, and test it on the supplied test files. The XML files +produced by your analyzer should be identical to the supplied compare files, up to white space. +Resources +The main tool in this project is the programming language that you will use for implementing the +syntax analyzer. You will also need the supplied TextComparer tool, or similar tools. This program +allows comparing files while ignoring white space. This will enable comparing the output files +generated by your analyzer with the supplied compare files. You may also want to inspect these +files using an XML viewer. Any standard web browser will do – just use the browser’s file>open +menu to open the XML file that you wish to inspect. +Test files +We provide several .jack files, for testing purposes. Projects/10/Square is a 3-class program that +enables moving a black square around the screen using the keyboard’s arrow keys. +Projects/10/ArrayTest is a single-class program that computes the average of a user-supplied +sequence of integers, using array processing. Both programs were discussed in project 9, so they +should be familiar (although their semantics and behaviors are irrelevant for this project). Note +though that we have made some harmless changes to the original code, to make sure that the +syntax analyzer will be fully tested on all aspects of the Jack language. For example, we've added a +static variable to Square/Main.jack, as well as a function, which are neither used nor called. These +neutral changes allow testing how the analyzer handles language elements that don't appear in the +original Square and ArrayTest files, like static variables, else, and unary operators. +Analyzer +The analyzer, which is the main program in this project, is invoked using the command +"JackAnalyzer source", where source is either a file name of the form Xxx.jack (the extension is +mandatory), or a folder name (in which case there is no extension). In the latter case, the folder +www.nand2tetris.org / Copyright © Noam Nisan and Shimon Schocken + +contains one or more .jack files, and, possibly, other files as well. The file/folder name may include +a file path. If no path is specified, the analyzer operates on the current folder. +The analyzer handles each file separately. For each Xxx.jack input file, the analyzer constructs a +JackTokenizer for handling the input, and an output file for writing the output (XML code). +Tokenizer +Implement the JackTokenizer module described in the lecture. Test your implementation by writing +a basic JackAnalyzer. In this first version of the JackAnalyzer, the output file corresponding to each +input Xxx.jack file is named XxxT.xml (where "T" stands for tokenized output). After constructing a +JackTokenizer, the analyzer enters a loop that advances and handles all the tokens in the input file, +one token at a time, using the JackTokenizer services. Each token should be printed in a separate +line, as token , where tokenType is an XML tag coding one of the five +possible token types in the Jack language. Here is an example: + +Note that in the case of string constants, the program ignores the double quote characters. This +requirement is by design. +The generated output has two trivial technicalities dictated by XML conventions. First, an XML file +must be enclosed within some begin and end tags; this convention is satisfied by the and + tags. Second, four of the symbols used in the Jack language (<, >, ", &) are also used for +XML markup, and thus they cannot appear as data in XML files. Following convention, the analyzer +represents these symbols as <, >, ", and &, respectively. For example, when the +parser encounters the symbol < in the input file, it outputs the line " < ". +This so-called "escape sequence" is rendered by XML viewers as < , which is +what we want. + +www.nand2tetris.org / Copyright © Noam Nisan and Shimon Schocken + +Testing +Apply your JackAnalyzer to the supplied .jack files, and verify that it operates correctly. View the +output files generated by the analyzer using a web browser, and use the supplied TextComparer +tool (or similar tools) to compare the output files generated by your JackAnalyzer to the supplied +.xml compare files. For example, compare the generated file SquareT.xml to the supplied compare +file SquareT.xml. +Since the generated and compare files have the same names, we suggest keeping them in separate +folders. +Parser +The next version of your syntax analyzer should be capable of parsing every element of the Jack +language, except for expressions and array-oriented commands. To that end, implement the +CompilationEngine module specified in the lecture, except for the compilation routines that handle +expressions and arrays. +For each Xxx.jack file, the analyzer constructs a JackTokenizer for handling the input, and an output +file for writing the output, named Xxx.xml. The analyzer then calls the compileClass routine of the +CompilationEngine. From this point onward, the CompilationEngine routines should call each other +recursively, emitting XML output. +Unit-test this version of your JackAnalyzer by applying it to the folder ExpressionlessSquare. This +folder contains versions of the files Square.jack, SquareGame.jack, and Main.jack, in which each +expression in the original code has been replaced with a single identifier (variable name in scope). +For example: + +Note that the replacement of expressions with variables results in nonsensical code. This is just +fine: The nonsensical code is syntactically correct, and that's all that matters for testing the parser. +Note also that the original and expression-less files have the same names, but are located in +separate folders. +View the output files generated by your JackAnalyzer using a web browser, and compare them to +the supplied .xml compare files. + +www.nand2tetris.org / Copyright © Noam Nisan and Shimon Schocken + +Next, complete the CompilationEngine routines that handle expressions, and test them by applying +your JackAnalyzer to the Square folder. Finally, complete the routines that handle arrays, and test +them by applying your JackAnalyzer to the ArrayTest folder. +Some technical notes: +1. If you are not sure how to format the XML output, inspect the .xml files supplied in this project. +The outputs of your analyzer should be identical to these files. +2. The input Jack code processed by your syntax analyzer is assumed to be valid after all comments +and whitespace characters are removed. The whitespace is replaced by a single space. For +example, if we use underscores to represent spaces (only in this document, for illustration +purpose), then Jack input code like ”let__x___=_17;” should be treated as if it were +”let_x_=_17;” +3. Note that keywords in the Jack language cannot be used as identifiers in Jack programs. +4. When the syntax analyzer produces beginning and ending tags without contents between them, +the output must be formatted as follows: + + +Where "xxx" is the tag name, and the opening and closing tags must be separated by a line break. +This rule is especially relevant for empty parameter lists. +5. The indentation in the XML output is considered neutral, and is ignored. But, it makes the code +easier to read and inspect.