mirror of
https://github.com/soconnor0919/eceg431.git
synced 2025-12-10 06:04:43 -05:00
Project 10 In projects 10 and 11 you’ll develop a compiler for Jack, a simple, Java-like, object-based, high-level language. You will write this compiler in two main stages. In this project you’ll develop a compilation engine that performs syntax analysis (also called parsing). In project 11 you’ll extend the compilation engine to a full-scale compiler that generates executable VM code. Objective Build a syntax analyzer that parses Jack programs according to the Jack grammar. To enable testing for correctness, the analyzer will output XML code, reflecting the syntactic structure of the input source code. This version of the syntax analyzer assumes that the source Jack code is error-free. Error checking, reporting and handling can be added to later versions of the analyzer, but are not part of this project. Contract Write a syntax analyzer for the Jack language, and test it on the supplied test files. The XML files produced by your analyzer should be identical to the supplied compare files, up to white space. Resources The main tool in this project is the programming language that you will use for implementing the syntax analyzer. You will also need the supplied TextComparer tool, or similar tools. This program allows comparing files while ignoring white space. This will enable comparing the output files generated by your analyzer with the supplied compare files. You may also want to inspect these files using an XML viewer. Any standard web browser will do – just use the browser’s file>open menu to open the XML file that you wish to inspect. Test files We provide several .jack files, for testing purposes. Projects/10/Square is a 3-class program that enables moving a black square around the screen using the keyboard’s arrow keys. Projects/10/ArrayTest is a single-class program that computes the average of a user-supplied sequence of integers, using array processing. Both programs were discussed in project 9, so they should be familiar (although their semantics and behaviors are irrelevant for this project). Note though that we have made some harmless changes to the original code, to make sure that the syntax analyzer will be fully tested on all aspects of the Jack language. For example, we've added a static variable to Square/Main.jack, as well as a function, which are neither used nor called. These neutral changes allow testing how the analyzer handles language elements that don't appear in the original Square and ArrayTest files, like static variables, else, and unary operators. Analyzer The analyzer, which is the main program in this project, is invoked using the command "JackAnalyzer source", where source is either a file name of the form Xxx.jack (the extension is mandatory), or a folder name (in which case there is no extension). In the latter case, the folder www.nand2tetris.org / Copyright © Noam Nisan and Shimon Schocken contains one or more .jack files, and, possibly, other files as well. The file/folder name may include a file path. If no path is specified, the analyzer operates on the current folder. The analyzer handles each file separately. For each Xxx.jack input file, the analyzer constructs a JackTokenizer for handling the input, and an output file for writing the output (XML code). Tokenizer Implement the JackTokenizer module described in the lecture. Test your implementation by writing a basic JackAnalyzer. In this first version of the JackAnalyzer, the output file corresponding to each input Xxx.jack file is named XxxT.xml (where "T" stands for tokenized output). After constructing a JackTokenizer, the analyzer enters a loop that advances and handles all the tokens in the input file, one token at a time, using the JackTokenizer services. Each token should be printed in a separate line, as <tokenType> token </tokenType>, where tokenType is an XML tag coding one of the five possible token types in the Jack language. Here is an example: Note that in the case of string constants, the program ignores the double quote characters. This requirement is by design. The generated output has two trivial technicalities dictated by XML conventions. First, an XML file must be enclosed within some begin and end tags; this convention is satisfied by the <tokens> and </tokens> tags. Second, four of the symbols used in the Jack language (<, >, ", &) are also used for XML markup, and thus they cannot appear as data in XML files. Following convention, the analyzer represents these symbols as <, >, ", and &, respectively. For example, when the parser encounters the symbol < in the input file, it outputs the line "<symbol> < </symbol>". This so-called "escape sequence" is rendered by XML viewers as <symbol> < </symbol>, which is what we want. www.nand2tetris.org / Copyright © Noam Nisan and Shimon Schocken Testing Apply your JackAnalyzer to the supplied .jack files, and verify that it operates correctly. View the output files generated by the analyzer using a web browser, and use the supplied TextComparer tool (or similar tools) to compare the output files generated by your JackAnalyzer to the supplied .xml compare files. For example, compare the generated file SquareT.xml to the supplied compare file SquareT.xml. Since the generated and compare files have the same names, we suggest keeping them in separate folders. Parser The next version of your syntax analyzer should be capable of parsing every element of the Jack language, except for expressions and array-oriented commands. To that end, implement the CompilationEngine module specified in the lecture, except for the compilation routines that handle expressions and arrays. For each Xxx.jack file, the analyzer constructs a JackTokenizer for handling the input, and an output file for writing the output, named Xxx.xml. The analyzer then calls the compileClass routine of the CompilationEngine. From this point onward, the CompilationEngine routines should call each other recursively, emitting XML output. Unit-test this version of your JackAnalyzer by applying it to the folder ExpressionlessSquare. This folder contains versions of the files Square.jack, SquareGame.jack, and Main.jack, in which each expression in the original code has been replaced with a single identifier (variable name in scope). For example: Note that the replacement of expressions with variables results in nonsensical code. This is just fine: The nonsensical code is syntactically correct, and that's all that matters for testing the parser. Note also that the original and expression-less files have the same names, but are located in separate folders. View the output files generated by your JackAnalyzer using a web browser, and compare them to the supplied .xml compare files. www.nand2tetris.org / Copyright © Noam Nisan and Shimon Schocken Next, complete the CompilationEngine routines that handle expressions, and test them by applying your JackAnalyzer to the Square folder. Finally, complete the routines that handle arrays, and test them by applying your JackAnalyzer to the ArrayTest folder. Some technical notes: 1. If you are not sure how to format the XML output, inspect the .xml files supplied in this project. The outputs of your analyzer should be identical to these files. 2. The input Jack code processed by your syntax analyzer is assumed to be valid after all comments and whitespace characters are removed. The whitespace is replaced by a single space. For example, if we use underscores to represent spaces (only in this document, for illustration purpose), then Jack input code like ”let__x___=_17;” should be treated as if it were ”let_x_=_17;” 3. Note that keywords in the Jack language cannot be used as identifiers in Jack programs. 4. When the syntax analyzer produces beginning and ending tags without contents between them, the output must be formatted as follows: <xxx> </xxx> Where "xxx" is the tag name, and the opening and closing tags must be separated by a line break. This rule is especially relevant for empty parameter lists. 5. The indentation in the XML output is considered neutral, and is ignored. But, it makes the code easier to read and inspect.