2020/10/24 CS 241 — Fall 2020 — Assignment 5 https://student.cs.uwaterloo.ca/~cs241/a5/ 1/4 CS 241 — Fall 2020 — Assignment 5 Assignments for CS 241 ← Assignment 4 Assignment 5 Assignment 6 → Fri, Oct 23rd at 5:00 pm Fri, October 30th at 5:00 pm Fri, Nov 13th, at 5:00 pm P1 • P2 • P3 • P4 • P5 • P6 • P7 • P8 In this assignment, you will write a scanner for WLP4, and get some practice working with context-free grammars. Marmoset Notes For each question, Marmoset has public and release tests, and often has secret tests as well. When you release test a submission, Marmoset will only report on public and release tests. No information about secret tests will be visible until after the assignment deadline. Marmoset tests are not available yet. A post will be made on Piazza when they are available. The Marmoset test mark distribution for the questions will be made available on this page at the same time as the tests themselves. Part I. Scanning Problem 1 (filename: wlp4scan.rkt or wlp4scan.cc) Write a scanner (lexical analyzer) for the WLP4 programming language. Your scanner should read an entire WLP4 program and identify the tokens, in the order they appear in the input. For each token, your scanner should compute the kind of the token and the lexeme (the string of characters making up the token), and print it to standard output, one line per token. For example, the correct output for this WLP4 program // // WLP4 program with two integer parameters, a and b // returns the sum of a and b // int wain(int a, int b) { return a + b; // unhelpful comment about summing a and b } is: INT int WAIN wain LPAREN ( INT int ID a COMMA , 2020/10/24 CS 241 — Fall 2020 — Assignment 5 https://student.cs.uwaterloo.ca/~cs241/a5/ 2/4 INT int ID b RPAREN ) LBRACE { RETURN return ID a PLUS + ID b SEMI ; RBRACE } If the input cannot be scanned as a sequence of valid tokens (possibly separated by white space as detailed in the WLP4 language specification), your program should produce exactly one line of output to standard error containing the string ERROR, in upper case. We recommend that you try to make the error message informative. A reference implementation of the scanner called wlp4scan is available after running source /u/cs241/setup. When testing your program, you can compare its output to wlp4scan's to make sure it is correct. Run it as follows: wlp4scan < input.wlp4 > tokens.txt You may wish to modify the starter code from Assignment 3 to solve this problem. Note that the scanner in these starters is based on a DFA that repeatedly recognizes the longest prefix of a string that is a token. You should be able to replace the DFA (which recognizes MIPS assembly language tokens) by one which recognizes WLP4 tokens. Click here to go back to the top of the page. Part II. Context-free Grammars You are to create context-free grammars and derivations represented as CFG files each of which is a textual reprentation of a CFG, optionally followed by a textual representation of several derivations. The student.cs environment provides a tool cs241.cfgcheck which checks the validity of the CFG and derivations. Problem 2 (filename: a5p2.cfg) Create a file representing the CFG with the following production rules: S → BOF A EOF A → x A → A x Problem 3 (filename: a5p3.cfg) Add derivations for the following strings to the CFG file created for Problem 2: BOF x EOF BOF x x EOF BOF x x x EOF 2020/10/24 CS 241 — Fall 2020 — Assignment 5 https://student.cs.uwaterloo.ca/~cs241/a5/ 3/4 Problem 4 (filename: a5p4.cfg) The following production rules yield an ambiguous CFG: S → BOF A EOF A → x A → A x A Find a string for which the derivation is ambiguous. Compose a CFG file representing the CFG and two different derivations for your string. Problem 5 (filename: a5p5.cfg) Add a derivation to this context-free grammar for the following sequence of symbols: BOF id EOF Problem 6 (filename: a5p6.cfg) Add a derivation to this context-free grammar for the following sequence of symbols: BOF ( id - ( id - id ) ) - id EOF Problem 7 (filename: a5p7.cfg) Add a derivation to this augmented grammar for WLP4 for the following (augmented) sequence of WLP4 tokens: BOF INT WAIN LPAREN INT ID COMMA INT ID RPAREN LBRACE RETURN ID PLUS ID STAR ID SEMI RBRACE EOF Problem 8 (filename: galaxy.rkt or galaxy.cc) Write a Racket or C++ program that reads a CFG file that contains the same grammar as for Problems 5 and 6 with the example derivation replaced by another valid derivation for the same grammar. Assume that each occurrence of the symbol id represents the value 42, that each occurrence of - represents subtraction, that operations associate from left to right, except where overridden by parentheses. Output a single line giving the value of the derived expression. The correct output for the sample grammar and derivation is a single line containing -42. Notes: The grammar is fixed, so you may, if you wish, simply skip over the grammar part of the .cfg file, and embed assumptions about the syntax directly into your program. The input will have exactly one derivation following the grammar. 2020/10/24 CS 241 — Fall 2020 — Assignment 5 https://student.cs.uwaterloo.ca/~cs241/a5/ 4/4 The derivation will be valid. You may find these programs useful: derivationprinter.rkt derivationprinter.cc. Each reads a .cfg file, skips over the grammar, and prints the rules used in the derivation, with the indentation removed. You will want to replace the part that reads and prints the rules, so that it instead processes the rules to evaluate the expression. Hint: If you were only given the final string of terminal symbols, this problem would be a lot harder. The derivation contains useful information about the structure of the expression, and you should take advantage of this information. Click here to go back to the top of the page.
欢迎咨询51作业君