ECS 140A – Programming Languages Project Due: Sunday, May 29, 2022, 11:59pm PT This project specification is subject to change at any time for clarification. 1 Overview This is a group project. Each group may have up to 2 people. Instructions for submission is at the end of the file. For this project you will be writing a program in Rust to parse language X. Language X is similar to the C language, but is simpler and slightly different. If a program written in language X is syntactically correct, your parser should convert it to a syntax highlighted XHTML version. Otherwise, your parser should report an error. There are several stages of the project. Try to follow them to maximize your grade in case you couldn’t get it to to work completely. 2 X Programming Language The following EBNF describes the X language. The EBNF is a little different from what we have been using in class. The non-terminals on the right are identified by italics. Literal values, i.e., terminals, are specified in bold. The operators not in bold or italics describe the special options of the EBNF: {} for repetition, [ ] for optional, () for grouping, and | for or. And the rule definition symbol is := instead of ::=. The token rules specify what a valid identifier, integer constant or float constant is. For example, _ab and a2 are valid identifiers, but 2x is not. Note that the X language allows for nested function declarations, which is not allowed in C or Rust. And the grammar will not pass the disjoitness test for FIRST sets, which means your parser will sometimes need to look ahead more than one lexeme to be able to decide which option to take. 1 3 Instruction 3.1 Stage 1: Read in the File Tasks: 1. Define a CStream struct that is able to read the input file character by character. • Unlike C++, Rust does not have a convenient way of reading a file character by character using FileInputStream. For simplicity, you may read the whole file at once and store it in a string or a vector of strings. Then you may read the stored string or vector of strings character by character. • But keep in mind that this is not how modern scanner works because the input program may be too large to be stored as a second copy. • Do NOT modify the input program in any way. If it has an extra newline that makes it ugly, that leave it be. 2. Read the input file from command line. 3. Print the stored file to the screen. Note: 2 • Stage 1 is worth 10 points. 3.2 Stage 2: Writing the Scanner (Lexical Analyzer) Tasks: 1. Define an enumerated type called TokenType with the following elements: IntConstant, FloatConstant, Keyword, Operator, Identifier, and Invalid. • The keywords of language X are: unsigned, char, short, int, long, float, double, while, if, return, void, and main. • The operators of language X are: (, ,, ), {, }, =, ==, <, >, <=, >=, !=, +, -, *, /, and ;. • Invalid is for lexemes that don’t follow the token rules, e.g., 2x. • You can add extras for empty spaces, new lines, etc but it’s not required. 2. Define a struct called Token • The Token struct must have at least the following attributes: – text: of type String; the text of the token – token_type: of type TokenType; the token type of the token – line_num: of type i32; the line number of the token; the first line is numbered 0 – char_pos: of type i32; the character position of the first character in the token text; starting at 0 for each new line 3. Write a struct called Scanner that will tokenize the file read from Stage 1 into the token types defined in TokenType. • The Scanner struct probably needs a function named get_next_token() that when called, will return the next token as read from the .x file. The return type is Token. 4. Store all the tokens in the input program in order as a vector of tokens. • Name the vector all_tokens and create it in the main function. • Make some comments around it so TAs can easliy find. • You may implement a function in the Scanner struct that will get all the tokens by repeatedly calling get_next_token(), or outside the Scanner struct. No particular requirement. Note: • We will grade this stage by comparing the vector of tokens you returned. We will consider two to- kens being equal if they have the same text, same token type, same line number and same character position. So make sure you do NOT delete any empty spaces from the input file. • Stage 2 is worth 10 points. 3.3 Stage 3: Writing the Parser (Syntax Analyzer) Task: • Write a recursive descent parser struct called Parser that will parse the .x file. – The Parser struct should implement one method per EBNF rule. 3 – It should also be able to validate that the .x file does conform to the grammar specified. If an error is found in the .x file, the line number and character position of the error and the grammar rule where the error was found should be printed with an appropriate message. Then the parser terminates. (Also keep in mind that this is not how modern parsers work. But for simplicity, you do not need to find as many syntactic errors as possible.) – You’ll need to use the custom_error to return an error with the customized error message. The error message writes something like this: Error at Line 2 Character 10. The syntax should be: DeclarationType := DataType Identifier. – The line number and character position is for the token that violated the grammar rule. – Depending on how you implement the functions for the grammar rules, the exact first token that is detected for violating the grammar rules might differ slightly, e.g., by 1 or 2, but not much. – If the .x file has no syntax error, print "Input program is syntactacilly correct." Note: • Stage 3 is worth 55 points. 3.4 Stage 4: Output XHTML File Task: • If the input .x file is syntactically correct, outputs a .xhtml file, i.e., a syntax highlighted file. – The background color is navy, default font color is orange, and font is Courier New. – The colos and styles of tokens in the .xhtml file are: Token type Font color style Identifier yellow Float constant aqua bold Int constant aqua bold Operator white bold Keyword white bold – Each nested block should be indented another level. – The .xhtml file should be a valid .html file that can be loaded into any web browser and viewed. You will probably need to begin your .xhtml file with: You may also need to have attributes in your html element. So it should look like: – You may get rid of extra empty spaces or lines in the .xhtml file. – Examples of valid input file .x and the output .xhtml file can be found on Canvas. Note: • Stage 4 is worth 20 points. 4 4 Submission You must submit a zipped folder that includes your entire project cargo (named parser) and a README file. The cargo should include the example .x files as well. Each group MUST only submit one copy. The README file is worth 5 points. The README file should include: • The names, emails and student IDs of the members in your group. • What stages you are able to finish. This will help TAs speed up the grading process. • Instructions on how to run your code. It should simply be something like cargo run parser ex.x but just in case. • Any sources of code that you have viewed to help you complete this project. You should avoid using existing source code as a primer that is currently available on the Internet (except for your own code or my version from previous homework). You are also not allowed to use the parser tools found on the Internet. All class projects will be submitted to MOSS to determine if students have excessively collaborated. Excessive collaboration, or failure to list external code sources will result in the matter being referred to Student Judicial Affairs. 5
欢迎咨询51作业君