COMP3000: Tips for assignment 2 The assignment is all about syntax analysis; so we need to understand how all the Kiama stuff works. I will use examples from the weeks 5, 6 classes. Parsers Consider: lazy val term : PackratParser[Expression] = term ~ ("*" ~> factor) ^^ { case t ~ f => StarExp (t, f) } | term ~ ("/" ~> factor) ^^ { case t ~ f => SlashExp (t, f) } | factor This is defining a new parser (PackratParser) called term. It is going to produce an Expression. There are three possible things that can be parsed as a term and they are laid out in the next three lines. Note that those three lines must be separated by “|”. It is very easy to forget one of those at the end of a line (but not the last line). Note that the three alternatives for term are tried in the order they are listed. Consider why factor was listed last. Each of those lines is usually going to look like: some things to be parsed ^^ what to do with the data that comes from the parsers The right part of the line is done only if the left part was successfully parsed. Left of ^^ Each parser produces some data but not all of it is useful; e.g. term ~ ("*" ~> factor) Here there are three parsers but we don’t need any data from parsing “*”. It needs to be part of what gets parsed but once the parsing has been successful then any data from “*” can be ignored because there is no relevant data from it. Instead we want the data from term and factor. In the week 6 class we saw how to discard unwanted data by having the parser combinators pointing at what we want to keep. From the data above, we get: (data from term) ~ (data from factor) Here, we can think of ~ as binding these two data items together. The result of this is passed to the right- hand side of our line. ^^ Simplistically, ^^ is an operator that expects its left operand to be data and its right operand to be a function; and it passes that data as an argument to the function. E.g. a ^^ f evaluates f(a) Right of ^^ The right-hand side of our line is: { case t ~ f => StarExp (t, f) } This is a Scala partial function – its parameter must be of the form t ~ f otherwise an exception will be thrown. The data getting passed to it is indeed of that form: t corresponds to the data from term f corresponds to the data from factor This partial function must produce a result of type Expression because this is what the first line of the term parser declares (Packrat[Expression]). StarExp is a subclass of expression, so the result will be an Expression. ^^ with function without parameters ^^ is smarter than what has been described above; for this example, it is sufficient to write: term ~ ("*" ~> factor) ^^ StarExp | If ^^’s left operand has n items of data bound together by n-1 occurrences of ~ (e.g. a ~ b ~ c), and you have a function that expects n arguments then the above form can be used. This is often not possible. Consider the integer and factor parsers: lazy val factor : PackratParser[Expression] = integer ^^ (s => IntExp (s.toInt)) | idnuse ^^ IdnExp lazy val integer : PackratParser[String] = regex ("[0-9]+".r) We can see that the integer parser will return a String. In factor, for the integer alternative, integer returns a string but IntExp expects an Int; so this conversion has to be explicitly handled and a lambda expression (rather than a partial function) has been used because there is not requirement to match on the function parameter. ^^^ There is a variant of ^^ that is sometimes useful. In assignment 2 in the alternatives for the literal parser there is: "false" ^^^ BoolExp (false) Basically, this just says to ignore the result of the left operand (after the parse has been successful) and return the value of the right operand. Other lines Some lines are even simpler when the result of the parse is the result that is required; e.g. the third alternative for term is: factor This parser returns an Expression (as can be seen from the definition of factor) and that is what is required for term, so nothing further to do. Parsing repetition + and * can be used to express repetition in parsing; e.g. : (statement+) ^^ ExpProgram This is requiring one or more statements for a program. + and * will produce vectors. Note that the brackets around statement+ are required. Sometimes a list of things with a separator between each is needed, e.g. to parse a list of integers (“3, 1, 4”). This could be done with (assuming at least one integer): integer ~ ((“,” ~> integer)*) but this is a bit clunky. The simpler way to do this is (when there is a separator): rep1sep(integer, “,”) rep1sep corresponds to +, meaning at least one occurrence of the parser expression that is the first argument. The second argument is the separator. For zero or more occurrences (with a separator), use repsep. \ in Scala strings In a string delimited by “ (e.g. “abc”), \ is used to signal special characters (e.g. \n for newline). To get an actual \ in your string, you need \\. E.g. “ab\\c” is a string of length 4 containing the characters ‘a’, ‘b’, ‘\’, ‘c’. Scala also has multi-line strings delimited by “””. You will see this used in some of the bigger test cases. In a multi-line string, \ has no special meaning, so it can be used as is. Precedence and associativity You should not start writing code for the assignment until you have rewritten the assignment grammar to remove ambiguity from the grammar by using the rules of precedence and associativity as spelt out in the assignment spec. This needs to be done (separately) for: expressions – you will need expression parsers between factor and exp types – tipe and basictipe are enough type parsers; you just need to fill in their details patterns – pat and basicpat are enough pattern parsers; you just need to fill in their details There is an exercise in the week 7 class that shows how to rewrite the grammar. We tend to think of the rules as applying to binary operators but this is too narrow a view. A good example is function application. In HasLang, this is done just by juxtaposing two expressions – no brackets or binary operator required. It still needs full precedence and associativity treatment. What are precedence and associativity Associativity tells us how to bracket an ambiguous expression; e.g. 60 / 6 / 2 This can be bracketed as: (60 / 6) / 2 or 60 /( 6 / 2) [giving 5 or 20] Normally division is left associative (i.e. bracket from the left like the left one above; but for this assignment it is right associative. Operators also have different levels of precedence; e.g. multiplication has higher precedence than addition. This means that the ambiguous expression: 2 + 3 * 4 which can be bracketed as: (2 + 3) * 4 or 2 + (3 * 4) is done the second way because the operator with higher precedence (*) is done first. Precedence will always be an issue when there are two adjacent operators of different precedence (like the example above). FunLine In the definitions of variables, variable values (expressions) and multi-line functions are all represented by FunLine objects. A Funline takes three values: an identifier (string) a vector of patterns an expression For a definition of a variable with a value, the first two FunLine fields are irrelevant and are filled with “” and Vector(). For example for: x :: Int = 100 we get: Defn(IdnDef("x", IntType()), Vector(FunLine("", Vector(), IntExp(100)))) For a multi-line function, the identifier repeats the name of the function and there is a non-empty vector of patterns. For example for: length :: [Int] -> Int length [] = 0. length h:t = 1 + length t we get: Defn(IdnDef("length", FunType(ListType(IntType()), IntType())), Vector(FunLine("length", Vector(ListPat(Vector())), IntExp(0)), FunLine("length", Vector(ConsPat(IdentPat("h"), IdentPat("t"))), PlusExp(IntExp(1), AppExp(IdnUse("length"), IdnUse("t"))))) The first function line was used to populate the IdnDef with the name and type of the function. The second and third lines of the function (note the dot separator at the end of the second line) each have a FunLine with “length”, a pattern and an expression. Additional information based on consult session 13 Sep In SyntaxAnalysis.scala: in factor: don’t change the existing lines in factor, just add some extra lines, but always keep the failure line last (and don’t use failure anywhere else in the file) exp needs to be the top-level parser for expressions; you will need other expression parsers between factor and exp (based on precedence and associativity analysis) but do not rename exp None of the existing test cases address precedence or associativity. You will need to write some. Testing The way you will need to test that == is not associative is to parse “1 == 2 == 3” and find that it ignores the last part (“ == 3”) and produces a syntax tree just for “1 == 2”. Because I have made – and / right associative, this has some consequences if you have + and – or * and / next to each other. For example, “1 + 2 – 3” and “1 / 2 * 3” don’t produce what you might expect. There will be no test cases like these two. Some precedence and associativity tests don’t make sense from a type perspective but still need to be done. Additional information based on consult session 20 Sep In the HasLang syntax, function application is shown as: exp exp , i.e. two expressions next to each other. In the specification of associativity, the list cons operator is listed; this also refers to “:” when used in a pattern. Some parts of the syntax do not have precedence and associativity because there is no parsing ambiguity when they are used – they can be added to factor. Note that once you have rewritten the grammar using the precedence and associativity rules, your grammar will have extra rows (like the B row and the C row in the Precedence and Associativity presentation). The A row corresponds to factor and the D row corresponds to exp. For each of the rows in between, you will need a new parser, i.e. something that starts with lazy val your-parser-name : PackratParser[Exp] = You have rewritten the grammar but it is still just a grammar. Each line now needs to use Kiama’s parser combinators (~, ~>, <~) and rewritten like: ... ~ ... ^^ ... (with | at end if another alternative follows) See week 6 class for how the IF statement was incorporated into the parsing. There will be no testing involving HasLang comments. Just ignore comments. Distinguishing between a bracketed expression and a tuple may be difficult. The last page of the Precedence and Associativity presentation may provide a hint. 20 Sep 2021
欢迎咨询51作业君