COMP3000: Tips for assignment 2

The assignment is all about syntax analysis; so we need to understand how all the Kiama stuff works. I will

use examples from the weeks 5, 6 classes.

Parsers

Consider:

lazy val term : PackratParser[Expression] =

term ~ ("*" ~> factor) ^^ { case t ~ f => StarExp (t, f) } |

term ~ ("/" ~> factor) ^^ { case t ~ f => SlashExp (t, f) } |

factor

This is defining a new parser (PackratParser) called term. It is going to produce an Expression. There are

three possible things that can be parsed as a term and they are laid out in the next three lines. Note that

those three lines must be separated by “|”. It is very easy to forget one of those at the end of a line (but

not the last line).

Note that the three alternatives for term are tried in the order they are listed. Consider why factor was

listed last.

Each of those lines is usually going to look like:

some things to be parsed ^^ what to do with the data that comes from the parsers

The right part of the line is done only if the left part was successfully parsed.

Left of ^^

Each parser produces some data but not all of it is useful; e.g.

term ~ ("*" ~> factor)

Here there are three parsers but we don’t need any data from parsing “*”. It needs to be part of what gets

parsed but once the parsing has been successful then any data from “*” can be ignored because there is

no relevant data from it. Instead we want the data from term and factor. In the week 6 class we saw how

to discard unwanted data by having the parser combinators pointing at what we want to keep. From the

data above, we get:

(data from term) ~ (data from factor)

Here, we can think of ~ as binding these two data items together. The result of this is passed to the right-

hand side of our line.

^^

Simplistically, ^^ is an operator that expects its left operand to be data and its right operand to be a

function; and it passes that data as an argument to the function. E.g.

a ^^ f evaluates f(a)

Right of ^^

The right-hand side of our line is:

{ case t ~ f => StarExp (t, f) }

This is a Scala partial function – its parameter must be of the form t ~ f otherwise an exception will be

thrown. The data getting passed to it is indeed of that form:

t corresponds to the data from term

f corresponds to the data from factor

This partial function must produce a result of type Expression because this is what the first line of the term

parser declares (Packrat[Expression]). StarExp is a subclass of expression, so the result will be an

Expression.

^^ with function without parameters

^^ is smarter than what has been described above; for this example, it is sufficient to write:

term ~ ("*" ~> factor) ^^ StarExp |

If ^^’s left operand has n items of data bound together by n-1 occurrences of ~ (e.g. a ~ b ~ c), and you

have a function that expects n arguments then the above form can be used.

This is often not possible. Consider the integer and factor parsers:

lazy val factor : PackratParser[Expression] =

integer ^^ (s => IntExp (s.toInt)) |

idnuse ^^ IdnExp

lazy val integer : PackratParser[String] =

regex ("[0-9]+".r)

We can see that the integer parser will return a String. In factor, for the integer alternative, integer

returns a string but IntExp expects an Int; so this conversion has to be explicitly handled and a lambda

expression (rather than a partial function) has been used because there is not requirement to match on

the function parameter.

^^^

There is a variant of ^^ that is sometimes useful. In assignment 2 in the alternatives for the literal parser

there is:

"false" ^^^ BoolExp (false)

Basically, this just says to ignore the result of the left operand (after the parse has been successful) and

return the value of the right operand.

Other lines

Some lines are even simpler when the result of the parse is the result that is required; e.g. the third

alternative for term is:

factor

This parser returns an Expression (as can be seen from the definition of factor) and that is what is required

for term, so nothing further to do.

Parsing repetition

+ and * can be used to express repetition in parsing; e.g. :

(statement+) ^^ ExpProgram

This is requiring one or more statements for a program. + and * will produce vectors. Note that the

brackets around statement+ are required.

Sometimes a list of things with a separator between each is needed, e.g. to parse a list of integers (“3, 1,

4”). This could be done with (assuming at least one integer):

integer ~ ((“,” ~> integer)*)

but this is a bit clunky. The simpler way to do this is (when there is a separator):

rep1sep(integer, “,”)

rep1sep corresponds to +, meaning at least one occurrence of the parser expression that is the first

argument. The second argument is the separator. For zero or more occurrences (with a separator), use

repsep.

\ in Scala strings

In a string delimited by “ (e.g. “abc”), \ is used to signal special characters (e.g. \n for newline). To get an

actual \ in your string, you need \\. E.g. “ab\\c” is a string of length 4 containing the characters ‘a’, ‘b’, ‘\’,

‘c’.

Scala also has multi-line strings delimited by “””. You will see this used in some of the bigger test cases. In

a multi-line string, \ has no special meaning, so it can be used as is.

Precedence and associativity

You should not start writing code for the assignment until you have rewritten the assignment grammar to

remove ambiguity from the grammar by using the rules of precedence and associativity as spelt out in the

assignment spec.

This needs to be done (separately) for:

expressions – you will need expression parsers between factor and exp

types – tipe and basictipe are enough type parsers; you just need to fill in their details

patterns – pat and basicpat are enough pattern parsers; you just need to fill in their details

There is an exercise in the week 7 class that shows how to rewrite the grammar.

We tend to think of the rules as applying to binary operators but this is too narrow a view. A good

example is function application. In HasLang, this is done just by juxtaposing two expressions – no brackets

or binary operator required. It still needs full precedence and associativity treatment.

What are precedence and associativity

Associativity tells us how to bracket an ambiguous expression; e.g.

60 / 6 / 2

This can be bracketed as:

(60 / 6) / 2 or 60 /( 6 / 2) [giving 5 or 20]

Normally division is left associative (i.e. bracket from the left like the left one above; but for this

assignment it is right associative.

Operators also have different levels of precedence; e.g. multiplication has higher precedence than

addition. This means that the ambiguous expression:

2 + 3 * 4

which can be bracketed as:

(2 + 3) * 4 or 2 + (3 * 4)

is done the second way because the operator with higher precedence (*) is done first. Precedence will

always be an issue when there are two adjacent operators of different precedence (like the example

above).

FunLine

In the definitions of variables, variable values (expressions) and multi-line functions are all represented by

FunLine objects.

A Funline takes three values:

an identifier (string)

a vector of patterns

an expression

For a definition of a variable with a value, the first two FunLine fields are irrelevant and are filled with “”

and Vector(). For example for:

x :: Int = 100

we get:

Defn(IdnDef("x", IntType()),

Vector(FunLine("", Vector(), IntExp(100))))

For a multi-line function, the identifier repeats the name of the function and there is a non-empty vector

of patterns. For example for:

length :: [Int] -> Int

length [] = 0.

length h:t = 1 + length t

we get:

Defn(IdnDef("length", FunType(ListType(IntType()),

IntType())),

Vector(FunLine("length", Vector(ListPat(Vector())),

IntExp(0)),

FunLine("length",

Vector(ConsPat(IdentPat("h"),

IdentPat("t"))),

PlusExp(IntExp(1),

AppExp(IdnUse("length"),

IdnUse("t")))))

The first function line was used to populate the IdnDef with the name and type of the function.

The second and third lines of the function (note the dot separator at the end of the second line) each have

a FunLine with “length”, a pattern and an expression.

Additional information based on consult session 13 Sep

In SyntaxAnalysis.scala:

in factor: don’t change the existing lines in factor, just add some extra lines, but always keep the

failure line last (and don’t use failure anywhere else in the file)

exp needs to be the top-level parser for expressions; you will need other expression parsers

between factor and exp (based on precedence and associativity analysis) but do not rename exp

None of the existing test cases address precedence or associativity. You will need to write some.

Testing

The way you will need to test that == is not associative is to parse “1 == 2 == 3” and find that it ignores the

last part (“ == 3”) and produces a syntax tree just for “1 == 2”.

Because I have made – and / right associative, this has some consequences if you have + and – or * and /

next to each other. For example, “1 + 2 – 3” and “1 / 2 * 3” don’t produce what you might expect. There

will be no test cases like these two.

Some precedence and associativity tests don’t make sense from a type perspective but still need to be

done.

Additional information based on consult session 20 Sep

In the HasLang syntax, function application is shown as: exp exp , i.e. two expressions next to each

other.

In the specification of associativity, the list cons operator is listed; this also refers to “:” when used in a

pattern.

Some parts of the syntax do not have precedence and associativity because there is no parsing ambiguity

when they are used – they can be added to factor.

Note that once you have rewritten the grammar using the precedence and associativity rules, your

grammar will have extra rows (like the B row and the C row in the Precedence and Associativity

presentation). The A row corresponds to factor and the D row corresponds to exp. For each of the rows in

between, you will need a new parser, i.e. something that starts with

lazy val your-parser-name : PackratParser[Exp] =

You have rewritten the grammar but it is still just a grammar. Each line now needs to use Kiama’s parser

combinators (~, ~>, <~) and rewritten like:

... ~ ... ^^ ... (with | at end if another alternative follows)

See week 6 class for how the IF statement was incorporated into the parsing.

There will be no testing involving HasLang comments. Just ignore comments.

Distinguishing between a bracketed expression and a tuple may be difficult. The last page of the

Precedence and Associativity presentation may provide a hint.

20 Sep 2021

欢迎咨询51作业君