Monash University
Faculty of Information Technology
2
nd
Semester 2024
FIT2014
Assignment 2
Regular Languages, Context-Free Languages, Lexical analysis, Parsing,
Turing machines and Quantum Computation
DUE: 11:55pm, Friday 4 October 2024
In these exercises, you will
•implement a lexical analyser usinglex(Problem 3);
•implement parsers usinglexandyacc(Problems 1–6);
•program a Turing machine (Problem 7);
•learn about some aspects of quantum circuits and quantum registers, by applying our methods
to calculations with them (Problems 2–6);
•practise your skills relating to the Pumping Lemmas and Context-Free Languages (Problem 8).
Solutions to Problem 7 must be implemented in the simulator Tuatara. We are providing version
2.1 on Moodle under week 8; the file name istuatara-monash-2.1.jar.You must use exactly
this version, not some other version downloaded from the internet.Do not unpack this
file. It must be run directly using the Java runtime.
How to manage this assignment
•You should start working on this assignment now and spread the work over the time until it
is due. Do as much as possiblebeforeweek 10.
•Don’t be deterred by the length of this document! Much of it is an extended tutorial to get
you started withlexandyacc(pp. 7–11) and documentation for functions, written in C, that
are provided for you to use (pp. 12–17); some matrices and sample outputs also take up a fair
bit of space. There is an optional three-page introduction tosomeof the basics of quantum
computing (pp. 17–19), which is there for those who are interested in knowing more but is
not required for this assignment. Althoughlexandyaccare new to you, the questions about
them only require you to modify some existing input files for them rather than write your own
input files from scratch.
•The tasks required for the assignment are on pp. 3–6.
For Problems 1–5, read the background material on pp. 7–11.
For Problems 2–6, also read the background material on pp. 12–17.
For Problem 7, read the background material on p. 21.
For Problem 8, read the background material on p. 22.
Instructions
Instructions are mostly as for Assignment 1, except that some of the filenames have changed, and now
each Problem has its own directory. To begin working on the assignment, download the workbench
asgn2.zipfrom Moodle. Create a new Ed Workspace and upload this file, letting Ed automatically
extract it.Edit thestudent-idfile to contain your name and student ID.Refer to Lab 0
for a reminder on how to do these tasks.
Open a terminal and change into the directoryasgn2. You will find four other files already in
the directory:plus-times-power.l,plus-times-power.y,quant.handprob6.awk. You will not
modify these files directly; you will make copies of the first two and modify the copies, whilequant.h
andprob6.awkmust remain unaltered (although it’s ok to have copies of them in other directories).
1
You need to construct newlexfiles, usingplus-times-power.las a starting point, for Problems
1, 3, 4 & 5, and you’ll need to construct a newyaccfile fromplus-times-power.yfor Problems 4
& 5. Your submission must include:
•the filestudent-id, edited to contain your name and student ID
•alexfileprob1.lwhich should be obtained by modifying a copy ofplus-times-power.l
•a PDF fileprob2.pdfwhich should contain your solution to Problem 2 (with your name and
student ID at the start)
•alexfileprob3.lwhich should also be obtained by modifying a copy ofplus-times-power.l
•alexfileprob4.lwhich should be obtained by modifying a copy ofprob3.l
•ayaccfileprob4.ywhich should be obtained by modifying a copy ofplus-times-power.y
•alexfileprob5.lwhich should be obtained by modifying a copy ofprob4.l
•ayaccfileprob5.ywhich should be obtained by modifying a copy ofprob4.y
•atextfileprob6.txtwhich should contain ten lines, being your solution to Problem 6
•a Tuatara Turing machine fileprob7.tm
•a PDF fileprob8.pdfwhich should contain your solution to Problem 8 (with your name and
student ID at the start).
The original directory structure and file locations must be preserved. For each problem, the files
you are submitting must be in the corresponding subdirectory, i.e.problemxfor Problemx.
Each of these problem subdirectories contains empty files with the required filenames. These must
each be replaced by the files you write, as described above. Before submission,checkthat each of
these empty files is, indeed, replaced by your own file.
To submit your work, download the Ed workspace as a zip file by clicking on “Download All” in
the file manager panel. The “Download All” option preserves the directory structure of the zip file,
which is required to aid the marking process.You must submit this zip file to Moodle by the
deadline given above.
2
Assignment tasks
First, read about “Lex, Yacc and thePLUS-TIMES-POWERlanguage” on pp. 7–11.
Problem 1. [2 marks]
Constructprob1.l, as described on pp. 9–11, so that it can be used withplus-times-power.y
to build a parser forPLUS-TIMES-POWER.
Now refer to the document “Quantum circuits, registers and the languageQUANT”, pages 12–17.
In fact, for Problem 2, you can focus just on pages 15–17 and the languageQCIRC.
Problem 2. [3 marks]
Write a Context-Free Grammar for the languageQCIRCover the eleven-symbol alphabet
{I,H,X,Y,Z,CNOT,TOF,*,⊗,(,)}. It can be typed or hand-written, but must be in PDF
format and saved as a fileprob2.pdf.pdf.
Now we use some very basic regular expressions (in thelexfile,prob3.l) and a CFG (in the
yaccfile,prob4.y) to construct a lexical analyser (Problem 3) and a parser (Problem 4) forQCIRC.
Problem 3. [3 marks]
Using the file provided forPLUS-TIMES-POWERas a starting point, construct alexfile,
prob3.l, and use it to build a lexical analyser forQCIRC.
You’ll need to introduce simple regexps for the various tokens, among other things.
Sample output:
$./a.out
CNOT* ( H (x)I)
Token: CNOT; Lexeme: CNOT
Token and Lexeme: *
Token and Lexeme: (
Token: H; Lexeme: H
Token: KRONECKERPROD; Lexeme: (x)
Token: I; Lexeme: I
Token and Lexeme: )
Token and Lexeme:
Control-D
3
Problem 4. [6 marks]
Make a copy ofprob3.l, call itprob4.l, then modify it so that it can be used withyacc.
Then construct ayaccfileprob4.yfromplus-times-power.y. Then use theselexandyacc
files to build a parser forQCIRCthat can correctly evaluate any expression in that language.
Note that you do not have to program any of the quantum expression functions yourself.
They have already been written: see the Programs section of theyaccfile. Theactionsin your
yaccfile will need to call these functions, and you can do that by using the function call for
pow(. . .)inplus-times-power.yas a template.
The core of your task is to write the grammar rules in the Rules section, inyaccformat,
with associated actions, using the examples inplus-times-power.yas a guide. You also need
to do some modifications in the Declarations section; see page 10 and further details below.
When entering your grammar into the Rules section ofprob4.y, it is best to leave the existing
rules for the nonterminalstartunchanged, as this has some extra stuff designed to allow you to
enter a series of separate expressions on separate lines. So, take the Start symbol from your grammar
in Problem 2 and represent it by the nonterminallineinstead of bystart.
The specific modifications you need to do in the Declarations section should be:
•You need a new%tokendeclaration for the tokensI,H,X,Y,Z,CNOT,TOF, andKRONECKERPROD.
These have the same structure as the line for theNUMBERtoken, except that “num” is replaced
by “str” (since each of the above tokens represents a string, being names of matrices, registers
or operations, whereasNUMBERrepresents a number).
•You should include each of our two matrix operations,*(multiplication) andKRONECKERPROD
(the Kronecker product,⊗), in a%leftor%rightstatement. Such a statement specifies when
an operation is left- or right-associative, i.e., whether you do multiple consecutive operations
left-to-right or right-to-left. Mathematically, for*and⊗, it doesn’t matter. So, for these, you
can use either%leftor%right. But you should do one of them, because it helps the parser
determine an order in which to do the operations and removes some ambiguity. For operations
whose%leftor%rightstatements are on different lines, the operations with higher precedence
are those with higher line numbers (i.e.,laterin the file). Ordinary matrix multiplication
should have higher precedence than Kronecker product.
•For nonterminal symbols, you can include a%typeline that declares its type, i.e., the type of
value that is returned when an expression generated from this nonterminal is evaluated. E.g.,
%type
Here, “qmx” is the type name we are using for quantum matrices. The various type names can
be seen in the%unionstatement a little earlier in the file. But you do not need to know how
that works in order to do this task.
•You should still usestartas your Start symbol. If you use another name instead, you will
need to modify the%startline too.
Sample output:
$./a.out
CNOT* ( H (x)I)
4x4 matrix:
0.7071 0.0000 0.7071 0.0000
0.0000 0.7071 0.0000 0.7071
0.0000 0.7071 0.0000 -0.7071
0.7071 0.0000 -0.7071 0.0000
Control-D
4
Problem 5. [5 marks]
Make a copy ofprob4.l, call itprob5.l. Also, make a copy ofprob4.y, call itprob5.y. Then
modify theselexandyaccfiles further to build a parser forQUANTthat can correctly evaluate
any expression in that language.
Again, the core of your task is to write the grammar rules in the Rules section, inyacc
format with associated actions, in prob5.y. You will also need new tokensKET0andKET1,
which representk0andk1respectively. These tokens need appropriate rules inprob5.land
declarations inprob5.y, and you will use them in the grammar.
Problem 6. [5 marks]
In theproblem6directory you will find a file,prob6.awk. This is anawkprogram for converting
your student ID number to a particular quantum register expression.
Do this conversion by running thisawkprogram usingawk -f prob6.awk, then (when it
waits for your input) entering your student ID number. You will see the quantum register
expression appear as output.
(a)Copy this quantum register expression (which will be a member ofQUANT) and enter it
as the first line in the text fileprob6.txt.
(b)Run your parser on your expression from (a), and report the result of evaluating it by
appending the output to the fileprob6.txt.
The answer to (a) should be the first line of your fileprob6.txt. Your answer to (b) should
occupy the remaining nine lines. The file should therefore have ten lines altogether. You can
use wc prob6.txt to verify this.
For example, if your ID is 12345678, then your ten-line fileprob6.txtshould look like this:
(X (x) Y (x) Z) * (Y (x) CNOT) * (X (x) Y (x) Z) * (k0 (x) k0 (x) k0)
3-qubit register, 8-element vector:
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000+1.0000i
0.0000
0.0000
5
Turing machines
Now refer to the description of thenumber choice functionon page 21.
Problem 7. [7 marks]
Build, in Tuatara v2.1, a Turing machine that computes the number choice function, and save
the Turing machine as a fileprob7.tm.
Context-Free Languages
Now refer to the description ofUniversal Modelson page 22.
Problem 8. [9 marks]
(a) Prove, using the Pumping lemma for Regular Languages, that there is no Universal Regular
Expression.
(b) Prove, using the Pumping lemma for Context-Free Languages, that there is no Universal
Context-Free Grammar.
Your submission can be typed or hand-written, but it must be in PDF format and saved as
a single fileprob8.pdf.
6
Lex, Yacc and thePLUS-TIMES-POWERlanguage
In this part of the Assignment, you will use the lexical analyser generatorlex, initially by itself,
and then with the parser generatoryacc
1
.
Some useful references on Lex and Yacc:
•T. Niemann,Lex & Yacc Tutorial,http://epaperpress.com/lexandyacc/
•Doug Brown, John Levine, and Tony Mason,lex and yacc (2nd edn.), O’Reilly, 2012.
•thelexandyaccmanpages.
We will illustrate the use of these programs with a languagePLUS-TIMES-POWERbased on
simple arithmetic expressions involving nonnegative integers, using just addition, multiplication and
powers. Then you will uselexandyaccon some languages related to quantum computing.
PLUS-TIMES-POWER
The languagePLUS-TIMES-POWERconsists of expressions involving addition, multiplication and
powers of nonnegative integers, without any parentheses (except for those required by the function
Power). Example expressions include:
5 + 8,8 + 5,3 + 5∗2,13 + 8∗4 + Power(2,Power(3,2)),Power(1,3) + Power(5,3) + Power(3,3),
Power(999,0),0 + 99∗0 + 1,2014,10∗14 + 74 + 10∗13∗73,2∗3∗5∗7∗11∗13∗17∗19.
In these expressions, integers are written in unsigned decimal, with no leading zeros or decimal point
(so 2014, 86, 10, 7, and 0 are ok, but +2014,−2014, 86.0, A, 007, and 00 are not).
For lexical analysis, we treat every nonnegative integer as a lexeme for the token NUMBER.
Lex
An input file tolexis, by convention, given a name ending in.l. Such a file has three parts:
•definitions,
•rules,
•C code.
These are separated by double-percent,%%. Comments begin with/*and end with*/. Any
comments are ignored whenlexis run on the file.
You will find an input file,plus-times-power.l, among the files for this Assignment. Study
its structure now, identifying the three sections and noticing that various pieces of code have been
commented out. Those pieces of code are not neededyet, but some will be needed later.
We focus mainly on the Rules section, in the middle of the file. It consists of a series of statements
of the form
pattern{action}
where thepatternis a regular expression and theactionconsists of instructions, written in C,
specifying what to do with text that matches thepattern.
2
In our file, eachpatternrepresents a set
of possible lexemes which we wish to identify. These are:
1
actually, Linux includes more modern implementations of these programs calledflexandbison.
2
This may seem reminiscent ofawk, but note that: the pattern is not delimited by slashes,/. . ./, as inawk; the
actioncode is in C, whereas inawkthe actions are specified inawk’s own language, which has similarities with C but
is not the same; and theactionpertains only to the text that matches the pattern, whereas inawkthe action pertains
to the entire line in which the matching text is found.
7
•a decimal representation of a nonnegative integer, represented as described above;
– This is taken to be an instance of the token NUMBER (i.e., a lexeme for that token).
•the specific stringPower, which is taken to be an instance of the token POWER.
•certain specific characters:+,*,(,), and comma;
•the newline character;
•white space, being any sequence of spaces and tabs.
Note that all matching inlexis case-sensitive.
Ouractionis, in most cases, to print a message saying what token and lexeme have been found.
For white space, we take no action at all. A character that cannot be matched by any pattern yields
an error message.
If you runlexon the fileplus-times-power.l, thenlexgenerates the C programlex.yy.c.
3
This is the source code for the lexical analyser. You compile it using a C compiler such ascc.
For this assignment we useflex, a more modern variant oflex. We generate the lexical analyser
as follows.
$flex plus-times-power.l
$cc lex.yy.c
By default,ccputs the executable program in a file calleda.out
4
. This can be executed in
the usual way, by just entering./a.outat the command line. If you prefer to give the executable
program another name, such asplus-times-power-lex, then you can tell this to the compiler using
the-ooption:cc lex.yy.c -o plus-times-power-lex.
When you run the program, it will initially wait for you to input a line of text to analyse. Do
so, pressing Return at the end of the line. Then the lexical analyser will print, to standard output,
messages showing how it has analysed your input. The printing of these messages is done by the
printfstatements from the fileplus-times-power.l. Note how it skips over white space, and only
reports on the lexemes and tokens.
$./a.out
13+8 * 4 + Power(2,Power (3,2 ))
Token: NUMBER; Lexeme: 13
Token and Lexeme: +
Token: NUMBER; Lexeme: 8
Token and Lexeme: *
Token: NUMBER; Lexeme: 4
Token and Lexeme: +
Token: POWER; Lexeme: Power
Token and Lexeme: (
Token: NUMBER; Lexeme: 2
Token and Lexeme: ,
Token: POWER; Lexeme: Power
Token and Lexeme: (
Token: NUMBER; Lexeme: 3
Token and Lexeme: ,
Token: NUMBER; Lexeme: 2
Token and Lexeme: )
Token and Lexeme: )
Token and Lexeme:
Try running this program with some input expressions of your own. You can keep entering new
expressions on new lines, and enter Control-D to stop when you are finished.
3
The C program will have this same name,lex.yy.c, regardless of the name you gave to thelexinput file.
4
a.outis short forassembleroutput. In some other environments, it may be calleda.exe.
8
Yacc
We now turn to parsing, usingyacc.
Consider the following grammar forPLUS-TIMES-POWER.
S−→E
E−→I
E−→POWER(E, E)
E−→E∗E
E−→E+E
I−→NUMBER
In this grammar, the non-terminals areS,EandI. TreatNUMBERandPOWERas just single
tokens, and hence single terminal symbols in this grammar.
We now generate a parser for this grammar, which will also evaluate the expressions, with +,∗
interpreted as the usual integer arithmetic operations andPower(. . . ,. . .)interpreted as raising its
first argument to the power of its second argument.
To generate this parser, you need two files,prob1.l(forlex) andplus-times-power.y(for
yacc):
•Change into yourproblem1subdirectory and do the following steps in that directory.
•Copyplus-times-power.lto a new fileprob1.l, and then modifyprob1.las follows:
–in theDefinitionssection,uncomment the statement#include "y.tab.h";
–in theRulessection, in eachaction:
∗uncomment the statements of the form
·yylval.str = ...;
·yylval.num = ...;
·returnTOKENNAME;
·return *yytext;
·yyerror ...
∗Comment out theprintfstatements. These may still be handy if debugging is
needed, so don’t delete them altogether, but the lexical analyser’s main role now is
to report the tokens and lexemes to the parser, not to the user.
–in theC codesection, comment out the functionmain(), which in this case occupies
four lines at the end of the file.
•plus-times-power.y, the input file foryacc, is provided for you. You don’t need to modify
thisyet.
An input file foryaccis, by convention, given a name ending in.y, and has three parts, very loosely
analogous to the three parts of alexfile but very different in their details and functionality:
•Declarations,
•Rules,
•Programs.
These are separated by double-percent,%%. Comments begin with/*and end with*/.
Peruse the provided fileplus-times-power.y, identify its main components, and pay particular
attention to the following, since you will need to modify some of them later.
9
•in the Declarations section:
–lines like
int printMatrix(Matrix x);
Matrix identity();
.
.
.
Register kroneckerProductReg(Register v, Register w);
which aredeclarationsof functions (but they aredefinedlater, in the Programs section);
5
–declarations of the tokens to be used:
%token
%token
–some specifications that certain operations are left-associative (which helps determine the
order in which operations are applied and can help resolve conflicts and ambiguities):
%left ’+’
%left ’*’
–declarations of the nonterminal symbols to be used (which don’t need to start with an
upper-case letter):
%type
%type
%type
%type
–nomination of which nonterminal is the Start symbol:
%start start
•in the Rules section, a list of grammar rules in Backus-Naur Form (BNF), except that the
colon “:” is used instead of→, and there must be a semicolon at the end of each rule. Rules
with a common left-hand-side may be written in the usual compact form, by listing their
right-hand-sides separated by vertical bars, and one semicolon at the very end. The terminals
may be token names, in which case they must be declared in the Declarations section and also
used in thelexfile, or single characters enclosed in forward-quote symbols. Each rule has
anaction, enclosed in braces{. . .}. A rule for a Start symbol may print output, but most
other rules will have an action of the form$$=. . . . The special variable$$represents the
value to be returned for that rule, and in effect specifies how that rule is to be interpreted for
evaluating the expression. The variables$1,$2, . . . refer to the values of the first, second, . . .
symbols in the right-hand side of the rule.
•in the Programs section, various functions, written in C, that your parsers will be able to use.
You do not need to modify these functions, and indeed should not try to do so unless you are
an experienced C programmer and know exactly what you are doing! Most of these functions
are not used yet; some will only be used later, in Problem 4.
After constructing the newlexfileprob1.las above, the parser can be generated by:
$yacc -d plus-times-power.y
$flex prob1.l
$cc lex.yy.c y.tab.c -lm
The executable program, which is now a parser forPLUS-TIMES-POWER, is again nameda.out
by default, and will replace any other program of that name that is sitting in the same directory.
5
These functions for computing with quantum expressions are not needed byplus-times-power.y, but you will
need them later, when you make a modified version ofplus-times-power.yto parse quantum expressions.
10
$./a.out
13+8 * 4 + Power(2,Power (3,2 ))
557
13+8*4+Power(2,Power(3,2))
557
Power(1,3)+Power(5,3)+Power(3,3)
153
1+2+3+4+5+6+7+8+9+10
55
10*9*8*7*6*5*4*3*2*1
3628800
Power(999,0)
1
Control-D
Run it with some input expressions of your own. You can keep entering new expressions on new
lines, as above, and enter Control-D to stop when you are finished.
11
Quantum circuits, registers and the languageQUANT
Introduction
Roughly speaking, aquantum computeris a computer that is able to use certain capabilities
based on quantum physics in addition to the usual (“classical”) capabilities that computers have.
The idea to use quantum physics for computation arose in the 1980s with work by the physicists
Richard Feynman and David Deutsch. Interest increased considerably in the 1990s when Peter Shor
gave an efficient quantum algorithm for integer factorisation. It had been assumed that integer
factorisation was difficult; a fast factorisation algorithm could be used to break the RSA public-key
cryptosystem. RSA was the first public-key cryptosystem to be published and is still the most
widely used, underpinning the security of a large proportion of modern electronic communications.
6
.
Since then, many quantum algorithms have appeared, some improving considerably on the best
classical algorithms. But, to have any impact in practice, they will have to be implemented on
actual quantum computers and applied to large inputs.
Several dozen quantum computers have been built. Many of these are experimental, while some
have been used for highly specialised applications. But they are nowhere near powerful enough or
robust enough to be useful on a large scale. Practical quantum computing still faces huge technical
challenges including adequately protecting the delicate computations from being perturbed by the
surrounding environment. In spite of this, there is considerable optimism about their future, and
a widespread belief that, in time, their effect will be revolutionary (as well as some scepticism
about this). Although they cannot yet be used to break RSA, the threat they pose has been
taken very seriously by cryptographers, whose work has led to the development of new quantum-
resistant cryptosystems:https://www.nist.gov/news-events/news/2022/07/nist-announces-
first-four-quantum-resistant-cryptographic-algorithms.
For a recent review of the field’s current progress and prospects, see [1].
Mathematically, the heart of a quantum computer has three parts:
•aregisterthat stores information, and in fact can store multiple data items simultaneously, in
asuperposition, although these data items cannot be accessed in standard classical ways;
•aquantum circuit expression, in which certain basic components (calledgates) are combined in
order to produce a transformation that is applied to the register to produce another register;
•measurement, in which some part of a register is measured (i.e., read), but the result is de-
termined probabilistically by the entire contents of the register, and the act of measurement
reduces the amount of information held in the register.
In this assignment, we just consider registers and quantum circuit expressions, and in fact we only
consider restricted types of these. Nonetheless, our registers and expressions are sufficient, in prin-
ciple, to represent the pre-measurement parts of quantum computations arbitrarily accurately. We
define these concepts now.
Aregisteris a 2
n
-dimensional vector of unit length. It may contain complex numbers (e.g.,
i=
√
−1). Examples of registers include:
n= 0 :