COMPSCI 369

THE UNIVERSITY OF AUCKLAND

FIRST SEMESTER, 2020 —MID-SEMESTER TEST

Version 1

Campus: VIRTUAL

COMPUTER SCIENCE

Computational Science

(Time allowed: ONE hour)

NOTE: The test has 15 questions, and it is worth 100 marks in total.

This test contributes 10% to your final grade

Answer all questions in a single document and upload it as a pdf file to

Canvas, like you do for the assignments.

This test has been designed as an open book one hour test. However

you have a full 24 hours to work on this test. You are allowed to use the full

24 hours and you are allowed to use the provided materials.

When a question requests to explain your answer, that means you need

to justify how you came up with the solution or why you made a certain

choice. Be concise and as clear as possible.

The work must be your own. You are not allowed to copy text or code

or solutions. We will not accept quotes to be valid answers. So, use your own

words when answering the questions. We will run the test through Turnitin.

You will be asked to include a sentence which states that the test is your

own work.

CONTINUED

QUESTION SHEET 2 COMPSCI 369

Section A: Integrity Statement

Copy down and sign the following statement.

For the 24 hour duration of this test, I, (insert your name here), confirm that I will not discuss the

content of the test with anyone else. I will not give any assistance to another student taking this test. I

will not receive any assistance from any person or tutoring service. I will not post any information about

this test on the Internet. [2 marks]

Section B: Computational Biology and Numerical Integration

The Runge-Kutta (RK4) Algorithm

1. Does RK4 take greater or fewer computations per iteration than the Euler method? [2 marks]

2. Explain why RK4 is preferable to the Euler method. [5 marks]

3. The RK4 algorithm calculates four ‘k-terms’ before combining them together in a weighted aver-

age. In your own words, once they are all combined together, what is this value an estimation of?

[5 marks]

4. Pretend that you are teaching a friend how RK4 works. The friend has never studied the algorithm

before. Write down your detailed explanation of the k2 term. What does it estimate and how is it

calculated? [7 marks]

5. Explain why finding a numerical artifact in a simulation is not a good thing. [3 marks]

Building a dynamical model

Pretend you have recently been hired by a farmer to build a computational model to help her understand

her dairy farm. She wants the model to simulate her grassy grazing fields, which are shared by many

cows and many sheep. Both cows and sheep eat the grass in these fields, and the number of cows grows

proportionally to how much grass they eat. Similarly, the sheep grow proportionally to how much grass

they eat. The grass grows at a constant rate.

6. Write down a set of differential equations for modelling this system. [12 marks]

7. List each VARIABLE in your model, and explain what it represents. [5 marks]

8. List each PARAMETER in your model and explain what it represents. [4 marks]

9. Identify two assumptions that are implicit in your model. [6 marks]

10. Draw a sketch of what you imagine a timeseries of this system might look like. Explain the main

features of the plot and why [10 marks]

CONTINUED

QUESTION SHEET 3 COMPSCI 369

Section C: Sequence Alignments

11. You are given the following two alignments generated for the same pair of sequences. Given that

the match score is sm, the mismatch score is sn, and the linear gap score is −g, which one aligns

better the two sequences and why? Consider also the biological relevance as a criterion in the

decision. Is there a better alignment than the two presented? Could you say that the two sequences

are homologous? Explain your answers.

Alignment 1 Alignment 2

ATG-CCAGG- ATGCCAGG-

AT-CC-A-GG AT-CCA-GG

[7 marks]

12. What is the minimum length and the maximum possible length of an optimal sequence alignment

of two sequences A and B, where A can be any sequence of length 5 and B can be any sequence of

length 7, in the case of

(a) global alignment

(b) local alignment

(c) overlap alignment

You can assume that the mismatch and gap scores can take only non-positive values, and the match

score is always positive. Explain your answer. You can use examples to support your reasoning.

[8 marks]

13. The following alignment score matrices were produced with one of the algorithms we discussed in

the lectures.

F1 :

B1 B2 B3 B4

0 −1 −2 −3 −4

A1 −1 1 0 −1 −2

A2 −2 0 0 1 0

A3 −3 −1 −1 1 0

A4 −4 −2 −2 0 0

F2 :

B1 B2 B3 B4 B5 B6

0 0 0 0 0 0 0

A1 0 0 0 0 0 0 0

A2 0 1 1 1 1 1 1

A3 0 1 2 2 2 2 2

A4 0 1 2 2 3 3 3

A5 0 1 2 2 3 4 4

A6 0 1 2 2 3 4 5

A7 0 1 2 2 3 4 5

Given that the mismatch and gap scores can take only non-positive values, and the match score is

always positive, answer the following questions and explain your answers.

CONTINUED

QUESTION SHEET 4 COMPSCI 369

(a) What alignment algorithm could have produced the matrix F1? What is the optimal score?

How will the optimal alignment look like?

(b) What alignment algorithm could have produced the matrix F2? The elements in red colour

and bold face trace the optimal path. Write down the alignment for the traced path. Is this the

only solution?

[10 marks]

14. You work with a biologist who just sequenced the complete genome of a new bacterial organism.

She does not know if the sequence is new or already exists, but found an online database of all

bacterial genomes sequenced so far. She does not know how to solve the problem computationally.

So, she comes to you with the database containing M sequences of length N1 . . . Nm and the

new sequence X of length Nx, and asks you to help her find out if the sequence is known. If the

sequence is unknown, she wants to know what is the closest bacterial genome to X.

(a) Explain how you will solve this problem, be explicit about the type of alignment algorithm

you will use and justify your choices.

(b) What will be the time complexity of the whole data analysis? Explain your result.

[8 marks]

15. Can we solve the multiple alignment problem exactly and efficiently? Explain your answer.

[6 marks]