程序代写案例-COMP1730/6730

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

COMP1730/6730 S1 2021 — Project Assignment
Jeffrey Fisher
03-05-2021
Important
• The assignment is due 9:00 am Monday May 24 (in week 12).
• The code for the assignment can be developed in groups of up to four people.
• The report is individual - you must write it entirely on your own.
• COMP6730 students have to write a more extensive report.
• Include your university ID in every file you submit.
• Include the university ID of every member of your group in any code files you submit.
• This assignment is worth 25% of your grade for COMP1730/COMP6730.
Groups
• The code for the assignment can be completed in groups of up to 4 people. If you wish to work in a
group you should sign-up for one in the sign-up form in Wattle (all members should sign-up to the
same group).
• The report is individual, i.e. you must do it on your own. We will check for plagiarism and other
suspicious activities in the reports within groups as normal.
• If you do not wish to work with other people, please sign-up to the I will do the assignment on my own
group instead.
• There is no difference in size/scope/marking criteria based on your group size. Unless you have strong
feelings otherwise, we recommend you work in a group of 3 or 4 people.
• COMP6730 and COMP1730 students can be part of the same group.
• You are not limited to members of your tutorial discussion groups - you can form a group with anyone
else enrolled in the course.
• The sign-up link in Wattle is here.
• Group sign-ups will close at 9:00am Monday 10 May. Anyone not signed up to a group at that point
will be added to the I will do the assignment on my own group.
Overview
In this assignment you will be doing a series of data analysis and modelling tasks using some real-world
geographical data. It is different to the homework assignments you have done up until this point in that for
almost all of the questions, there is no single “right” answer. You are also not given any tests against which
to check your answers (although you are encouraged to write your own to help you test the correctness of
your functions). Because there is no single “right” answer, it will be important to justify the decisions and
choices that you make while completing the assignment. This is important since it allows anyone relying
on your results and conclusions to understand how they were obtained and whether they are suitable for a
particular purpose.
The Cotter River provides the ACT with the majority of its water supply. The river stretches over 70
kilometers from the South West edge of the ACT, Northward until it joins with the Murrumbidgee River,
1
just below the Cotter Dam. In addition to the Cotter Dam, there are two other reservoirs along the Cotter
River, Bendora Dam and Corin Dam. These three dams store the majority of water used in the ACT.
The Cotter Dam (left) and Corin Dam (right) are both overflowing after the wet Summer and Autumn we’ve
had in Canberra this year.
For this assignment, we have obtained elevation data for the majority of the Cotter River catchment area,
at a 5 meter resolution. You will be analysing this data and answering some questions about the region,
including some questions that are relevant to Canberra’s drinking water supply.
The Data
We have provided you with two csv files - elevation_data_small.csv and elevation_data_large.csv.
These two csv files contain height information on a 5 metre grid. The elevation_data_small.csv file
contains just the Cotter Dam region. The elevation_data_large.csv contains the entire Cotter River
catchment area including the Cotter Dam, Bendora Dam and Corin Dam, as well as the surrounding mountain
ranges and the Namadgi National Park. You can see a heatmap of the two data sets in the images below.
Brighter colours are higher elevation. You can also see the Cotter Dam on Google maps here, Bendora Dam
here and Corin Dam here.
Elevation data for the Cotter Dam (left) and Cotter River catchment area (right).
2
The elevation_data_small file looks like this (but with a lot more rows and columns):
693.366,692.038,690.964,690.964,...
693.406,692.079,691.025,691.025,...
693.383,692.039,691.018,691.018,...
693.457,692.085,691.058,691.058,...
693.457,692.107,691.091,691.091,...
... ,... ,... ,... ,...
All elevation values are in meters.
This means that the elevation of the NorthWest most point in the region is 693.366m, the point 5 meters to
the East of it has elevation 692.038 meters and so forth.
If we need to refer to a specific cell in the data, we can do so using its x and y coordinates. We’ll use matrix
style indexing so the origin (x=0, y=0) is the top left grid cell, rather than the bottom left grid cell you might
see on a traditional graph. The coordinate at x = 2 and y = 4 means the 3rd column from the left and the
5th row from the top (highlighted in yellow in Figure 1).
Figure 1: Indexing Example
Be aware that the same location will have different coordinates in the two data sets.
If you are not sure how to read and process CSV files, have a look at Labs 6 and 8 in order to remind yourself.
Please also keep in mind that even the elevation_data_small file is not actually that small, it contains
roughly one million points. You may need to consider efficiency when completing the assignment.
The Task
You are provided with assignment_template.py, which contains the basic functions of the assignment. The
functions are incomplete. In this assignment, you will fill in the blanks and complete the missing functions.
However, we also encourage you to use functional decomposition where appropriate, i.e. you may (and should)
add additional functions as necessary. You will also write a short report about your functions and decisions.
For Questions 1 through 5 you should just make use of the elevation_data_small file. Question 6 requires
you to use the elevation_data_large file as well. Please be aware that if you try and test Questions 1
through 5 using the large data set, you will need to do the cleaning/preparation (described at the start of
Question 6) or you will get nonsensical results.
Question 1: Reading the Data - 10 marks
Write a function that takes the file path of the CSV file as input, reads the file, and returns the data in a
suitable format. The assignment template contains a function for you to fill in:
def read_dataset(filepath):
pass
pass means “do nothing”, and you should remove it when you fill in this function. To load the data, you can
then run
3
data = read_dataset('elevation_data_small.csv')
as long as the CSV file is in the same directory as your assignment file. If it is elsewhere, you’ll need to
provide the file path instead of just the file name.
You should read the data from filepath, and return it in an easy-to-use format. This can be any data
type or data structure that you like, as long as it makes sense for the tasks you will be doing later in this
assignment. You will be using this returned value in all other questions of the assignment, so make sure your
choices here support your later solutions!
Hint - have a look at the remaining questions before deciding on what format to load your data in!
Question 2: Summary Statistics - 10 marks
Now that we have a function to read in the data set, it’s time to do some analysis. We’ll start by calculating
some basic statistics about the data.
There are three function to fill in for Question 2:
def minimum_elevation(data_set):
pass
def maximum_elevation(data_set):
pass
def average_elevation(data_set):
pass
The input to each of these functions should be the data set returned by Question 1. The output, should be
the minimum elevation, the maximum elevation and the average (mean) elevation respectively of the region
covered by the data set. All return values should be in meters.
The minimum and maximum are each worth 3 marks. The average is worth 4 marks.
Question 3: Gradient - 10 marks
The Cotter River valley is pretty rugged country. There are steep mountain ranges on either side of the river
for most of its length. It’s useful to know how steeply sloped an area is. For example, it would be used when
planning walking or fire trails, risk of landslides, assessing bushfire risk, and so on.
For a given cell we calculate the slope by subtracting the elevation in the cell on its left from the elevation in
the cell on its right - then dividing by 10 (the horizontal distance). This is the x gradient. Then subtract the
elevation in the cell below from the elevation in the cell above, and again divide the result by 10. This is the
y gradient. Square both gradients, add them together, then take the square root. This is the slope, or total
gradient.
Mathematically if ex,y is the elevation for cell (x, y), the slope at cell (x, y) is calculated as:
slopex,y =
√
((ex+1,y − ex−1,y)/10)2 + ((ex,y+1 − ex,y−1)/10)2
Fill in the function:
def slope(data_set, x_coordinate, y_coordinate):
pass
It should take as inputs the data set (returned by Question 1), an x coordinate and a y coordinate and return
the total gradient at the corresponding cell.
Hint: You may need to consider the edges of the map separately.
4
Question 4: Surface Area of the Dams - 10 marks
The areas covered by the two elevation data sets are particularly important for Canberra’s water supply.
There was a period of time in the not too distant past where the level in all the dams was dangerously low,
resulting in severe restrictions being placed on water usage in the ACT. One way of measuring the water in
the dam is by calculating its extent - or surface area. Our elevation data just contains the elevation at the
surface - regardless of whether it is water or land. However, if we assume that the dam is all approximately
the same level, then as long as we know the elevation of a single point on the dam, we can figure out the
surface area.
Complete the following function in the assignment template:
def surface_area(data_set, x_coordinate, y_coordinate):
pass
The input to the function should be the data set (again in the format returned by Question 1), and the x and
y coordinates of a point on the dam. The function should return the surface area of the dam in square meters.
If you want to run your solution on the real data, in the small data set, the cell (x=794, y=234) lies on the
Cotter Dam. In the large data set, the cell (x=2878, y=242) lies on the dam (though keep in mind the note
about cleaning the large data set described in Question 6).
Hint: you’ll need to make an assumption about a tolerance level for elevation when determining what cells
are part of the dam. You’ll also need to figure out how to tell if a cell is actually connected to the dam - you
can’t just go through the whole data set looking for a certain elevation. You could also consider incorporating
slope as well if you want.
Hint 2: A plot could be a really useful check here - if you compare it to the heatmap above.
Question 5: Waters Rising - 10 marks
While measuring the surface area of the dam is one thing, the reality is that the area of the dam changes as
the water level in the dam rises and falls. While it’s hard to predict what the area of the dam will look like if
the water level is lower (since we don’t have the elevation data for the bed of the dam), we can calculate
what the area might be if the water level rises.
Complete the following function:
def expanded_surface_area(data_set, water_level, x_coordinate, y_coordinate):
pass
This is similar to the function for Question 4 above, but also takes a water_level parameter, which should
be greater than or equal to the current elevation of the dam surface. The function should return the surface
area the dam would have at this water level, i.e. calculate how it would spread out to cover more ground.
Question 6: Catchment Areas - 10 marks
For the last question, you are going to look at the catchment area of each of the three dams. This is a very
open ended question, and you’ll likely need to do some research of your own in order to solve it. It’s also the
only question which is really very specific to this data set, and you may need to do some further investigation
of the data as well.
It’s also likely a much harder question that the first 5, so please make sure you have a good working solution
to them, before you spend too much time on this question.
First things first, to answer this question, you’ll need the elevation_data_large file, but before you can
use it, you’ll need to clean it up a bit.
The elevation_data_small file is complete. However, for the elevation_data_large file, there are some
cells which don’t have a valid measurement. These are indicated in the data set by the value -3.403e+38 -
5
well beyond the edge of our galaxy! Needless to say, if you try and perform any analysis with these values
included, they will significantly distort the results.
There are lots of different approaches to imputing missing values on a grid, and you can even use something
very similar to the interpolation from Homework 5.
Fill in the following function:
def impute_missing_values(data_set):
pass
so that it checks for missing values and replaces any that are found with something more appropriate. You
can either return a new copy of the data set, or directly modify the function argument. Whichever you think
is more appropriate.
The Cotter River has a large catchment area, consisting of much of the Namadgi National Park. Because
all three dams are on the same river, it isn’t as important which dam the water ends up in, it will still be
available for use in the ACT. However, when the dams start to become full (like they are at the moment),
excess water in the lowest dam (the Cotter Dam), can’t be collected and is essentially wasted.
Your task for this question is to produce one or more maps, showing the catchment areas for each of the
three dams. You can put all three catchment areas on one map if you want, provided it is clear which is
which. Alternatively, you can plot the catchment area of for each dam on its own map.
You don’t need to consider any part of the catchment area that lies outside the data sets given.
It’s up to you what functions you want to define in order to solve this problem. However, your code for
this question should make it clear what we need to do in order to reproduce the plots when marking your
assignment.
You should include the plot(s) in your report.
Written Report - 20 marks COMP1730 or 40 marks COMP6730
Answer the following questions in your written report, answers.pdf. You should also include the plot(s) you
generate in Question 6.
Question 1:
Please run your code and answer the following questions:
• What are the minimum, maximum and average elevation values for the elevation_data_small data
set?
• What is the slope of the steepest point in the elevation_data_small data set.
• Given that the elevation of the dam wall of the Cotter Dam is 550.800m above sea level, this is the
maximum surface elevation of the dam (since if it was to go above this, it would overflow the dam
rather than spread out further). What is the maximum surface area of the Cotter Dam.
Question 2:
Justify your choice of data structure from Question 1. Why did you choose to use this particular data
structure? What are its advantages? Are there any questions where it wasn’t as effective or an alternative
data structure could have been better? What other data structures did you consider? Why did you discard
them?
Question 3:
How did you test your functions are working correctly. Why do you think this is a good approach? How did
you come up with good test cases? How did you test your function for Question 1, which requires a data set?
Could you test much (or any) of your work for Question 6?
Question 4:
6
How did you validate your assumptions - particularly for Questions 3-6? What alternatives did you consider?
Did you test whether slightly different assumptions would significantly change the results? For example, how
did you select the tolerance level for the dam elevation in Question 4 - does a small change to this make a
significant difference to the calculated value.
In addition to answering these questions, please provide a copy of the plot(s) you generated in Question 6.
Additional Questions for COMP6730 students
Question 5: Even more important than the surface area of the dam is the volume - since it’s a much more
accurate measure of how much drinking water remains. At the time this data was collected, the Cotter
Dam wasn’t full. Just using the elevation data, how could we calculate the volume of extra water it would
take before it overflowed. Would this estimate be any good? Please note that you don’t actually have to
implement this - just describe how you would go about it if you were asked to.
Question 6: Run your solutions for Questions 3 - 5 on elevation_data_large (after doing the cleaning
described in Question 6). How much longer did they take? What is the computational complexity of your
solutions to Questions 3 - 6? Do you think this could be improved by using a different approach?
Requirements, Expectations, and Marking Criteria
What to submit:
You must submit the assignment through the submission link on wattle. You must submit a zip folder (i.e. a
.zip archive) containing:
• assignment.py, the Python script containing your implementation of the assignment; and
• assignment_tests.py, which is a Python script containing any tests you have written to verify the
correctness of your functions.
• Any additional data files you used to test your work (please keep them relatively small).
• answers.pdf, a PDF version of your written report.
The assignment.py, assignment_tests.py and any data sets you submit must be the same for all members
of the group. Your report in answers.pdf must be individual. Note that while the report component accounts
for only a part of the assignment mark, the report is required. If you fail to submit an individual report,
your mark for the assignment may be reduced (even as far as to zero), regardless of the mark of your code.
Please make sure you submit a zip folder. Do not submit a rar file, a tar.gz file or any other kind of archive.
Please make sure your report is a pdf file, not a Word Document, Lotus Notes document or any other kind
of strange file type. All major operating systems have easy solutions to create both zip archives and pdf
files. (Just search the Internet if you haven’t had to do this before.) If you don’t follow the submission
instructions you will lose marks, and if we can’t open your submission, you will get a mark of
0.
Marking
The marks for the assignment are broken up as follows:
COMP1730 (marked out of 100)
• 60 functionality (10 each for Questions 1-6).
• 20 code quality and organisation
• 20 written report
COMP6730 (marked out of 120)
• 60 functionality (10 each for Questions 1-6).
• 20 code quality and organisation
7
• 40 written report
A marking rubric will be made available shortly.
For your code, we have the following requirements:
• Your code must be syntactically correct: it must run in Python 3.
• You can use any modules that are available in the Anaconda 3 Python distribution.
• Aside from the limitation on imports, you are free to use any aspect of Python you wish (classes and
objects, functional programming, etc.). However, none of this is necessary, it is perfectly possible to
complete the assignment using only programming concepts we have covered in the course.
• You must not change the names of the functions in the template, or change their parameters (including
their parameter names). You may add new functions if you wish (indeed, appropriate use of functional
decomposition is part of the marking criteria).
• You should not use any global variables or have any code outside of function definitions unless it is in
the if __name__ == '__main__' block.
• Appropriate use of global constants is OK.
• Your code shouldn’t raise any unintentional exceptions or warnings.
We will mark your code based on correctness and quality. “Correctness” means that your functions run
without error and return acceptable answers for all valid inputs. “Quality” means your code is readable, well
documented, well organised, and efficient:
• You should use docstrings and comments where it is appropriate. The content of docstrings and
comments should be clear and accurate.
• Your function and variable names should make sense and be descriptive.
• You should use suitable data types to solve problems.
• You should organise your code appropriately, using additional functions where it is helpful to do so.
Avoid code repetition. In particular, although the assignment specifies a number of different functions
for you to implement, these are not (always) meant to be self-contained. If there is functionality that is
common between the questions and that can be isolated into one or more separate functions that are
reused in several places in your code, we expect you to do so.
• Your code should be reasonably efficient: don’t make the computer do too much unnecessary work.
We will also mark the answers in your written report based on correctness and clarity. If we cannot
understand or find your answers in the PDF file you submit, you may receive 0 marks. Your
written answers should be:
• clear (it should be easy to find and understand your answers);
• concise (write what is relevant to answer the questions; do not overcomplicate);
• well-organised (use headings and sub-headings where appropriate);
• relevant to the rest of your assignment submission; and
• 1–2 pages (COMP1730) or 2–3 pages (COMP6730). We will stop reading after the page limit.
The page limit includes your answers to the above questions and your plot(s) from Question 6. However,
your bibliography/references don’t count towards the page limit.
In this assignment, you will have to make some choices on how to design your solution to problems, and you
will be asked to justify these choices in your written report. You should show understanding of the problem
and your solution, and convince your marker that your solution solves the problem in an appropriate way.
Much like real life, many questions in this assignment do not always have a single correct answer, so it is
especially important to justify the decisions, assumptions, and solutions you’ve made.
Deadlines and Extensions
The assignment is due May 24, 2021, at 9:00 am. This deadline is hard. Late submissions will not be
accepted, unless you have received an approved extension before the deadline.
You can upload new versions of your submission up to the deadline. However, remember that we can only see
your latest submission, and that is what we will mark.
8
Due to the group nature of the assignment extensions will only be granted on the individual report,
the code must be submitted by the stated deadline.
Extensions can only be given in extenuating circumstances as defined by ANU policy; this means accident,
illness, or other things that you could not reasonably have anticipated or avoided. Failure to plan in advance
to spend sufficient time working on the assignment is neither unforseeable nor unavoidable. If you think you
have grounds for an extension, you should send an e-mail to [email protected] as soon as possible and
provide written evidence in support of your case (such as a medical certificate). The course convener will
then decide whether to grant an extension and inform you as soon as practical.
Please also note that you must inform us of any disruption as soon as it is practical to do so. If you ask for a
long extension the night before the deadline, based on information that you had been aware of for weeks, it
will likely not be granted.
Plagiarism and Collusion
You can only work with other people in your group (as organised in Wattle) on this assignment. In addition,
your report must be individual - you cannot write it in conjunction with other members of your group.
Both the report and code will be considered under the usual plagiarism rules. If you are unsure about what
constitutes plagiarism, please read through the ANU Academic Honesty Policy.
If you do include ideas or material from other sources (in your code or your report), then you clearly have
to make attribution by providing a reference to the material or source in your report. We do not require
a specific referencing format, as long as you are consistent and your references allow us to find the source,
should we need to while we are marking your assignment. However, marking will be based on original content;
if you have borrowed extensively from other sources, we will consider that in determining your mark, even if
it is correctly referenced.
Please note that you may only refer to resources written in English.
If you are found to have have engaged in plagiarism - such as copying from or working with someone in
another group - you will usually receive a mark of 0 for the entire assignment. If you assist another student
in engaging in plagiarism, for example by giving another group your code or instructing them on how to solve
the problem, you will also normally receive a mark of 0 for the entire assignment. This may also affect the
mark of your group members. In either case, you may also receive additional penalties as appropriate under
the ANU Academic Honesty Policy.
References
• https://en.wikipedia.org/wiki/Cotter_River last accessed 3 May 2021
• https://www.iconwater.com.au/cotterdam last accessed 3 May 2021
• Elevation data obtained from https://elevation.fsdf.org.au/ accessed 22 April 2021
• Photos were taken by the course convener.
9

欢迎咨询51作业君