辅导案例-COMP20003-Assignment 1

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
COMP20003 Algorithms and Data Structures
Second (Spring) Semester 2020
[Assignment 1]
Melbourne Census Dataset
Information Retrieval using a Linked List
Handed out: Monday, 17 of August
Due: 8:00 AM, Friday, 28 of August
Marks: 10 (10% of total mark)
Purpose
The purpose of this assignment is for you to:
• Improve your proficiency in C programming and your dexterity with dynamic memory allocation.
• Demonstrate understanding of a concrete data structure (linked list).
• Practice multi-file programming and improve your proficiency in using UNIX utilities.
Background
A dictionary is an abstract data type that stores and supports lookup of key, value pairs. For example,
in a telephone directory, the (string) key is a person or company name, and the value is the phone
number. In a student record lookup, the key would be a student ID number and the value would be a
complex structure containing all the other information about the student.
Your task
In this assignment, you will create a simple dictionary based on an unsorted linked list to store in-
formation from the City of Melbourne Census of Land Use and Employment (CLUE). A user will be
able to search this dictionary to retrieve information about businesses in Melbourne using the business
name (key).
Your implementation will build the dictionary by reading census data from a file and inserting each
property record as a node in a linked list. You will also implement a method to search for a key in the
list, outputting any records that match the key. Note that keys are not guaranteed to be unique!
Dataset
The dataset comes from the City of Melbourne Open Data website (https://data.melbourne.vic.gov.au/),
which provides a variety of data about Melbourne that you can visualize online. The dataset used in
this project is a subset of the Business establishment trading name and industry classification
2018 dataset, accessed from:
https://data.melbourne.vic.gov.au/Business/Business-establishment-trading-name-and-industry-c/vesm-
c7r2
The dataset describes businesses in the Melbourne area. Each row in the dataset includes the following
fields:
Census year - the year in which surveying was completed (2018)
Block ID - an ID number to identify city blocks (about 606 in total)
Property ID - an ID number to identify an individual property
1
Visualization of the dataset. Colours indicate CLUE small area; open circles indicate locations with
multiple business establishments. You can make your own interactive graphs and visualizations of this
dataset here.
Base property ID - an ID number to identify a parcel of land (which may contain
multiple properties)
CLUE small area - city area name (e.g., Melbourne CBD)
Trading name - name of the business located at this property
Industry (ANZSIC4) code - numeric code to describe the industry in which the
business operates
Industry (ANZSIC4) description - name of the industry corresponding to the
code
x coordinate - longitude of the establishment
y coordinate - latitude of the establishment
Location - location as a (lat,long) pair (used for visualization)
The fields , , ,
and are alphabetic strings of varying length. You may assume that none of these fields
are more than 128 characters. The dataset is in csv format, with each field separated by a comma.
Note that string fields may contain commas (the field always contains a comma); in
these cases the string is enclosed in quotation marks.
For the purposes of this assignment, you may assume that the input data is well-formatted, that the
input file is not empty, and that the maximum length of an input record (a single full line of the csv
file) is 512 characters. This number could help you choose a reading buffer size.
The should be used as the key in your dictionary implementation.
Implementation Details
Your Makefile should produce an executable program called dict. This program should take two
command line arguments: (1) the name of the data file used to build the dictionary, and (2) the name
of an output file.
2
Your dict program should:
• Construct an unsorted linked list to store the information contained in the data file specified in
the command line argument. Each record (row) should be stored in a separate Node.
• Search the linked list for records, based on keys. The keys will be read in from stdin, i.e. from
the screen. Remember that the entries in the file do not necessarily have unique keys, so your
search must locate all keys matching the search key, and output all the data found.
• Your program will look up each key and output the information (the data found) to the output
file specified by the second command line parameter. If the key is not found in the tree, you
must output the word NOTFOUND.
For testing, it may be convenient to create a file of keys to be searched, one per line, and redirect the
input from this file. Use the UNIX operator < to redirect input from a file.
Examples of use:
• dict1 datafile outputfile then type in keys; or
• dict1 datafile outputfile < keyfile
Example output
This is an example of what might be output to the file after searching for two keys:
In a Rush Espresso −− > Census year: 2018 || Block ID: 44 || Property ID: 105956 || Base property
ID: 105956 || CLUE small area: Melbourne (CBD) || Industry (ANZSIC4) code: 4511 || Industry (ANZSIC4)
description: Cafes and Restaurants || x coordinate: 144.96174 || y coordinate: -37.81561 || Location:
(-37.81560561, 144.9617411) ||
In a Rush Espresso −− > Census year: 2018 || Block ID: 1101 || Property ID: 108973 || Base
property ID: 108973 || CLUE small area: Docklands || Industry (ANZSIC4) code: 4511 || Industry (ANZSIC4)
description: Cafes and Restaurants || x coordinate: 144.95223 || y coordinate: -37.81761 || Location:
(-37.81761044, 144.9522269) ||
Tim Hortons −− > NOTFOUND
The format need not be exactly as above. Variations in whitespace/tabs are permitted. The number of
comparisons above has been made up, do not take it as an example of a correct execution.
Requirements
The following implementation requirements must be adhered to:
• You must write your implementation in the C programming language.
• You must write your code in a modular way, so that your implementation could be used in
another program without extensive rewriting or copying. This means that the linked list opera-
tions are kept together in a separate .c file, with its own header (.h) file, separate from the main
program.
3
• Your code should be easily extensible to different dictionaries. This means that the functions
for insertion, search, and deletion take as arguments not only the item being inserted or a key
for searching and deleting, but also a pointer to a particular dictionary, e.g. insert(dict,
item).
• Your implementation must read the input file once only.
• Each record should be stored in a struct with separate variables to store the separate data fields
(e.g., separate variables for the record’s , , (ANZSIC4) code>, etc.). Each of these variables should be an appropriate type and size for
the data it stores.
• Your program should store strings in a space-efficient manner. If you are using malloc() to
create the space for a string, remember to allow space for the final end of string ‘\0’ (NULL).
• A Makefile is not provided for you. The Makefile should direct the compilation of your
program. To use the Makefile, make sure is in the same directory of your code, and type make
dict to make the dictionary. You must submit your makefile with your assignment.
Hint: If you haven’t used make before, try it on simple programs first. If it doesn’t work, read the error
messages carefully. A common problem in compiling multifile executables is in the included header
files. Note also that the whitespace before the command is a tab, and not multiple spaces. It is not a
good idea to code your program as a single file and then try to break it down into multiple files. Start
by using multiple files, with minimal content, and make sure they are communicating with each other
before starting more serious coding.
Resources: Programming Style (2 Marks)
Two locally-written papers containing useful guidelines on coding style and structure can be found
on the LMS Resources→ Project Coding Guidelines, by Peter Schachte, and below and adapted version
of the LMS Resources → C Programming Style, written for Engineering Computation COMP20005 by
Aidan Nagorcka-Smith. Be aware that your programming style will be judged with 2 marks.
1 /** ***********************
2 * C Programming S t y l e f o r Engineer ing Computation
3 * Created by Aidan Nagorcka−Smith ( aidann@student . unimelb . edu . au) 13/03/2011
4 * D e f i n i t i o n s and inc ludes
5 * D e f i n i t i o n s are in UPPER CASE
6 * Inc ludes go before d e f i n i t i o n s
7 * Space between inc ludes , d e f i n i t i o n s and the main func t ion .
8 * Use d e f i n i t i o n s f o r any cons tan t s in your program , do not j u s t wr i te them
9 * in .
10 *
11 * Tabs may be s e t to 4−spaces or 8−spaces , depending on your e d i t o r . The code
12 * Below i s ``gnu ' ' s t y l e . I f your e d i t o r has ``bsd ' ' i t w i l l fo l low the 8−space
13 * s t y l e . Both are very standard .
14 */
15
16 /**
17 * GOOD:
18 */
19
20 #inc lude
21 #inc lude
22 #def ine MAX STRING SIZE 1000
23 #def ine DEBUG 0
24 i n t main( i n t argc , char **argv) {
25 . . .
26
4
27 /**
28 * BAD:
29 */
30
31 /* D e f i n i t i o n s and inc ludes are mixed up */
32 #inc lude
33 #def ine MAX STING SIZE 1000
34 /* D e f i n i t i o n s are given names l i k e v a r i a b l e s */
35 #def ine debug 0
36 #inc lude
37 /* No spac ing between inc ludes , d e f i n i t i o n s and main func t ion */
38 i n t main( i n t argc , char **argv) {
39 . . .
40
41 /** *****************************
42 * Va r i ab l e s
43 * Give them use fu l lower case names or camelCase . E i t he r i s f ine ,
44 * as long as you are c o n s i s t e n t and apply always the same s t y l e .
45 * I n i t i a l i s e them to something tha t makes sense .
46 */
47
48 /**
49 * GOOD: lower case
50 */
51
52 i n t main( i n t argc , char **argv) {
53
54 i n t i = 0;
55 i n t num_fifties = 0;
56 i n t num_twenties = 0;
57 i n t num_tens = 0;
58
59 . . .
60 /**
61 * GOOD: camelCase
62 */
63
64 i n t main( i n t argc , char **argv) {
65
66 i n t i = 0;
67 i n t numFifties = 0;
68 i n t numTwenties = 0;
69 i n t numTens = 0;
70
71 . . .
72 /**
73 * BAD:
74 */
75
76 i n t main( i n t argc , char **argv) {
77
78 /* Var i ab l e not i n i t i a l i s e d − causes a bug because we didn ' t remember to
79 * s e t i t be fore the loop */
80 i n t i ;
81 /* Var i ab l e in a l l caps − we ' l l get confused between t h i s and cons tan t s
82 */
83 i n t NUM_FIFTIES = 0;
84 /* Overly abbrev ia ted v a r i a b l e names make th ings hard . */
85 i n t nt = 0
86
87 while (i < 10) {
88 . . .
89 i++;
90 }
91
92 . . .
5
93
94 /** ********************
95 * Spacing :
96 * Space i n t e l l i g e n t l y , v e r t i c a l l y to group b locks of code tha t are doing a
97 * s p e c i f i c operat ion , or to separa te v a r i a b l e d e c l a r a t i o n s from other code .
98 * One tab of inden ta t ion within e i t h e r a func t ion or a loop .
99 * Spaces a f t e r commas .
100 * Space between ) and { .
101 * No space between the ** and the argv in the d e f i n i t i o n of the main
102 * func t ion .
103 * When dec l a r ing a po in te r v a r i a b l e or argument , you may place the a s t e r i s k
104 * ad jacent to e i t h e r the type or to the v a r i a b l e name .
105 * L ines at most 80 c h a r a c t e r s long .
106 * Clos ing brace goes on i t s own l i n e
107 */
108
109 /**
110 * GOOD:
111 */
112
113 i n t main( i n t argc , char **argv) {
114
115 i n t i = 0;
116
117 f o r (i = 100; i >= 0; i−−) {
118 i f (i > 0) {
119 printf( ”%d b o t t l e s of beer , take one down and pass i t around , ”
120 ” %d b o t t l e s of beer .\n” , i , i − 1) ;
121 } e l s e {
122 printf( ”%d b o t t l e s of beer , take one down and pass i t around . ”
123 ” We ' re empty .\n” , i) ;
124 }
125 }
126
127 re turn 0;
128 }
129
130 /**
131 * BAD:
132 */
133
134 /* No space a f t e r commas
135 * Space between the ** and argv in the main func t ion d e f i n i t i o n
136 * No space between the ) and { at the s t a r t of a func t ion */
137 i n t main( i n t argc , char ** argv){
138 i n t i = 0;
139 /* No space between v a r i a b l e d e c l a r a t i o n s and the r e s t of the func t ion .
140 * No spaces around the boolean opera tor s */
141 f o r (i=100;i>=0;i−−) {
142 /* No indenta t ion */
143 i f (i > 0) {
144 /* Line too long */
145 printf( ”%d b o t t l e s of beer , take one down and pass i t around , %d
146 b o t t l e s of beer .\n” , i , i − 1) ;
147 } e l s e {
148 /* Spacing f o r no good reason . */
149
150 printf( ”%d b o t t l e s of beer , take one down and pass i t around . ”
151 ” We ' re empty .\n” , i) ;
152
153 }
154 }
155 /* Clos ing brace not on i t s own l i n e */
156 re turn 0;}
157
158 /** ****************
6
159 * Braces :
160 * Opening braces go on the same l i n e as the loop or func t ion name
161 * Clos ing braces go on t h e i r own l i n e
162 * Clos ing braces go at the same indenta t ion l e v e l as the th ing they are
163 * c l o s i n g
164 */
165
166 /**
167 * GOOD:
168 */
169
170 i n t main( i n t argc , char **argv) {
171
172 . . .
173
174 f o r ( . . . ) {
175 . . .
176 }
177
178 re turn 0;
179 }
180
181 /**
182 * BAD:
183 */
184
185 i n t main( i n t argc , char **argv) {
186
187 . . .
188
189 /* Opening brace on a d i f f e r e n t l i n e to the fo r loop open */
190 f o r ( . . . )
191 {
192 . . .
193 /* Clos ing brace at a d i f f e r e n t indenta t ion to the th ing i t ' s
194 c l o s i n g
195 */
196 }
197
198 /* Clos ing brace not on i t s own l i n e . */
199 re turn 0;}
200
201 /** **************
202 * Commenting :
203 * Each program should have a comment exp la in ing what i t does and who created
204 * i t .
205 * Also comment how to run the program , inc lud ing op t iona l command l i n e
206 * parameters .
207 * Any i n t e r e s t i n g code should have a comment to exp la in i t s e l f .
208 * We should not comment obvious th ings − wri te code tha t documents i t s e l f
209 */
210
211 /**
212 * GOOD:
213 */
214
215 /* change . c
216 *
217 * Created by Aidan Nagorcka−Smith ( aidann@student . unimelb . edu . au)
218 13/03/2011
219 *
220 * P r i n t the number of each coin tha t would be needed to make up some
221 change
222 * tha t i s input by the user
223 *
224 * To run the program type :
7
225 * . / co ins −−num coins 5 −−shape co ins t rapezo id −−output b l ab l a . t x t
226 *
227 * To see a l l the input parameters , type :
228 * . / co ins −−help
229 * Options : :
230 * −−help Show help message
231 * −−num coins arg Input number of co ins
232 * −−shape co ins arg Input co ins shape
233 * −−bound arg (=1) Max bound on xxx , d e f a u l t value 1
234 * −−output arg Output s o l u t i o n f i l e
235 *
236 */
237
238 i n t main( i n t argc , char **argv) {
239
240 i n t input_change = 0;
241
242 printf( ” P lease input the value of the change (0−99 cent s
243 i n c l u s i v e ) :\n” ) ;
244 scanf( ”%d” , &input_change) ;
245 printf( ”\n” ) ;
246
247 // Va l id change va lues are 0−99 i n c l u s i v e .
248 i f (input_change < 0 | | input_change > 99) {
249 printf( ” Input not in the range 0−99.\n” )
250 }
251
252 . . .
253
254 /**
255 * BAD:
256 */
257
258 /* No explanat ion of what the program i s doing */
259 i n t main( i n t argc , char **argv) {
260
261 /* Commenting obvious th ings */
262 /* Create a i n t v a r i a b l e c a l l e d input change to s t o r e the input from
263 the
264 * user . */
265 i n t input_change ;
266
267 . . .
268
269 /** ****************
270 * Code s t r u c t u r e :
271 * F a i l f a s t − input checks should happen f i r s t , then do the computation .
272 * S t ruc tu re the code so tha t a l l e r ro r handl ing happens in an easy to read
273 * l o c a t i o n
274 */
275
276 /**
277 * GOOD:
278 */
279 i f (input_is_bad) {
280 printf( ” Er ror : Input was not v a l i d . E x i t i n g .\n” ) ;
281 exit(EXIT_FAILURE) ;
282 }
283
284 /* Do computations here */
285 . . .
286
287 /**
288 * BAD:
289 */
290
8
291 i f (input_is_good) {
292 /* l o t s of computation here , pushing the e l s e par t o f f the screen .
293 */
294 . . .
295 } e l s e {
296 fprintf(stderr , ” Er ror : Input was not v a l i d . E x i t i n g .\n” ) ;
297 exit(EXIT_FAILURE) ;
298 }
Additional Support
Your tutors will be available to help with your assignment during the scheduled workshop times. Ques-
tions related to the assignment may be posted on the Piazza forum, using the folder tag assignment1
for new posts. You should feel free to answer other students’ questions if you are confident of your
skills.
A tutor will check the Discussion Forum regularly, and answer some questions, but be aware that for
some questions you will just need to use your judgment and document your thinking. For example, a
question like, “How much data should I use for the experiments?”, will not be answered; you must try
out different data and see what makes sense.
In this subject, we support MobaXterm for ssh to the CIS machines nutmeg.eng.unimelb.edu.au
and dimefox.eng.unimelb.edu.au, the excellent editor built into MobaXterm or Atom, and gcc
on the department machines. While you are free to use the platform and editor of your choice, these
are the only tools you can “expect” help with from the staff in this subject. We’ll always do our best to
help you learn. Your final program must compile and run on the department machines.
Submission
Your C code files (including your Makefile and any other files needed to run your code) should be
submitted through the LMS under Assignment 1: Code in the Assignments tab.
Your programs must compile and run correctly on JupyterHub. You may have developed your program
in another environment, but it still must run on JupyterHub at submission time. For this reason, and
because there are often small, but significant, differences between compilers, it is suggested that if you
are working in a different environment, you upload and test your code on JupyterHub at reasonably
frequent intervals.
A common reason for programs not to compile is that a file has been inadvertently omitted from the
submission. Please check your submission, and resubmit all files if necessary.
Assessment
There are a total of 10 marks given for this assignment.
Your C program will be marked on the basis of accuracy, readability, and good C programming struc-
ture, safety and style, including documentation. Safety refers to checking whether opening a file
returns something, whether mallocs do their job, etc. The documentation should explain all major
design decisions, and should be formatted so that it does not interfere with reading the code. As much
as possible, try to make your code self-documenting, by choosing descriptive variable names.
9
Plagiarism
This is an individual assignment. The work must be your own.
While you may discuss your program development, coding problems and experimentation with your
classmates, you must not share files, as this is considered plagiarism.
If you refer to published work in the discussion of your experiments, be sure to include a citation to
the publication or the web link.
“Borrowing” of someone else’s code without acknowledgment is plagiarism. Plagiarism is considered
a serious offense at the University of Melbourne. You should read the University code on Academic in-
tegrity and details on plagiarism. Make sure you are not plagiarizing, intentionally or unintentionally.
You are also advised that there will be a C programming component (on paper, not on a computer) in
the final examination. Students who do not program their own assignments will be at a disadvantage
for this part of the examination.
Late policy
The late penalty is 10% of the available marks for that project for each day (or part thereof) overdue.
Requests for extensions on medical grounds will need to be supported by a medical certificate. Any
request received less than 48 hours before the assessment date (or after the date!) will generally not
be accepted except in the most extreme circumstances. In general, extensions will not be granted if
the interruption covers less than 10% of the project duration. Remember that departmental servers
are often heavily loaded near project deadlines, and unexpected outages can occur; these will not be
considered as grounds for an extension.
Students who experience difficulties due to personal circumstances are encouraged to make use of the
appropriate University student support services, and to contact the lecturer, at the earliest opportunity.
Finally, we are here to help! There is information about getting help in this subject on the LMS.
Frequently asked questions about the project will be answered in Piazza.
10
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468