辅导案例-COMP20003-Assignment 1
COMP20003 Algorithms and Data Structures Second (Spring) Semester 2020 [Assignment 1] Melbourne Census Dataset Information Retrieval using a Linked List Handed out: Monday, 17 of August Due: 8:00 AM, Friday, 28 of August Marks: 10 (10% of total mark) Purpose The purpose of this assignment is for you to: • Improve your proficiency in C programming and your dexterity with dynamic memory allocation. • Demonstrate understanding of a concrete data structure (linked list). • Practice multi-file programming and improve your proficiency in using UNIX utilities. Background A dictionary is an abstract data type that stores and supports lookup of key, value pairs. For example, in a telephone directory, the (string) key is a person or company name, and the value is the phone number. In a student record lookup, the key would be a student ID number and the value would be a complex structure containing all the other information about the student. Your task In this assignment, you will create a simple dictionary based on an unsorted linked list to store in- formation from the City of Melbourne Census of Land Use and Employment (CLUE). A user will be able to search this dictionary to retrieve information about businesses in Melbourne using the business name (key). Your implementation will build the dictionary by reading census data from a file and inserting each property record as a node in a linked list. You will also implement a method to search for a key in the list, outputting any records that match the key. Note that keys are not guaranteed to be unique! Dataset The dataset comes from the City of Melbourne Open Data website (https://data.melbourne.vic.gov.au/), which provides a variety of data about Melbourne that you can visualize online. The dataset used in this project is a subset of the Business establishment trading name and industry classification 2018 dataset, accessed from: https://data.melbourne.vic.gov.au/Business/Business-establishment-trading-name-and-industry-c/vesm- c7r2 The dataset describes businesses in the Melbourne area. Each row in the dataset includes the following fields: Census year - the year in which surveying was completed (2018) Block ID - an ID number to identify city blocks (about 606 in total) Property ID - an ID number to identify an individual property 1 Visualization of the dataset. Colours indicate CLUE small area; open circles indicate locations with multiple business establishments. You can make your own interactive graphs and visualizations of this dataset here. Base property ID - an ID number to identify a parcel of land (which may contain multiple properties) CLUE small area - city area name (e.g., Melbourne CBD) Trading name - name of the business located at this property Industry (ANZSIC4) code - numeric code to describe the industry in which the business operates Industry (ANZSIC4) description - name of the industry corresponding to the code x coordinate - longitude of the establishment y coordinate - latitude of the establishment Location - location as a (lat,long) pair (used for visualization) The fields , , , and are alphabetic strings of varying length. You may assume that none of these fields are more than 128 characters. The dataset is in csv format, with each field separated by a comma. Note that string fields may contain commas (the field always contains a comma); in these cases the string is enclosed in quotation marks. For the purposes of this assignment, you may assume that the input data is well-formatted, that the input file is not empty, and that the maximum length of an input record (a single full line of the csv file) is 512 characters. This number could help you choose a reading buffer size. The should be used as the key in your dictionary implementation. Implementation Details Your Makefile should produce an executable program called dict. This program should take two command line arguments: (1) the name of the data file used to build the dictionary, and (2) the name of an output file. 2 Your dict program should: • Construct an unsorted linked list to store the information contained in the data file specified in the command line argument. Each record (row) should be stored in a separate Node. • Search the linked list for records, based on keys. The keys will be read in from stdin, i.e. from the screen. Remember that the entries in the file do not necessarily have unique keys, so your search must locate all keys matching the search key, and output all the data found. • Your program will look up each key and output the information (the data found) to the output file specified by the second command line parameter. If the key is not found in the tree, you must output the word NOTFOUND. For testing, it may be convenient to create a file of keys to be searched, one per line, and redirect the input from this file. Use the UNIX operator < to redirect input from a file. Examples of use: • dict1 datafile outputfile then type in keys; or • dict1 datafile outputfile < keyfile Example output This is an example of what might be output to the file after searching for two keys: In a Rush Espresso −− > Census year: 2018 || Block ID: 44 || Property ID: 105956 || Base property ID: 105956 || CLUE small area: Melbourne (CBD) || Industry (ANZSIC4) code: 4511 || Industry (ANZSIC4) description: Cafes and Restaurants || x coordinate: 144.96174 || y coordinate: -37.81561 || Location: (-37.81560561, 144.9617411) || In a Rush Espresso −− > Census year: 2018 || Block ID: 1101 || Property ID: 108973 || Base property ID: 108973 || CLUE small area: Docklands || Industry (ANZSIC4) code: 4511 || Industry (ANZSIC4) description: Cafes and Restaurants || x coordinate: 144.95223 || y coordinate: -37.81761 || Location: (-37.81761044, 144.9522269) || Tim Hortons −− > NOTFOUND The format need not be exactly as above. Variations in whitespace/tabs are permitted. The number of comparisons above has been made up, do not take it as an example of a correct execution. Requirements The following implementation requirements must be adhered to: • You must write your implementation in the C programming language. • You must write your code in a modular way, so that your implementation could be used in another program without extensive rewriting or copying. This means that the linked list opera- tions are kept together in a separate .c file, with its own header (.h) file, separate from the main program. 3 • Your code should be easily extensible to different dictionaries. This means that the functions for insertion, search, and deletion take as arguments not only the item being inserted or a key for searching and deleting, but also a pointer to a particular dictionary, e.g. insert(dict, item). • Your implementation must read the input file once only. • Each record should be stored in a struct with separate variables to store the separate data fields (e.g., separate variables for the record’s , , (ANZSIC4) code>, etc.). Each of these variables should be an appropriate type and size for the data it stores. • Your program should store strings in a space-efficient manner. If you are using malloc() to create the space for a string, remember to allow space for the final end of string ‘\0’ (NULL). • A Makefile is not provided for you. The Makefile should direct the compilation of your program. To use the Makefile, make sure is in the same directory of your code, and type make dict to make the dictionary. You must submit your makefile with your assignment. Hint: If you haven’t used make before, try it on simple programs first. If it doesn’t work, read the error messages carefully. A common problem in compiling multifile executables is in the included header files. Note also that the whitespace before the command is a tab, and not multiple spaces. It is not a good idea to code your program as a single file and then try to break it down into multiple files. Start by using multiple files, with minimal content, and make sure they are communicating with each other before starting more serious coding. Resources: Programming Style (2 Marks) Two locally-written papers containing useful guidelines on coding style and structure can be found on the LMS Resources→ Project Coding Guidelines, by Peter Schachte, and below and adapted version of the LMS Resources → C Programming Style, written for Engineering Computation COMP20005 by Aidan Nagorcka-Smith. Be aware that your programming style will be judged with 2 marks. 1 /** *********************** 2 * C Programming S t y l e f o r Engineer ing Computation 3 * Created by Aidan Nagorcka−Smith ( aidann@student . unimelb . edu . au) 13/03/2011 4 * D e f i n i t i o n s and inc ludes 5 * D e f i n i t i o n s are in UPPER CASE 6 * Inc ludes go before d e f i n i t i o n s 7 * Space between inc ludes , d e f i n i t i o n s and the main func t ion . 8 * Use d e f i n i t i o n s f o r any cons tan t s in your program , do not j u s t wr i te them 9 * in . 10 * 11 * Tabs may be s e t to 4−spaces or 8−spaces , depending on your e d i t o r . The code 12 * Below i s ``gnu ' ' s t y l e . I f your e d i t o r has ``bsd ' ' i t w i l l fo l low the 8−space 13 * s t y l e . Both are very standard . 14 */ 15 16 /** 17 * GOOD: 18 */ 19 20 #inc lude 21 #inc lude 22 #def ine MAX STRING SIZE 1000 23 #def ine DEBUG 0 24 i n t main( i n t argc , char **argv) { 25 . . . 26 4 27 /** 28 * BAD: 29 */ 30 31 /* D e f i n i t i o n s and inc ludes are mixed up */ 32 #inc lude 33 #def ine MAX STING SIZE 1000 34 /* D e f i n i t i o n s are given names l i k e v a r i a b l e s */ 35 #def ine debug 0 36 #inc lude 37 /* No spac ing between inc ludes , d e f i n i t i o n s and main func t ion */ 38 i n t main( i n t argc , char **argv) { 39 . . . 40 41 /** ***************************** 42 * Va r i ab l e s 43 * Give them use fu l lower case names or camelCase . E i t he r i s f ine , 44 * as long as you are c o n s i s t e n t and apply always the same s t y l e . 45 * I n i t i a l i s e them to something tha t makes sense . 46 */ 47 48 /** 49 * GOOD: lower case 50 */ 51 52 i n t main( i n t argc , char **argv) { 53 54 i n t i = 0; 55 i n t num_fifties = 0; 56 i n t num_twenties = 0; 57 i n t num_tens = 0; 58 59 . . . 60 /** 61 * GOOD: camelCase 62 */ 63 64 i n t main( i n t argc , char **argv) { 65 66 i n t i = 0; 67 i n t numFifties = 0; 68 i n t numTwenties = 0; 69 i n t numTens = 0; 70 71 . . . 72 /** 73 * BAD: 74 */ 75 76 i n t main( i n t argc , char **argv) { 77 78 /* Var i ab l e not i n i t i a l i s e d − causes a bug because we didn ' t remember to 79 * s e t i t be fore the loop */ 80 i n t i ; 81 /* Var i ab l e in a l l caps − we ' l l get confused between t h i s and cons tan t s 82 */ 83 i n t NUM_FIFTIES = 0; 84 /* Overly abbrev ia ted v a r i a b l e names make th ings hard . */ 85 i n t nt = 0 86 87 while (i < 10) { 88 . . . 89 i++; 90 } 91 92 . . . 5 93 94 /** ******************** 95 * Spacing : 96 * Space i n t e l l i g e n t l y , v e r t i c a l l y to group b locks of code tha t are doing a 97 * s p e c i f i c operat ion , or to separa te v a r i a b l e d e c l a r a t i o n s from other code . 98 * One tab of inden ta t ion within e i t h e r a func t ion or a loop . 99 * Spaces a f t e r commas . 100 * Space between ) and { . 101 * No space between the ** and the argv in the d e f i n i t i o n of the main 102 * func t ion . 103 * When dec l a r ing a po in te r v a r i a b l e or argument , you may place the a s t e r i s k 104 * ad jacent to e i t h e r the type or to the v a r i a b l e name . 105 * L ines at most 80 c h a r a c t e r s long . 106 * Clos ing brace goes on i t s own l i n e 107 */ 108 109 /** 110 * GOOD: 111 */ 112 113 i n t main( i n t argc , char **argv) { 114 115 i n t i = 0; 116 117 f o r (i = 100; i >= 0; i−−) { 118 i f (i > 0) { 119 printf( ”%d b o t t l e s of beer , take one down and pass i t around , ” 120 ” %d b o t t l e s of beer .\n” , i , i − 1) ; 121 } e l s e { 122 printf( ”%d b o t t l e s of beer , take one down and pass i t around . ” 123 ” We ' re empty .\n” , i) ; 124 } 125 } 126 127 re turn 0; 128 } 129 130 /** 131 * BAD: 132 */ 133 134 /* No space a f t e r commas 135 * Space between the ** and argv in the main func t ion d e f i n i t i o n 136 * No space between the ) and { at the s t a r t of a func t ion */ 137 i n t main( i n t argc , char ** argv){ 138 i n t i = 0; 139 /* No space between v a r i a b l e d e c l a r a t i o n s and the r e s t of the func t ion . 140 * No spaces around the boolean opera tor s */ 141 f o r (i=100;i>=0;i−−) { 142 /* No indenta t ion */ 143 i f (i > 0) { 144 /* Line too long */ 145 printf( ”%d b o t t l e s of beer , take one down and pass i t around , %d 146 b o t t l e s of beer .\n” , i , i − 1) ; 147 } e l s e { 148 /* Spacing f o r no good reason . */ 149 150 printf( ”%d b o t t l e s of beer , take one down and pass i t around . ” 151 ” We ' re empty .\n” , i) ; 152 153 } 154 } 155 /* Clos ing brace not on i t s own l i n e */ 156 re turn 0;} 157 158 /** **************** 6 159 * Braces : 160 * Opening braces go on the same l i n e as the loop or func t ion name 161 * Clos ing braces go on t h e i r own l i n e 162 * Clos ing braces go at the same indenta t ion l e v e l as the th ing they are 163 * c l o s i n g 164 */ 165 166 /** 167 * GOOD: 168 */ 169 170 i n t main( i n t argc , char **argv) { 171 172 . . . 173 174 f o r ( . . . ) { 175 . . . 176 } 177 178 re turn 0; 179 } 180 181 /** 182 * BAD: 183 */ 184 185 i n t main( i n t argc , char **argv) { 186 187 . . . 188 189 /* Opening brace on a d i f f e r e n t l i n e to the fo r loop open */ 190 f o r ( . . . ) 191 { 192 . . . 193 /* Clos ing brace at a d i f f e r e n t indenta t ion to the th ing i t ' s 194 c l o s i n g 195 */ 196 } 197 198 /* Clos ing brace not on i t s own l i n e . */ 199 re turn 0;} 200 201 /** ************** 202 * Commenting : 203 * Each program should have a comment exp la in ing what i t does and who created 204 * i t . 205 * Also comment how to run the program , inc lud ing op t iona l command l i n e 206 * parameters . 207 * Any i n t e r e s t i n g code should have a comment to exp la in i t s e l f . 208 * We should not comment obvious th ings − wri te code tha t documents i t s e l f 209 */ 210 211 /** 212 * GOOD: 213 */ 214 215 /* change . c 216 * 217 * Created by Aidan Nagorcka−Smith ( aidann@student . unimelb . edu . au) 218 13/03/2011 219 * 220 * P r i n t the number of each coin tha t would be needed to make up some 221 change 222 * tha t i s input by the user 223 * 224 * To run the program type : 7 225 * . / co ins −−num coins 5 −−shape co ins t rapezo id −−output b l ab l a . t x t 226 * 227 * To see a l l the input parameters , type : 228 * . / co ins −−help 229 * Options : : 230 * −−help Show help message 231 * −−num coins arg Input number of co ins 232 * −−shape co ins arg Input co ins shape 233 * −−bound arg (=1) Max bound on xxx , d e f a u l t value 1 234 * −−output arg Output s o l u t i o n f i l e 235 * 236 */ 237 238 i n t main( i n t argc , char **argv) { 239 240 i n t input_change = 0; 241 242 printf( ” P lease input the value of the change (0−99 cent s 243 i n c l u s i v e ) :\n” ) ; 244 scanf( ”%d” , &input_change) ; 245 printf( ”\n” ) ; 246 247 // Va l id change va lues are 0−99 i n c l u s i v e . 248 i f (input_change < 0 | | input_change > 99) { 249 printf( ” Input not in the range 0−99.\n” ) 250 } 251 252 . . . 253 254 /** 255 * BAD: 256 */ 257 258 /* No explanat ion of what the program i s doing */ 259 i n t main( i n t argc , char **argv) { 260 261 /* Commenting obvious th ings */ 262 /* Create a i n t v a r i a b l e c a l l e d input change to s t o r e the input from 263 the 264 * user . */ 265 i n t input_change ; 266 267 . . . 268 269 /** **************** 270 * Code s t r u c t u r e : 271 * F a i l f a s t − input checks should happen f i r s t , then do the computation . 272 * S t ruc tu re the code so tha t a l l e r ro r handl ing happens in an easy to read 273 * l o c a t i o n 274 */ 275 276 /** 277 * GOOD: 278 */ 279 i f (input_is_bad) { 280 printf( ” Er ror : Input was not v a l i d . E x i t i n g .\n” ) ; 281 exit(EXIT_FAILURE) ; 282 } 283 284 /* Do computations here */ 285 . . . 286 287 /** 288 * BAD: 289 */ 290 8 291 i f (input_is_good) { 292 /* l o t s of computation here , pushing the e l s e par t o f f the screen . 293 */ 294 . . . 295 } e l s e { 296 fprintf(stderr , ” Er ror : Input was not v a l i d . E x i t i n g .\n” ) ; 297 exit(EXIT_FAILURE) ; 298 } Additional Support Your tutors will be available to help with your assignment during the scheduled workshop times. Ques- tions related to the assignment may be posted on the Piazza forum, using the folder tag assignment1 for new posts. You should feel free to answer other students’ questions if you are confident of your skills. A tutor will check the Discussion Forum regularly, and answer some questions, but be aware that for some questions you will just need to use your judgment and document your thinking. For example, a question like, “How much data should I use for the experiments?”, will not be answered; you must try out different data and see what makes sense. In this subject, we support MobaXterm for ssh to the CIS machines nutmeg.eng.unimelb.edu.au and dimefox.eng.unimelb.edu.au, the excellent editor built into MobaXterm or Atom, and gcc on the department machines. While you are free to use the platform and editor of your choice, these are the only tools you can “expect” help with from the staff in this subject. We’ll always do our best to help you learn. Your final program must compile and run on the department machines. Submission Your C code files (including your Makefile and any other files needed to run your code) should be submitted through the LMS under Assignment 1: Code in the Assignments tab. Your programs must compile and run correctly on JupyterHub. You may have developed your program in another environment, but it still must run on JupyterHub at submission time. For this reason, and because there are often small, but significant, differences between compilers, it is suggested that if you are working in a different environment, you upload and test your code on JupyterHub at reasonably frequent intervals. A common reason for programs not to compile is that a file has been inadvertently omitted from the submission. Please check your submission, and resubmit all files if necessary. Assessment There are a total of 10 marks given for this assignment. Your C program will be marked on the basis of accuracy, readability, and good C programming struc- ture, safety and style, including documentation. Safety refers to checking whether opening a file returns something, whether mallocs do their job, etc. The documentation should explain all major design decisions, and should be formatted so that it does not interfere with reading the code. As much as possible, try to make your code self-documenting, by choosing descriptive variable names. 9 Plagiarism This is an individual assignment. The work must be your own. While you may discuss your program development, coding problems and experimentation with your classmates, you must not share files, as this is considered plagiarism. If you refer to published work in the discussion of your experiments, be sure to include a citation to the publication or the web link. “Borrowing” of someone else’s code without acknowledgment is plagiarism. Plagiarism is considered a serious offense at the University of Melbourne. You should read the University code on Academic in- tegrity and details on plagiarism. Make sure you are not plagiarizing, intentionally or unintentionally. You are also advised that there will be a C programming component (on paper, not on a computer) in the final examination. Students who do not program their own assignments will be at a disadvantage for this part of the examination. Late policy The late penalty is 10% of the available marks for that project for each day (or part thereof) overdue. Requests for extensions on medical grounds will need to be supported by a medical certificate. Any request received less than 48 hours before the assessment date (or after the date!) will generally not be accepted except in the most extreme circumstances. In general, extensions will not be granted if the interruption covers less than 10% of the project duration. Remember that departmental servers are often heavily loaded near project deadlines, and unexpected outages can occur; these will not be considered as grounds for an extension. Students who experience difficulties due to personal circumstances are encouraged to make use of the appropriate University student support services, and to contact the lecturer, at the earliest opportunity. Finally, we are here to help! There is information about getting help in this subject on the LMS. Frequently asked questions about the project will be answered in Piazza. 10