The University of Queensland School of Electrical Engineering and Computer Science CSSE2310 – Semester 2, 2024 Assignment 3 (Version 1.0) Marks: 75 Weighting: 15% Due: 3:00pm Friday 4 October, 2024 This specification was created for the use of Jiaxiang LIANG (s4568784) only. Do not share this document. Sharing this document may result in a misconduct penalty. Introduction 1 The goal of this assignment is to demonstrate your skills and ability in fundamental process management 2 and communication concepts (pipes and signals), and to further develop your C programming skills with a 3 moderately complex program. 4 You are to create a program (called uqzip) which allows users to compress a file (or set of files) using any 5 one of a set of external compression programs (zip, gzip, xz or bzip2). This will create a .uqz archive file. 6 The program will also allow decompression of the files that it creates. 7 The assignment will also test your ability to code to a particular programming style guide, and to use a 8 revision control system appropriately. 2824 9 Student Conduct 10 This section is unchanged from assignment one – but you should remind yourself of the referencing requirements. 11 This is an individual assignment. You should feel free to discuss general aspects of C programming and 12 the assignment specification with fellow students, including on the discussion forum. In general, questions like 13 “How should the program behave if 〈this happens〉?” would be safe, if they are seeking clarification on the 14 specification. 15 You must not actively help (or seek help from) other students or other people with the actual design, structure 16 and/or coding of your assignment solution. It is cheating to look at another person’s assignment code 17 and it is cheating to allow your code to be seen or shared in printed or electronic form by others. 18 All submitted code will be subject to automated checks for plagiarism and collusion. If we detect plagiarism or 19 collusion, formal misconduct actions will be initiated against you, and those you cheated with. That’s right, if 20 you share your code with a friend, even inadvertently, then both of you are in trouble. Do not post your 21 code to a public place such as the course discussion forum or a public code repository. (Code in private posts 22 to the discussion forum is permitted.) You must assume that some students in the course may have very long 23 extensions so do not post your code to any public repository until at least three months after the result release 24 date for the course (or check with the course coordinator if you wish to post it sooner). Do not allow others to 25 access your computer – you must keep your code secure. Never leave your work unattended. 26 You must follow the following code usage and referencing rules for all code committed to your SVN 27 repository (not just the version that you submit): 28 Code Origin Usage/Referencing Code provided by teaching staff this semester Code provided to you in writing this semester by CSSE2310 teaching staff (e.g., code hosted on Blackboard, found in /local/courses/csse2310/resources on moss, posted on the discussion forum by teaching staff, provided in Ed Lessons, or shown in class). Permitted May be used freely without reference. (You must be able to point to the source if queried about it – so you may find it easier to reference the code.) Code you wrote this semester for this course Code you have personally written this semester for CSSE2310 (e.g. code written for A1 reused in A3) – pro- vided you have not shared or published it. Permitted May be used freely without reference. (This assumes that no reference was required for the original use.) © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 1 Version 1.0 Document generated 2024-09-06 16:04 Code Origin Usage/Referencing Unpublished code you wrote earlier Code you have personally written in a previous enrolment in this course or in another UQ course or for other reasons and where that code has not been shared with any other person or published in any way. Conditions apply, references required May be used provided you understand the code AND the source of the code is referenced in a comment adjacent to that code (in the required format – see the style guide). If such code is used without appropriate referencing then this will be considered misconduct.Code from man pages on moss Code examples found in man pages on moss. (This does not apply to code from man pages found on other systems or websites unless that code is also in the moss man page.) Code and learning from AI tools Code written by, modified by, debugged by, explained by, obtained from, or based on the output of, an artificial intelligence tool or other code generation tool that you alone personally have interacted with, without the assis- tance of another person. This includes code you wrote yourself but then modified or debugged because of your interaction with such a tool. It also includes code you wrote where you learned about the concepts or library functions etc. because of your interaction with such a tool. It also includes where comments are written by such a tool – comments are part of your code. Conditions apply, references & documentation req’d May be used provided you understand the code AND the source of the code or learning is referenced in a com- ment adjacent to that code (in the required format – see the style guide) AND an ASCII text file (named toolHistory.txt) is included in your repository and with your submission that describes in detail how the tool was used. (All of your interactions with the tool must be captured.) The file must be committed to the reposi- tory at the same time as any code derived from such a tool. If such code is used without appropriate referencing and without inclusion of the toolHistory.txt file then this will be considered misconduct. See the detailed AI tool use documentation requirements on Blackboard – this tells you what must be in the toolHistory.txt file. Code copied from sources not mentioned above Code, in any programming language: • copied from any website or forum (including Stack- Overflow and CSDN); • copied from any public or private repositories; • copied from textbooks, publications, videos, apps; • copied from code provided by teaching staff only in a previous offering of this course (e.g. previous assignment one solution); • written by or partially written by someone else or written with the assistance of someone else (other than a teaching staff member); • written by an AI tool that you did not personally and solely interact with; • written by you and available to other students; or • from any other source besides those mentioned in earlier table rows above. Prohibited May not be used. If the source of the code is referenced adjacent to the code then this will be considered code without academic merit (not misconduct) and will be re- moved from your assignment prior to marking (which may cause compilation to fail and zero marks to be awarded). Copied code without adjacent referencing will be consid- ered misconduct and action will be taken. This prohibition includes code written in other program- ming languages that has been converted to C. Code that you have learned from Examples, websites, discussions, videos, code (in any pro- gramming language), etc. that you have learned from or that you have taken inspiration from or based any part of your code on but have not copied or just converted from another programming language. This includes learning about the existence of and behaviour of library functions and system calls that are not covered in class. Conditions apply, references required May be used provided you do not directly copy code AND you understand the code AND the source of the code or inspiration or learning is referenced in a comment adjacent to that code (in the required format – see the style guide). If such code is used without appropriate referencing then this will be considered misconduct. You must not share this assignment specification with any person (other than course staff), organ- 29 isation, website, etc. Uploading or otherwise providing the assignment specification or part of it to a third 30 party including online tutorial and contract cheating websites is considered misconduct. The university is aware 31 of many of these sites and many cooperate with us in misconduct investigations. You are permitted to post 32 small extracts of this document to the course Ed Discussion forum for the purposes of seeking or providing 33 clarification on this specification. 34 In short – Don’t risk it! If you’re having trouble, seek help early from a member of the teaching staff. 35 © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 2 Version 1.0 Document generated 2024-09-06 16:04 Don’t be tempted to copy another student’s code or to use an online cheating service. Don’t help another 36 CSSE2310/7231 student with their code no matter how desperate they may be and no matter how close your 37 relationship. You should read and understand the statements on student misconduct in the course profile and 38 on the school website: https://eecs.uq.edu.au/current-students/guidelines-and-policies-students/ 39 student-conduct. 40 Specification 41 The uqzip program will allow a user to create a .uqz archive file that will hold a compressed file (or files) that 42 can later be extracted from the archive (decompressed) using uqzip. uqzip won’t actually do the compression 43 itself – it will rely on external programs (such as zip, gzip, xz or bzip2 available on moss) to do the compression 44 and will package up the resulting data into a .uqz archive. The .uqz archive file will also store the method of 45 compression used and the filename for each compressed file stored in the archive. 46 When used for decompression, uqzip will extract each compressed data stream from the .uqz archive file 47 and pipe it to an external decompression program whose output will be saved to a file in the current directory 48 with the original filename. 49 uqzip is to support both sequential and parallel compression and decompression. Sequential processing 50 means files are compressed (or decompressed) one after the other. Parallel processing means files are compressed 51 (or decompressed) in parallel – each using a separate child process of uqzip (or two child processes per file in 52 the case of parallel decompression). 53 Command Line Arguments 54 Your uqzip program is to accept command line arguments as follows when doing file compression, i.e. creating 55 a .uqz archive file: 56 ./uqzip [--parallel] [--xz|--gzip|--bz|--none|--zip] [--output outFilename ] filename ... 57 and as follows when doing decompression (archive extraction): 58 ./uqzip [--parallel] --extract archive-file 59 The square brackets ([]) indicate optional arguments or groups of arguments. The pipe symbol (|) indicates 60 a choice. The italics indicate placeholders for user-supplied value arguments. An ellipsis (. . . ) indicates the 61 previous argument can be repeated. Note that the option arguments (those beginning with “--” and their 62 associated value, if any) can be in any order. 63 When used for compression, uqzip will accept an optional argument specifying the compression method. 64 When none of --bz, --gzip, --zip, --xz or --none are specified then the program is to act as if --none is 65 specified – i.e. no compression is used when creating the archive file. 66 When used for compression, an optional output file name (outFilename ) can be specified by using the 67 --output option argument. If this argument pair is not specified, then the default filename out.uqz is to be 68 used. It can be assumed that (i.e. we will not test a situation when) the outFilename (or default output 69 filename) is given as a filename argument on the command line. 70 When used for compression, one or more existing filenames to be compressed into the .uqz archive must 71 also be specified after any option arguments. It can be assumed that the first (or only) file name argument 72 does not start with the characters “--”. (If the user wants to specify a file whose name does start with “--” 73 then they should prefix the name with “./”. Any arguments after the first file name are also assumed to be file 74 names, even if they begin with “--”.) 75 When used for decompression (with the --extract option argument specified), uqzip expects a filename 76 argument to be present (archive-file , which must be the last argument). This is the name of the .uqz archive 77 file to be decompressed1. 78 If the --parallel option argument is present, then uqzip is to use child processes running in parallel to do 79 the compression or decompression. If this argument is not present, then uqzip will use child processes running 80 sequentially. Full details of the required functionality are provided later in this document. 81 Some examples of how the program might be run include the following2: 82 ./uqzip /usr/dict/share/words 83 ./uqzip --parallel --bz one.txt two.txt three.txt 84 1The term “.uqz archive file” will be used to describe the file format but filenames may or may not have the .uqz file extension – it is not a requirement and will not be added if not present. The default filename (out.uqz) will use this extension. 2This is not an exhaustive list and does not show all possible combinations of arguments. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 3 Version 1.0 Document generated 2024-09-06 16:04 ./uqzip --zip --output lists.uqz --parallel list1 list2 list3 list4 list5 list6 85 ./uqzip --output archive --xz /usr/share/dict/words local/dictionary 86 ./uqzip --extract archive.abc 87 ./uqzip --parallel --extract out.uqz 88 More details on the expected behaviour of the program are provided later in this document.2824 89 Prior to doing anything else, your program must check the command line arguments for validity. If the 90 program receives an invalid command line then it must print the following (two line) message: 91 Usage: ./uqzip [--parallel] [--xz|--gzip|--bz|--none|--zip] [--output outFilename] filename ... 92 Or: ./uqzip [--parallel] --extract archive-file 93 to standard error (with a newline at the end of each line), and exit with an exit status of 20. Note that the 94 colon (:) symbols on each line are aligned, i.e. three spaces precede “Or:”. 95 Invalid command lines include (but may not be limited to) any of the following: 96 • More than one of the option arguments --bz, --gzip, --zip, --xz, --none or --extract is given on the 97 command line 98 • Any of the option arguments is listed more than once. 99 • The --output option argument is given but it is not followed by a filename argument 100 • No filenames are given on the command line. 101 • --extract is given on the command line along with --output 102 • --extract is given on the command line along with more than one filename 103 • An unexpected argument is present.2824 104 • Any argument is the empty string. 105 Checking whether files exist or can be opened is not part of the usage checking (other than checking that 106 filename values are not empty). This is checked after command line validity as described below. 107 File Checking 108 Compression 109 If uqzip is creating an archive file (i.e. --extract is NOT specified on the command line) then it must attempt 110 to open the output file (the given outFilename from the command line, or, if none is present, then the default 111 output filename – out.uqz) for writing. If the file already exists, it must be truncated and overwitten. If the 112 file is unable to be opened for writing then uqzip must print the message: 113 uqzip: can’t write to file "filename " 114 to standard error (with a following newline) where filename is replaced by the name of the file that could not 115 be opened for writing. uqzip must then exit with status 8. 116 The files to be compressed are not checked for existence. 117 Decompression 118 If --extract is specified on the command line and uqzip is unable to open the given archive filename 119 (archive-file argument) for reading, then your program must print the message: 120 uqzip: unable to read from file "filename " 121 to standard error (with a following newline) where filename is replaced by the name of the .uqz archive file 122 from the command line. The double quotes must be present. uqzip must then exit with status 18. 123 Program Behaviour – Compression (Archive Creation) 124 If the command line and file checks described above are successful and an archive file is to be created (the 125 argument --extract is NOT specified on the command line) then uqzip is to behave as described below. 126 First, uqzip must write out the header section for the archive file. (See Table 1 for details of the file format, 127 including the header section.) Placeholders should initially be used for the file record offsets because these 128 aren’t known yet. These will need to updated in the file after the compressed files are added to the archive. 129 © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 4 Version 1.0 Document generated 2024-09-06 16:04 Table 1: .uqz file format. The file will consist of a header section plus one file record for every file stored in the archive. Multi-byte numbers are stored in little-endian format (i.e. the same format used on moss.) Number of Bytes Data Type Description Header Section 3 Characters File signature – “UQZ” – this fixed string at the start of the file indicatesthat this is a .uqz archive file. Note that the string is not null terminated. 1 8-bit unsignedinteger Method – this integer indicates the compression method used in the file. The number must be one of the method numbers shown in Table 2. 4 32-bit unsignedinteger Number of files – this integer is the number of files contained in this archive (say N). N must not be zero. 4×N 32-bit unsignedintegers File record offsets – for each of the files, this field contains the byte number in this file where the record for this file starts. For the first file (file 0), this number will be 3+1+4+4×N (i.e. the size of this header section). For the second file (file 1), this number will be the size of this header section plus the size of the file record for file 0, etc. File Record (one per file in the archive) 4 32-bit unsignedinteger Data section length – number of bytes (say M) in the data section of this record (see below). The length may be zero. 1 8-bit unsignedinteger Filename length – number of characters (say F ) in the filename section of this record (see below). The length must not be zero. F Characters Filename section – the name of this file (without any directory part). The name will not contain any ‘/’ or null characters. The name is not null terminated. M Bytes Data section – the compressed data for this file. The bytes may have any value. Note that this section may be missing if the data section length is specified as zero. 0 to 3 Null bytes Padding – null (zero valued) padding bytes added to ensure that the file record is a multiple of 4 bytes in size. These are required even on the last file record. Table 2: Commands to be used for compression and decompression for each method. Method Num uqzip Command Line Arg Compression Command (output goes to stdout, filename is replaced by the name of the file being compressed) Decompression Command (input comes from stdin, output goes to stdout) 1 --none cat filename cat 2 --bz bzip2 --stdout filename bzip2 -dc 3 --gzip gzip --best --stdout filename gzip -dc 4 --xz xz --stdout filename xz -dc 5 --zip zip -D -fz- - filename funzip Sequential Compression 130 Individual files specified on the uqzip command line are to be compressed (in the order given on the command 131 line) using a separate child process for each running the compression command shown in Table 2. (Programs 132 are to be found on the user’s PATH.) The output of each command must be piped back to the parent (uqzip) 133 and uqzip must add a complete file record to the archive file. (See Table 1 for details of the file record format.) 134 The filename within the record must be only the “basename” of the given filename – i.e. the name excluding 135 any directory path. In other words, if a ‘/’ character is present in the supplied filename then only that part of 136 © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 5 Version 1.0 Document generated 2024-09-06 16:04 the name after the last ‘/’ is to be saved in the archive file3. For example, if the filename /etc/motd is given on 137 the command line, then it will be saved in the archive using the filename motd.4 When EOF is detected when 138 reading from the pipe, the child process is to be reaped. 139 If a compression program is unable to be executed (e.g. not found on the user’s PATH) then the child process 140 that attempted the exec must send a SIGUSR1 signal to itself to terminate itself. (By default this signal is not 141 caught and causes process termination). If the parent (uqzip) detects that a child has died due to SIGUSR1 142 then it must print the following message to stderr (with a trailing newline): 143 uqzip: Unable to execute "command " 144 where command is replaced by the name of the command whose execution failed (e.g. “gzip” – no arguments 145 are included). The double quotes must be present. uqzip must then exit with status 6. 146 If a child compression process fails for any other reason (i.e. does not exit normally with a zero exit status), 147 then your program must print the following message to stderr (with a trailing newline): 148 uqzip: "command " command failed for filename "filename " 149 where command is replaced by the name of the command being executed (e.g. “xz”) and filename is replaced 150 by the basename of the file being compressed. The double quotes must be present. uqzip must then exit with 151 status 16. 152 If either of these failures occurs then uqzip must abort the archive creation (no further files are to be 153 processed) and the (incomplete) archive file must be removed prior to the program exiting. 154 If a file record is able to be successfully added to the archive file, then uqzip should move on to compressing 155 the next file using the approach described above. 156 If all file records can be successfully added to the archive file, then uqzip should ensure the file offsets section 157 of the archive file header is correct and then exit with status 0. 158 Parallel Compression 159 If the --parallel argument is supplied on the command line, then uqzip is to behave as described above for 160 sequential execution except that all child compression processes must be started before the result of any of 161 these are checked. Once all are started, then the parent (uqzip) must read the stdout of each process in turn 162 (i.e. in the same order as filenames are listed on the command line) and add a corresponding file record to the 163 archive file. When EOF is detected, the child process must be reaped. If an execution error is detected then 164 the program must behave as described above for sequential operation (i.e. printing the appropriate message, 165 removing the incomplete archive file, and exiting), but in addition (prior to exiting), must send a SIGTERM 166 signal to each child process yet to be reaped and reap all remaining children. No further or additional error 167 messages are to be printed. You can assume that SIGTERM will terminate a child process. 168 Program Behaviour – Decompression (Archive Extraction) 169 If the command line and file checks described above are successful and an archive file is to be decompressed 170 (--extract is specified on the command line) then uqzip is to behave as described below, with the precise 171 behaviour depending on whether the --parallel command line argument is supplied or not. 172 If any read errors occur when reading data that is expected to be present in an archive file (e.g. a filename 173 of N bytes is to be read but EOF is reached before reading that many bytes) then your program must print the 174 following message to stderr (with a trailing newline): 175 uqzip: Archive file "filename " has invalid format 176 where filename is replaced by the name of the archive file (as it appears on the command line). Your program 177 must then exit with status 1. (This applies to any read errors when reading the archive file, not just when 178 reading the header section.) 179 You can assume that the archive file itself will not be overwritten as part of the decompression process (i.e. 180 we will not test the decompression where the archive file is in the current directory and the name of that file 181 appears in one of the file records within the file). 182 3Note that it possible a filename given on the command line does not have a basename (e.g. it ends in /). Such a filename will cause an error when the compression program is run (because it is not a file) and will be picked up a child compression process failure. 4It is valid for the same basename to appear in an archive file more than once – this may happen if a filename is listed twice on the command line or files with the same name from two different directories are added to the archive. Your program does not have to check for this. Note that it will not be possible to extract both files from the archive. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 6 Version 1.0 Document generated 2024-09-06 16:04 If you do not encounter any read errors, you can assume that the file is formatted validly, e.g. the method 183 is valid, the offsets are correct, etc. We will not test your program undertaking decompression with invalidly 184 formatted files other than those that will be detected by read errors. 185 Sequential Decompression 186 Each file in the .uqz archive file is to be extracted in turn (in the order that file records are present in the 187 archive file) by using a child process running the decompression command shown in Table 2. (If the file size 188 in the file record is zero then no command should be run, an empty output file can just be created and your 189 program can move on to the next file record.) The standard input of the child process must come via pipe from 190 the parent (the data section of the file record). The standard output of the child process must be redirected to 191 a file (in the current directory) with the name given in the file record. This file must be created if it does not 192 exist and overwritten (truncated) if it does exist5 – even if a file with that name was previously extracted from 193 this same archive file. If it is not possible for uqzip to open an output file for writing then no child process 194 should be created and uqzip must print the following to stderr (with a trailing newline): 195 uqzip: can’t write to file "filename " 196 where filename is replaced by the name of the file (from the filename section of the file record). uqzip must 197 then exit with status 8. The complete file record should be read before an attempt is made to open the output 198 file for writing. If this error occurs, no attempts should be made to extract any other file records. 199 After creating each child process, uqzip must wait for it to be completed and reap it. 200 If a decompression program is unable to be executed (e.g. not found on the user’s PATH) then the child 201 process that attempted the exec must send a SIGUSR1 signal to itself to terminate itself. (By default this 202 signal is not caught and causes process termination). If the parent (uqzip) detects that a child has died due to 203 SIGUSR1 then it must print the following message to stderr (with a trailing newline): 204 uqzip: Unable to execute "command " 205 where command is replaced by the name of the command whose execution failed (e.g. “gzip” – no arguments 206 are included). The double quotes must be present. uqzip must then exit with status 6. 207 If a child decompression process fails for any other reason (i.e. does not exit normally with a zero exit 208 status), then your program must print the following message to stderr (with a trailing newline): 209 uqzip: "command " command failed for filename "filename " 210 where command is replaced by the name of the command being executed (e.g. “xz”) and filename is replaced 211 by the name of the file being extracted (from the file record). The double quotes must be present. uqzip must 212 then exit with status 16. 213 If either of these failures occurs then uqzip must abort the archive extraction (no further file records are to 214 be processed) and the (incomplete) destination file must be removed before the program exits but previously 215 extracted files are to remain (assuming they don’t have the same name as the current destination file). 216 When a file is able to be successfully extracted from the archive file, then uqzip must print the following to 217 stdout (with a trailing newline): 218 "filename " successfully extracted 219 where filename is replaced by the name of the file extracted (from the file record). The double quotes must 220 be present. uqzip must then move on to extracting the next file using the approach described above. 221 If all file records can be successfully extracted from the archive file, then uqzip must exit with status 0. 222 Parallel Decompression – Advanced Functionality 223 This functionality is more advanced. It is recommended that you get other functionality operating first. 224 If the --parallel argument is supplied on the command line, then, for each file in the archive (that doesn’t 225 have size zero6), uqzip must create two child processes connected by a pipe. The child process with the reading 226 end of the pipe (which must be connected to that process’s standard input) will run the decompression program 227 as described above for sequential execution with its standard output redirected to the output file. The child 228 5Be careful about testing your program in a directory with files that you want to keep. It is strongly recommended that you test the decompression functionality in a separate directory to your SVN working directory. 6Files with size zero must be handled as with sequential execution – an empty output file can be created but no child processes are to be created. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 7 Version 1.0 Document generated 2024-09-06 16:04 process with the writing end of the pipe will send the data (from the data section of the file record) over the 229 pipe to the decompression process and then close the pipe and exit7. 230 As with sequential execution, if it is not possible for uqzip to open an output file for writing then no child 231 processes should be created for that file and the program must exit with the stderr message and exit status 232 described above. No further child processes are to be created for any remaining files and, before uqzip exits, 233 any existing child processes must be sent a SIGTERM signal and reaped. All output files opened for writing 234 must be removed. 235 Once all processes are started, uqzip must reap all child processes as they exit (i.e. not necessarily in order 236 of creation). If a decompression program is unable to be executed, or execution fails for some other reason, 237 then uqzip must print a message and exit as described above for sequential execution. Before exiting, uqzip 238 must send a SIGTERM signal to all remaining child processes and reap them. No additional error messages 239 are printed, even if execution errors are detected. All output files successfully created (i.e. the decompression 240 program ran successfully for that file) should be preserved (not deleted). All other output files opened for 241 writing must be removed. We will not test parallel decompression situations where the same filename appears 242 twice in the archive file (i.e. your program can behave however it likes in this situation). 243 You can assume that SIGTERM will terminate a child process. 244 Interrupting uqzip 245 If uqzip receives a SIGINT signal (as usually sent by pressing Ctrl-C) when running in sequential mode then it 246 must allow the current compression/decompression job to finish (and reap any associated child process(es) etc. 247 as required) and not commence processing any further files. If the current file is the last file in the sequence 248 then uqzip should behave as if the signal was not received and exit as described above. If files remain to be 249 processed and archive creation is being undertaken, then uqzip must remove the archive. If archive extraction 250 is being undertaken then existing files that have been extracted successfully should remain. Your program must 251 then print the following message to standard error (with a trailing newline): 252 uqzip: Execution interrupted 253 and exit with status 10. 254 If uqzip is undertaking parallel execution then the SIGINT signal must be ignored. 255 Other Requirements 256 Your program must also meet all of the following requirements: 257 • uqzip must free all dynamically allocated memory before exiting.2824 (This requirement does not apply to 258 child processes of uqzip, only to the original process.) 259 • uqzip must use memory judiciously. When compressing, uqzip must have no more than one compressed 260 file in memory at a time and must not construct the archive file in memory (i.e. it should write file records 261 out to the archive file as compression jobs complete). When decompressing, no uqzip process should 262 have more than one file record in memory at a time (i.e. the complete archive file must not be read into 263 memory of any one process). 264 • Child processes of uqzip must not inherit any unnecessary open file descriptors opened by uqzip. (Open 265 file descriptors that uqzip inherits from its parent and that are passed to a child must remain open in the 266 child.) 267 • uqzip is not to leave behind any orphan processes (i.e. when uqzip exits normally then none of its children 268 must still be running). uqzip is also not to leave behind any zombie processes – when doing sequential 269 processing, all child processes from processing one file must be reaped before a child process is created for 270 the next file. 271 • uqzip must not busy wait, i.e. it should not repeatedly check for something (e.g. process termination) in 272 a loop. This means that use of the WNOHANG option when waiting is not permitted. 273 7This approach, with extra processes, allows for parallel processing – all of the decompression processes can be fed data in parallel. Without it, a single parent process may get blocked on writing if it fills up a pipe buffer to one of the child decompression processes and would then be unable to write data to any of the other children until that buffer is read. Another approach (not to be implemented in this assignment) could be to use non-blocking writes from a single parent to each of the children. This would have to be coupled with select() or poll() or similar to avoid busy waiting for pipes to become available for writing. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 8 Version 1.0 Document generated 2024-09-06 16:04 • All commands run by uqzip when processing files must be direct children of uqzip, i.e. the use of 274 grandchild processes is not permitted. 275 • No child processes must ever output anything to stderr. 276 • Any file created by uqzip must be created with rw-rw---- permissions. 277 • uqzip must never exit due to SIGPIPE. 278 We will not test for unexpected system call or library failures in an otherwise correctly-implemented program 279 (e.g. if fork() or malloc() fails due to insufficient available resources). Your program can behave however it 280 likes in these cases, including crashing. 281 We will also not check the behaviour of uqzip with very large files. You can assume that the sizes of files 282 to be compressed and input/output archive files will always fit in a 32 bit unsigned integer. 283 Provided Library 284 A library has been provided to you with the following functions which your program may use:2824 285 UqzHeaderSection* read_uqz_header_section(FILE* stream); 286 void free_uqz_header_section(UqzHeaderSection* header); 287 See the man pages on moss for details. 288 To use the library, you will need to add #include
to your code and use the compiler flag 289 -I/local/courses/csse2310/include when compiling your code so that the compiler can find the include 290 file. You will also need to link with the library containing this function. To do this, use the compiler arguments 291 -L/local/courses/csse2310/lib -lcsse2310a3.2824 292 Style 293 Your program must follow version 3 of the CSSE2310/CSSE7231 C programming style guide available on the 294 course Blackboard site. Your submission must also comply with the Documentation required for the use of AI 295 tools if applicable. 2824 296 Note that a single global variable of type bool may be used in your assignment – for the implementation of 297 signal handling. Any other use of global variables will be heavily penalised – see style marking details on page 298 13. 299 Hints 300 1. A demo program is available on moss that implements the required functionality: demo-uqzip. Make sure 301 you try it out to see how your program is expected to behave. 302 2. A 32-bit unsigned integer can be read from a file using fread(3) or read(2). For example, if FILE* 303 handle; is a stream that is open for reading, then the following code will read a single 4-byte (32-bit) 304 unsigned integer from the current file position: 305 uint32_t number; 306 size_t itemsRead = fread(&number, sizeof(uint32_t), 1, handle); 307 Your code should #include
to use types such as uint32_t. Such quantities can be written 308 in a similar way using fwrite(3) or write(2). 309 3. You can change the file position in a file using fseek(3) (if using standard C IO with FILE* handles) or 310 lseek(2) (if using file descriptors). 311 4. A process can send a signal to another process with the kill(2) system call or it can send a signal to 312 itself with the raise(3) function. 313 5. The basename(3) function may be used to return the basename of a path (though note that it will modify 314 the string that is passed to it). 315 6. You can round a number n up to a multiple of 4 using the operation ((n+3)/4)*4 or, equivalently, 316 (n+3)&∼3. (These operations add 3 to n and then zero out the two least significant bits of the result.) 317 7. A file can be removed using the remove(3) library function or the unlink(2) system call. 318 © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 9 Version 1.0 Document generated 2024-09-06 16:04 8. For a given process, you can examine the file descriptors that it has open by running 319 ls -l /proc/PID/fd 320 where PID is replaced by the process ID. 321 9. You can use the --trace-children=no and --child-silent-after-fork=yes options to valgrind when 322 checking for memory leaks. This will look only at uqzip and ignore child processes. 323 10. Review the provided assignment one sample solution to see what you can learn from it. You may freely 324 use code from this solution without reference. 325 11. Review and understand the code examples covered in the week 6 and 7 lectures. You may freely use code 326 from these without reference. 327 12. Complete the week 6 and 7 Ed Lessons8. These lessons cover important functionality and system calls 328 that will be needed in this assignment. You can freely use code from these and other Ed Lessons. 329 Suggested Approach 330 It is suggested that you follow the following steps. Test your program at each stage and commit to your SVN 331 repository frequently. Note that the specification text above is the definitive description of the expected program 332 behaviour. The list below does not cover all required functionality. 333 1. Write a program that checks the command line arguments and exits with a usage error if they are invalid. 334 2. It is recommended that you implement compression functionality first. Implement the file checking func- 335 tionality (i.e. that the output file can be opened for writing). 336 3. Implement sequential compression. It is recommended that you write functions to (a) write out the header 337 section of an archive file (with placeholders for offsets); (b) create a pipe, fork a child and execute the 338 required compression program for a given file; (c) for a given file, read data from a pipe and write out a 339 file record and reap the child process; (d) fix up the offsets in an archive file 340 4. Add code to check for execution failure and handle that appropriately 341 5. Add support for parallel execution – save process ids and pipe file descriptors from the children, then 342 write file records and reap child processes as required. 343 6. Implement sequential decompression next. 344 7. Implement remaining functionality as required . . . 345 Forbidden Functions, Statements etc. 346 2824You must not use any of the following C statements/directives/etc. If you do so, you will get zero (0) marks 347 for the assignment. 348 • goto 349 • #pragma2824 350 • gcc attributes (other than the possible use of __attribute__((unused)) as described in the style guide)2824 351 You must not use any of the following C functions. If you do so, you will get zero (0) marks for any test case 352 that calls the function. 353 • longjmp(3) and equivalent functions 354 • system(3) 355 • popen(3)2824 356 • mkfifo(3) or mkfifoat(3) 357 • signal(2), sigpending(2), sigqueue(3), sigwaitinfo(2), sigtimedwait(2), sigsuspend(2) 358 • sleep(3), usleep(3), nanosleep(2) 359 • Any pthread* functions 360 • As noted in Other Requirements, you must not use the WNOHANG option when waiting. 361 8These lessons require at least the week 2 lessons to be completed first. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 10 Version 1.0 Document generated 2024-09-06 16:04 • Functions described in the man page as non standard, e.g. strcasestr(3). Standard functions will 362 conform to a POSIX standard – often listed in the “CONFORMING TO” section of a man page. Note 363 that getopt_long(3) and getopt_long_only(3) are an exception to this – these functions are permitted 364 if desired. 365 The use of comments to control the behaviour of clang-format and/or clang-tidy (e.g., to disable style 366 checking) will result in zero marks for automatically marked style. 367 Testing 368 You are responsible for ensuring that your program operates according to the specification. You are encouraged 369 to test your program on a variety of scenarios. A variety of programs will be provided to help you in testing: 370 • A demonstration program (called demo-uqzip) that implements the correct behaviour is available on moss. 371 You can test your program with a set of command line arguments and also run the same test (perhaps 372 with a different output filename) with demo-uqzip to check that you get the same output. You can create 373 archives with one program and extract files with the other. You can also use demo-uqzip to check the 374 expected behaviour of the program if some part of this specification is unclear. 375 • A test script will be provided on moss that will test your program against a subset of the functionality 376 requirements – approximately 50% of the available functionality marks. The script will be made available 377 about 10 to 14 days before the assignment deadline and can be used to give you some confidence that 378 you’re on the right track. The “public tests” in this test script will not test all functionality and you 379 should be sure to conduct your own tests based on this specification. The “public tests” will be used in 380 marking, along with a set of “private tests” that you will not see. 381 • The Gradescope submission site will also be made available about 10 to 14 days prior to the assignment 382 deadline. Gradescope will run the test suite immediately after you submit. When this is complete9 you 383 will be able to see the results of the “public tests”. You should check these test results to make sure 384 your program is working as expected. Behaviour differences between moss and Gradescope may be due to 385 memory initialisation assumptions in your code, so you should allow enough time to check (and possibly 386 fix) any issues after submission. 387 When testing your program, be very careful about the names of archive files you create and 388 the names of files you place in archive files. It is very easy for uqzip to overwrite an existing file 389 (e.g. when extracting a file from an archive) which may cause you to lose assignment work if 390 you overwrite a source file. You should do your testing in a separate directory to your assignment working 391 directory and commit your work to your SVN repo regularly to minimise the risk of losing your work. 392 Submission 393 Your submission must include all source and any other required files (in particular you must submit a Makefile). 394 Do not submit compiled files (e.g. .o, compiled programs) or .uqz archive files.2824 395 Your program (named uqzip) must build on moss.labs.eait.uq.edu.au and in the Gradescope environ- 396 ment with: 397 make 398 Your program must be compiled with gcc with at least the following options: 399 -Wall -Wextra -pedantic -std=gnu99 400 You are not permitted to disable warnings or use pragmas to hide them. You may not use source files other 401 than .c and .h files as part of the build process – such files will be removed before building your program. 2824 402 If any errors result from the make command (i.e. the uqzip executable can not be created) then you will 403 receive 0 marks for functionality (see below). Any code without academic merit will be removed from your 404 program before compilation is attempted (and if compilation fails, you will receive 0 marks for functionality).2824 405 Your program must not invoke other programs or use non-standard headers/libraries. 406 Your assignment submission must be committed to your Subversion repository under 407 9Gradescope marking may take only a few minutes or more than 30 minutes depending on the functionality and efficiency of your code. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 11 Version 1.0 Document generated 2024-09-06 16:04 svn+ssh://source.eait.uq.edu.au/csse2310-2024-sem2/csse2310-s4568784/trunk/a3 408 Only files at this top level will be marked so do not put source files in subdirectories. You may create 409 subdirectories for other purposes (e.g. your own test files) but these will not be considered in marking – they 410 will not be checked out of your repository. 411 You must ensure that all files needed to compile and use your assignment (including a Makefile) are commit- 412 ted and within the trunk/a3 directory in your repository (and not within a subdirectory) and not just sitting 413 in your working directory. Do not commit compiled files or binaries. You are strongly encouraged to check out 414 a clean copy for testing purposes. 415 To submit your assignment, you must run the command 2824 416 2310createzip a3 417 on moss and then submit the resulting zip file on Blackboard (a GradeScope submission link will be made 418 available in the Assessment area on the CSSE2310/7231 Blackboard site)10. The zip file will be named 419 s4568784_csse2310_a3_timestamp.zip 420 where timestamp is replaced by a timestamp indicating the time that the zip file was created. 421 The 2310createzip tool will check out the latest version of your assignment from the Subversion repository, 422 ensure it builds with the command ‘make’, and if so, will create a zip file that contains those files and your 423 Subversion commit history and a checksum of the zip file contents. You may be asked for your password as 424 part of this process in order to check out your submission from your repository. You will be asked to confirm 425 references in your code and also to confirm your use (or not) of AI tools to help you.2824 426 You must not create the zip file using some other mechanism and you must not modify the zip file prior 427 to submission. If you do so, you will receive zero marks. Your submission time will be the time that the file 428 is submitted via GradeScope on Blackboard, and not the time of your last repository commit nor the time of 429 creation of your submission zip file. 430 Multiple submissions to Gradescope are permitted. We will mark whichever submission you choose to 431 “activate” – which by default will be your last submission, even if that is after the deadline and you made 432 submissions before the deadline. Any submissions after the deadline11 will incur a late penalty – see the 433 CSSE2310 course profile for details.2824 434 Marks 435 Marks will be awarded for functionality and style and documentation. Marks may be reduced if you attend an 436 interview about your assignment and you are unable to adequately respond to questions – see the CSSE2310 Stu- 437 dent Interviews section below. 438 Functionality (60 marks for CSSE2310) 439 Provided your code compiles (see above) and does not use any prohibited statements/functions (see above), and 440 your zip file has been generated correctly and has not been modified prior to submission, then you will earn 441 functionality marks based on the number of features your program correctly implements, as outlined below. 442 Not all features are of equal difficulty. 2824 443 Partial marks will be awarded for partially meeting the functionality requirements. A number of tests will 444 be run for each marking category listed below, testing a variety of scenarios. Your mark in each category will 445 be proportional (or approximately proportional) to the number of tests passed in that category. 446 If your program does not allow a feature to be tested then you will receive 0 marks for that 447 feature, even if you claim to have implemented it. For example, if your program can never create an archive 448 file successfully then we can not determine if your program can correctly run compression programs. Your tests 449 must run in a reasonable time frame, which could be as short as a few seconds for usage checking to many tens 450 of seconds when valgrind is used to test for memory leaks. If your program takes too long to respond, then it 451 will be terminated and you will earn no marks for the functionality associated with that test.2824 452 Exact matching of output messages and output files are used for functionality marking. Strict 453 adherence to this specification is critical to earn functionality marks. 454 The markers will make no alterations to your code (other than to remove code without academic merit). 455 Marks will be assigned in the following categories. 456 10You may need to use scp or a graphical equivalent such as WinSCP, Filezilla or Cyberduck in order to download the zip file to your local computer and then upload it to the submission site. 11or your extended deadline if you are granted an extension. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 12 Version 1.0 Document generated 2024-09-06 16:04 1. Program correctly handles invalid command lines (usage errors) (4 marks) 457 2. Program correctly handles file checking (2 marks) 458 3. Program can correctly create a single file archive (for all 5 formats) (5 marks) 459 4. Program can correctly create multi-file archives in sequential mode (4 marks) 460 5. Program can correctly create multi-file archives in parallel mode (4 marks) 461 6. Program correctly handles errors when executing compression programs (sequential mode) (3 marks) 462 7. Program correctly handles errors when executing compression programs (parallel mode) (3 marks) 463 8. Program can correctly extract a file from a single file archive (for all 5 formats) (5 marks) 464 9. Program can correctly extract files from a multi-file archive in sequential mode (4 marks) 465 10. Program can correctly extract files from a multi-file archive in parallel mode (4 marks) 466 11. Program handles read errors when extracting files from an archive file (4 marks) 467 12. Program correctly handles errors when executing decompression programs (sequential mode) (3 marks) 468 13. Program correctly handles errors when executing decompression programs (parallel mode) (3 marks) 469 14. Program correctly implements interruption (signal handling) for a variety of execution 470 scenarios (4 marks) 471 15. Program correctly closes all unnecessary file descriptors in child processes (for a 472 variety of execution scenarios) (4 marks) 473 16. Program operates correctly, uses memory judiciously and frees all memory upon exit 474 (for a variety of scenarios) (4 marks) 475 Some functionality may be assessed in multiple categories. For example, if your program can’t correctly 476 create any archive file then it will fail most compression tests. 2824 477 Style Marking 478 Style marking is based on the number of style guide violations, i.e. the number of violations of version 3 of the 479 CSSE2310/CSSE7231 C Programming Style Guide (found on Blackboard). Style marks will be made up of two 480 components – automated style marks and human style marks. These are detailed below. Your style marks can 481 never be more than your functionality mark – this prevents the submission of well styled programs which don’t 482 meet at least a minimum level of required functionality. 483 You should pay particular attention to commenting so that others can understand your code. The marker’s 484 decision with respect to commenting violations is final – it is the marker who has to understand your code. 485 You are encouraged to use the 2310reformat.sh and 2310stylecheck.sh tools installed on moss to cor- 486 rect and/or check your code style before submission. The 2310stylecheck.sh tool does not check all style 487 requirements, but it will determine your automated style mark (see below). Other elements of the style guide 488 are checked by humans. 489 All .c and .h files in your submission will be subject to style marking. This applies whether they are 490 compiled/linked into your executable or not12. 491 Automated Style Marking (5 marks) 492 Automated style marks will be calculated over all of your .c and .h files as follows. If any of your submitted 493 .c and/or .h files are unable to be compiled by themselves then your automated style mark will be zero (0). If 494 your code uses comments to control the behaviour of clang-format and/or clang-tidy then your automated 495 style mark will be zero. If any of your source files contain C functions longer than 100 lines of code13 then your 496 automated and human style marks will both be zero. If you use any global variables (other than a single flag 497 of bool type for signal handling) then your automated and human style marks will both be zero. 498 If your code does compile and does not contain any C functions longer than 100 lines and does not use 499 any global variables (other than a single flag of bool type for signal handling) and does not interfere with the 500 expected behaviour of clang-format and/or clang-tidy then your automated style mark will be determined 501 as follows: Let 502 12Make sure you remove any unneeded files from your repository, or they will be subject to style marking. 13Note that the style guide requires functions to be 50 lines of code or fewer. Code that contains functions whose length is 51 to 100 lines will be penalised somewhat – one style violation (i.e. one mark) per function. Code that contains functions longer than 100 lines will be penalised very heavily – no marks will be awarded for human style or automatically marked style. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 13 Version 1.0 Document generated 2024-09-06 16:04 • W be the total number of distinct compilation warnings recorded when your .c files are individually built 503 (using the correct compiler arguments) 504 • A be the total number of style violations detected by 2310stylecheck.sh when it is run over each of your 505 .c and .h files individually14. 506 Your automated style mark S will be 507 S = 5− (W +A) 508 If W +A ≥ 5 then S will be zero (0) – no negative marks will be awarded. If you believe that 509 2310stylecheck.sh is behaving incorrectly or inconsistently then please bring this to the attention of the 510 course coordinator prior to submission, e.g., it is possible the style checker may report different issues on moss 511 than it does in the Gradescope environment. Your automated style mark can be updated if this is deemed to 512 be appropriate. You can check the result of Gradescope style marking soon after your Gradescope submission 513 – when the test suite completes running. 514 Human Style Marking (5 marks) 515 The human style mark (out of 5 marks) will be based on the criteria/standards below for “comments”, “nam- 516 ing” and “modularity”. Note that if your code contains any functions longer than 100 lines or uses a global 517 variable (other than a single flag of bool type for signal handling) then your human style mark is zero and the 518 criteria/standards below are not relevant. 519 The meanings of words like appropriate and required are determined by the requirements in the style guide. 520 Comments (3 marks) 521 Mark Description 0 25% or more of the comments that are present are inappropriate AND/OR at least 50% of therequired comments are missing 1 At least 50% of the required comments are present AND the vast majority (75%+) of commentspresent are appropriate AND the requirements for a higher mark are not met 2 All or almost all required comments are present AND all or almost all comments present are appro-priate AND the requirements for a mark of 3 are not met 3 All required comments are present AND all comments present are appropriate AND additionalcomments are present as appropriate to ensure clarity 522 Naming (1 mark) 523 Mark Description 0 At least a few names used are inappropriate 0.5 Almost all names used are appropriate 1 All names used are appropriate 524 Modularity (1 mark) 525 Mark Description 0 There are two or more instances of poor modularity (e.g. repeated code blocks) 0.5 There is one instance of poor modularity (e.g. a block of code repeated once) 1 There are no instances of poor modularity 526 SVN Commit History Marking (5 marks) 527 Markers will review your SVN commit history for your assignment up to your zip file creation time. This 528 element will be graded according to the following principles: 529 14Every .h file in your submission must make sense without reference to any other files, e.g., it must #include any .h files that contain declarations or definitions used in that .h file. You can check that a header file compiles by itself by running gcc -c filename.h – with any other gcc arguments as required. © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 14 Version 1.0 Document generated 2024-09-06 16:04 • Appropriate use and frequency of commits (e.g. a single monolithic commit of your entire assignment will 530 yield a score of zero for this section). Progressive development is expected, i.e., no large commits with 531 multiple features in them. 532 • Appropriate use of log messages to capture the changes represented by each commit. (Meaningful messages 533 explain briefly what has changed in the commit (e.g. in terms of functionality, not in terms of specific 534 numbered test cases in the test suite) and/or why the change has been made and will be usually be more 535 detailed for significant changes.). 536 The standards expected are outlined in the following rubric. The mark awarded will be the highest for which 537 the relevant standard is met. 538 Mark (out of 5) Description 0 Minimal commit history – only one or two commits ORall commit messages are meaningless. 1 Some progressive development evident (three or more commits) ANDat least one commit message is meaningful. 2 Progressive development is evident (multiple commits) ANDat least half the commit messages are meaningful 3 Multiple commits that show progressive development of almost all or all functionality ANDat least two-thirds of the commit messages are meaningful. 4 Multiple commits that show progressive development of ALL functionality ANDmeaningful messages for all but one or two of the commits. 5 Multiple commits that show progressive development of ALL functionality ANDmeaningful messages for ALL commits. 539 Total Mark 540 Let 541 • F be the functionality mark for your assignment (out of 60 for CSSE2310 students). 542 • S be the automated style mark for your assignment (out of 5). 543 • H be the human style mark for your assignment (out of 5). 544 • C be the SVN commit history mark (out of 5). 545 • V be the scaling factor (0 to 1) determined after interview (if applicable – see the CSSE2310 Student 546 Interviews section below) – or 0 if you fail to attend a scheduled interview without having evidence of 547 exceptional circumstances impacting your ability to attend. 548 Your total mark for the assignment will be: 549 M = (F + min{F, S +H} + min{F,C})× V 550 out of 75 (for CSSE2310 students) 551 In other words, you can’t get more marks for style or SVN commit history than you do for functionality. 552 Pretty code that doesn’t work will not be rewarded! 553 Late Penalties 554 Late penalties will apply as outlined in the course profile. 555 CSSE2310 Student Interviews 556 This section is unchanged from assignment one. 557 The teaching staff will conduct interviews with a subset of CSSE2310 students about their sub- 558 missions, for the purposes of establishing genuine authorship. If you write your own code, you have nothing 559 to fear from this process. If you legitimately use code from other sources (following the usage/referencing 560 requirements outlined in this assignment, the style guide, and the AI tool use documentation requirements) 561 © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 15 Version 1.0 Document generated 2024-09-06 16:04 then you are expected to understand that code. If you are not able to adequately explain the design of your 562 solution and/or adequately explain your submitted code (and/or earlier versions in your repository) and/or be 563 able to make simple modifications to it as requested at the interview, then your assignment mark will be scaled 564 down based on the level of understanding you are able to demonstrate and/or your submission may be subject 565 to a misconduct investigation where your interview responses form part of the evidence. Failure to attend 566 a scheduled interview will result in zero marks for the assignment unless there are documented exceptional 567 circumstances that prevent you from attending. 568 Students will be selected for interview based on a number of factors that may include (but are not limited 569 to): 570 • Feedback from course staff based on observations in class, on the discussion forum, and during marking; 571 • An unusual commit history (versions and/or messages), e.g. limited evidence of progressive development; 572 • Variation of student performance, code style, etc. over time; 573 • Use of unusual or uncommon code structure/functions etc.; 574 • Referencing, or lack of referencing, present in code; 575 • Use of, or suspicion of undocumented use of, artificial intelligence or other code generation tools; and 576 • Reports from students or others about student work. 577 Specification Updates 578 Any errors or omissions discovered in the assignment specification will be added here, and new versions released 579 with adequate time for students to respond prior to due date.2824 Potential specification errors or omissions can be 580 discussed on the discussion forum. 581 © 2024 The University of Queensland Document created for Jiaxiang LIANG (s4568784) only. 16 Version 1.0 Document generated 2024-09-06 16:04 51作业君版权所有