Algorithms and Analysis COSC 2123 Assignment 2: Algorithm Design & Complexity Analysis Assessment Type Individual Assignment. Submit online via Canvas → Assign- ments → Assignment 2. Clarifications/updates/FAQ can be found in Canvas Announcements and Discussion → Assign- ment 2 General Discussion. Due Date Week 12, 11:59pm, 18th of Oct, 2020 Marks 30 IMPORTANT NOTES • If you are asked to develop an algorithm, you need to describe it in words first, say a paragraph, and then provide an unambiguous pseudo code. The description must include enough details to understand how the algorithm runs and what the com- plexity is. All algorithm descriptions and pseudo codes required in this assignment are at most half a page in length. • Standard array operations such as sorting, linear search, binary search, sum, max/min elements, can be used straight away and no further details are required. However, if some modification is needed, you have to provide a full description. If you are not clear if certain operations are standard or not, post it to Canvas Discussion Forum or drop us an email at
[email protected]. • Questions marked with ‘*’ or ‘**’ can be slightly or significantly more challenging. • In the submission (your PDF file), you will be required to certify that the submitted solution represents your own work only by agreeing to the following statement: I certify that this is all my own original work. If I took any parts from elsewhere, then they were non-essential parts of the assignment, and they are clearly attributed in my submission. I will show that I agree to this honour code by typing “Yes": Problem 1 [3 marks] Determine if the following statements are true or false. In either case, provide a formal proof using the definitions of the big-O, big-Omega, and big-Theta notations. For instance, to formally prove that f (n) ∈O(g(n)) or f (n) ∉O(g(n)), we need to demonstrate the existence of a constant c and a sufficient large n0 such that f (n)≤ cg(n) for all n≥ n0, or showing that there are no such values. a) [1 mark] 10000n2 ∈O(n4). b) [1 mark*] 2nn2 ∈Ω(3n). c) [1 mark] p 3n2+4n ∈Θ(n). Problem 2. [3 marks] Consider the following recursive algorithm. Algorithm Mystery(n) if n=1 then Execute Task A; // Requires Θ(1) operations else Mystery(n/3); Mystery(n/3); Mystery(n/3); Execute Task B; //Requires 2n operations end if Let C(n) be the complexity of Mystery(n). Use the method of backward substitution (introduced in Week 3’s lecture) to determine C(n) in three steps. a) [1 mark] Write the recurrence relation for C(n) including the initial condition. b) [1 mark] Write at least two substitution steps for C(n) and identify the pattern. c) [1 mark] Determine the complexity class of the algorithm in terms of Θ(·). Note that there is the so-called Master Theorem, which can deal with a general recur- rence relation of the type T(n)= aT(n/b)+O(nc), where a,b, c are constants (see Levitin’s textbook Chapter 5). However, here it is required that you present the method of back- ward substitution to understand how a proof of the Master Theorem should work. Problem 3 [2 marks] Let A be an integer array of length n. Design a divide and conquer algorithm (description and pseudo code) to find the index of an element of the mini- mum value in the array. If there are several such elements, your algorithm must return the index of the rightmost element. For instance, if A = {0,2,4,5,2,0,3,10}, then the algorithm should return 5, which is the index of the second 0. 2 Problem 4 [4 marks] After graduating from RMIT with distinction, you flew to Mas- sachusetts to join Boston Dynamics, a famous robotics company. You are immediately tasked with developing a set of algorithms for a librarian robot that can put newly ar- rived books to the main book shelf (of a sufficient capacity) while maintaining an alpha- betical order. Your first goal is to design an algorithm that minimises Ccmp, the number of comparisons of two books’ labels in the alphabetical order, and Cmov, the number of book movements (one movement is counted whenever a book is moved from one location to another) in the worst case. Suppose there are n books on the main shelf, which have already been sorted in an ascending alphabetical order. The m newly arrived books are carried into the library on a portable shelf. a) [2 marks*] Scenario 1: The newly arrived books have also been sorted in an ascend- ing alphabetical order and you are allowed to use a temporary shelf of an infinite capacity. Design an algorithm (description + pseudo code) for the robot to ar- range the new books onto the main shelf using Ccmp =Θ(n+m) label comparisons and Cmov = Θ(n+m) book movements in the worst case. Explain why your algo- rithm has the stated worst case complexity (ignoring the constant and choosing the dominant term in your analysis). Algorithm ArrangingBooksScenario1(A,B,C,n,m) //A, B, and C represent the arrays of book labels for the main, the portable, and the temporary shelves //COMPLETE WITH YOUR PSEUDO CODE HERE b) [2 mark*] Scenario 2: There is no temporary shelf to use in the library and the number of newly arrived books, m, is a small constant compared to n, for instance, m = 10. Design an algorithm (description + pseudo code) for the robot to ar- range the new books onto the main shelf using Ccmp =Θ(log(n)) label comparisons in the worst case. What is the number of book movements Cmov incurred in the worst-case of your algorithm and why? Algorithm ArrangingBooksScenario2(A,B,n,m) //A and B represent the arrays of book labels for the main and the portable shelves //COMPLETE WITH YOUR PSEUDO CODE HERE Problem 5 [4 marks] Two years later, to broaden your experience, you leave Boston Dynamics to join Storj Labs, which is developing a fast, safe, and affordable decentralised object storage network. Your very first task at Storj is to contribute to the design of a peer-to-peer (P2P) computer network. According to the description handed to you by the project leader, there are N = 2k computers and each computer node is labelled by a sequence of k bits 0 and 1, referred to as the nodeID, determined as follows. The N nodes can be arranged as leaves of a full rooted binary tree. One traverses the tree from the root to the leaves and label the edges linking the current node with the left child and the right child by 0 and 1, respectively. The nodeID of each leaf is obtained by collecting 3 the edge labels going from the root to the node (see Fig. 1 for an example when N = 8). Moreover, to be space-efficient, each computer node maintains a set of nodeIDs and IP 0 0 0 0 0 0 0 1 1 1111 000 001 010 011 100 101 110 111 1 NodeID root Sender Receiver Figure 1: Example of how the nodeIDs are created for N = 8 nodes. The sender 000 sends a message to its neighbour 011, which forwards it to its neighbour 110, which then forwards it to its neighbour 101, which turns out to be the receiver. The message takes three hops to reach the destination. You need to design an algorithm for a general N = 2k, and not just N = 8. This is just a toy example. addresses of only k= log2(N) other nodes, referred to as the neighbourSet. Each node can only send a message to one of its neighbours and not the other nodes in the network. a) [2 marks*] Your first task is to design a simple policy to assign the set of neigh- bours for each of the N node AND a strategy to pick a neighbour that plays a role of the next hop to relay the message message to the destination target_nodeID. In particular, you need to implement (description and pseudo code) Algorithm pickANeighbour() to complete Algorithm sendToNextHop(), which plays the role of a so-called distributed1 algorithm that handles the relay of the message in the P2P network. You need to show that your algorithm always terminates after a finite number of calls2 to sendToNextHop(). In other words, you must show that a message always takes a finite number of hops to be delivered. How many hops are needed in the worst case using the big-O notation? Algorithm pickANeighbour(neighbourSet,target_nodeID) //YOU NEED TO COMPLETE THIS ALGORITHM return neighbour_nodeID; b) [2 marks**] After the company’s demo has run successfully, you are assigned the second task, which is the same as your first task except that now there is an extra requirement that each message, regardless of the sender and the intended receiver, always takes at most k = log2(N) hops to reach the target. Provide the description 1A distributed algorithm is an algorithm that runs locally at each computer node while still guarantee a network-wide operation such as message transmission. 2An example when the algorithm cannot terminate: if each of the four nodes 000, 001, 010, 011, has the other three as its only neighbours, then there is no way for 000 to send a message to 100, for instance, regardless of how the function pickANeighbour() is designed. 4 Algorithm sendToNextHop(nodeID,neighbourSet,target_nodeID,message) if nodeID= target_nodeID then Read(message); // If the message is intended for the current node then read it else neighbour_nodeID :=pickANeighbour(neighbourSet,target_nodeID); forwardToNeighbour(neighbour_nodeID,target_nodeID,message); //Forward the message to the chosen neighbour end if and the pseudo code and explain why your design of the neighbour sets and pick- ANeighbour() achieves that goal. Hint: greedy. Problem 6 [8 marks] Thanks to your great achievements at Boston Dynamics, Storj, and numerous other high-profile companies, you have been selected to join a much more ambitious mission: Oil Extraction on Mars! Initial explorations have indicated a vast amount of oil reserves in a particular area on Mars. Successful extractions would allow us to fuel our vehicles, generate electricity, and build up another civilization outside the planet Earth. There are n deep wells of oil located on a narrow strip of land, the reserves of which have been estimated thanks to the exploration team (see Fig. 2). Denote by A the array of n elements where A[i], i = 1,2, . . . ,n, is the estimation of the i-th well (in millions of barrels). Due to the lack of equipments as well as the peculiar geological condition on Mars, some wells are marked as "non-extractable" and the corresponding element A[i] is set to be ‘X’. Moreover, extracting two adjacent wells are prohibitive as it may cause instability and collapses of wellbores. The question that faces your team is: which wells should be drilled to maximise the total amount of oil extracted? This is the task you, the only computer scientist in the team, are assigned, and you are of course well equipped to do it (or so everyone thinks!). 1 3 X 6 10 7 6 X 5 2 Figure 2: A toy example of an array of oil wells. The number inside each well indicates the estimated reserve. The wells labelled by ’X’ are non-extractable. A valid selection of well includes wells {1,4,6,9}, which amounts to 19 millions barrels. a) [1 mark] Design a brute-force algorithm (detailed description + unambiguous pseudo code) to find the best wells to drill, given the input A and n. Determine the time complexity of your algorithm (big-O notation) in terms of n. b) [2 marks] Design a greedy algorithm (detailed description + unambiguous pseudo code) to find the best wells to drill, given the input A and n. Demonstrate on the toy example in Fig. 2. Determine the time complexity of your algorithm (big-O notation) in terms of n? 5 1 3 X 6 10 7 6 X 5 2 5 2 1 4 X 3 5 4 1 5 Figure 3: A toy example of two arrays of oil wells. The number inside each well indicates the estimated reserve. The wells labelled by ’X’ are non-extractable. A valid selection includes wells {1,4,6,9} in the top array and wells {2,7,10} in the bottom one, which amounts to 31 millions barrels. c) [2 marks*] Design a (bottom-up) dynamic programming algorithm (explain in de- tail the recurrence and backtrace, and provide an unambiguous pseudo code) to find the best wells to drill, given the input A and n. Demonstrate the re- currence and the backtrace on the toy example in Fig. 2. Determine the space and time complexity of your algorithm (big-O notation) in terms of n. d) [3 marks**] In several newly discovered sites there are even two parallel arrays of n wells each, the reserves of which are represented by A and B. That means more oils can be extracted by your team, which is fantastic. However, your task is now even more challenging (your boss thinks otherwise, of course!), as there is a new technical requirement that horizontally or vertically adjacent wells cannot be both drilled. Design a (bottom-up) dynamic programming algorithm (explain in detail the recurrence and backtrace, the pseudo code is needed only for the backtrace part) to find the best wells to drill, given the input A, B, and n. Demonstrate the recurrence and the backtrace on the toy example in Fig. 3. Problem 7 [6 marks] While you team were celebrating the huge success achieved with the oil project, a string of bad luck struck. First, the radars and the whole telecom unit were struck by lightning and heavily damaged. Then, a mysterious virus silently wiped out every single piece of software in all the computers. You and your fellow telecom engineer are now facing an immense task: to find a way to transmit a 100-page technical report detailing the exploration and extraction success back to Earth, so that the CEO can call for more investments and send up a reinforcement team with better equipments and of course, virus-free computers. Besides, you’ve got to admit that after spending two years on Mars, you start to miss your folks and friends and wish to come home. Anyway, your colleague has just built up a makeshift transmitter, which can transmit binary bits at a very low rate. Forget about the sweet NBN speed of 100Mbps, the transmitter can only perform long-range transmission reliably at a horrifying speed of 1 bit/sec!!! You must find a way to compress the text as much as you can. The Huffman algorithm miraculously pops into your head and you decide immedi- ately to use it to compress the English letters in the report. To test it out, you first try a sample string of letters whose probabilities of appearance are given in Table 4. As all the software libraries have been erased, you need to build a priority queue using a min-heap from the scratch, apart from the Huffman trees/codes. 6 Letter A G H I L M O R T Probability 0.08 0.2 0.15 0.02 0.1 0.35 0.05 0.01 0.04 Figure 4: Probabilities of appearance/frequencies of different letters in a string. a) [1 mark] Construct a min-heap using the bottom-up heap construction to represent the probabilities in Table 4. You may use, e.g. “A: 0.08”, for the corresponding node in the heap. Describe (draw) all the steps required (use two-head arrows to indicate an exchange between two nodes). Let us remind you that a min-heap is the same as a max-heap, except that the key stored at a parent node is required to be smaller than the keys stored at its two children nodes. b) [1 mark] Illustrate (draw) the process of dequeuing the two letters of the smallest probabilities from the heap one by one (i.e. dequeue -> repair -> dequeue -> repair). c) [1 mark] Illustrate (draw) the process of enqueuing the new element (i.e. enqueue -> repair), which results from the combination of probabilities of the two dequeued elements earlier. For example, if ‘A:0.08’ and ‘G:0.2’ were dequeued, then the new element ‘AG:0.28’ would be enqueued. d) [2 marks] Provide a complete construction of the Huffman trees together with the codes for all letters. All intermediate steps need to be provided. When con- structing the trees, the following rule applies: for every node from the root, the probability of its left child must be smaller than or equal to that of its right child. e) [1 mark] Suppose the report is 100,000 character long and the characters follow the same frequencies as in Table 4 (pretending that there are no other characters). [0.5 mark] How long would it take to transmit the report back to Earth using your codes at the speed of 1 bit/sec? Explain your answer. [0.5 mark] If you use fixed-length codes instead of Huffman’s, how long would it take? Explain your answer. 7 1 Submission The final submission (via Canvas) will consist of: • Your solutions to all questions in a PDF file of font size 12pt and your agreement to the honour code (see the first page). Late Submission Penalty: Late submissions will incur a 10% penalty on the total marks of the corresponding assessment task per day or part of day late, i.e, 3 marks per day. Submissions that are late by 5 days or more are not accepted and will be awarded zero, unless Special Consideration has been granted. Granted Special Considerations with new due date set after the results have been released (typically 2 weeks after the deadline) will automatically result in an equivalent assessment in the form of a practical test, assessing the same knowledge and skills of the assignment (location and time to be arranged by the coordinator). Please ensure your submission is correct and up-to-date, re-submissions after the due date and time will be considered as late sub- missions. The core teaching servers and Canvas can be slow, so please do double check ensure you have your assignments done and submitted a little before the submission deadline to avoid submitting late. Assessment declaration: By submitting this assessment, you agree to the assess- ment declaration - https://www.rmit.edu.au/students/student-essentials/ assessment-and- exams/assessment/assessment-declaration 2 Plagiarism Policy University Policy on Academic Honesty and Plagiarism: You are reminded that all sub- mitted work in this subject is to be the work of you alone. It should not be shared with other students. Multiple automated similarity checking software will be used to compare submissions. It is University policy that cheating by students in any form is not permit- ted, and that work submitted for assessment purposes must be the independent work of the student(s) concerned. Plagiarism of any form will result in zero marks being given for this assessment, and can result in disciplinary action. For more details, please see the policy at https://www.rmit.edu.au/students/student-essentials/assessment-and-results/ academic-integrity. 3 Getting Help There are multiple venues to get help. There are weekly consultation hours (see Can- vas for time and location details). We will also be posting common questions on Canvas and we encourage you to check and participate in the discussion forum on Canvas. Al- though we encourage participation in the forums, please refrain from posting solutions or suggestions leading to solutions. 8
欢迎咨询51作业君