CS3114 (Fall 2019) PROGRAMMING ASSIGNMENT #3 Due Thursday, Nov 14 @ 11:00 PM for 100 points Early bonus date: Tuesday, Nov 12 @ 11:00 PM for a 10 point bonus Background setup Assume that there is a big company (let’s take Apple) that would like to recruit students from all over the country. They have their own evaluation system, in which each student is given a score. We will call it apple score, or in short “A score”. A score is a positive real number that we will store as a double data type. And then each student will be assigned a pid tag, which is an integer that we will store as a long data type. Each school is assigned a unique tag header and VT’s tag header will be 909, then a student will have another 9 digits, which makes the pid tag a positive integer with up to 12 digits. By design, that 9 digits after the header is exactly the pid # stored in our student info file (We set up this way just to avoid another round of search). For example, a pid TAG with 909123456789 will be a VT student with pid # 123456789. That student’s name can then be found in our student info database. Note: Not all VT students are selected and given an A score. External Sorting For this project, you will implement an external sorting algorithm for binary data. The input data file will have a size at least 8 blocks, where a block is 16K (16,384) bytes. Each block will contain a series of records, where each record has 16 bytes. The first 8-byte field is a positive integer value (long) for the pid TAG and the second 8-byte field is a double value for the A score, which will be used for sorting. Thus each block contains 1,024 records. Your job is to sort the file (in descending order of the key values, where the largest key value will rank as #1, and so on…), as follows: Using replacement selection (as described in Section 9.6 in the OpenDSA in Canvas), you will sort sections of the file in a working memory that is 8 blocks long. To be precise, the heap will be 8 blocks in size; in addition you will also have a one block input buffer, a one-block output buffer and any additional working variables that you need. If you use a class to represent your record, it is allowed for you to use a little larger memory, but your total memory usage is still limited by 8 blocks of data. Please be aware that byte operations will be more efficient in your codes. To process, read the first 8 blocks of the input file into memory and use replacement selection to create the longest possible run. As it is being created, the run is output to the one block output buffer. Whenever this output buffer becomes full, it is written to an output file called the run file. During that process, you will load input data into the input buffer whenever that input buffer is empty. When the first run is complete, continue on to the rest of the input file, adding the second run to the end of the run file. When the process of creating runs is complete, the run file will contain some number of runs, each run being at least 8 blocks long (except that the last run may have arbitrary number of data), with the data sorted within each run. You may begin each run in a new block or store all runs continuously. In this process, you will use a list (either an array or a linked list) to store the information necessary for you to retrieve the runs. You will then use a multi-way merge to combine the runs into a single sorted file. You must also re-use the same 8-block memory, which was used for the heap in the run-building step, to store working data from the runs during the merge step. Multi-way merging is done by reading the first block from each of the runs currently being merged into your working area, and merging these runs into the one block output buffer. When the output buffer fills up, the data in the output buffer are written to another output file. Whenever one of the input blocks is exhausted, read in the next block for that particular run. This step requires random access (using seek) to the run file, and a sequential writing of the output file. Depending on the size of all records, you may need multiple passes of multiway-merging to sort the whole file. Program Invocation and Operation: The program will take the names of one file from the command line, like this: java Ascoresorting You may assume that the specified record file does exist in our test cases. This record file is the file to be sorted. At the end of your program, the record file (on disk) should be sorted. So this program does modify the input data file. Be careful to keep a copy of the original when you do your testing. In addition to sorting the data file, you must report some information about the execution of your program. You will need to report the ranking of those VT students. The ranking is basically the order of their corresponding A score, sorted from large to small. Specifically, your program will print (to standard output) the ranking #, the student name (full name and last name) and then the A score, in order, from the sorted data file. The records are to be printed 1 record a line (detailed format will be given by our TA later). For that purpose, you will need to take the corresponding pid # from the pid TAG and then search through the studentinfo database (similar to what we have in project 2) to find the names of those ranked VT students. The studentinfo binary file will be provided by our TA with the format same as in project 2. A sample will be posted on piazza as well. For output purpose, just print the first 100 ranked VT students. If there are less than 100 VT students in the original file, print all of them. Programming Standards: You must conform to good programming/documentation standards. Some specifics: • You must include a header comment, preceding main(), specifying the compiler and operating system used and the date completed. • Your header comment must describe what your program does; don't just plagiarize language from this spec. • You must include a comment explaining the purpose of every variable or named constant you use in your program. • You must use meaningful identifier names that suggest the meaning or purpose of the constant, variable, function, etc. • Always use named constants or enumerated types instead of literal constants in the code. • Precede every major block of your code with a comment explaining its purpose. You don't have to describe how it works unless you do something so sneaky it deserves special recognition. • You must use indentation and blank lines to make control structures more readable. • Precede each function and/or class method with a header comment describing what the function does, the logical significance of each parameter (if any), and pre- and post- conditions. • Decompose your design logically, identifying which components should be objects and what operations should be encapsulated for each. Neither the GTAs nor the instructors will help any student debug an implementation unless it is properly documented and exhibits good programming style. Be sure to begin your internal documentation right from the start. You may only use codes you have written, either specifically for this project or for earlier programs, or code taken from the textbook. Note that the textbook code is not designed for the specific purpose of this assignment, and is therefore likely to require modification. It may, however, provide a useful starting point. Design and Testing: Sample data files will be provided in piazza to help you test your program. This is not the data file that will be used in grading your program. The test data provided to you will attempt to exercise the various syntactic elements of the command specifications. It makes no effort to be comprehensive in terms of testing the data structures required by the program. Thus, while the test data provided should be useful, you should also do testing on your own test data to ensure that your program works correctly. For this project, testing design will be quite different from what you have been familiar with in projects 1 and 2. You will test your codes by using another codes to generate different types of binary data. For example, you should generate testing data that’s already sorted (best case), reversely sorted (worst case), then well sorted data plus one extra un-sorted data, and data of size 8 blocks, 16 blocks, 32 blocks, and so on. Don’t simply count on our sample data to test your codes, and somehow you are limited by the data type or size for the files allowed to be uploaded to WebCAT. It is also recommended to use bytebuffer and its corresponding wrap function to build a connection between your data class and byte array. The original data are in bytes. It will be more efficient to use byte array for many operations. On the other hand, it is up to the code to interpret the bytes stored in the byte array. So the same byte array can be used for different data structures. That’ll be the case here. We will use the same memory space for both replacement selection and multiway merge. The reason is based on the fact that external sorting is used when your data is too large to fit into memory. So we do have a firm limitation on the total number of data you are allowed to process in memory. It is also required that you use just two files for your external sorting. Please do not use an individual file for each run. Deliverables: You will implement your project using Eclipse, and you will submit your project using the Eclipse plugin to Web-CAT. Links to the Web-CAT client are posted at the class website. If you make multiple submissions, only your last submission will be evaluated. There is no limit to the number of submissions that you may make. You are required to submit your own test cases (should cover at least 70% of your code) with your program. Of course, your program must pass your own test cases. Your grade will be fully determined by test cases that are provided by the graders (TAs). Web-CAT will report to you which test files have passed correctly, and which have not. Note that you will not be given a copy of these test files, only a brief description of what each accomplished in order to guide your own testing process in case you did not pass one of our tests. To be able to write tests that can be run by webcat, you should go to the following link: http://web-cat.org/junit-quickstart/ and download the student.jar file. You then need to reference this file as an external library. The library extends the existing JUnit library, so you can write tests using regular JUnit syntax. The webpage also includes some helpful hints to help you test better. When structuring the source files of your project, use a directory structure; that is, your source files will all be contained in the project “src" directory. Any subdirectories in the project will be ignored. You are permitted (and encouraged) to work with a partner on this project. When you work with a partner, then only one member of the pair will make a submission. Both names and emails must be included in the documentation and selected on any Web-CAT submission. The last submission from either of the pair members will be graded. In this project, you should pay attention to the efficiency of your code. There will be test cases that even though your code is correct, its running time may exceed our limit and that will be considered a failed test. The time limit is set from TAs’ running time with those test cases plus some extra time. Pledge: Your project submission must include a statement, pledging your conformance to the Honor Code requirements for this course. Specifically, you must include the following pledge statement near the beginning of the file containing the function main() in your program. // On my honor: // // - I have not used source code obtained from another student, // or any other unauthorized source, either modified or // unmodified. // // - All source code and documentation used in my program is // either my original work, or was derived by me from the // source code published in the textbook for this course. // // - I have not discussed coding details about this project with // anyone other than my partner (in the case of a joint // submission), instructor, ACM/UPE tutors or the TAs assigned // to this course. I understand that I may discuss the concepts // of this program with other students, and that another student // may help me debug my program so long as neither of us writes // anything during the discussion or modifies any computer file // during the discussion. I have violated neither the spirit nor // letter of this restriction. Programs that do not contain this pledge will not be graded.