The Australian National University Research School of Computer Science, CECS COMP8430 – Data Wrangling – 2020 Assignment 4 Due 11:55 pm on Friday 30 October 2020 Worth 10% of the final grade for COMP8430 This assignment is only for Master students enrolled in COMP8430. Draft – Last update August 28, 2020 Overview and Objectives This assignment requires you to select and read a research paper relevant to data wrangling, and then to summarise and critically analyse this paper. You will need to provide a summary of the paper in your own words, and answer a set of questions about certain aspects of the paper that you have selected. Important • The answers to this assignment have to be submitted online in Wattle, see the link Assignment 4 Submission in week 12 (26 to 30 October). • Follow instructions given for maximum text length in free format answers. If your answers are too long this will attract a penalty (for details see the individual questions below and the corresponding answer submission forms in Wattle). • You can edit your answers many times and they will be saved by Wattle. • Make sure you submit the final version of your assignment answers before the submission deadline. • Note that Wattle does not allow us to access any earlier edited versions of your answers, so check very carefully what you submit as the final version! You can only submit your assignment once! Make sure you do not forget to submit your assignment! Penalties Textual questions have maximum line and maximum word limits. If you write more than these provided limits we will have to apply an over-word-limit penalty. For details of limits see the individual questions below and the corresponding pages in the assignment submission in Wattle. Deadlines, Extensions, and Late Submissions The assignment is due 11:55 pm on Friday 30 October 2020. Students will only be granted an extension on the submission deadline in extenuating circumstances, as defined by ANU policy (http://www.anu.edu.au/students/program-administration/assessments-exams/deferred-examinations). If you think you have grounds for an extension, you must notify the course convener as soon as possible and provide written evidence in support of your case (such as a medical certificate). The course convener will then decide whether to grant an extension and inform you as soon as practical. In accordance with the CECS and ANU late submission policy, no late submissions will be accepted, except where an extension has been approved by the course convener. Assignment Structure The assignment consists of five (5) tasks as described below. Make sure you answer all aspects of each task. If you have any questions on the assignment please post them on Wattle – however do not post any partial solutions, program codes, URLs, etc., or any hints on how to solve any of the assignment tasks. Plagiarism No group work is permitted for this assignment. We do encourage you to discuss your work, but we expect you to do the assignment work by yourself. If you are unsure about what constitutes plagiarism, make sure you carefully read the ANU Academic Honesty Policy (http://academichonesty.anu.edu.au/). If you do include ideas or material from other sources, then you clearly have to make attribution by providing a reference to the material or source in your submitted assignment answers. We do not require a specific referencing format, as long as you are consistent and your references allow us to find the source, should we need to while we are marking your assignment. Marking This assignment will be marked out of 10, and it will contribute 10% of your final course mark. Note that not all questions might be equally difficult. For some questions there is no single right or wrong answer. Marks will be awarded based on your description, reasoning, and explanations, as well as clarity and correctness of writing. We will endeavour to release your marks and feedback within two teaching weeks after the submission deadline. If you feel we have made an error in marking, you have two weeks following the release of marks to raise any issues with the course convener, after which time your mark will be considered final. If you request that we re-mark your assignment, we will re-mark the entire assignment and your mark may go up or down as a result. Assignment Tasks On Wattle in week 6, under the Assignment 4 specification (this doucment), you will find links to seven scientific publications. These are papers from our group working in data wrangling here at the ANU. All these papers have been published at different Pacific-Asia Conferences on Knowledge Discovery and Data Mining (PAKDD) in the past few years. We selected these papers because we are very familiar with the topics and content of these papers; all these papers are on different topics related to record linkage; and all these papers have the same length and format. The seven listed papers are: 1. Adaptive Temporal Entity Resolution on Dynamic Databases (Christen and Gayler, 2013) 2. Efficient Interactive Training Selection for Large-Scale Entity Resolution (Wang, Vatsalan, and Christen, 2015) 3. Improving Temporal Record Linkage Using Regression Classification (Hu, Wang, Vatsalan, and Christen, 2017) 4. Pattern-Mining Based Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage (Christen, Vidanage, Ranbaduge, and Schnell, 2018) 5. A Scalable and Efficient Subgroup Blocking Scheme for Multidatabase Record Linkage (Ranbaduge, Vatsalan, and Christen, 2018) 6. Robust Temporal Graph Clustering for Group Record Linkage (Nanayakkara, Christen, and Ranbaduge, 2019) 7. Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage (Ranbaduge, Christen, and Schnell, 2020) For this assignment, you must select one of these seven papers, and address the following five questions and provide answers in the corresponding answer fields in Wattle (under the Assignment 4 Submission link in week 12). • Task 1: Paper topic and the research question addressed (2 marks): Describe in your own words the topic of what the paper covers, and the research question(s) the paper aims to address. Write a maximum of 250 words (around 10 lines). • Task 2: Proposed methods (2 marks): Describe in your own words the method / approach proposed by this paper. How does this method work? What are the building blocks / components / techniques used by the proposed method? Again write a maximum of 250 words (around 10 lines). • Task 3: Data set(s) and evaluation used (2 marks): Describe in your own words the data set(s) used by the paper and how the proposed method was evaluated. What measures were used, and what aspects of the proposed method were evaluated (runtime, scalability, quality, accuracy, privacy, etc.)? Write a maximum of 250 words (around 10 lines). • Task 4: Critiques and shortcomings (2 marks): In your own words, describe any criticism you have of this paper. This can include, but is not limited to, unclear or even wrong description of the method, inappropriate or incomplete evaluation using not the right evaluation measures or not suitable data sets, unclear or inappropriate conclusions, and so on. Again write a maximum of 250 words (around 10 lines). • Task 5: Paper summary (2 marks): Finally, summarise the paper in your own words using a maximum of 250 words (around 10 lines). Briefly describe the method the paper proposes, how this method is assessed or evaluated, and what the main findings are. Other Aspects You do not need to include into the answers a reference to the paper you selected. For all answers in this assignment, English writing mistakes and typographical errors will attract small penalties.
欢迎咨询51作业君