CS418 Multimedia Technologies and Applications Course Project 1. Objectives The objectives of this course project are for students to have some hands-on experiences of multimedia programming and to develop an image retrieval application. This course project is interesting because we learn how to find a particular object (football in our case) from a set of images. You are given an image retrieval program written using C++/OpenCV, and are asked to extend it to provide additional features. This project involves first extracting different features from the input image, and then improving the image/object matching performances through different ways of combining the extracted features. 2. Requirements of the Course Project This course project can be carried out as an individual or group project. The maximum number of members in each group is 3. However, we expect more work and better results from a group with more people, and the responsibility of each group member should be clearly indicated in the report. You are given an OpenCV-based demo program. The package includes two image datasets (dataset1 and dataset2). dataset1 contains a lot of images, some of which contain footballs in them, for the Image Retrieval task. dataset2 contains only football images for the Object Detection task. Using the given matching methods in the demo program, you can only correctly retrieve a few matched images that contain the desired object (i.e., football in this case) or locate some of the desired objects. You are asked to improve the retrieval performance of this program by adding more feature extractors. There are two levels of requirements for the project, basic and advanced, to cater for students of different backgrounds and interests. The basic requirements are designed for all the students to practice some multimedia programming skills. The advanced requirements are for those students who would like to go further to create an application, and are more flexible in terms of what you would like to do. The basic requirements and advanced requirements account for 80% and 25%, respectively, of the grade for this project. The total final mark will be bounded by 100%. 2.1 Basic Requirements (80%) Students are required to finish the following two tasks in the basic requirements: Task 1: Image Retrieval Find the images containing footballs (i.e., images 990.jpg, 991.jpg, …, 999.jpg) from dataset1. Each time, you pick one of these football images (i.e., images 990.jpg, 991.jpg, …, 999.jpg) as the input image to the program. The program will return n images. (n is set to 10 by default, but you may change it.) As there are a total of 10 football images in dataset1, the final retrieval performance is computed as the average of the 10 retrieval results.
1 - Improvement on the Precision (20%) The target of this requirement is to achieve an average of 60% retrieval precision. This means that given an input football image, the program will return some matched images from dataset1. Among these returned images, at least 60% of them contain footballs. (30% precision gets 5% of marks, 60% precision gets 20% of marks, etc.) - Improvement on the Recall (20%) The target of this requirement is to be able to retrieve an average of 60% of all the images in dataset1 containing footballs. (30% recall gets 5% of marks, 60% recall gets 20% of marks, etc.) Task 2: Object Detection Detect and locate the football in each image in dataset2. Use the given football image (filename: football.png) as input and generate bounding boxes to indicate the locations of the football in the images, as shown in the demo program. - Improvement on Top 10 Detection Accuracy (20%) Top 10 accuracy refers to the situation that one of the top 10 detected bounding boxes should be a correct match with the ground truth bounding box based on the intersection over union (IoU) metric. IoU is the intersected region of two bounding boxes divided by the union of the two bounding boxes. (For these two bounding boxes, one is the ground truth bounding box provided by us and the other is detected by your program.) Here, for each retrieved image, if the best IoU among the top 10 returned bounding boxes is more than 0.1, we consider this image as a correct detection. The final accuracy is defined by how many retrieved images that are considered as correct detection. To be exact, if all 10 images are considered as correct detection, your algorithm has 100% accuracy. See: http://www.mathworks.com/help/vision/ref/bboxoverlapratio.html for more information. Note: You should use the same setting to test all the images in dataset2 and report your accuracy. (Evaluation code has already been included in the demo program. 40% accuracy gets 5% of marks, 70% accuracy gets 20% of the marks, etc.) - Improvement on IoU (20%) This is to try and improve the localization accuracy measured by IoU. The higher the IoU that you get, the higher the mark that you will receive. The final IoU score is computed by averaging the IoU obtained from each of the images in dataset2. 10% accuracy improvement gets 5% of marks, 20% improvement gets 10% of marks, 30% or above gets 20% marks. images retrieved ofnumber football) containing AND images (retrieved ofnumber Precision = footballs containingdataset in the images ofnumber football) containing AND images (retrieved ofnumber Recall = 2 2.2 Advanced Requirements (25%) Students are expected to extend the program into an application. The extension can be done along two directions: technical improvement and/or UI design. The technical improvement may include speeding up the retrieval time and advancing the retrieval performance with new techniques (such as using machine learning methods, high dimensional data indexing techniques, efficient searching of sub-regions of each image instead of using sliding window, or a crawler to obtain images from the internet). A UI may include real-time display of the regions of each image being compared and their scores, or allowing users to select different objects to be retrieved from the database. 3. Grading The course work component contributes 40% of the final course mark/grade. Attendance will contributes to 5%. For the remaining 35%, I will select one of the following distributions for your project that will maximize your coursework mark: • 15% for course project, 20% for quiz • 17.5% for course project, 17.5% for quiz • 20% for course project, 15% for quiz Note that we will use a PC with the following configurations to grade the course projects: • Windows with Visual Studio 2017 • OpenCV 2.4.13 Unfortunately, we do not have a Mac to grade the course projects. I understand that SCM students may not have a PC for the course project. I have asked cslab to install the above tools in all the PCs in room MMW2410 in the cslab. So, you may use those PCs for your course project, if you like. 4. Submission Details Due date: November 10, 2019 Each group needs to submit the following items in a CD or a USB, together with a hardcopy report summarizing the work (see /Report below): /Program: (1) A source subdirectory containing all the source files and the necessary files. (2) A binary subdirectory containing the executable file of the program and relevant files, including image files or libraries. The executable file should output the retrieved results (e.g., the list of retrieved images), precision, recall values and IoU (in the Object Detection task). Note that it is important to make sure that we only need to click on the executable file to run the program. You will need to try the executable file on a different machine before you submit the work. We will not be able to give you marks if we fail to run your executable file. (3) A readme file with instructions on how to compile and execute the program. 3 /Demo: A demo video that guides the marker through the main contributions of the work. This video should be captured while you are running the program, so that we can see the inputs and the outputs. /Report: The purpose of this report is just to indicate the main contributions of the work. We will not be marking on the report itself. Instead, the report should show us what have been done so that we may grade the work appropriately. Hence, there is no need to submit a large report. It can just be a few pages providing the following information: (1) A cover that indicates your name(s) and student ID(s) (2) A brief description of the final program, including the main modules and the relationship of these modules. (The description may be in the form of short paragraphs or a flow diagram.) (3) A list of features added to the original program, including the names of the modified modules (in reference to point (2) above), brief explanations, and screen captures of the results. (4) Listing of your entire program output. The demo has been rewritten to output some required information. You should report these information, including the precision, recall for each football image and the average value in the Image Retrieval task, and the IoU of the top 10 detected bounding boxes for each image in the Object Detection task. You may organize these results into several tables if you prefer. (5) Responsibilities of each group member (if applicable), including ▪ The programmer of each added function ▪ The author of each major section of the report ▪ The person who has done the survey, group coordination, etc. Note that your submission must contain the above items. Marks may be deducted if any is missing. There is no need to submit the image database. 4