辅导案例-CS418
CS418 Multimedia Technologies and Applications
Course Project
1. Objectives
The objectives of this course project are for students to have some hands-on experiences of
multimedia programming and to develop an image retrieval application. This course project is
interesting because we learn how to find a particular object (football in our case) from a set of images.
You are given an image retrieval program written using C++/OpenCV, and are asked to extend it to
provide additional features. This project involves first extracting different features from the input
image, and then improving the image/object matching performances through different ways of
combining the extracted features.
2. Requirements of the Course Project
This course project can be carried out as an individual or group project. The maximum number of
members in each group is 3. However, we expect more work and better results from a group with
more people, and the responsibility of each group member should be clearly indicated in the report.
You are given an OpenCV-based demo program. The package includes two image datasets (dataset1
and dataset2). dataset1 contains a lot of images, some of which contain footballs in them, for the
Image Retrieval task. dataset2 contains only football images for the Object Detection task.
Using the given matching methods in the demo program, you can only correctly retrieve a few
matched images that contain the desired object (i.e., football in this case) or locate some of the desired
objects. You are asked to improve the retrieval performance of this program by adding more feature
extractors.
There are two levels of requirements for the project, basic and advanced, to cater for students of
different backgrounds and interests. The basic requirements are designed for all the students to
practice some multimedia programming skills. The advanced requirements are for those students who
would like to go further to create an application, and are more flexible in terms of what you would
like to do. The basic requirements and advanced requirements account for 80% and 25%, respectively,
of the grade for this project. The total final mark will be bounded by 100%.
2.1 Basic Requirements (80%)
Students are required to finish the following two tasks in the basic requirements:
Task 1: Image Retrieval
Find the images containing footballs (i.e., images 990.jpg, 991.jpg, …, 999.jpg) from dataset1. Each
time, you pick one of these football images (i.e., images 990.jpg, 991.jpg, …, 999.jpg) as the input
image to the program. The program will return n images. (n is set to 10 by default, but you may
change it.) As there are a total of 10 football images in dataset1, the final retrieval performance is
computed as the average of the 10 retrieval results. 

1
- Improvement on the Precision (20%)
The target of this requirement is to achieve an average of 60% retrieval precision. This means that
given an input football image, the program will return some matched images from dataset1.
Among these returned images, at least 60% of them contain footballs. (30% precision gets 5% of
marks, 60% precision gets 20% of marks, etc.)

- Improvement on the Recall (20%)
The target of this requirement is to be able to retrieve an average of 60% of all the images in
dataset1 containing footballs. (30% recall gets 5% of marks, 60% recall gets 20% of marks, etc.)

Task 2: Object Detection
Detect and locate the football in each image in dataset2. Use the given football image (filename:
football.png) as input and generate bounding boxes to indicate the locations of the football in the
images, as shown in the demo program.
- Improvement on Top 10 Detection Accuracy (20%)
Top 10 accuracy refers to the situation that one of the top 10 detected bounding boxes should be a
correct match with the ground truth bounding box based on the intersection over union (IoU) metric.
IoU is the intersected region of two bounding boxes divided by the union of the two bounding boxes.
(For these two bounding boxes, one is the ground truth bounding box provided by us and the other is
detected by your program.) Here, for each retrieved image, if the best IoU among the top 10 returned
bounding boxes is more than 0.1, we consider this image as a correct detection. The final accuracy is
defined by how many retrieved images that are considered as correct detection. To be exact, if all 10
images are considered as correct detection, your algorithm has 100% accuracy. See:
http://www.mathworks.com/help/vision/ref/bboxoverlapratio.html for more information.
Note: You should use the same setting to test all the images in dataset2 and report your accuracy.
(Evaluation code has already been included in the demo program. 40% accuracy gets 5% of marks,
70% accuracy gets 20% of the marks, etc.)
- Improvement on IoU (20%)
This is to try and improve the localization accuracy measured by IoU. The higher the IoU that you
get, the higher the mark that you will receive. The final IoU score is computed by averaging the IoU
obtained from each of the images in dataset2. 10% accuracy improvement gets 5% of marks, 20%
improvement gets 10% of marks, 30% or above gets 20% marks.
images retrieved ofnumber
football) containing AND images (retrieved ofnumber Precision =
footballs containingdataset in the images ofnumber
football) containing AND images (retrieved ofnumber Recall =
2
2.2 Advanced Requirements (25%)
Students are expected to extend the program into an application. The extension can be done along two
directions: technical improvement and/or UI design. The technical improvement may include
speeding up the retrieval time and advancing the retrieval performance with new techniques (such as
using machine learning methods, high dimensional data indexing techniques, efficient searching of
sub-regions of each image instead of using sliding window, or a crawler to obtain images from the
internet). A UI may include real-time display of the regions of each image being compared and their
scores, or allowing users to select different objects to be retrieved from the database.
3. Grading
The course work component contributes 40% of the final course mark/grade. Attendance will
contributes to 5%. For the remaining 35%, I will select one of the following distributions for your
project that will maximize your coursework mark:
• 15% for course project, 20% for quiz
• 17.5% for course project, 17.5% for quiz
• 20% for course project, 15% for quiz
Note that we will use a PC with the following configurations to grade the course projects:
• Windows with Visual Studio 2017
• OpenCV 2.4.13
Unfortunately, we do not have a Mac to grade the course projects. I understand that SCM
students may not have a PC for the course project. I have asked cslab to install the above tools
in all the PCs in room MMW2410 in the cslab. So, you may use those PCs for your course
project, if you like.
4. Submission Details
Due date: November 10, 2019
Each group needs to submit the following items in a CD or a USB, together with a hardcopy report
summarizing the work (see /Report below):
/Program:
(1) A source subdirectory containing all the source files and the necessary files.
(2) A binary subdirectory containing the executable file of the program and relevant files,
including image files or libraries. The executable file should output the retrieved results
(e.g., the list of retrieved images), precision, recall values and IoU (in the Object Detection
task). Note that it is important to make sure that we only need to click on the executable file
to run the program. You will need to try the executable file on a different machine before
you submit the work. We will not be able to give you marks if we fail to run your
executable file.
(3) A readme file with instructions on how to compile and execute the program.
3
/Demo:
A demo video that guides the marker through the main contributions of the work. This video
should be captured while you are running the program, so that we can see the inputs and the
outputs.
/Report:
The purpose of this report is just to indicate the main contributions of the work. We will not be
marking on the report itself. Instead, the report should show us what have been done so that we
may grade the work appropriately. Hence, there is no need to submit a large report. It can just be a
few pages providing the following information:
(1) A cover that indicates your name(s) and student ID(s)
(2) A brief description of the final program, including the main modules and the relationship of
these modules. (The description may be in the form of short paragraphs or a flow diagram.)
(3) A list of features added to the original program, including the names of the modified
modules (in reference to point (2) above), brief explanations, and screen captures of the
results.
(4) Listing of your entire program output. The demo has been rewritten to output some required
information. You should report these information, including the precision, recall for each
football image and the average value in the Image Retrieval task, and the IoU of the top 10
detected bounding boxes for each image in the Object Detection task. You may organize
these results into several tables if you prefer.
(5) Responsibilities of each group member (if applicable), including
▪ The programmer of each added function
▪ The author of each major section of the report
▪ The person who has done the survey, group coordination, etc.
Note that your submission must contain the above items. Marks may be deducted if any is missing.
There is no need to submit the image database.
4
51作业君 51作业君

扫码添加客服微信

添加客服微信: IT_51zuoyejun