Assignment 6: Integrating the Extraction and Retrieval
Service
Due Oct 20 by 3:30am
Points 100
Submitting a website url
New Attempt
PREREQUISITES: Review the Visual Search lectures. Pay special attention to the modules pertaining to
the Rectification and Interface Service.
REQUIRED PYTHON PACKAGES (in 'requirements.txt'):
torch
torchvision
facenet-pytorch
PIL
faiss-cpu
REQUIRED RESOURCES:
mutli_image_identities.tar (https://jhu.instructure.com/courses/82952/files/12141248?wrap=1)
(https://jhu.instructure.com/courses/82952/files/12141248/download?download_frd=1)
The dataset we will use for this case study 'multi_image_identities.tar'. The probe directory contains the
names and images of probes you can use to test and analyze your system. The 'gallery' directory
contains the names and images of known identities (personnel) of IronClad. Place the 'gallery'
directory under a directory called 'storage/'. Place the 'probe' directory under'simclr_resources/'. Do
not check these image files in your repository!
OBJECTIVES: You will be tasked with integrating the extraction (https://github.com/creating-ai-
enabled-systems-fall-2024/student-resources/tree/main/ironclad/modules/extraction) and retrieval
(https://github.com/creating-ai-enabled-systems-fall-2024/student-
resources/tree/main/ironclad/modules/retrieval) service introduced in the lectures. Review the code found
in the modules/ directory. The modules included runners (if __name__= "__main__") as examples to run
each module. For this assignment, update your repository by adding a directory called 'ironclad' in your
project root directory.
The extraction/embedding.py module borrows the implementation of facenet-pytorch
(https://github.com/timesler/facenet-pytorch/tree/master?tab=readme-ov-file#complete-detection-and-
recognition-pipeline) . You can configure this class by specifying two class arguments:
2024/10/24 21:18Assignment 6: Integrating the Extraction and Retrieval Service
https://jhu.instructure.com/courses/82966/assignments/889125?return_to=https%3A%2F%2Fjhu.instructure.com%2Fcalendar%23view_name%3Dmo...1/3
The pretrained argument allows the user to choose either 'casia-webface' or 'vggface2' as the pre-
trained model source.
The device can be specified during instantiation, making the class flexible for different hardware
setups.
To complete this assignment, you must install the python packages in requirements.txt
Task 1: Compute Embeddings
In the pipeline.py module, define a Pipeline class:
1. a method called __encode(image) which will extract the embedding vector output from a single
preprocessed image. We will use this function to embed probes at test time.
Task 2: Index the Embeddings
In the pipeline.py module, write the following class methods:
1. a method called __precompute(gallery_directory) which will extract embeddings from ALL images
stored in 'storage/gallery' (calling __encode(image)) and store in a FAISS database.
2. a method called __save_embeddings() which will store the embeddings in a FAISS's serialized binary
format (See FiassIndex.save() method). Write this binary file in 'storage/catalog'.
3. a method called search_gallery(probe, k) that returns the k-nearest-neighbors of a probe. It should
return a list of k individuals' names, the source image filename, and the vector embedding.
Task 3: Analyze the pre-trained models (i.e., "vggface2" and "casia-webface")
1. In a notebook called notebooks/embedding_analysis.ipynb, analyze the performance of the two models
'casia-webface' or 'vggface2'. Provide a discussion of the following:
Describe the general prediction model performance.
Describe the performance impacts of various noise transformations and varying degrees of
severity pertinent to the case.
For each result, discuss how your findings will impact your system design.
SUBMISSION: You will need to check in the following files and any supporting python modules:
ironclad/extraction/embedding.py
ironclad/extraction/processing.py
ironclad/retrieval/index.py
ironclad/retrieval/search.py
ironclad/retrieval/pipeline.py
ironclad/notebooks/embedding_analysis.ipynb
2024/10/24 21:18Assignment 6: Integrating the Extraction and Retrieval Service
https://jhu.instructure.com/courses/82966/assignments/889125?return_to=https%3A%2F%2Fjhu.instructure.com%2Fcalendar%23view_name%3Dmo...2/3
Total Points: 100
Assignment 6: Integrating the Extraction and Retrieval Service
CriteriaRatingsPts
25 pts
25 pts
50 pts
Provide the GitHub URL link to your ironclad/notebooks/embedding_analysis.ipynb file via
Canvas to get credit for this submission.
Task 1
Task 2
Task 3
Evaluated based on the thoroughness and complexity of the analysis.
50 pts for the "highest quality";
40 pts for "good quality";
30 pts for "average quality";
20 pts for "minimum effort";
10 pts for "low quality";
2024/10/24 21:18Assignment 6: Integrating the Extraction and Retrieval Service
https://jhu.instructure.com/courses/82966/assignments/889125?return_to=https%3A%2F%2Fjhu.instructure.com%2Fcalendar%23view_name%3Dmo...3/3