Assignment 9: Tackling the Cold Start Problem
Due Thursday by 3am
Points 100
Submitting a website url
Start Assignment
PREREQUISITES: Review the lectures on Recommender Systems. Pay particular attention to Rule-
based, Collaborative Filtering, and Content-Based Filtering methods, as well as strategies to handle the
cold start problem.
REQUIRED PYTHON PACKAGES (in 'requirements.txt'):
pandas
numpy
scikit-learn
surprise
matplotlib
nltk (optional)
REQUIRED RESOURCES:
storage/u.* (MovieLens [100K] dataset)
The student-resources repository provides scripts to load and explore the dataset. Review the
accompanying storage/README.md.
OBJECTIVES: You will develop three types of recommendation systems: Rule-based, Collaborative
Filtering, and Content-Based Filtering. You will also analyze the cold start problem and propose
solutions. Review the code found in the modules/ directory. The modules included runners (if __name__=
"__main__") as examples to run each module. Review the provided datasets. For this assignment, update
your repository by adding a directory called 'moviemate' in your project root directory.
Task 1: Study and Partition the Dataset
In the module pipeline.py, define a Pipeline class with the following methods:
1. load_dataset(file_path):
Load the dataset using pandas and inspect its structure.
2. partition_data(ratings_df, partition_type=None):
Split the data into training and testing sets using user-stratified sampling and temporal sampling.
Create a notebook notebooks/data_exploration.ipynb to demonstrate:
11/22/24, 9:07 PMAssignment 9: Tackling the Cold Start Problem
https://jhu.instructure.com/courses/82966/assignments/906046?module_item_id=42470521/3
Loading the dataset.
Partitioning the data and visualizing rating distributions and timestamp distributions.
Task 2: Model Comparison
Implement the following in moviemate/modules/:
Rule-based Recommendation (modules/adaptive/filters/rule_based.py)
Recommend top-rated movies overall or by genre.
Collaborative Filtering (modules/adaptive/filters/collaborative_filtering.py)
Implement a User-User and Item-Item Collaborative Filtering model using the Surprise library.
Content-Based Filtering (modules/adaptive/filters/collaborative_filtering.py/content_based.py)
Use genres as item features.
Calculate similarities between movies using TF-IDF and cosine similarity.
Create a notebook notebooks/model_selection.ipynb to demonstrate:
Training each model on the training dataset.
Evaluating each model using metrics such as RMSE/MAE, and nDCG on a test set (See Task 1).
Visualizing performance comparisons and an explanation of the results.
Task 3: Tackle the Cold Start Problem
Create a notebook notebooks/cold_start_analysis.ipynb to simulate and analyze the cold start scenario.
The notebook must:
Simulate scenarios where a new user has rated few and/or no movies.
Explain and offer alternative strategies to mitigate cold starts.
Provide performance analysis and visualizations for each alternative of cold start scenarios.
SUBMISSION: You will need to check in the following files and any supporting python modules:
moviemate/modules/rule_based.py
moviemate/modules/collaborative_filtering.py
moviemate/modules/content_based.py
moviemate/modules/cold_start_analysis.py
moviemate/pipeline.py
moviemate/notebooks/data_exploration.ipynb
moviemate/notebooks/model_selection.ipynb
moviemate/notebooks/cold_start_analysis.ipynb
11/22/24, 9:07 PMAssignment 9: Tackling the Cold Start Problem
https://jhu.instructure.com/courses/82966/assignments/906046?module_item_id=42470522/3
Provide the GitHub URL link to your textwave/notebooks/cold_start_analysis.ipynb file via Canvas
to get credit for this submission.
11/22/24, 9:07 PMAssignment 9: Tackling the Cold Start Problem
https://jhu.instructure.com/courses/82966/assignments/906046?module_item_id=42470523/3