Fall 2023 Machine Learning Project
Machine Learning (CS-GY 6923) Exact Date TBD
No late assignments accepted Objective of the Project
In this project, you will improve the performance of machine learning algorithms that you have previously implemented in class. Your will identify and implement three distinct improvements to these algorithms. These improvements can be applied to the same algorithm or different ones.
Working as a Team:
This project can be completed individually or with a partner. If working in a team, hand in JUST ONE project for the two of you, with both of your names on it.
The goal of this project is for you to explore how you could improve your imple- mentations of machine learning algorithms.
Project Guidelines
You will make three extensions to one or more of the algorithms you implemented in a homework assignment.
Here is how to make one extension (the other two extensions are the same):1
1. Select a machine learning algorithm you implemented in a homework assignment that can be extended to improve the accuracy on a dataset: mentioned in class, on a homework assignment, or is on the list given by the CA on EdStem. 2
1You might need to try many different combinations of: algorithm×extension×dataset until you find one where the extension improves the accuracy. Do not start step 2 until you have found an extension that improves the accuracy on a dataset.
2How do you know it will improve the accuracy? I don’t want you to code it up and then find out your imple- mentation didn’t improve the accuracy on one of the datasets you were allowed to use. So instead - you will have one more step in the process: learn how to use Scikit-learn’s implementation of this algorithm (make sure you set the parameters so Scikit-learn’s implementation is close to your homework assignment’s implementation).
Find an extension of the basic algorithm that Scikit-learn implements that improves accuracy on one of dataset. In your write-up include the accuracy provided by the Scikit-learn implementation of the extension your implemented.
2. Implement this extension by adding code to your homework assignment implementation.34
3. Select a new dataset of your choice and show this data set’s accuracy using your homework
assignment implementation, and with the new extension you just added
Details
1. If Scikit-learn does not support your algorithm, you may consider other libraries (example: PyTorch, Tensorflow).
2. When implementing the upgraded algorithm, be sure to:
• not use any external libraries aside from Numpy.
• do not copy the existing open-source implementations offered by existing machine learn- ing frameworks
3. You will turn in your (working) python notebook and a written report (at most 2 pages for each improvement). In addition to reporting the accuracy, your report should give the rational on why your addition improves our existing algorithm, both from a high level, and then in more detail. In your report, make sure you have a chart to show the accuracy obtained for the different datasets and the different algorithms/implementations
Your grade will depend on:
• the complexity of your extension
• how much you improved in accuracy on a dataset from our existing algorithm
• how well you explained the reason behind your improvement, and your understanding of why your code works
3You will write the additonal code you add to your existing assignment from scratch.
4Note: if Scikit-learn’s implementation with the extension showed an improvement in accuracy, but your imple- mentation didn’t show the same or any improvement with the same extension, that is ok. Please comment on why you thought this was the case.