程序代写案例-COGS118A

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Final project for COGS118A
Fall 2020

Due on December 15, 2020 11:59PM CA time
No late submission is possible without extremely unusual circumstances.

Read this paper first: https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf

Your project will be to replicate a part of the analysis done in the paper by Caruana &
Niculescu-Mizil (hereafter referred to as CNM06).

You will write a report with >1,000 words (excluding references & appendices). The main
sections are: a) abstract, b) introduction, c) method, d) experiment, e) conclusion, and f)
references.

You can follow the paper format from leading machine learning journals such as Journal of
Machine Learning Research (https://www.jmlr.org/format/authors-guide.html) or IEEE Trans. on
Pattern Analysis and Machine Intelligence (http://www.computer.org/web/tpami), or leading
conferences like NeurIPS (the conference formerly known as NIPS; https://papers.nips.cc/) and
ICML (https://icml.cc/Conferences/2020/StyleAuthorInstructions).
Experiments & results:
From CNM06, pick 3 of the datasets (available from UCI machine learning repository) and pick 3
of the algorithms (different kernels of SVM are not 2 different classifiers, pick truly different
ones). For each classifier and dataset combo, you will need to do 3 trials (CNM06 does 5; we
will make it easier for you). That’s 3x3x3 = 27 total trials.

Each trial you will follow the procedure laid out in CNM06: randomly choose 5000 data samples
for 5 fold cross-validation to select the hyperparameters via a gridsearch. In CNM06 section 2.1
they lay out the hyperparameter values they used in their search for each algorithm; use those
for your search too.

Each algorithm has a different number of hyperparameter settings to try, so the exact number of
train/validate cycles you do will depend on the algorithm you choose. For example CNM06 say
about Logistic Regression: “train both unregularized and regularized models, varying the
ridge (regularization) parameter by factors of 10 from 10^-8 to 10^4.” That would be 14 total
settings (including one where regularization = 0) to try, yielding 14 hyperparameter settings * 5
folds 378 total train/validate cycles just for a single trial of Logistic Regression. At the end of
those 378 train/validate cycles, you will select the hyperparameters settings that did best for the
mean over all 5 folds of that setting. Then you will train the model one more time on all 5000
training data samples, and measure model performance on the test set (all the data in the
dataset other than the 5000 random samples).

Model performance will be measured by a single performance metric; Here again we make it
easier for you than CNM06’s 8 different performance metrics. Let’s default to accuracy, but if
you think that a particular dataset would benefit from a different performance metric or metrics
that is your call. The obvious case would be a dataset with very imbalanced classes which
would require something like F1 or AUC to better understand performance.

Your main results will be something similar to tables 2 & 3 in CNM06:
- a table of mean (across 3 trials) test set performance for each algorithm/dataset combo1.
This table should be annotated with uncorrected 2 sample t-tests to compare across
algorithms. The difference between your table and CNM06 is that you are using only a
single performance metric; therefore you do not have to normalize & calibrate your
performance metrics to compare across them as in CNM06.2
- a table of mean (across 3 trials x 3 datasets) test set performance for each algorithm.
This table should be annotated with uncorrected 2 sample t-tests to compare across
algorithms. The difference between your table and CNM06 is that you are using only a
single performance metric; therefore you do not have to normalize & calibrate your
performance metrics to compare across them as in CNM06.3

Secondary results you should report
- A main matter table showing mean training set performance for the optimal
hyperparameters on each of the dataset/algorithm combos and a discussion of the
difference between each algorithms’ training and test set performance
- An appendix table with raw test set scores, not just the mean scores of the table
- An appendix table with the p-values of the comparisons across algorithms in the different
main matter tables

Secondary results you may wish to report (extra credit land):
- An analysis of the time complexity of the algorithms you tried
- A learning curve per algorithm/dataset combo: comparing test set performance for the
best hyperparameter choice as you vary the number of training samples or a given
dataset
- A heatmap-style plot of the validation performance vs hyperparameter setting for your
algorithms

1 Obviously standard deviation is almost laughable at 3 trials per setup; but it's the thought that counts I
guess?
2 This could change if you wanted to try multiple error metrics. Please discuss this with me if you’re
headed this way.
3 This could change if you wanted to try multiple error metrics. Please discuss this with me if you’re
headed this way.
Grading and extra credit:
You will turn in your report and code in a single PDF. The code is in an appendix. You should be
prepared to provide your code in a working format (.ipynb, .py, .cpp, whatever) to the grader
upon request via email.

The project is marked out of 100 points:
- 50 are based on the technical aspects of what you implement being done correctly
- 30 points for the write up (clarity & correctness of each section; we will only be marking
down for English mistakes if they rise to the level of making it hard to understand)
- 10 points for the code quality and legibility/commenting
- 10 points for hardness of the undertaking, aesthetics, and other quality issues.

If you feel that your work deserves bonus points due to reasons such as:
- novel ideas and algorithms or state-of-the-art results,
- large efforts in your own data collection/preparation
- doing more than the 27 trials/1 metric required for the main analysis
- doing lots of secondary results
then please create a "Bonus Points" section to describe why you deserve bonus points. I’m not
setting an upper limit on bonus points, but FYI if someone did a couple of extra
algorithms/metrics/secondary analyses well I would probably grant 5-10 points of extra credit.
Implementation and advice:
You may use what you’d like, but I highly recommend scikit-learn as it implements all these
algorithms plus many important helper functions for cross-validation, scoring metrics, etc.

As you decide which algorithms to use, do some time testing on a single train/validate cycle. If
it takes 5 minutes per train/validate cycle and there’s > 1,000 cycles to do to get all the
hyperparameters done… well you get the problem that 3.5 days of computation for a single
algorithm/dataset combo could land you in.

Note that you might not get exactly the same algorithm ranking results as CNM06 did. The test
set performances and optimal parameters vary due to the particular ML libraries you are using
as well as random sampling of the data. But the overall differences should be reasonable and
interpretable and certain overall trends should be the same.

If you have a question or a problem seek advice from the instruction team ASAP!!! There’s not
much time left for going down the wrong path.

This project is based on previous projects in this course, but this quarter is different than other
ones as we deal with pandemic. If we need to adjust expectations as we go we will. But to do
that I need communication from you!

欢迎咨询51作业君