COSC 051 Page 1 of 4 Overview of Deliverables 1. Design 2. Implement DataSets Class and subclasses 3. Implement ClassifierAlgorithms Class, simplekNNClassifier subclass and Experiment Class 4. Implement a. ROC method for Experiment Class, b. decisionTreeClassifier 5. Advanced Algorithms and final package Software Design Requirements Overview The toolbox will be implemented using OOP practices and will take advantage of inheritance and polymorphism. Specifically, the toolbox will consist of 3 main classes some of which have subclasses and member methods as noted below. You will also submit a demo script for each submission that tests the capabilities of your newly created toolbox. 1. Class Hierarchy a. DataSet i. TimeSeriesDataSet ii. TextDataSet iii. QuantDataSet iv. QualQuanDataSet COSC 051 Page 2 of 4 b. ClassifierAlgorithm i. simplekNNClassifier c. Experiment 2. Member Methods for each Super Class (subclasses will have more specified members as well) Classifier i. init ii. Load() iii. train(…) iv. test(…) b. Experiment i. runCrossVal(…) ii. score(…) iii. confusionMatrix(…) 3. Demo Details for Deliverable #3. Classifer class and simplekNNClassifier subclass. There are 4 main components for your deliverable this week. Focus on not only correctness, but also focus on efficiency of your algorithmic solution. Consider both time and space efficiency. 1. Using Python, you will implement the Classifier Class (as an ABC … note this was largely completed in previous deliverable) , simplekNNClassifier subclass, and Experiment Class. Implement methods train and test for the Classifier class and subclass. Implement runCrossVal, score, and confusionMatrix for the Experiment Class. You will also need to have member attributes to help store data and possibly state information. a. simplekNNClassifier i. The train method for simplekNNClassifier will have input parameters trainingData and true labels. It will simply store the data and labels member attributes (simple!!). ii. The test method will have parameters testData and k, and will find the k closest training samples and return the mode of the k labels associated from the k closest training samples. The predicted labels will be stored in a member attribute and will also be returned. b. Experiment i. The Experiment Constructor will take as input one data set , labels, and a list of classifiers. Each will be stored in a member attribute. 1. Assume that each classifier will be applied to the data set, and if they are not applicable, one of the intrinsic exceptions to the classifier will be thrown. COSC 051 Page 3 of 4 ii. crossValidation method will take as input kFolds. This method will perform k-fold crossvalidation, and for each fold will train all classifiers (on the training folds), and test all classifiers on the testing folds. 1. Predicted labels will be stored in a matrix numSamples x numClassifiers iii. The score method will compute the accuracy of each classifier and present the result as a table. iv. The confusionMatrix method will compute and display a confusion matrix for each classifier. 2. Perform formal computational Complexity Analysis on the following methods. Include a space count S(n) and step count T(n) function (where n is the size of the input) as well as a tight-fit upperbound using Big-O notation. Assume worst case and justify your analyses. a. simpleKNNClassifier test method b. Experiment score method c. Experiment confusion matrix method 3. Using python you will implement a demo script that tests the functionality of your code. You will test to the full functionality of the new code submitted for this deliverable. You will test all constructors and methods. a. Your demo test script must run without error. Be sure to include all data files and supporting files in the zip submission. Also be sure that all paths are relative and will work from the zip folder (once unzipped) as a base directory. 4. Using Doxygen (or another UML-like documentation tool), UPDATE your documentation which illustratively describes the class hierarchy, member attributes, and member methods. The description should include structural and functional details. Constraints All coding must be done by you – this means you must implement knn, score, and ROC “from scratch”. For example, you must compute the distances used in knn from scratch (you cannot simply call dist() or math.dist(), instead you must code this yourself). You may not import any libraries / modules EXCEPT for those listed below, and you cannot repurpose any other code for this submission. The libraries below may be used to help with loading the data and visualizing the data only. If you wish to use another library, please ask for approval. Approved libraries to aid with reading in data, use of data structures, io, and visualizing data: csv nltk numpy matplotlib.pyplot os wordcloud COSC 051 Page 4 of 4 Submission Details Upload (as instructed by your professor) a zip folder containing ALL files (.py, .pdf, and/or .html files). Use the following folder name:
P3. For example, I would create a folder named jeremyBoltonP3 which contained all files. I would then zip this folder creating file jeremyBoltonP3.zip . I would then submit this zip file. Late submissions will be penalized heavily. If you are late you may turn in the project to receive feedback but the grade may be zero. In general, requests for extensions will not be considered. 欢迎咨询51作业君