代写辅导接单-FNCE 491 Python Programming with Applications in Finance Group Homework 8

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

FNCE 491 Python Programming with Applications in Finance Group Homework 8: Supervised Learning Due by 12/8/2023 5:00 PM 50 pts

Please read the lecture notes and practice the in-class examples before you do the homework. One submission per group includes programming code and output.

Write the scripts in Python code through Google co-lab, execute each step, keep the output box, save the code in a pdf file, and submit this file together with your Co-lab notebook share link on Canvas (in the “Print” setting, choose “Default” for “Margins”&select the “Headers and Footers” option, and you will find the link at the bottom of your pdf file). In the Python file, write down your group number and members using comment (#) in the first line (without this information, your group will lose one point).

Please use comments # (NOT the text block!) to tell the question number and explain your design if necessary. Also, you are welcome to submit more than one solution.

Please submit your answers on Canvas by the deadline. Late submission will not be accepted.

1. K-nearest neighbors classifier: please submit a pdf or word file or save in the text box in Google Co-lab for this question

Lisa has lost species information of one of the Iris flower. Can you help her make a better decision using a kNN-classifier base on Sepal width and Sepal length?

The Iris flower is missing species information: Species ----, Sepal width 3.5, Sepal length 5.1 Let us use K=3 nearest neighbors.

Fill in the blank cells in the table to calculate kNN. (No need to conduct Python coding)

Species

Width (cm)

Length (cm)

distance

Ranking number

Belongs to the neighborhood (Yes or No)

Setosa 3.0 4.9 Setosa 3.2 4.7 Versicolor 2.9 6.6 Versicolor 3.3 6.2 Virginica 2.5 5.1 Virginica 2.8 5.7

(3.5-3)^2+(5.1-4.9)^2=0.29 (3.5-3.2)^2+(5.1-4.7)^2=0.25

Count of the Setosa neighborhood members:

Count of Versicolor neighborhood members:

Count of Virginica neighborhood members:

Class based on the majority vote, species that gets most hits:

Here is a definition of kNN algorithm mentioned in Lecture 10

1). Determine K = number of nearest neighbors

2). Calculate the distance between test sample and all the training samples

3). Sort the distances* (shortest should get ranking number one) and determine nearest neighbors of the test sample

4). Gather the category classes of the nearest neighbors

5). Use the simple majority voting to predict the class of the test sample: the class that occurs the most frequently in the nearest neighbors wins.

*here, we use squared Euclidean distance to estimate the distances between the training samples and the test sample

2. Logistic Regression classifier – predicting stock price movement at intraday level

Use the intraday.csv file from Lecture 12. This time, try to use logistic regression classifier to predict the stock price movement. Please note that the stock price movement may be more correlated with recent price changes than distant changes, so we do not shuffle when split the data.

Step 1: split the data into training set and testing set, 70% and 30%, no shuffle Step 2: predict using training set

Step 3: test and evaluate the performance using both train and test set.

Step 4: Choose C = 0.1, 1, 100 and check the performance changes

3. Naïve Bayes classifier – predicting stock price movement at intraday level

Work on the intraday.csv data again. Assign integer number to the lagged stock returns and lagged market returns. For example, if lagged return>0, then lag_sign=2, if lagged return=0, then lag_sign=1, and if lagged return<0, then lag_sign=0.

Now use the Naïve Bayes classifier: use the lagged stock movement to predict future stock price movement. Same steps1-3 as in Question 2.

4. Naïve Bayes classifier – fraud detection for credit cards transaction

The datasets (creditcard.csv) contains transactions made by credit cards during two days in Sep. 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation* (see definition below). Unfortunately, due to confidentiality issues, the original features and more background information about the data are not available. Features V1, V2, .., V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount. Target 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

*Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. (from wikipedia)

Step 1: load the data. Learn the shape, first five rows, and headers of the data.

Step2: define X and y, then split the data into 80% train set and 20% test set.

Step3: Use Naïve Bayes Classifier to conduct classification and check the performance. Step4: also, try logistic regression classifier and compare the performance from step3.