辅导案例-CSCM38

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
CSCM38
Advanced Topics: Artificial Intelligence and Cyber Security
by Dr. Jingjing Deng
Released on 2nd March 2020
The assignment consists of multiple tasks that are designed to be completed during the lab
sessions and signed off by either module instructor or teaching assistant. If you are not able
to complete the tasks during the lab sessions, then you should do them at home and have
them ready to be marked off in the lab session by the deadline. All lab tasks also must be
uploaded to Blackboard before the deadline stated on each assignment sheet.
If there is any report or dissertation, it must be written and submitted in PDF format. Source
codes must be organised and formatted neatly, sufficient and clear comments are very welcome
and necessary for markers to assess your work. Submissions and feedback will be done via
Blackboard-Tunitin system. Plagiarism will not be tolerated. Zip all your files with the following
naming convention for submission:
• [Student Number]-[Last Name][First Initial]-[Assignment][Number].zip
• For example: 123456-DengJ-Assignment1.zip
CSCM38 Assignment 2 Complete by 20/03/2020
Background: The dataset to be audited was provided which consists of a wide variety of
intrusions simulated in a military network environment. It created an environment to acquire
raw TCP/IP dump data for a network by simulating a typical US Air Force LAN. The LAN was
focused like a real environment and blasted with multiple attacks. A connection is a sequence of
TCP packets starting and ending at some time duration between which data flows to and from a
source IP address to a target IP address under some well-defined protocol. Also, each connection
is labelled as either normal or as an attack with exactly one specific attack type. Each connection
record consists of about 100 bytes. For each TCP/IP connection, 41 quantitative and qualitative
features are obtained from normal and attack data (3 qualitative and 38 quantitative features).
The class variable has two categories: Normal or Anomalous.1 A copy of the dataset can be
downloaded from Blackboard.
Task 2.1 – AI-Driven Network Intrusion Detection (15 marks)
This assignment is to construct an artificial intelligence model using machine learning approach
to detect the anomalous network flows. The dataset consists of a training and a testing data
files in CSV format, and the problem can be formulated as a supervised classification problem
given the network measurements (all columns except the last one) and the label of intrusion
1Anonymised reference for module assignment purposed.
1
(the last column). The following steps are required to be completed using Python 3 program-
ming language in Jupyter Notebook format. Machine learning package Scikit-Learn2, data
manipulation package Pandas3 and visualisation packages matplotlib4 and ggplot2 5 can be used
in this assignment.
• Load the Dataset (2 marks): Read the training and the testing data from two CSV
files into your programme. Hints: You can use either Python built-in CSV reader6 or
Pandas CSV reader7 to achieve this. Be aware of that some attributes are categorical
variables in String format, where you might need to convert them into numerical variables.
• Visualise the Features (3 marks): Selection a few attributes to visually assess the
differences between normal and anomalous categories. Hints: For the same attribute,
you can plot the frequencies or distributions of two different categories using bar chart or
histogram.
• Train a Machine Learning Model (6 marks): To predict the network flow, train
a machine learning model using training data in a supervised fashion. Hints: A few
algorithms can be used for prediction, such as nearest neighbours, linear discriminant
analysis, random forests, Naive bayes classifier and neural networks.8 Normally, you
need to pre-process the data before applying the learning step, which may includes data
normalisation, feature extraction and selection.
• Test the Model (4 marks): Evaluate your model on testing data and calculate the
prediction accuracy. Hints: For detection problem, you can quantitatively compute the
confusion matrix 9 given the prediction from your model and the label provided in the data.
In addition, you can plot the ROC (Receiver Operating Characteristic)10 to illustrates
the diagnostic ability of a binary classifier system as its discrimination threshold is varied.
2https://scikit-learn.org/stable/
3https://pandas.pydata.org/
4https://matplotlib.org/
5https://ggplot2.tidyverse.org/
6https://docs.python.org/3/library/csv.html
7https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
8https://scikit-learn.org/stable/user_guide.html
9https://en.wikipedia.org/wiki/Confusion_matrix
10https://en.wikipedia.org/wiki/Receiver_operating_characteristic
2
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468