辅导案例-FIT2086-Assignment 1
FIT2086 Assignment 1
Due Date: Sunday, 30/08/2020, 11:55PM
1 Introduction
There are a total of five questions worth 4 + 6+ 5+ 7+ 8 = 30 marks in this assignment. Please note
that working and/or justification must be shown for all questions that require it.
This assignment is worth a total of 10% of your final mark, subject to hurdles and any other matters
(e.g., late penalties, special consideration, etc.) as specified in the FIT2086 Unit Guide or elsewhere
in the FIT2086 Moodle site (including Faculty of I.T. and Monash University policies).
Students are reminded of the Academic Integrity Awareness Training Tutorial Activity and, in par-
ticular, of Monash University’s policies on academic integrity. In submitting this assignment, you
acknowledge your awareness of Monash University’s policies on academic integrity and that work is
done and submitted in accordance with these policies.
Submission: No files are to be submitted via e-mail. Correct files are to be submitted to Moodle.
Scans of handwritten answers are acceptable but theymust be clean and legible. You must ensure your
submission contains answers to the questions in the order they appear in the assignment. Submission
must occur before 11:55 PM Sunday, 30th of August, and late submissions will incur penalties as per
Faculty of I.T. policies.
2 Questions
1. In Lecture 1 we learned about several different types of general data science techniques/applica-
tions: (i) risk prediction, (ii) recommendation systems, (iii) forecasting, (iv) anomaly detection,
(v) image recognition systems. For each of the following problems, suggest which of these appli-
cation types the problem belongs to and justify your selection:
(a) Discovering the hidden patterns in the behaviour of Netflix users? [1 mark]
(b) Estimating the number of people at the platform of a train station from CCTV footage? [1
mark]
(c) Predicting the number of new cases of COVID-19 over the coming month? [1 mark]
(d) By using the recent usage habits/actions of the users of a mobile app, predicting whether
or not these users will continue using the app in the near future? [1 mark]
2. An electro-cardiograph (ECG) is used for detecting the presence of a wide range of heart diseases
such as heart valve disease, heart arrhythmias, etc. Despite this, an ECG does not provide
1
H = 0 H = 1
T = 0 ? ?
T = 1 ? ?
Table 1: Empty table of the proportions of having (H = 1)/not having (H = 0) a heart attack given
that an individual had normal (T = 0)/abnormal (T = 1) ECG result six months prior.
evidence for the presence of all diseases that can cause a heart attack. Even if a individual’s
ECG test does not show any sign of heart disease there is still a chance that the individual will
suffer a heart attack at a later stage.
The file heart.disease.csv contains data on a number of patients. For each patient (row)
the file contains the results of their ECG, indicating whether the ECG was normal (T = 0) or
abnormal (T = 1), as well as whether the patient subsequently suffered a heart attack six months
after the test (H = 1) or not (H = 0). Using this data please answer the following questions;
you must provide working/justification.
(a) Using the data in heart.disease.csv, write R code to find the frequency with which a
individual had/did not have a heart attack given that their ECG results were normal/ab-
normal. Using these frequencies, fill in the entries of Table 1 with the proportions of the
times those events occured, i.e., estimates of the joint probabilities of a normal/abnormal
ECG followed by a heart attack/no heart attack in six months time. [2 marks]
(b) Using these proportions, calculate the marginal probability of having a heart-attack irre-
spective of whether ECG results were normal or abnormal, i.e., P(H = 1). [1 mark]
(c) What is the probability that a person will have a heart attack given that their ECG results
were abnormal? [1 mark]
(d) What is the probability that a person will have a heart attack given that their ECG results
were normal? [1 mark]
(e) Do you believe that the ECG test is a good predictor of possible heart attack? You must
justify your answer. [1 mark]
3. Imagine that we roll a fair six-sided die (i.e., all six sides have equal probability) and toss a fair
coin. Let X1 and Y1 be the independent random variables representing the outcomes of those
events respectively (the outcome of the coin is represented numerically as head: Y1 = 1, tail:
Y1 = 0.). Let S = X1 + 3Y1 be the sum of the outcome of the roll and three times the outcome
of the coin toss. Please answer the following questions with appropriate working/justification.
(a) What is the variance of S, i.e., what is V [S]? [1 mark]
(b) Determine the probability distribution of S, i.e., the probability that S = {1, . . . , 9}. [1
mark]
(c) What is the expected value of

S, i.e., what is E
[√
S
]
? [1 mark]
(d) Calculate the approximate value of E
[√
S
]
using the Taylor-series procedure discussed in
Lecture 2. [1 mark]
(e) Imagine that we roll a second fair six-sided die; call the outcome of this roll X2. What is
the expected value of (X1 + 3Y1 +X2)2, i.e., what is E
[
(X1 + 3Y1 +X2)2
]
? [1 mark]
2
4. Imagine that a continuous random variable X defined on the range (−s, s) follows the probability
density function
p(X = x | s) =
{
s− |x|
s2
for x ∈ (−s, s)
0 everywhere else
.
Answer the following questions; you must include working if appropriate.
(a) Plot the probability density function of X when s = 1 and s = 2 for x ∈ (−3, 3). [2 marks]
(b) Determine the expected value of X, i.e., E [X]. [1 mark]
(c) Determine the variance of X, i.e., V [X] (hint: it will be a function of s). [2 marks]
(d) Determine the cumulative distribution function for this distribution, i.e., P(X ≤ x). [1
mark]
(e) Determine the expected value of |X|, i.e., E [abs(X)]. [1 mark]
5. The file heights.hk.csv contains the records of the heights of 18 year old children based on
a survey that was conducted in 1993 in Hong Kong1. Imagine that during the same year, an
amateur football team in Hong Kong is planning to select players from their local school district
and the coach wants to determine how likely it is that they can find players of different heights
to fill various positions on the field. They decide to fit a normal distribution to this data and
use it to model the height of 18 year old people in the city.
(a) Fit a normal distribution to the height data using the maximum likelihood estimator for µ
and the unbiased estimator of variance for σ2. What are the values of these parameters for
this data? [2 marks]
(b) Plug these estimates µˆ and σˆ2 into the normal distribution, and use these to answer the
following questions posed by the coach:
i. Imagine that the coach groups the players according to following 4 height ranges:
(< 1.65m), (1.65m− 1.75m), (1.75m− 1.85m), (> 1.85m).
What are the proportions of candidate players that would fall into each of these height
ranges as predicted by your Gaussian model? [2 marks]
ii. If a new candidate player registers for the selection trial, which height range are they
most likely to be in? [1 mark]
iii. A football team consists of 18 players (11 on the field and 7 on the bench). The coach
would like to have at least 4 players in their team that are taller than 1.80m for use in
defence and attack. Assuming that the height of the players and their chances of being
selected to the team are independent, what is the probability that the coach will have
at least 4 players taller than 1.80m available in their team? [1 mark]
(c) Is the normal distribution an appropriate model for this height data? Plot the observed
probabilities of the different heights (using a histogram) against the probability density
predicted by your Gaussian (normal distribution) model. Justify whether or not you believe
the distribution appears to be a good fit to this data. [2 marks]
1Data source: Maternal and Child Health Centres (MCHC)
3
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie