辅导案例-ECE2191-Assignment 1

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
ECE2191 Probability Models in Engineering
Assignment 1
Second semester 2020 Dr. Faezeh Marzbanrad
1 About the assignment
In this assignment, you will use your knowledge about probability concepts to extract some
necessary information from a given data set. The aim is to diagnose a heart condition in a group
of patients. This assignment is going to account for 15 percent of your total mark. Thus, please
pay attention to the following notes
• Some tasks of the assignment needs to be completed using Matlab only and the rest could
be completed either manually or by using Matlab
• You will need to submit your codes and a PDF file of your report which contains your
answers to the different tasks of the assignment
• Any form of plagiarism must be avoided. Your codes will be checked by MOSS software
developed by Stanford University to find any possible similarities.
• Note that the assignment is to be completed individually.
2 Problem description
Atrial fibrillation (AF) is an abnormal heart rhythm (arrhythmia) characterized by the rapid
and irregular beating of the atrial chambers of the heart. It often begins as short periods of
abnormal beating, which become longer or continuous over time. In order to diagnose AF, we
Figure 1: ECG waveform for a normal cardiac cycle.
1
Figure 2: ECG of atrial fibrillation (top) and normal sinus rhythm (bottom). The purple arrow
indicates a P wave, which is lost in atrial fibrillation.
analyze the ECG signal. An ECG is a time-varying physiological signal which reflects the ionic
current flow causing the cardiac (heart) fibers to contract and relax subsequently. It is obtained
by recording the potential difference between two electrodes placed on the skin. ECG represents
the successive atrial depolarization and repolarization as well as ventricular depolarization and
repolarization occurring at every normal cycle of heartbeat. These events manifest as the peaks
and troughs of the ECG waveform, namely P, Q, R, S, and T, shown in Figure 1. Note that
R-R interval refers to the time between successive R-peaks.
In the following, we explain that obtaining information about R-R intervals and the absence
or presence of P-waves helps us to diagnose AF more effectively. When a person is diagnosed
with AF, typical characteristics of the ECG are the absence of P waves, and irregular R–R
intervals due to irregular conduction of impulses to the ventricles. At very fast heart rates,
atrial fibrillation may look more regular, which may make it more difficult to separate from
other conditions. In Fig. 2, the ECGs of a normal and AF rhythm has been brought, where
it can be seen that in the case of AF, the R-R peaks are irregular and the P-waves are almost
gone.
3 Data
We have collected relevant information of 1613 patients in a the file "Assignment_Data.mat".
More specifically, we have extracted the standard deviation (STD) of the R-R intervals for each
ECG recording. In addition, for each patient, we have checked whether the P-wave is present
or not. For simplicity, from now on, we refer to STD of the R-R peaks as SDRR.
A screenshot of how the data looks is shown in Fig. 3. As can be seen, the first column shows
the ID of the patients, the second column determines whether a P-wave has been detected in
the ECG signal or not, the third column shows the SDRR, and lastly, the fourth column shows
whether the patient has been clinically diagnosed with AF or not. In the second column, a 1
indicates presence of a P-wave and a 0 indicates absence of a P-wave. In the fourth column, a
1 indicates a positive diagnosis of AF and a 0 indicates a negative diagnosis (normal). As an
example, for the first patient, the SDRR is 0.0726, and a P-wave has been detected. However,
he has not been diagnosed with AF. Keep in mind that the values of SDRR and P-value can be
affected by noise. For example the P-wave might have been missed in a normal case, just due
to the noise, not an abnormal ECG.
Once you import this table into your Matlab workspace (using "load('Assignment_Data.mat')"),
you can access the element in the i-th row and the SDRR column by the command "Data.SDRR(i)",
and similarly can use other column names to access their elements. In addition, in order to access
2
Figure 3: The provided table of data
3
Figure 4: The result of executing "Data(2 : 5, 1 : 3)"
a subset of the table, you can use ":". For example, in order to access the data from row i to j of
the ID column, you can use the command "Data.ID(i : j)". You can also view specific contents
of the table by addressing the row and column number, as well. For example, the command
"Data(2 : 5, 1 : 3)" will give you all the data from row 2 to row 5 and from column 1 to column
3. The result of executing "Data(2 : 5, 1 : 3)" has been shown in Fig. 4. However to work
with the values in the table and perform operations on them, you need to use the Data.[column
name] format (e.g. "Data.SDRR(i)").
4 Preliminary tasks [compulsory but not graded]
In the following sections, we are going to work with a subset of the data to train a probability
model. To this end, we will need some of the data for training the model and the rest for testing.
By running the following script you randomly select 1200 patients’ data and put them in a new
table as the training table and put the rest in another table as the test table. Then these two
data tables are saved as "Train.mat" and "Test.mat". You should submit these two files with
your code. Note that these train and test sets and hence the results will be unique
to your student ID. From now on you will work with these two sets, instead of the original
Data.
id=input('What is your student ID? ');
rng(id);
K=1200;
N=length(Data.ID);
i_n=randperm(N);
i_tr=Data.ID(i_n(1:K));
i_ts=Data.ID(i_n(K+1:end));
Train=Data(i_tr,:);
Test=Data(i_ts,:);
save('Train','Train');
save('Test','Test');
Note: When you run the code, it asks for your student ID, you should type in your
4
own student ID. Run this code "only once" when you start the assignment, then
the train and test sets are saved. If you exit MATLAB or clear your workspace,
you can load the saved Test and Train data.
5 Probabilities
Note: This part should be completed using Matlab.
If we choose a subject randomly from the Train set,
Q5.1: What is the probability of the subject being normal?
Q5.2: What is the probability of the subject having AF disease?
Q5.3: What is the probability of presence of P-wave for a subject in the train set?
Q5.4: Find the mean and variance of SDRR for subjects in the train set (Matlab built-in
functions can be used).
Q5.5: Find the range of SDRR values (in the 3rd column of the Train set). Divide the range
into 10 equal-sized intervals (bins). Next, find the probability of an SDRR value lying in
each bin. The probability found for the i-th interval approximates
∫ xi+d
xi
fX(x)dx, where
fX(x) is the PDF of SDRR and [xi, xi + d] is the i-th bin. Plot a bar graph showing the
probability of the SDRR value in each bin [Hint: use the histogram and bar functions]
Q5.6: What is the probability of the SDRR value being in the fourth bin.
Marking scheme
2 marks = 1 mark for correctness of your approach (you should explain the concepts behind the
code, in your report) + 0.5 mark for correctness of the codes + 0.5 mark for demonstration
6 Conditional probabilities
Note: This part should be completed using Matlab.
If we randomly select a subject from the Train set, find the following:
Q6.1: The probability of the P-wave being present if we know that the subject is not diagnosed
with AF.
Q6.2: The probability of the P-wave being present if we know that the subject is diagnosed
with AF.
Q6.3: For each of the bins found in Q5.5, find the an approximate of
∫ xi+d
xi
fX|N (x)dx, where
fX|N is the conditional PDF of SDRR given that the subject is Normal (not AF). Plot
the bar graph showing the conditional probability of SDRR lying in each bin.
Q6.4: For each of the bins found in Q5.5, find the an approximate of
∫ xi+d
xi
fX|A(x)dx, where
fX|A is the conditional PDF of SDRR given that the subject is diagnosed with AF. Plot
the bar graph showing the conditional probability of SDRR lying in each bin.
Q6.5: If we know that a patient has AF, what is the probability that its SDRR is in the seventh
bin?
Q6.6: The mean of SDRR for the patients diagnosed with AF.
Q6.7: The mean of SDRR for the patients with normal condition.
5
Marking scheme
2.5 marks = 1 mark for correctness of your approach (you should explain the concepts behind
the code, in your report) + 1 mark for correctness of the codes + 0.5 mark for demonstration
7 Classification based on P-wave
Note: Parts 7.1 to 7.5 can be completed either using Matlab or manually (on
paper). You can choose based on your preference.
From the probabilities found for the Train data, find the following (Q7.1-Q7.4):
Q7.1: The probability of having a Normal condition if no P-wave has been detected
Q7.2: The probability of having AF if no P-wave has been detected
Q7.3: The probability of having a Normal condition if P-wave has been detected
Q7.4: The probability of having AF if P-wave has been detected
Q7.5: Based on you answers to Q7.1 to Q7.4, which diagnosis is more likely for the four subjects
with IDs 12, 100, 125 and 132 in the "original Data" set? Compare your prediction with
the actual diagnosis, how do you explain your finding?
Q7.6: For all the data in the "Test set", use the conditional probabilities you found in Q7.1
to Q7.4 based on the presence of the P-value, to make a prediction whether the pa-
tient has AF or has a normal condition. Compare your predictions with the actual
results for the subjects in the Test set, and find the accuracy, sensitivity and speci-
ficity of your prediction. Hint: Accuracy means the percentage of correct predictions.
Moreover, in medical diagnosis, test sensitivity is the ability of a test to correctly iden-
tify those with the disease (true positive rate), whereas test specificity is the ability
of the test to correctly identify those without the disease (true negative rate). Watch:
https://www.youtube.com/watch?v=FnJ3L-63Cf8
Marking scheme
3 marks = 2 marks for correctness of your approach (if you choose the Matlab option, you should
explain the concepts behind the code, in your report) + 0.5 mark for correctness of the codes
(Q7.6) + 0.5 mark for demonstration
8 Classification based on SDRR
Note: The parts 8.1 to 8.3 can be completed either using Matlab or manually (on
paper). You can choose based on your preference.
From the probabilities found for the Train data, find the following (Q8.1-Q8.2):
Q8.1: For each of the bins found in Q5.5, find the probability of having a Normal condition
when SDRR lies in each bin. For example, if the range of i-th bin is [xi, xi + d], then find
the probability of having a normal condition when SDRR is in [xi, xi + d]. Repeat this
process for all 10 bins.
Q8.2: For each of the bins found in Section 5, part (f), find the probability of having AF when
SDRR is in the range of the corresponding bin (similar process to part (a)).
6
Q8.3: Based on your answers to Q8.1 and Q8.2, which diagnosis is more likely for the four
subjects with IDs 4, 64, 86, 191 in the original Data set? How do you justify this result?
Q8.4: For all the data in the "Test set", based on the bin that the pateint’s SDRR belongs to,
make a prediction whether the patient has AF or has a normal condition. Next, compare
your predictions with the actual results, and find the accuracy, sensitivity and specificity
of your prediction (see the hint for Q7.6).
Marking scheme
3 marks = 2 marks for correctness of your approach (if you choose the Matlab option, you should
explain the concepts behind the code, in your report) + 0.5 mark for correctness of the codes
(Q8.4) + 0.5 mark for demonstration
9 Decision based on P-wave and SDRR
Note: Parts 9.1 to 9.5 of the assignment can be completed either using Matlab or
manually (on paper). You can choose based on your preference.
Assume that P-wave and SDRR are "independent", find the following values
Q9.1: For each of the 10 bins found in Q5.5, find the probability of having a Normal condition
given that P-wave is present and SDRR is in the corresponding bin. For example, the
range of i-th bin is [xi, xi + d], then find the probability of having a Normal condition
when SDRR is in [xi, xi + d] and P-wave is present. Repeat this process for all i.
Q9.2: For each of the 10 bins found in Q5.5, find the probability of having a Normal condition
when P-wave is absent and SDRR is in the corresponding bin (similar process to part
(a)).
Q9.3: For each of the 10 bins found in Q5.5, find the probability of having AF when P-wave is
present and SDRR is in the corresponding bin.
Q9.4: For each of the 10 bins found in Q5.5, find the probability of having AF when P-wave is
absent and SDRR is in the corresponding bin.
Q9.5: Which diagnosis is more likely for subjects with IDs 4, 21 and 26 in the original Data
set? How do you justify this result?
Q9.6: For all the data in the Test set, based on the bin that the patient’s SDRR belongs to
and the presence or absence of P-wave, make a prediction whether the patient has AF
or has a normal condition. Next, compare your predictions with the actual results, and
find the accuracy, sensitivity and specificity of your prediction
Q9.7: In Sections 7, 8, and 9, we created three models for predicting whether a patient has
AF or not. For each of these models, you found the accuracy, sensitivity, and specificity
in the Test set. Compare these metrics among the three models and briefly explain the
result of the comparison.
Marking scheme
3 marks = 2 marks for correctness of your approach (if you choose the Matlab option, you should
explain the concepts behind the code, in your report) + 0.5 mark for correctness of the codes
(Q9.6) + 0.5 mark for demonstration
7
10 Additional questions
Q10.1: In this assignment, the probabilities of a patient having AF or a Normal condition were
almost equal. However, in general, the incidence rate of AF is 0.02. Do your proposed
models suit the real-world scenarios? Do you need to modify them? How?
Q10.2: If we don’t assume independence between P-wave and SDRR random variables, how
do you incorporate both P-wave and SDRR in auto diagnosis? just explain, no Matlab
implementation is required.
Q10.3: Find a Gaussian PDF model (just the formula on paper) for the PDF of SDRR if we
know that the patient has AF.
Marking scheme
1.5 marks = 0.5 mark for each question (only based on your report)
11 Submission
You are required to submit your code, "Train.mat" and "Test.mat" files (all in a single .zip
archive) and a brief report (at most six A4 pages in .pdf format) via Moodle submission links
by Monday 5 October at 9am.
Your report should include your answers to all questions in parts 5 to 10. For Matlab-based
questions, you should explain the concepts behind the codes as well.
8

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468