辅导案例-STA509

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STA509 Homework #5
Due Wednesday, November 18th at 5pm
1. In this problem, our goal is to examine the impact of testing a marker as opposed to
testing the DSL in a case control study. Let’s assume (and this could be shown) that
the required sample size (number of cases = number of controls = nDSL) necessary for
testing the DSL is given by
nDSL = KpA(1− pA)/δ2A
where K is a constant and pA is the DSL allele frequency and δA = Pr(A|case) −
Pr(A|control). Similarly for the marker near the DSL we have,
nmarker = KpB(1− pB)/δ2B
where K is the same constant as above, pB is the marker allele frequency and δB =
Pr(B|case)− Pr(B|control). Our goal is to relate nDSL and nmarker. To do so, recall
from our notes that
δB = δA · (Pr(B|A)− Pr(B|a)).
and
r = (pAB − pApB)/√pApapBpb.
Step 1, using the two equations immediately above, rewrite r in terms of δA, δB,
pA, pa, pB, pb. Then use this expression and the equations for nmarker and nDSL to
derive nmarker in terms of r and nDSL.
With this relationship in hand, imagine a scenario where we had sufficient power with
100 case and 100 control samples to find the DSL. With that information, approxi-
mately how many samples would be necessary to find a marker in LD with the DSL
where r = 0.70?
2. In this problem, we compare the power between a chi-square test on a 2× 2 table and
a trend test. Let’s simulate for 2000 subjects in each group (case or control). Let the
cell probabilities for our 2 × 3 table be summarized in Table 1. Now let’s simulate
data when assuming HWE holds for both the cases and controls. Let the frequency of
the A allele be 0.35 for the cases and 0.25 for the controls.
1
AA Aa aa
cases p2D p1D p0D
controls p2C p1C p0C
Table 1: Table for problem 2
(a) Assuming HWE, simulate in R a 2 × 3 table corresponding to the above proba-
bilities for each of the sample sizes. Include your R code.
(b) Test the 2 x 3 table for significance using the trend test. Report the null hypoth-
esis, test statistic, p-value and conclusion. Does your conclusion agree with the
truth regarding cases and controls?
(c) Convert your 2 x 3 table into the appropriate 2 x 2 table and use the chi-squared
test (chisq.test) with the continuity correction turned off (correct = F) to test
for significance. Does your conclusion agree with the truth?
(d) Use a for loop and repeat the above 100 times and report the average p-value
from each test and the percentage of times the p-value from each test was less
than 0.05 for each sample size.
(e) Comment on your answers comparing the power in light of what your learned from
HW4. That is, from HW4 what’s the main difference in assumptions between a
chi-squared test on a 2 by 2 table and a trend test? What’s the cost involved (in
terms of power) for not making that assumption?
3. Suppose that a candidate gene case-control study has been undertaken. The SNP of
interest is given in Table 2.
Genotype
CC CG GG
Cases 25 20 15
Controls 5 20 45
Table 2: Table for candidate SNP
In addition 20 other markers have been genotypes and the values of their respective
χ2 statistics from a trend test are summarized in Table 3
(a) For the candidate SNP assume the disease allele is C and test for association using
the trend test.
(b) Test for a dominant mode of inheritance model.
(c) Give the odds ratio and the 95% confidence interval for the dominant mode odds
ratio.
2
0.1101 0.1485 1.3233 0.0191 0.1935
0.0158 1.0742 0.0642 0.0039 0.0101
0.7083 0.0268 2.4173 2.2925 6.4653
0.2750 10.8274 4.7093 0.3573 0.5707
Table 3: Table of χ2 statistics
(d) Compute the p-value for each statistic in Table 3 and for the candidate SNP, is
the candidate SNP associated with the disease in light of controlling the family
wise error rate at 0.05 via Bonferroni, via Holm, and via Hochberg procedures?
(e) Is the candidate SNP significant when controlling the false discovery rate at 0.05
via Benjamini and Hichberg procedure?
3

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468