辅导案例-STA 247
STA 247 - Assignment #2 Due: April 3, 2020 @ 11:59 PM - Submit through Crowdmark This is an individual assignment - all work and ideas presented should be entirely your own. You should not be discussing with others and brainstorm ideas. This is not a group assign- ment! This also means that posting publicly on Piazza is NOT permitted. Remember to show all your work. Solutions without justifications will not earn any marks. Assignments are an opportunity for you to demonstrate how well you are able to apply what you have been learning in the course, with constructive feedback returned to you so that you can make improvements for evaluations in the course. While there may be different paths to the solution, it is your responsibility to show, without a doubt, through your solutions that you have learned the course material. This includes clearly defining any relevant ran- dom variables/events and their full distributions as appropriate, using appropriate notation, and interpreting your results in plain English. For problems that require R, consider using R-Studio Cloud so you have the flexibility of saving plots and exporting your script as a .pdf file. Sample scripts will be provided for you under ‘R Files’ on Quercus by end of day Friday. 1 Problem 1 [22 points]. Consider a binomial random variable X ∼ Bin(n, p). As you will soon be learning, it turns out that under certain conditions, a binomial distribution can be well approximated using a normal distribution. In this exercise, you will be comparing and contrasting between probabilities calculated using the exact distribution and the ap- proximate distribution. For segments that uses R: type and save your code in the R-script section of R-Studio. Submission Instructions: Follow the instructions on Crowdmark carefully. There will be three submission parts: (i) Your written responses, including calculations and numerical answers, (ii) Your two histograms, (iii) A .pdf file of your R script for all parts of the problem. Use comments (#Insert comments using the hashtag symbol#) to separate the different parts of the problem. Failure to submit according to instructions will result in an automatic deduction of 5 points. a) (1 point) Suppose that X1 ∼ Bin(100, 0.2). Using R, find the exact probability that X1 is between 5 and 15, inclusive. b) (5 points) Since the parameters of a normal distribution are the mean and variance of the random variable, find the mean and variance of X1. Using a normal distribution with those parameters, calculate the approximate probability that X1 is between 5 and 15, inclusive. Remember to apply any continuity corrections as needed. 2 c) (1 point) Compare your results in (a) and (b). d) (4 points) Using the syntax provided in the lecture slides, create a vector in R that will save 10,000 samples from a Bin(100, 0.2) distribution. Plot a histogram of your samples in R. Label your axes accordingly and include ‘Bin(100, 0.2)’ as the title of your histogram. You can do this easily using the ‘hist( )’ command in R. If ever you’re unsure how to use a command in R, simply type ‘?(command)’ in the console in R-Studio. (e.g. to find out how to use the histogram command, including any features, type ?hist in the console and read the help page that pops up). e) (3 points) Repeat (a) and (b) for X2 ∼ Bin(100, 0.02). f) (2 points) How do the exact and approximate probabilities compare in (e)? Is the approximation for X2 significantly better, significantly worse, or neither compared with the approximation for X1? 3 g) (4 points) Repeat (d) by plotting a histogram of 10,000 binomial outcomes from a Bin(100, 0.02) distribution, with appropriately labeled axes. Include ‘Bin(100, 0.02)’ as the title of your histogram. h) (2 points) Using your histograms, come up with an explanation for how well/poorly the normal distribution approximates the binomial distribution. 4 Problem 2 [5 points]. Prove that for any two independent random variables X and Y with finite variance will have zero covariance. Show this for the two cases where X and Y are both discrete and when they are both continuous. 5 Problem 3 [20 points]. An exam consists of a problem section and a short-answer section. Let X1 denote the amount of time in hours that a student spends on the problem section and X2 represent the amount of time the same student spends on the short answer section. Suppose the joint probability density function of these two times is: f(x1, x2) = ⎧⎪⎪⎪⎨⎪⎪⎪⎩cx1x2, x1 3 < x2 < x1 2 , 0 < x1 < 1 0, otherwise a) (3 points) Sketch and fully label the support (i.e. label your axes, include the number scale on your graph). Shade in the area corresponding to the support, and label all boundaries. b) (4 points) Find the value of c that would make this a valid probability density function. 6 c) (6 points) Derive the marginal distributions of X1 and X2. Are the two times inde- pendent? Explain how you determine this. d) (4 points) If the student spends exactly 0.25 hours on the short answer section, what is the probability that at most 0.60 hours was spent on the problem section? 7 e) (3 points) If a student spends 0.25 hours on the short answer section, what’s the expected time they will spend on the problem section? 8 Problem 4 [8 points]. If X is uniformly distributed on [-3, 1], find the probability density function of Y = ∣X∣. Hint: Sketch a graph of the transformation to help you determine the corresponding supports of Y . 9 Problem 5 [5 points]. Complete textbook exercise 5.67 on page 232, without using mo- ment generating functions. Numerical results without appropriate work and detail in steps will not earn credit. 10 Problem 6 [6 points]. Use the appropriate transformation method to find the distribution of the sample mean of n independent observations from a Gamma(α, β) distribution. Sample mean is the average of a collection (x1, x2, ..., xn) of observations from a population (e.g. the average height of 10 randomly selected students) and is denoted by X = ∑ni=1Xi n Remember that finding the distribution means identifying the distribution type, where pos- sible, including all relevant parameters. 11 Problem 7 [18 points]. A random variable R has probability density function given by: g(r) = {kr6e−r/5, r > 0 0, otherwise a) (2 points) What is the distribution of R? b) (2 points) Find the value of k that makes g(r) a density function. 12 c) (2 points) Find the mean and variance of R. d) (6 points) Suppose R models the total distance (in km) between 8 randomly selected inter-city bus stops, where the distance is computed as the distance to the nearest bus stop to its east. That is, R = D1 + D2 + ... + D7 where Di is the distance between the i th and the (i + 1)th bus stops. What distribution might be used to model Di? State at least two assumption(s) that must be made about Di. Critique whether the assumption(s) is/are reasonable or not in the given context. 13 e) (3 points) Referring to part (d), what is the probability that the next closest bus stop east of the first bus stop is within 5 km? f) (3 points) Referring to part (d), give a lower bound estimate for the probability that the total distance between the first and eighth bus stop is at most 82 km. 14