LUBS5000 Quantitative Methods Indicative Outline Solutions to Mock Exam Section A Indicative / Outline answers only. Fuller discussion and explanation required 1. Average profits are about £4000 higher in Region B. However, in both cases, the median is considerably lower than the mean, implying a relatively small number of extreme values at the top end of the distribution and a positively skewed distribution of data. The high value of the standard deviation in both data sets indicates considerable variability around the mean. The inter quartile range indicates that the gap between the lower quartile performers and the upper quartile performers is larger in the Region B group (again more positive skew in Region B than A) → Diagram. 2. Sampling distribution of means is the distribution obtained by taking all possible samples of fixed size n from a population, and plotting the mean of each sample as a distribution you get the sampling distribution of means. Distribution is close to normal distribution, has a mean of the population mean, and a standard deviation known as STEM. STEM depicts the extent of error with which the sample is drawn compared to the population i.e. different samples will have a different STEM. 3. Mean value = 2 x 2 = 4 P (x >= 3) = 1 – [P(0) + P(1) + P(2)] (can calculate this manually) e-4 x 4.00 = 0.0183 0! e-4 x 4.01 = 0.0733 1! e-4 x 4.02 = 0.1465 2! = 1 – [ 0.0183 + 0.0733 + 0.1465] = 0.7619 OR from cumulative Poisson tables with (r=2 m=4) = 1 – 0.2381 = 0.7619 The probability that 3 or more injuries occur within a two year period is 76.19% (ii) P(exactly 3 sales) = 8! x (0.2)3 x (0.8)5 3! (8-3)! = 56 x 0.008 x 0.32768= 0.1468 OR from cumulative binomial tables with (n=8 r=3 p=0.2) - (n=8 r=2 p=0.2) 0.9437 - 0.7969 = 0.1468 Probability that exactly 3 sales are made when 8 visits are made is 14.68%. 4. Examples of sampling methods include: Random – with this method every member of the target population has an equal chance of being selected i.e. raffle, use of random number tables. Stratified – if you think the responses you will get from a survey are likely to be determined partly by different categories (unemployed/employed, different industry sectors), then you need to ensure that your sample contains each category in the correct proportions. Multi-stage – mixture of 2 above, if target population covers a wide geographical area then the area to be surveyed is divided into smaller areas and a number of these are chosen at random. Systematic – the idea here is that every nth member of the population is selected, the value of n being determined by the size of the population and the required sample size. Section C Outline answers only. Fuller discussion and explanation required (b) size of sample - explain why when sample size less than 30 use t-statistic and explain test briefly. Explain why when sample size greater than 30 use normal distribution and calculate Z (standard normal variable). number of samples – where working with one sample the above holds, where comparing 2 independent samples use modified Z and t statistics again depending on sample size. (x1 – x2) – (1 - 2) / (x1- x2) If a paired sample i.e. not independent need to use paired t-statistic. Explain form of each test and why each is appropriate 1(c ) H0 = 975 H1 < 975 x bar = 956, s = 58, and n = 25. This requires a t-statistic of the form: (x bar - ) / STEM and under the null hypothesis this has a t-distribution with n - 1 (= 24 ) degrees of freedom STEM = 58 / 25 = 11.6 t = (956 – 975)/ 11.6 = 1.64 From the t-table t24(0.05) = 1.711(one tailed test). Because 1.64 < 1.711 we cannot reject the null hypothesis and hence conclude that, at the 5 % level of significance, there is no statistical evidence that the average life is significantly below the company’s claim of 975 hours, even though the observed average is below 975 hours. 1.(d) H0 : there is no association between the variables H1 : there is an association between the variables O E O - E (O – E)2 (O - E)2 / E 23 14.55 8.45 71.4 4.907 7 15.45 -8.45 71.4 4.621 25 19.40 5.6 31.36 1.616 15 20.60 -5.6 31.36 1.522 30 33.95 -3.95 15.6 0.459 40 36.05 3.95 15.6 0.433 17 19.40 -2.4 5.76 0.297 23 20.60 2.4 5.76 0.280 2 9.70 -7.7 59.29 6.112 18 10.30 7.7 59.29 5.756 Chi-squared 26.003 Expected value worked out by multiplying the row total by the column total and divide by the overall total (taken from contingency table in questions) e.g. 30 x 97 / 200 = 14.55, 30 x 103 / 200 = 15.45. The number of degrees of freedom is given by (no. of rows - 1) (no of columns - 1) (5 - 1) (2 - 1) = 4 The value of the chi-square at the 5% level of significance (from tables) is 9.488. As 9.448 is less than the calculated value (26.003) the null hypothesis is rejected and it is concluded that there is an association between the two variables. NB: Make sure explained in clear and accessible way. Outline answers only. Fuller discussion and explanation required b. Explain what each test you are using and what it is testing for. (i) H0: 1 = 0 H1: 1 ≠ 0 Carry out t-tests (coefficient / standard error) and contrast against critical values to find out which are significant (with n=200 critical values are +-1.96 for 5% significance and +- 2.58 for 1% significance -diagram of critical values and decision criteria- accept or reject H0). t-tests - EDUC = 4.35 EXPER = 2.60 GENDER = -2.76 AGE = -0.22 First 3 are statistically significant at 5% level of significance (ii) F statistic Explain the following H0: R2 = 0 ( = = 0) H1: R2 ≠ 0 (One or more of the parameters is not equal to zero) Look at F statistic and its significance to see whether the model as a whole has explanatory power. Find that reject the null hypothesis that R2 = 0. (iii) R2 = 0.512 Measure of the proportion of the variability of wages that is explained by the explanatory variables. The reported figure shows an acceptable level of explanatory power of the independent variables, but there is room for improvement (see d). This will depend on whether in practical terms all explanatory factors can be measured/observed. (c) Interpretation (statistically significant variables only) An extra year of education (post eighth grade) will on average increase monthly wages by £142.51 everything else held constant An extra year of work experience will on average increase monthly wages by £43.23 everything else held constant Women will tend, on average, to have monthly wages £81.43 lower than men, everything else held constant [Age has no statistically significant relationship with the dependent variable] b s ˆ t = (d) Explain and relate to the question. Omitted variable bias is a statistical term for the following issue: IF 1.we exclude explanatory variables that should be in the regression AND these omitted variables are correlated with the included explanatory variables THEN the OLS estimates of the coefficient on the included explanatory variables will be biased. Therefore you should always include explanatory variables which you think might possibly explain your dependent variable. This will reduce the risk of omitted variable bias. [For multicollinearity you would need to explain what it is, what problems it creates, how you would identify it and try and remedy it. Remember the measure of tolerance which is an indicator of how much of the variability a specified independent variable is not explained by the other independent variables in the model. Small values (<0.1) are often interpreted as indicating multicollinearity. VIF is (1/tolerance)].
欢迎咨询51作业君