THE UNIVERSITY OF SYDNEY - SCHOOL OF ECONOMICS ECMT2150 INTERMEDIATE ECONOMETRICS, S1 2021 ASSIGNMENT Due Date: 30 May 2021 (11:59pm sharp) Instructions: • Anonymous marking: Do NOT put your name anywhere on your assignment or in the file name. Identify yourself only by your student number. • Answer all questions. • A total of 100 points are available and marks for each question are indicated throughout. • The assignment is worth 15% of your final grade for this UoS. • You will need to use STATA (or another regression software program, e.g. R) to complete this assignment. Do not use Excel. Submission Instructions: • Answers to Parts A-D are to be submitted via the Canvas Quiz, “Assignment Quiz” Like Quiz 2, I encourage you to work through all of the following analysis of the data on Stata or another software package before heading to the quiz to answer the questions there. There are no trick questions, so if you have completed each of the following questions, kept a copy of your output and made a note of your answers, there will be no surprises when you are taking the quiz. You should not need to use Stata during the quiz at all. That said, the quiz is not timed, so you could leave and come back to the quiz if you need to. Remember – because it is an untimed quiz, it will not automatically submit at the due date. You must click submit yourself. You will get only one attempt at the quiz. Allowing multiple attempts tends to make marking complicated. • You must upload your Stata output and commands/do file in the final question of the Canvas quiz. This upload is worth 5 points. This document should be no more than 5 pages long. It should show your commands and your output. Think of this as a way of showing your working. NB: In Stata, if you highlight some of the output in the Results window, then right- click, you can a) copy and then you can paste this into Word or some other word processing software. (In Word, Font Courier New in size 9 works well), or b) copy as a table or picture. This will capture your commands and output. Then you can paste this table or image into Word or some other word processing software. You must submit your answer to Part E through the Assignment dropbox. Part E must be typed. It will be checked using Turnitin for plagiarism. Assignment: Multiple Linear Regression Inference, Heteroskedasticity, Endogeneity and IVs The topic and information on the dataset This assignment involves the application of a range of econometric methods in analysing the effect of class size in primary school on academic achievement. This topic has been the focus of a large research literature by economists as well as other social scientists going back decades. To quote Angrist and Lavy: When asked about their views on class size in surveys, parents and teachers generally report that they prefer smaller classes. This may be because those involved with teaching believe that smaller classes promote student learning, or simply because smaller classes offer a more pleasant environment… Social scientists and school administrators also have a longstanding interest in the class-size question. Class size is often thought to be easier to manipulate than other school inputs, and it is a variable at the heart of policy debates on school quality and the allocation of school resources in many countries… This broad interest in the consequences of changing class size notwithstanding, causal effects of class size on pupil achievement have proved very difficult to measure. Even though the level of educational inputs differs substantially both between and within schools, these differences are often associated with factors such as remedial training or students’ socioeconomic background. This quote comes from a published study by Angrist and Lavy, Using Maimonides’ Rule to Estimate the Effect of Class Size on Scholastic Achievement (Quarterly Journal of Economics, May 1999, pp.533-575). It is available through the library. Reading the article is not required for this assignment (but I encourage you to take a look). The data used in our assignment is a subset of that used in the paper. Download the data from the ‘Assignment Instructions’ page where you found these instructions. Head to the ‘Assignments’ area on our Canvas site: https://canvas.sydney.edu.au/courses/30993/assignments Note – there are a few different versions of the data and each student will have a link to just one of these. I have edited the data slightly for each version, and by enough that you need to work on your own data. If you work on one of your classmate’s data sets, you may answer one or more questions in the quiz incorrectly and lose marks or be referred to the academic integrity office. The data is a cross-section of 5th grade (Year 5) classes. The number of rows in the dataset is the sample size, and is equal to the number of classes we have observations for. So, one row or one observation for each class. There are 8 columns. The columns correspond to the variables: Variable name Description schlcode school code classid class sequence number enroll number of students in the grade pdisadv percent of students in the grade that are socio-economically disadvantaged avgmath mathematics score, class average avgverb grammar score, class average classize class size in number of students cs_rule class size rule – explained immediately below NB - the variable cs_rule: There is an administrative rule in Israel, where the data comes from, that restricts classes to be no larger than 40 students. This administrative rule can potentially be exploited as an instrumental variable. The rule implies that (again quoting Angrist and Lavy) “class size increases one-for-one with enrollment until 40 pupils are enrolled, but when 41 students are enrolled, there will be a sharp drop in class size, to an average of 20.5 pupils. Similarly, when 80 pupils are enrolled, the average class size will again be 40, but when 81 pupils are enrolled the average class size drops to 27”. Following the paper, I have constructed the variable cs_rule, as cs_rule = enroll/[(int((enroll-1)/40))+1]. Part A: Descriptive Statistics for the Sample [9 marks] Quiz questions 1-4: [4 marks] Investigate the distribution of class size, the average maths score, number of students in the grade (enrollment), and percentage of students that are socio-economically disadvantaged in the sample. For each, find the average, standard deviation, minimum, maximum, median and the 10th, 25th, 75th and 90th percentile of its sample distribution. Construct and keep a copy of a histogram for classize and a histogram for avgmath. In the quiz you will be asked to report selected summary statistics either to 4 decimal places or to the nearest whole number. You will also answer a multiple-choice question on the histograms. Quiz question 5: [2 marks] Pause and think about what you have found – both the summary statistics and the histograms. In the quiz you will be asked to comment on any thing that seems unusual or noteworthy. Quiz questions 6-7: [3 marks] Construct and keep a copy of a scatter plot of classize on cs_rule. In a quiz question, you will be asked to comment on what we observe from the scatter plot about the impact of the administrative rule limiting class sizes on actual class size. Part B: Simple & Multiple Regression Model - Estimation and Testing [25 marks] Quiz question 8: [2 marks] Estimate the simple regression model in (EQ.1): ℎ = 0 + 1 + (. 1) In the quiz you will report selected coefficient estimates, standard errors and the R-squared to 4 decimal places. Quiz questions 9-11: [4 marks] Economists and educators have long debated whether it’s worth paying the extra labour costs (i.e., teachers’ wages) required to reduce class size. What should the sign of the achievement/class-size relationship be if the investment is worthwhile? What is the sign of this relationship from your estimates of (EQ.1)? Interpret the estimated slope coefficient. Based on this model, are smaller class sizes associated with better student performance? Quiz questions 12-14: [3 marks] Is the estimated slope coefficient in (EQ.1) significantly different from zero? In the quiz, you will not need to set out all the steps of the hypothesis test, but you will need to write down the null and alternative hypotheses for the test, report the p-value, and report whether it is or is not statistically significant. In your quiz answers, writing H0 and H1, beta1, beta1hat, etc is fine – you are not required to use subscript formatting or typeset maths in your quiz answers. But distinguishing between and using beta1hat or beta1 is important. To write not equal to 0, you can write it out in words, or write neq or not=. Quiz question 15 [3 marks]: Do you think the estimated slope coefficient in (EQ.1) is a causal estimate? Briefly explain. Quiz question 16 [3 marks]: Now estimate the effect of class size in a model that includes controls for both the percentage of disadvantaged students and the number of students in the grade: ℎ = 0 + 1 + 2 + 3 + (. 2) In the quiz you will report selected coefficient estimates, standard errors and the R-squared to 4 decimal places. Quiz questions 17-19: [4 marks] • Report the 90% confidence interval on classize. Report your answer to 4 decimal places, but be sure to make any calculations using all of the decimal places given in your Stata output. • Using your confidence interval, is classize statistically significant at the 10% significance level? (Yes/No) • Describe how you used the confidence interval you calculated in order to determine whether classize statistically significant at the 10% significance level? In your quiz answers, writing H0 and H1, beta1, beta1hat, etc is fine – you are not required to use subscript formatting or typeset maths in your quiz answers. But distinguishing between and using beta1hat or beta1 is important. To write not equal to 0, you can write it out in words, or write neq or not=. Quiz question 20: [1 mark] Based on your estimated results for (EQ.4), are smaller class sizes associated with better student performance? (Yes/No) Quiz questions 21-23: [5 marks] Test whether the effect of an additional student in the grade on average maths scores is equal to the effect of an additional student in the class on average maths scores, holding all else constant. In the quiz, you will • write down the null and alternative hypotheses for this test, • report the either the test statistic or the p-value for the test, and • provide the formal conclusion from your test. In your quiz answers, writing H0 and H1, beta1, beta1hat, etc is fine – you are not required to use subscript formatting or typeset maths in your quiz answers. But distinguishing between and using beta1hat or beta1 is important. To write not equal to 0, you can write it out in words, or write neq or not=. Part C: Heteroskedasticity [10 marks] Quiz questions 24-28: [7 marks] Apply the Breusch Pagan test for the presence of heteroskedasticity to model (EQ.2), using a 5% significance level. What do you conclude? NB. For full marks, you must conduct all the steps of the test as per the lecture notes or as described in the textbook. In the quiz you will • report selected coefficient estimates and the R-squared from your auxiliary regression each to 4 decimal places, • report the test statistic, the degrees of freedom and either the critical value or the p- value for the test, and • provide the conclusion from your test. Quiz questions 29-30: [3 marks] Re-estimate the model (EQ.2) with robust standard errors. In the quiz you will report selected standard errors to 4 decimal places. You will be asked to comment on the differences between the robust standard errors and the regular standard errors you found above in Part B for (EQ.2). Part D: Endogeneity and Instrumental Variables [41 marks] Quiz question 31: [4 marks] Arguably, class size is still endogenous in (EQ.2) despite the addition of the two control variables, pdisadv and enroll. If so, does the multiple regression model in (EQ.2) capture a causal relationship between student achievement and class size? Why or why not? What does this imply about E(u|classize)? Quiz question 32: [3 marks] What is a plausible reason for, or source of, this endogeneity? Carefully explain. Quiz question 33: [3 marks] If the variable classize is endogenous in (EQ.2), state the impact on your estimates and inference if you estimate model (EQ.2) using OLS. Quiz question 34: [4 marks] The class size rule (cs_rule) provides a potential instrumental variable we could use to cleanly identify the causal effect of class size on student performance. What two key conditions must each instrumental variable satisfy in order for the IV estimator to be consistent? State whether each these conditions can be tested. Quiz question 35: [4 marks] Discuss whether, and why or why not, we could expect the IV, cs_rule, to satisfy these two conditions that you gave in Question (34). To do this, use intuition or simple economic theory. Quiz question 36: [3 marks] Estimate the first-stage or reduced form regression if we are going to use cs_rule as an IV for classize in (EQ.2). Use robust standard errors. In the quiz you will report the coefficient estimates, robust standard errors and the R-squared to 4 decimal places in a table. In the quiz, you can make a table using the little table creation drop-down icon, or you can make a table in Word or Excel and copy paste it in. For an example, your table might look something like this, where you replace ‘variable name’ with the appropriate variable name or label. Variable Coefficient estimate Robust standard error Variable name Variable name Variable name Intercept R-squared Quiz question 37-39: [5 marks] Use the results from your reduced form regression (in Question 35) to test the relevance of the IV, cs_rule (also known as a test of identification). Use a 1% level of significance. In the quiz, you will • write down the null and alternative hypotheses for this test, • report the test statistic and the critical value, both to 2 decimal places and • provide the formal conclusion from your test. In your quiz answers, writing H0 and H1, beta1, beta1hat, etc is fine – you are not required to use subscript formatting or typeset maths in your quiz answers. But distinguishing between and using beta1hat or beta1 is important. To write not equal to 0, you can write it out in words, or write neq or not=. Quiz questions 40: [3 marks] Re-estimate model (EQ.2) using cs_rule as an IV for classize. Estimate the model using robust standard errors. In the quiz you will report the coefficient estimates and robust standard errors to 4 decimal places in a table. In the quiz, you can make a table using the little table creation drop-down icon, or you can make a table in Word or Excel and copy paste it in. For an example, your table might look something like this, where you replace ‘variable name’ with the appropriate variable name or label. Variable Coefficient estimate Robust standard error Variable name Variable name Variable name Intercept Quiz question 41: [3 marks] Interpret the IV estimate for 1. Quiz question 42: [3 marks] Comment on the differences between the IV and OLS estimates and their standard errors (from Question 40 and Question 29, respectively) for (EQ.2). Quiz question 43: [1 mark] What can we now conclude - are smaller class sizes associated with better student performance? Quiz question 44: [5 marks] Upload your Stata output/do file in the final question of the Canvas quiz. • This upload is worth 5 points. • This document should be no more than 5 pages long. It should show your commands and your output. • Think of this as a way of showing your working. Part E: Conclusions [15 marks] Provide an executive summary or conclusion for your findings on the effect of class size on student achievement. Summarise your key finding(s). Be sure to comment on your conclusions regarding the causal effect of class size on student performance. Explain the reasons for your conclusions. NB: • Include your word count in your document. • Your answer for Part E should be 1-2 paragraphs and no more than 400 words long. Answers that exceed the word count may be penalized. • The most important aspect of an answer to this question is the presentation of a reasoned discussion based on the estimation results you have found for the project. • You must type up your answer to this question in your own words and submit it through the assignment dropbox. • It will be checked using Turnitin for plagiarism.
欢迎咨询51作业君