SCHOOL OF ECONOMICS ECO-7000A: Econometric Methods ECO-7009A: Finanacial Econometrics ECO-7014A: Banking Econometrics Take-home Assignment - Autumn Semester 2020 This piece of work should be submitted on Blackboard no later than 3pm on Monday 9 November 2020 (week 7). It accounts for 40% of the overall mark for the module. The exercise is divided into many parts. Parts marked with an asterisk (*) are the most important. The file HOUSE_2020 contains data on 4,150 residential properties traded in Norwich between January 2014 and October 2020. The variables are: house: Property number price: Sale price in thousands of pounds beds: Number of bedrooms baths: Number of bathrooms recs: Number of recreation rooms garages: Number of garages type: 1 if empty plot of land 2 if flat 3 if bungalow 4 if chalet 5 if terraced house 6 if end-terraced house 7 if semi-detached house 8 if detached house pcode: 1 if post code is NR1 (South and East Central Norwich) 2 if NR2 (West Central Norwich) 3 if NR3 (North Central Norwich) 4 if NR4 (South-West Norwich) 5 if NR5 (West Norwich) 6 if NR6 (North Norwich) 7 if NR7 (East Norwich) 8 if NR8 (North-West Norwich) sqm: Internal area in square metres. dg: One if property has double glazing; zero otherwise ch: One if property has central heating; zero otherwise gsize: Size of Garden in square metres poll: Air pollution at property (measured in millionths of a gram of particulate matter per cubic metre of air) noise: Level of traffic noise at property (measured in Decibels, DB). age: Age of property in years month: Month of transaction: 1 if Jan 2014; 2 if Feb 2014; : 82 if Oct 2020. (a) Compare mean and median price (sum price, detail), and obtain a histogram of price (hist price). Describe the distribution of property prices in Norwich. What does the comparison of mean and median tell us about the nature of the distribution? (b) Find the mean price for each of the eight postcodes (tab pcode, sum(price)). Rank the eight postcodes by mean property price. (c) Estimate a regression model using the OLS estimator, with price as the dependent variable, with sqm, beds, baths, recs and garages as quantitative explanatory variables, and with a set of dummy variables for type, using “empty plot” as the base case. (To obtain the type dummies, you just need to include i.type in the regress command).1 Present the results in a table. [continued over 1 Dummy variables are explanatory variables that take on only two values, 0 and 1. We will talk about dummy variables in the lectures soon. (d) Explain why one of the eight type dummies must be omitted in order for estimation to be possible. (e) Evaluate how well the model in part (c) fits the data. That is, quote and interpret R2; quote the F- statistic for overall significance, conduct an F-test for overall significance, and interpret the result. (f) Does the intercept parameter have a meaningful interpretation in the model of (c)? (g) Number of bedrooms and number of recreation rooms both appear to have a negative effect on price. Does this have a logical explanation? (h) Using economic concepts where appropriate, interpret each of the other coefficients in the model of (c). Briefly indicate which are significantly different from zero. (i) Still using the results from the model of (c), report a 95% confidence interval for the slope parameter associated with “sqm”. Interpret this interval estimate. (j) Extend the model of part (c) to include a set of postcode dummies, with NR1 as the “base case” (just add i.pcode to the regress command). Report the regression results. (k) Using the formula for the F-test given in the lecture notes, conduct an F-test for the significance of the postcode dummies, in order to assess the importance of location in price determination. (This is a test of the model of (c) as a restricted version of the model of (j)). (l)* Draw up a ranking of the eight postcodes, based on ceteris paribus price comparisons (i.e. based on your regression results). Does your answer contradict your answer to (b)? If so, why? (m) Using the model with postcode dummies (part j), predict the price of a terraced house in East Norwich, with 2 bedrooms, 1 bathroom, 1 recreation room, no garage and an internal area of 60 square metres. (Give the answer in pounds.) Predict the price of a detached house in North-West Norwich, with 5 bedrooms, 2 bathrooms, 3 recreation rooms, two garages and an internal area of 200 square metres. (n) Which of the 4,150 properties appears to have been the best in terms of “value for money”, and which worst? (Hint: Look at the residuals). FOR PARTS (o)-(r), DO NOT PROVIDE TABLES OF RESULTS; JUST FOCUS ON THE PARTS OF THE RESULTS THAT ARE RELEVANT IN ANSWERING THE QUESTIONS. (o)* Add the variables age and age-squared to the model of (j). (To do this, you need to add c.age##c.age to the regress command). Test the individual and joint significance of age and age- squared. Plot predicted price against age, with other variables set to means (to do this, you need the two commands: margins, at(age=(0(20)200) atmeans, followed by marginsplot.) There are two economic arguments for why age affects property price: the depreciation effect (the value of a property declines as it gets older); and the vintage effect (older properties attract a premium). Are you finding evidence of the depreciation effect, or the vintage effect, or both (p) Add the variable month to the the model of (o), and interpret the coefficient of month. Why is this information useful to a homeowner? (q)* Add the variable month-squared as well as month. (To do this, add c.month##c.month to the regress command). Similarly to part (o), obtain a plot of predicted price against month. In which month (if any) did prices reach a maximum? If not, predict when a maximum price will be reached. (r) Test for heteroskedasticity in the model of (q) using the command hettest sqm age, fstat. Suggest reasons why the variables sqm and age are likely to cause heteroskedasticity in the present situation. (s)* Other variables are available in the data set. Experiment with these by adding them in different combinations (with squared terms and interaction terms where appropriate) to the model (you should continue to include the variables you have previously used). You might also try making the necessary correction for heteroskedasticity (if you found evidence of it in (r)). REPORT ONLY ONE COMPLETE SET OF RESULTS; THIS SET OF RESULTS SHOULD BE FROM YOUR MOST PREFERRED SPECIFICATION. Make sure you explain why it is your most preferred specification. Interpret the coefficients of the variables you have added (there is no need to interpret coefficients of variables previously used). Hints for part (s): The relationship between price and garden size is almost certainly non-linear, with the marginal value of an additional square metre falling as garden size rises. For this reason it is inappropriate to use gsize itself as an explanatory variable. Try log(gsize). Take care in interpreting the coefficient. Something called “Covid19” happened from March 2020 onwards. Did it have a significant effect on property prices in Norwich? Try including a “Covid dummy” in your model.
欢迎咨询51作业君