1 UNIVERSITY COLLEGE LONDON DEPARTMENT OF POLITICAL SCIENCE POLS0010 Data Analysis ESSAY QUESTIONS 2019–2020 (TERM 1 – Part 2) Guidelines for Completing and Submitting POLS0010 Essay § Read the below guidelines to avoid losing unnecessary marks. § The assessment is due on 14th January 2020, 14.00 hours. Please follow all designated Department of Political submission guidelines. THESE MAY BE DIFFERENT TO THOSE OF YOUR HOME DEPARTMENT. The submission guidelines are available on the Moodle page for this module. You must submit one copy of your essay via Turnitin. The word limit is 1,500 words, excluding tables and graphs, references, and your R script appendix (see below). § This is an assessed piece of coursework for the POLS0010 module; collaboration and/or discussion with anyone is strictly prohibited. The rules for plagiarism apply and any cases of suspected plagiarism of published work or the work of classmates will be taken seriously. § The dataset for the essay can be found in the ‘Dataset’ folder on Moodle. § The data for Part A come from the British Election Study, 2010 – a multi-stage stratified random sample individual survey of political attitudes. A subset of variables from the post election fieldwork are included in your dataset. The data are collected about the individuals (e.g. age and qualifications). § You may open up the dataset and work on the essay questions anytime up until the submission date. There is no limit on the number of times you may open the data files. Be sure to save your data file and R script file. § The essay questions comprise two sections; you must complete each part of each section. § Where appropriate, answers should be written in complete sentences; no bulleting or outlining. Be sure to answer all parts of the questions posed and interpret output statistically and substantively. § You should include tabular and graphical output alongside your written answers in Part A and without any tabular and graphical output in Part B (see below). § You should include a copy of your R script as an appendix to your essay. FAILURE TO INCLUDE THE R SCRIPT WILL INCUR A 5 POINT PENALTY. Note that your R script file should include comments indicating the question being addressed. Your R script file should contain only the exercises/questions asked here. § All variable names are shown in italics. § You should discuss the interpretation of your results and how they relate to the questions you were asked. § You may assume the methods you have used (e.g. linear regression) are understood by the reader and do not need definitions, but you do need to say which techniques you have used and why. § As this is an assessed piece of work, you may not email/ask the module tutors questions about the essay questions. § 10 points will be awarded for presentation. 2 § This assessment is out of 100 marks and will count towards 50% of the term 1 mark. The data file is bes_2010.dta. You can copy this file in the usual way from Moodle. The variables are: Variable name Variable label cserial Unique respondent identification number bq1 Interest in politics bq16_1 Trust in political parties bq16_2 Trust in parliament bq16_3 Trust in British politicians bq16_4 Trust in police bq16_5 Trust in courts bq16_6 Trust in banks in Britain bq62 Life satisfaction bq68 Most people can be trusted bq92_1 Physical or mental impairment bq107 Self-rated health zgor Government Office Region zq88 Gender zq89 Age zq90 Marital status zq93_1 Household tenure zq95_1 Age finished full-time education zq95_3 Highest qualification zq96 Annual household income zq97 Main source of income zq101 Ethnicity consname Constituency Name wardid Electoral ward ID wardname Electoral ward name postwgt Survey weight For more information about these data can be found from the UKDS: http://doc.ukdataservice.ac.uk/doc/7529/mrdoc/pdf/7529_technical_report_2011.pdf DOs and DON’Ts - DON’T include raw variable names in the text or tables - DON’T use too many decimal places, but be consistent - DON’T include unedited R output in the main text of your essay or you will lose marks - DO make sure tables and figures have titles and referenced in the text - DO make sure your tables and figures can be understood without reading the text - DO make sure you have given a clear enough description of what you have done so that the reader can reproduce any numbers/results that you present - DO be careful how you use the terms ‘significant’ and ‘correlation’ because they have specific meanings in social statistics. 3 PART A: Multiple Linear Regression (60 Points) This question uses the bes_2010.dta dataset. You have been asked to write a short report for an academic publication on the relationship between individuals’ education and their trust in institutions. You should create one summary score of trust in the following: political parties, parliament, politicians, police, courts and banks and use this score to fit a multiple linear regression model(s) predicting trust in institutions by education. Your data contain multiple measures of education. You may choose to add additional explanatory variables to your model that may explain the relationship between education and trust in institutions. You may choose to recode or transform some variables in your data. You should report any decisions you take to adjust for individual non-response in your data and to take account of any complex survey design. The decisions should form an introduction that also includes a description of your dataset, descriptive statistics and your research hypothesis. Briefly explain any limitations to your analysis in a concluding section that also summarises your main substantive finding. PART B: Regression Interpretation (30 Points) The model below is from a paper published in a leading social science journal on the impact of living in a deprived neighbourhood on mental health among those living in social rented housing. The model contains a number of independent variables: age, age-squared, gender, ethnicity and qualifications as well as interactions between these variables and neighbourhood deprivation. The unit of analysis is individuals living in social rented accommodation in Great Britain. The dependent variable (General Health Questionnaire) ranges from 0 (least distressed) to 36 (most distressed) and is taken from data collected as part of the 2010 UK Household Longitudinal Study. Your task is to interpret the model and write up the results as if you were writing the results section and conclusion of a report for civil servants at the Department for Communities and Local Government. Interpret the model statistically and substantively and detail its limitations. Your results section should report on descriptive and statistical findings. Your conclusion should discuss whether the results support a causal effect of neighbourhood deprivation on mental health and describe limitations of the analysis. A table containing descriptive statistics is appended. 4 Linear regression of mental health (GHQ-12) on neighbourhood deprivation (ND) No interactions With interactions Significant interactions F-test Coefficient of ND 1.27*** 5.85** No 1.57 Standard error (0.37) (2.69) Adjusted R2 0.030 0.033 N 1075 1075 Notes: ***significant at the 1 per cent level; **significant at the 5 per cent level; *significant at the 10 per cent level. Control variables are: age, age squared, female, non-White, GCSEs. Interactions are between neighbourhood deprivation and all the control variables. F-test compares the restricted (no interactions) and unrestricted (with interactions) model. Sample characteristics, UK Household Longitudinal Study, 2010 Variable Description N Mean SD Min Max Dependent variable Mental health General Health Questionnaire 1075 12.40 5.96 0 36 Exposure variable Neighbourhood disadvantage Standardised mean summed scores of % aged 16+ who are economically active, % households without a car, % households renting their home, % household overcrowded 1118 0.80 0.99 -1.66 3.43 Control variables Age Age in years 1118 44.17 18.35 16 90 Gender Female 1118 0.61 0.49 0 1 Ethnicity Non-white 1118 0.04 0.19 0 1 Qualifications GCSE or above 1118 0.41 0.49 0 1 10 points are reserved for clear presentation and clarity of answers, especially in regards to production of tabular and graphical outputs. § 8-10 clear answers with outputs shown in concise format § 5-7 correct answers with outputs that can be understood but cumbersome § 0-4 confused answers with unclear outputs.