Research School of Finance, Actuarial Studies and Statistics Semester 2, 2024 STAT1008 Quantitative Research Methods ASSIGNMENT 2 DUE DATE:. Wednesday 16 Oct 4pm OBJECTIVES The main goal of this assignment is to help you apply the statistical tools that you have learned in this course, and to do so in a realistic data analysis setting. This assignment is a great way to solidify your understand of exploratory data analysis; a key preliminary step in any data analysis task. This assignment will also give you hands on practice in conducting statistical hypothesis testing. Hypothesis testing supports evidence based decision making and this assignment will give you practice in carrying out a hypothesis test of your choice and interpreting the results. REQUIREMENTS In this assignment you are required to analyse a dataset of your choice using the statistical tools and methods discussed in Chapters 1, 2, 3, 7, 8, 9 and 10 of the textbook. You will need to write a report to present and discuss the results of your analysis. You will need to decide on what data summaries and graphical displays to produce, and what numerical descriptive measures to calculate for inclusion in your report. You are required to carry out one hypothesis test on your chosen data set. You will need to decide on what applied question you want to answer with your chosen data set, formulate the question in terms of a hypothesis test, provide step-by-step details of your calculations, and interpret your analysis results in relation to your original question of interest. Specific requirements of the report are described in detail on page 3. Listed below are a few applied questions to give you an example of the type of research questions that you could investigate through the collection of relevant data, and running a hypothesis test: • Do NBA players over 2 metres tall have a higher average two-point field goal percentage than NBA players who are 2 metres or less in height? • Do piano players have longer fingers than non-piano players? • Do athletes sleep less than the recommended amount of 7 hours per night? The dataset you choose could be one of academic or personal interest to you and can come from any field. The freedom to choose your own dataset gives you the opportunity to stimulate your intellectual curiosity so find data you are excited to work with!! The data set can come from any field such as economics, law, medicine, education, sport, psychology, politics etc. Your analysis must be original and must not be copied from another source. Once you have chosen your dataset, you may wish to confirm with your tutor or the course convenor that your choice of data set is suitable for the assignment in terms of complexity and structure. Page 1 of 7 CAUTIONS!! • YOUMUST ANALYSE RAW, INDIVIDUAL LEVEL DATA (NOT AGGREGATED DATA). DO NOT ANALYSE DATA ALREADY SUMMARISED IN A FREQUENCY OR SUMMARY TABLE. (Note: data tables available for public download from the Australian Bureau of Statistics tend to be in aggregated form already, so this data is not suitable for the assignment). An example of a data table that is already aggregated is shown in the screenshot below. Figure 1: Example of aggregated (not individual level) data • THE STATISTICAL HYPOTHESIS TESTING TOOLSWE LEARN IN THIS COURSE REQUIRE INDEPENDENT UNITS OF OBSERVATIONS. FOR EXAMPLE, SERIALLY CORRELATED DATA (that is, data taken at yearly or other regular time intervals) ARE NOT INDEPENDENT. So time series data (eg. share prices over time) are not suitable for the assignment. DATA SOURCES If you have a specific topic or data field in mind, try running an online search for your topic eg. basketball data, human rights data, cost of living data . For a general search of datasets from a variety of topics, the following websites may be useful: • The Data and Story Library https://dasl.datadescription.com/ • UC Irvine Machine Learning Repository https://archive.ics.uci.edu/ml/index.php For those interested in global and country-level data, you may find the following websites useful: • Our World in Data https://ourworldindata.org/ • OECD Data https://data.oecd.org/ As an alternative to sourcing data from the internet, you might like to collect your own data (by conducting your own survey for example) to answer some question of interest to you. For example, according to the 2017 Universities Australia survey, the average number of hours worked per week amidst a semester is 16.3 hours for domestic undergraduate students. Do ANU domestic undergraduate students work on average more on less than this amount per week? A minimum sample size of 50 is recommended for a data set sourced from the internet. A minimum sample size of 25 is recommended if you conduct your own survey to collect your own data. Page 2 of 7 REPORT GUIDELINES You must submit a written report to communicate your project findings. Please include the following sections in your report: • INTRODUCTION: • State your research objective. What applied question are you trying to answer? Be very explicit and clear on what your null hypothesis and what your alternative hypothesis are using the framework discussed in class (H0 : .....) (HA : .....) • State why your research question is of personal and practical interest. • DATA SET DESCRIPTION: • State the source of your data set. Either provide the website address(es) if you downloaded the data from the internet or state that you conducted your own study to collect the data. • Target population and data collection method. What is your population of interest? What was the date of data collection from the original source? (eg GDP by country as at 31 Dec 2020) How were the records in your data set chosen for inclusion in your sample? If there are biases in the data collection method, be sure to comment on how this may a↵ect the validity of your results. • Data set size and variables. How many observations are in the data set? Which variables will you analyse? Classify the variables by type (numerical, categorical etc....) Note: You do not need to analyse all the variables in your chosen data set. For example, suppose you have a data set which was collected to study the relationship between exercise and sleep patterns. The data set has 20+ variables containing demographic and lifestyle information on the study participants. However, you are particularly interested to see whether there is an association between amount of hours slept per night and exercise hours per week, so you focus on these two variables for your analysis. • DATA SUMMARIES • Provide summary (frequency or contingency) tables for your chosen variables. Include some graphical displays (bar charts, histograms, scatter plots etc.) • Provide some numerical descriptive measures of your chosen variables (sample means, sample variances, sample proportions). Include a box and whisker plot for your numerical variable(s). You do not need to show working for any numerical descriptive measures that you calculate in this section. You can simply report the result. For example, the sample mean is 6.8 hours of sleep per night. • From your data summaries, what conclusions can you draw about the shape of the distribution of your variables or relationships between variables? Try to explain any patterns in the data you notice. • In the online submission box, there will be a separate tab to submit your data set as an Excel spreadsheet. You must submit your data set as an Excel spreadsheet. This spreadsheet must also show your Excel output with the relevant data summaries referred to in your report. Page 3 of 7 • HYPOTHESIS TESTS - RESULTS: • Carry out your hypothesis test. RestateH0 : ..... andHA : ..... as provided in your introduction. Clearly state the test statistic you calculated and report the p-value of your test. You must show all working. Specifically you need to provide the mathematical expressions for your test statistic calculation and your p-value calculation. Stating the generic formula (for example, t = x¯ µ sx/ p n for a one-sample t-test on a population mean) is not acceptable. You need to insert the specific numeric values for x¯, µ, sx and n that you used in your calculations. Report the exact p-value calculated using the Excel function T.DIST(..) as demonstrated in lectures. Also state your chosen significance level. • Justify that your data variables confirm to the assumptions required by your hypothesis test. • HYPOTHESIS TESTS - DISCUSSION: • Interpret your results in relation to your applied research question and provide some practical, intuitive reasoning behind your results. For example, a significant positive correlation is found between exercise hours and sleep hours. This makes sense as more rest time may be required to recover from the additional physical exertion during exercise. • CONCLUSIONS: • Briefly summarise your key findings. • Discuss any limitations of your analysis and potential future improvements. Are there any further questions you would like to answer if you had the relevant statistical knowledge or if you had access to additional data? • REFERENCE LIST: If applicable, please use the Harvard referencing style as detailed here https://www.anu.edu.au/students/academic-skills/academic-integrity/referencing/harvard. SUBMISSION GUIDELINES • Total length: 5-8 pages (including graphs, excluding reference list). Note this is a guideline on total length. A submission greater than 10 pages will be penalised and the pages exceeding the page limit will not be graded. On the other hand, it is doubtful that all elements of the assignment can be adequately addressed in 2 pages, hence a minimum length of around 5 pages is expected. • The assignment must be submitted online on the Wattle course website via the Turnitin submission box. Please submit your report as a ‘.doc’ or‘.pdf’ file. Turnitin is a ‘text-matching’ software and will compare your submission against an archive of Internet documents, Internet data, a repository of previously submitted papers, and subscription repository of periodicals, journals, and publications. Turnitin then creates an ‘Originality Report’ which can be viewed by both lecturers and tutors, which identifies where the text within a student submission has matched another source. It is important to note that Turnitin does not detect plagiarism. Turnitin will only match the text within a student’s assignment to text located elsewhere (e.g. found on the Internet, within journals or on databases of student papers). • No late assignments or hard copy assignments will be accepted without prior permission from the course convenor. Extension requests are to be submitted online on the course Wattle site. (See the assessment extension block on the right hand side of the Wattle site). Page 4 of 7 ACADEMIC INTEGRITY • This assignment is to be done individually and not in collaboration with other students in the class. • Students should not have another person/entity do any portion of the assignment for them, which includes hiring a person or a company to complete any portion of the assignment. • All parts of your assignment must uphold the principles of academic integrity, as defined in the ANU Policy: Code of Practice for Students University Academic Integrity Rule (https://services. anu.edu.au/learning-teaching/academic-integrity). You must attach a completed RSFAS ASSESSMENT INTEGRITY DECLARATION form to the front of your assessment when submitted. This form is available on the course Wattle site. • Analytical and critical thinking skills, and e↵ective communication of statistical ideas are part of the learning outcomes of this course. Developing strong competencies in these areas will prepare you for a competitive workplace. The assignment you submit must present your own, original work. USE OF ARTIFICIAL INTELLIGENCE (AI) • It is acceptable to use AI tools (like ChatGPT) to (i) generate project ideas, and/or (ii) as an editing tool for your own work. If you choose to use AI in the above ways you must reference your use of AI in detail as follows: 1. Include the following declaration after the Introduction to your report (and before the Dataset Description): I acknowledge the use of [insert name of AI tool and provide the website address] to prepare my report. I chose to use AI because [insert reason(s) for using AI ]. I used AI to [list the tasks you used AI for e.g to brainstorm ideas]. Screenshots of all AI prompts I used and the output generated are provided in the Appendix. 2. In the Appendix you must provide screenshots of all prompts you used and the output generated by the AI tool. • You should note that the material generated by AI programs may be inaccurate, incomplete, or otherwise problematic. Thus use of AI may result in a lower quality product with AI unable to produce the sophistication required for this task. AI tools should be used with caution and proper citation. AI is not a replacement for your own thinking and research. • It is very important that you do not use AI to merely ‘do’ your assignment for you. Submissions that have been generated entirely by AI are not permitted and will be treated as plagiarism and a breach of ANU’s Academic Integrity Rule. • Failure to properly cite your use of AI as described above is also in violation of the ANU Academic Integrity Rule. Page 5 of 7 MARKING GUIDELINES The assignment will be marked according to the following rubric: Item Total marks available Notes Clarity 3 Is your report logically structured, easy to follow and neatly presented? Are your research objectives clearly stated and justified? Have you included a project title? Are your graphs clear and well labelled? Interest 3 Did you state why your data set is of personal interest to you? Did you describe the practical importance of your research questions? Did your data set have a mix of di↵erent variable types? Is the sample size big enough? Dataset description 5 Did you answer all the questions in this section as required? Data summaries 6 Did you calculate a variety of numerical descriptive measures? Have you included some summary tables and graphs? Have you provided some commentary and/or interpretation of your summary tables and graphs? Have you submitted a copy of your data set as an Excel file that also shows the data summaries you generated? Hypothesis test 7 Are your null and alternative hypotheses clearly stated using the statistical notation shown in lectures? Did you provide step-by-step calculation details including the mathematical expressions to calculate your test statistic? Did you calculate an exact p-value using the Excel function T.DIST(...)? Did you state your chosen significance level? Did you justify the assumptions required for the statistical methods you used? Discussion 3 Did you interpret your hypothesis test results in relation to your research questions? Do your results confirm your initial hypothesis or are they counterintuitive? Did you provide some practical or intuitive explanation for your results? Conclusions 3 Did you summarise your key findings? Did you discuss the limitations of your analysis? Have you discussed potential future improvements to your study? Total 30 Some tips: • Choose a dataset you have a personal interest in. • Consult with the teaching sta↵ for advice and to check whether your chosen dataset is appropriate or not. • There is no need to use statistical methods that we haven’t discussed in class yet. Additional notes: • For copyright reasons and to avoid plagiarism, sample assignment reports cannot be made available to students. Page 6 of 7 • The teaching sta↵ will be happy to answer specific questions or concerns you have related to your project write-up. But the teaching sta↵ will not be available to read full drafts of your assignment before submission and provide detailed feedback before submission. Page 7 of 7 51作业君版权所有