MATH 4044 – Statistics for Data Sciences
Case Study SP5 2023
Due 29th Oct 2023 by 11:59pm
Instructions
ˆ This assignment is worth 35% of your final mark. It is due no later than 29th Oct 2023 by 11:59pm.
ˆ You will need to submit your assignment via learnonline.
ˆ The submitted assignment needs to be a single file, in either a Microsoft Word
(doc or docx) or pdf file format, 25 pages at most excluding any appendices.
ˆ The assignment is out of 100 marks. To achieve maximum marks for each question,
you should aim to:
– Complete the requested statistical analysis in SAS using appropriate tasks or procedures (40%).
– Include only the output most relevant to the question and interpret all key results (40%). Do not include every piece of output produced by SAS!
– Discuss the results more broadly in the context of the given scenario (20%).
ˆ Assignments submitted late, without an extension being granted, will attract a penalty of 10 marks per each working day or any part thereof beyond the due date and time.
MATH 4044 Statistics for Data Sciences Case Study
Introduction
Currently rental bikes are introduced in many urban cities for the enhancement of mo- bility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. We would like to see how rented count varies with factors such as seasons, rain, temperature and day of the week.
Data Description
The data file for this assignment is called Seoulbike.sas7bdat. If you are using SAS University edition, the file can be downloaded in the assessment page. This data contains count of public bike rental at the peak-demand hour (6pm–7pm) in Seoul Bike Hiring System. This is a processed version of the data downloaded from
The dataset contains weather information (Temperature, Humidity, Windspeed, Vis- ibility, Dewpoint, Solar radiation, Snowfall, Rain), the number of bikes rented at peak hour and date, season information. The variable descriptions are as follows.
Variable
date
rented
temperature
humidity
windspeed
visibility
dewpoint
SolarRadiation
Rain
Snowfall
seasons
Holiday
wkday
Description
date-month-year
peak rented bike count
temperature in Celcius
Relative humidity (%)
Wind speed in m/s
visibility (multiples of 10m)
Dew point temperature in Celcius
Solar radiation (MJ/m2
1: rainy, 0: no rain
Snow fall (cm)
Winter, Spring, Summer, Autumn Yes; No
Day of the week Monday – Sunday
MATH 4044 Statistics for Data Sciences Case Study
Case Study Tasks
In all questions, provide relevant SAS outputs and interpretations. Remember to check for the relevant assumptions, examine and comment on the residuals.
Question 1 (55 marks)
a) (25 marks) Carry out a one-way analysis of variance relating rented to wkday. Use contrast to test at least one a-priori hypothesis of your choice. Examine and comment on residuals. Also carry out appropriate post-hoc comparisons and discuss your results. Comment on the suitability of ANOVA in this study.
b) (20 marks) Extend the analysis in part (a) to test whether there is evidence of in- teraction between wkday and rain. Study the simple effects. Carry out appropriate post-hoc comparisons and discuss your results.
c) (10 marks) If ANOVA is not suitable for the study in part (a), carry out the Kruskal-Wallis test relating rented to wkday. If appropriate, carry out the post- hoc analysis. Discuss your results. Note: consider using the option dscf to produce post-hoc comparisons.
Question 2 (30 marks) Use SAS to perform a one-way ANCOVA relating rented and wkday with temperature as a covariate, including appropriate post-hoc comparisons:
ˆ Confirm that there is a linear relationship between the response variable and the covariate (a scatterplot and a correlation coefficient plus a comment will suffice).
ˆ Check the two additional ANCOVA assumptions (report and comment only on the parts of the output most directly relevant to condition checking):
– Independence of the covariate and the treatment effect (perform a one-way ANOVA test). Will the covariate helps enhance the difference in rented by seasons or will it be a confounding factor?
– Equality of slopes (add and check significance of the interaction term);
ˆ Report and briefly discuss your results. Compare your results with Question 1a).
Technical note: Make sure you obtain and examine Type III Sum of Square (ss3). Also obtain estimates of ‘least squares means’ (lsmeans) which are means by treatment adjusted for the covariate.
Question 3 (15 marks)
Write a summary of your findings from Questions 1–3. Keep the technical details of the analyses that led you to these conclusions to the absolute minimum. Rather, focus on practical significance and present your findings in non-specialist terms. One to two paragraphs (up to a page) will be sufficient.