程序代写案例-COVID-19

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Exploring COVID-19 data for Toronto, Canada
STA303/1002 Data exploration assessment
Information Note
Name Data exploration
Type Type 1
Value 10%
Due Friday, Feb 12 at 6:00 p.m. ET
Instructions and
submission link
Instructions: https://q.utoronto.ca/courses/204826/assignments/415115
Submission: Via Crowdmark (link will be active once you receive an email
from Crowdmark about this activity)
Late submission policy For assessments in Type 1, late assessments will still be accepted through
Crowdmark, but only if they are your first submission. You will lose 10
percentage points on the assessment, per day, with submissions accepted
for up to 3 days after the due date. I.e., 72 hours after the initial due date.
Accommodations and
extension policy
If you miss a type 1 assessment due to illness or a serious personal
emergency, please complete this form within ONE week of the due date of
the assignment. Upon receipt of your form, we will contact you via email
within 3 business days to arrange an accommodation.
Introduction
For this assessment you will be working with the most up-to-date COVID data for the City of Toronto.
To complete this assessment, there are two documents you need to be aware of:
(1) this set of instructions (i.e., what you’re currently reading), and
(2) sta303_data-exploration_template.Rmd a template you should save your on copy of.
The template will help you load the required data and has some helper functions for preparing your submis-
sion for Crowdmark.
In tasks 1 and 2, our aim is to create versions of the ‘Cases by Day’ and ‘Cases by Outbreak Type and Week’
graphs you can find on the Toronto COVID portal under ‘Daily Status of Cases’.
In task 3 you will use data about Toronto’s neighbourhoods, drawn once again from the Toronto COVID
portal, this time under ‘Neighbourhood Maps’, as well as neighbourhood profile data from the 2016 census,
from the OpenData Toronto portal.
The graphs in your final submission may look a little different from the ones in this document, as you will
get the most up to date version of the data before submitting. That should be the only difference between
your plots and the ones shown here.
Code last run 2021-01-20.
Daily: Data as of January 18, 2021.
Neighbourhood: Data as of January 17, 2021.
1
STA303/1002 Winter 2021 TASK 1: DAILY CASES
Task 1: Daily cases
Data wrangling
Prepare your data for visualization with the following data wrangling requirements:
• Start with reported_raw. (See the template code for the code to read the data in.)
• Your new wrangled dataset should be saved as an object called reported.
• Replace all NA values with 0 in the recovered, active and deceased columns. See the note, below.
• Make sure the reported_date column is in date format by explicitly overwriting it with a date version
if itself. Use the date() function.
• Assess whether the data is currently tidy. If yes, proceed. If no, alter it to be tidy. Specifically, it
needs to be in the correct format to be useful for creating the figure for this section.
• Note how “Recovered”, “Active” and “Deceased” appear in the figure for this section. Make appropriate
alterations to your data that makes sure these names are a) capitalized appropriately and b) will appear
in the correct order in the legend of the figure for this section.
Note
Part marks will be given for any code that achieves this. For full marks, your code should have the following
(filled in appropriately) as the first two lines:
reported <- %>%
(,
, replace=0)
(You don’t need to use the same line breaks.)
2 of 13
Data visualization STA303/1002 Winter 2021 TASK 1: DAILY CASES
Data visualization
Create a barchart of active, recovered and deceased cases by date. For full marks your graph must be exactly
the same as the one below (except for the updated data) and must be created from the code you’ve written.
• The bar chart should be stacked, as opposed to side by side.
• The title should be: “Cases reported by day in Toronto, Canada”.
• The subtitle should be: “Confirmed and probable cases”.
• The axis labels must match my image (pay attention to capitalization).
• There should be small text below the graph that includes the following: “Created by: for STA303/1002,
U of T” and “Source: Ontario Ministry of Health, Integrated Public Health Information System and
CORES” and the date the data was last updated, set programmatically, not by hand. See tips 1, 2
and 3 below.
• The limits should be from January 1, 2020 to the present day, with the present day set programmati-
cally, not by hand. See tip 4.
• Any warnings suppressed so they are not printed in your final document.
• No legend title.
• Legend position is set to c(.15, .8).
• Active cases are coloured “#003F5C”, recovered cases are coloured “#86BCB6”, and deceased cases
are coloured “#B9CA5D”.
Notes
1. You can use \n to create a line break in text strings you are using in ggplot.
2. date_daily[1,1] will give you the data as a character string.
3. str_c() is useful for combining text and code to output a final text string.
4. date("2020-01-01") and Sys.Date() will be helpful.
3 of 13
Data visualization STA303/1002 Winter 2021 TASK 1: DAILY CASES
0
500
1000
1500
2000
01 Jan 20 01 Apr 20 01 Jul 20 01 Oct 20 01 Jan 21
Date
Ca
se
c
ou
nt
Active
Recovered
Deceased
Confirmed and probable cases
Cases reported by day in Toronto, Canada
Created by: for STA303/1002, U of T
Source: Ontario Ministry of Health, Integrated Public Health Information System and CORES
Data as of January 18, 2021
4 of 13
STA303/1002 Winter 2021 TASK 2: OUTBREAK TYPE
Task 2: Outbreak type
Data wrangling
• Start with outbreak_raw.
• Your new wrangled dataset should be saved as an object called outbreak.
• Make sure the episode_week column is in date format by explicitly overwriting it with a date version
if itself. Use the date() function.
• Assess whether the data is currently tidy. If yes, proceed. If no, alter it to be tidy. Specifically, it
needs to be in the correct format to be useful for creating the figure for this section.
• Note how “Sporadic” and “Outbreak associated” appear in the figure for this section. Make alterations
to your data that make sure these names are a) correctly worded/capitalized and b) will appear in the
correct order in the legend of the figure for this section.
• Create a new variable, total_cases, that indicates the total number of cases in the episode week, i.e.,
the sum of sporadic cases and outbreak associated cases.
5 of 13
Data visualization STA303/1002 Winter 2021 TASK 2: OUTBREAK TYPE
Data visualization
Create a barchart of cases by outbreak type and week. For full marks your graph must be exactly the same
as the one below (except for the updated data) and must be created from the code you’ve written.
• The bar chart should be stacked, as opposed to side by side.
• The title should be: “Cases by outbreak type and week in Toronto, Canada”.
• The subtitle should be: “Confirmed and probable cases”.
• The axis labels must match my image (pay attention to capitalization).
• There should be small text below the graph that includes the following: “Created by: for STA303/1002,
U of T” and “Source: Ontario Ministry of Health, Integrated Public Health Information System and
CORES” and the date the data was last updated, set programmatically, not by hand. See Notes 1, 2
and 3 below.
• The x-axis labels should be formatted as day, month, year in the form “01 Jan 20”. Complete the code
in Note 4 to achieve this.
• The limits of the x-axis should be from January 1, 2020 to the present day + 7 days, with the present
day set programmatically, not by hand. See tips 4 and 5.
• The limits of the y-axis should be from 0 up to the maximum of the total_cases variable you made.
Set this programmatically, not by hand.
• Any warnings suppressed so they are not printed in your final document.
• No legend title.
• Legend position is set to c(.15, .8).
• Sporadic outbreak are coloured “#86BCB6”, and outbreak associated cases are coloured “#B9CA5D”.
Notes
1. You can use \n to create a line break in text strings you are using in ggplot.
2. date_daily[1,1] will give you the data as a character string.
3. str_c() is useful for combining text and code to output a final text string.
4. Complete and use this code: scale_x_date(labels = scales::date_format("%d %b %y"), limits
= ).
5.date("2020-01-01") and Sys.Date()+7 will be helpful.
6 of 13
Data visualization STA303/1002 Winter 2021 TASK 2: OUTBREAK TYPE
0
2000
4000
6000
01 Jan 20 01 Apr 20 01 Jul 20 01 Oct 20 01 Jan 21
Date
Ca
se
c
ou
nt
Sporadic
Outbreak associated
Confirmed and probable cases
Cases by outbreak type and week in Toronto, Canada
Created by: for STA303/1002, U of T
Source: Ontario Ministry of Health, Integrated Public Health Information System and CORES
Data as of January 18, 2021
7 of 13
STA303/1002 Winter 2021 TASK 3: NEIGHBOURHOODS
Task 3: Neighbourhoods
Data wrangling: part 1
Our goal here is to prepare a dataset that has the percentage of 18 to 64 years-olds who are classified as low
income in each neighbourhood.
• Start with the nbhood_profile dataset.
• Your new wrangled dataset should be saved as an object called income.
• Filter to only include the row(s) that are relevant, i.e. the row(s) that give the percentage of 18 to 64
year-olds who are classified as low income.
• Assess whether the data is currently tidy. If yes, proceed. If no, alter it to be tidy.
• Make sure that the percentages are stored as numbers, not character strings. The function
parse_number() may be of use to you for this.
Data wrangling: part 2
Our goal here is to use the income data from part 1 and merge it with the nbhoods_shape_raw data, that
will allow us to draw a map of Toronto, with the neighbourhoods.
• Start with nbhoods_shape_raw.
• Your new wrangled dataset should be saved as an object called nbhoods_all.
• Each row of this dataset represents a neighbourhood of Toronto. Use the AREA_NAME variable to create
a new variable called neighbourhood_name that removes the number in parentheses (and the space
before it) to get a clean neighbourhood name. See note 1 below.
• Merge appropriately so that you have the cases per 100,000 people by neighbourhood and the percentage
of 18 to 64 years-olds who are classified as being low income.
• Ensure the neighbourhoods are correctly matched. (Hint: Toronto has 140 neighbourhoods.) See note
2 below.
• Assess whether the data is currently tidy. If yes, proceed. If no, alter it to be tidy. Specifically, it
needs to be in the correct format to be useful for creating the figure for this section.
• Rename your case rate variable to rate_per_100000 if it isn’t already.
Notes
1. "\\s" matches the space character, "\$" and "\$" matches parentheses, and "\\d" by itself will
match any of the digits between 0 and 9 once, and the + will match as many of that type as exist in a
row, so “\\d+” would match a string of digits of any length. Putting a $ at the end says we want this
to match this pattern at the end of our character string.
2. ‘City of Toronto’ is not a neighbourhood, but may appear in your data, depending on how you merge.
Merge and/or filter appropriately so it is not included.
Data wrangling: part 3
Our goal here is to use the nbhood_all from part 2 and create a new variable that indicates
• Create a new dataset called nbhoods_final from nbhoods_all.
• Create two new variables med_inc and med_rate that are the median percentage of 18 to 64 year-olds
who are classified as low income and the median case rate per 100,000 people, across Toronto’s 140
neighbourhoods, respectively.
8 of 13
Data wrangling: part 3 STA303/1002 Winter 2021 TASK 3: NEIGHBOURHOODS
• Create a new variable called nbhood_type that takes the following values under the following conditions:
– “Higher low income rate, higher case rate” for neighbourhoods where the percentage of 18 to 64
year-olds who are low income is greater than or equal to the median percentage for Toronto
neighbourhoods and the number of cases per 100,000 people is also greater than or equal to
the median.
– “Higher low income rate, lower case rate” for neighbourhoods where the percentage of 18 to 64
year-olds who are low income is greater than or equal to the median percentage for Toronto
neighbourhoods and the number of cases per 100,000 people is lower than the median for Toronto
neighbourhoods.
– “Lower low income rate, higher case rate” for neighbourhoods where the percentage of 18 to 64
year-olds who are low income is less than the median percentage for Toronto neighbourhoods
and the number of cases per 100,000 people is also greater than or equal to the median.
– “Lower low income rate, higher case rate” for neighbourhoods where the percentage of 18 to 64
year-olds who are low income is less than the median percentage for Toronto neighbourhoods and
the number of cases per 100,000 people is also lower than the median for Toronto neighbourhoods.
9 of 13
Data visualization STA303/1002 Winter 2021 TASK 3: NEIGHBOURHOODS
Data visualization
Create three maps:
1. One showing the percentage of 18 to 64 year-olds that are classified as low income in each of Toronto’s
140 neighbourhoods.
2. One showing the number of COVID-19 cases per 100,000 people in each neighbourhood.
3. One showing each neighbourhood coloured by its case rate and low income combination (nbhood_type).
Use the following ‘starter code’ to help you. You can add a fill command within an aes() command
in geom_sf() (simple feature geometry) to colour each neighbourhood. These example maps should NOT
appear in your final submission.
# Basic code
ggplot(data = nbhoods_final) +
geom_sf() +
theme_map()
# Example with each neighbourhood coloured
ggplot(data = nbhoods_final) +
geom_sf(aes(fill = neighbourhood_name)) +
theme_map() +
theme(legend.position = "none")
Notes
• For the first figure, you can set the colour with the following line of code: scale_fill_gradient(name=
"% low income", low = "darkgreen", high = "lightgrey").
• For the second figure, you can use similar code to the above, but the colours should go from white
(low) to darkorgange (high).
• For the third figure, use the palette called ‘Set1’ from R colour brewer. (Hint: scale_fill_brewer()).
10 of 13
Data visualization STA303/1002 Winter 2021 TASK 3: NEIGHBOURHOODS
10
20
30
% low income
Neighbourhoods of Toronto, Canada
Percentage of 18 to 64 year olds living in a low income family (2015)
Created by: for STA303/1002, U of T
Source: Census Profile 98−316−X2016001 via OpenData Toronto
Data as of January 17, 2021
11 of 13
Data visualization STA303/1002 Winter 2021 TASK 3: NEIGHBOURHOODS
2000
4000
6000
Cases per 100,000 people
COVID−19 cases per 100,000, by neighbourhood in Toronto, Canada
Created by: for STA303/1002, U of T
Source: Ontario Ministry of Health, Integrated Public Health Information System and CORES
Data as of January 17, 2021
12 of 13
Data visualization STA303/1002 Winter 2021 TASK 3: NEIGHBOURHOODS
% of 18 to 64 year−olds in
low income families and
COVID−19 case rates
Higher low income rate, higher case rate
Higher low income rate, lower case rate
Lower low income rate, higher case rate
Lower low income rate, lower case rate
COVID−19 cases per 100,000, by neighbourhood in Toronto, Canada
Created by: for STA303/1002, U of T
Income data source: Census Profile 98−316−X2016001 via OpenData Toronto
COVID data source: Ontario Ministry of Health, Integrated Public
Health Information System and CORES
Data as of January 17, 2021
13 of 13

欢迎咨询51作业君