辅导案例-MAS 627

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

MAS 627 - Homework 1
Your Name Here
Due Wednesday, September 2nd by Midnight
Please submit BOTH your .RMD file and the knitted PDF file to Blackboard.
Instructions
• One line of code per question (Parts 1 and 2).
• R output is enough for an answer, you do not need to additionally type the answer to each question.
• No entering numbers manually.
• Example: What percent of people like the color yellow?
• Good: mean(favColor=='Yellow') <- this will remain correct if data changes
• Bad: 6/15, after looking at data and determining 6 of the 15 had yellow as favorite color
• Bad: sum(favColor=='Yellow')/15 <- this will be incorrect if the data changes
• No unnecessary or irrelevant output in your document. Keep it organized, relevant, and well formatted.
Part 1
stateData <- data.frame(state.x77, Region=state.region)
1. What is the dimension of this data set?
2. What variables does it contain?
3. Rename the variables Life.Exp and HS.Grad to LifeExp and HSGrad.
4. What is the mean population size?
5. What is the area of the United States?
6. How many states are in the ‘West’ region?
7. Use the table() function to see how many states are in each region.
8. What percent of states are in the ‘Northeast’ region?
9. What is the total area of the ‘North Central’ region?
10. In one line of code, determine the total area of each region.
11. Which states have the lowest illiteracy rate?
12. Which states in the South have above average income?
13. Which states have an area of over 100,000 square miles, life expectancies greater than 70 years, and
more than 50% high-school graduates?
14. Which 3 states have life expectancies over 73 years or murder rates per 100,000 less than 2%?
1
Part 2
• Read in the Largest Companies by Revenue Wikipedia page using the htmltab package/function.
– Data can be found here - https://en.wikipedia.org/wiki/List_of_largest_companies_by_revenue
• Data contains information on the 50 largest companies by revenue.
• Convert the data into the format given below.
– Pay attention to variable types.
str(data)
'data.frame': 50 obs. of 7 variables:
$ Rank : chr "1" "2" "3" "4" ...
$ Name : chr "Walmart" "Sinopec Group" "State Grid" "China National Petroleum" ...
$ Industry : chr "Retail" "Oil and gas" "Electricity" "Oil and gas" ...
$ Revenue : num 523964 407009 383906 379130 352106 ...
$ Profits : num 14881 6793 7970 4433 15842 ...
$ Employees: num 2200000 582648 907677 1344410 83000 ...
$ Country : chr "United States" "China" "China" "China" ...
head(data)
Rank Name Industry Revenue Profits Employees
2 1 Walmart Retail 523964 14881 2200000
3 2 Sinopec Group Oil and gas 407009 6793 582648
4 3 State Grid Electricity 383906 7970 907677
5 4 China National Petroleum Oil and gas 379130 4433 1344410
6 5 Royal Dutch Shell Oil and gas 352106 15842 83000
7 6 Saudi Aramco Oil and gas 329784 88211 79000
Country
2 United States
3 China
4 China
5 China
6 Netherlands
7 Saudi Arabia
Additional Questions:
1. What is the average revenue by industry?
2. What proportion of the companies listed are in the Oil and Gas industry?
3. How many employees are employed by the 10 largest (by revenue) companies? Note that the data is
already sorted high to low by revenue.
4. Among these companies, what percent of total revenue does the financial industry capture?
5. What percent of oil and gas companies are based in the United States?
2
Part 3
The data for Part 3 represents the Miami Dolphins schedule page from ESPN, located here - https://www.
espn.com/nfl/team/schedule/_/name/mia. It looks a bit hectic when you read it in, but if you look at it
online you should see what is going on (Preseason stuff is at the top, Regular season starts about midway
down). You will extract and clean the regular season table.
• Don’t be afraid of trial and error. You can always re-read in the dataset if you accidentally
overwrite something.
• vs/@ in the Opponent variable corresponds with Home/Away
I’m giving you a CSV file to read in, but if you are curious about pulling it directly from ESPN, here is the
rvest (. . . like “harvest”) code I used for it -
# This code is just for reference.
# Data is read-in in the next chunk.
library(rvest)
url <- 'https://www.espn.com/nfl/team/schedule/_/name/mia'
page <- read_html(url)
data <- data.frame(html_table(page, fill = TRUE))
str(data)
'data.frame': 16 obs. of 7 variables:
$ WEEK : chr "1" "2" "3" "4" ...
$ DATE : Date, format: "2020-09-13" "2020-09-20" ...
$ OPPONENT: chr "New England" "Buffalo" "Jacksonville" "Seattle" ...
$ TIME : chr "1:00 PM" "1:00 PM" "8:20 PM" "1:00 PM" ...
$ TV : chr "CBS" "CBS" "NFL" "FOX" ...
$ PRICE : num 163 318 82 354 64 98 132 132 51 206 ...
$ LOCATION: chr "Away" "Home" "Away" "Home" ...
data
WEEK DATE OPPONENT TIME TV PRICE LOCATION
9 1 2020-09-13 New England 1:00 PM CBS 163 Away
10 2 2020-09-20 Buffalo 1:00 PM CBS 318 Home
11 3 2020-09-24 Jacksonville 8:20 PM NFL 82 Away
12 4 2020-10-04 Seattle 1:00 PM FOX 354 Home
13 5 2020-10-11 San Francisco 4:05 PM FOX 64 Away
14 6 2020-10-18 Denver 4:05 PM CBS 98 Away
15 7 2020-10-25 Los Angeles 1:00 PM CBS 132 Home
16 8 2020-11-01 Los Angeles 1:00 PM FOX 132 Home
17 9 2020-11-08 Arizona 4:25 PM CBS 51 Away
18 10 2020-11-15 New York 4:05 PM CBS 206 Home
20 12 2020-11-29 New York 1:00 PM CBS 41 Away
21 13 2020-12-06 Cincinnati 1:00 PM CBS 132 Home
22 14 2020-12-13 Kansas City 1:00 PM CBS 272 Home
23 15 2020-12-20 New England 1:00 PM CBS 197 Home
24 16 2020-12-27 Las Vegas TBD 230 Away
25 17 2020-01-03 Buffalo 1:00 PM CBS 34 Away
3