辅导案例-COMP4450/COMP6445
COMP2550/COMP4450/COMP6445 - Quantitative/Qualitative
Methods Assignment
Jooyoung Lee (ANU)
11 March 2019
Maximum marks 100
Programming languages R
Assignment questions Post to the Wattle Discussion forum only.
Deadline 31 March, 2020 11:59pm (online via Wattle)
Introduction
In this assignment, we will have questions based on the lab materials but with different dataset. Download a
new dataset from: https://drive.google.com/file/d/1-7s1ti1L_dHkBuHu10x53oRWpRu-0nnR/view
Find some information about the dataset. Each data entry is a Twitter user and it has other related features.
data_df <- read.csv("/Users/joolee/Downloads/dataset.csv",sep=",",stringsAsFactors = F)
str(data_df)
## 'data.frame': 554615 obs. of 13 variables:
## $ UserId : num 8.89e+07 9.64e+17 9.43e+06 3.55e+09 9.60e+17 ...
## $ UserName : chr "trudy gonzales \U0001f1fa\U0001f1f2" "La Tanya" "Mark Allerton" "Roop Singh Insan" ...
## $ ScreenName : chr "trudygonzales" "TanyaStebbing8" "MarkAllerton" "RoopSinghInsan1" ...
## $ BackgroundImgUrl: chr "http://abs.twimg.com/images/themes/theme1/bg.png" "http://abs.twimg.com/images/themes/theme1/bg.png" "http://abs.twimg.com/images/themes/theme2/bg.gif" "http://abs.twimg.com/images/themes/theme1/bg.png" ...
## $ DatePosted : chr "Sat Feb 08 05:11:07 +0000 2020" "Sat Feb 08 05:11:07 +0000 2020" "Sat Feb 08 05:11:07 +0000 2020" "Sat Feb 08 05:11:07 +0000 2020" ...
## $ Text : chr "b\"RT @KTforBiden: \\xe2\\x80\\x9cYou wanna beat trump and save democracy, Joe's your guy\\xe2\\x80\\x9d\\nYou "| __truncated__ "b\"RT @GeoffsNZViews: As I keep saying, the #ClimateChange handwringers are fundamentally anti-science but they"| __truncated__ "b'RT @colinmckerrache: An EV already has much lower lifetime CO2 emissions than even the most efficient hybrids"| __truncated__ "b'RT @insaharjinder2: #ServeDifferentlyAbled\\n@Gurmeetramrahim \\n\\n@derasachasauda\\n does 134 humanity work"| __truncated__ ...
## $ Hashtags : chr "[{'text': 'NeverBernie', 'indices': [115, 127]}, {'text': 'Warren', 'indices': [131, 138]}]" "[{'text': 'ClimateChange', 'indices': [41, 55]}]" "[]" "[{'text': 'ServeDifferentlyAbled', 'indices': [20, 42]}]" ...
## $ statusCount : int 98997 22175 23945 1249 820 112968 2209 77395 1822 340 ...
## $ FollowerCount : int 9241 839 719 8 145 180 238 1789 18 94 ...
## $ FriendsCount : int 9289 439 1306 41 699 938 279 2252 309 195 ...
## $ Retweets : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location : chr "" "Auckland Central, Auckland" "North Vancouver, British Columbia" "" ...
## $ Links : chr "[]" "[]" "[]" "[]" ...
Question1 (5 pts)
1.1 (5 pts) Find out how many unique User,Location, Hashtags are in the given dataset. To get full marks,
enter the number in the table below and provide your code.
# User
# Location
# Hashtags
1
Question2 (15 pts)
The given dataset ranges from Feb. 8, 2020 to Feb. 10, 2020.
2.1 (5 pts) How many tweets (or retweets) are posted each day?
2.2 (5 pts) How many users were active (posting tweets/retweets) each day?
Feb08 Feb09 Feb10
# (re)Tweets
# Active users
2.3 (5 pts) How many users were active for all 3 days? Plot a histogram of the number of tweets/retweets
for each day which were produced by these users.
Question3 (40 pts)
We are interested in testing the relationship between statusCount and FriendCount of User. Our research
hypothesis is something like:
Ha: statusCount and FriendCount of Users are positively correlated
3.1 (10 pts) Define the null hypothesis then show and discuss the result of t-test. Do we reject Ha?
3.2 (3 pts) Plot distribution of User by statusCount.
3.2 (3 pts) Plot distribution of User by FriendCount.
3.3 (4 pts) Now plot the distributions of User by FriendCount and statusCount in a single plot.
3.4 (10 pts) (Qualitative) Describe the plot from 3.3. Do you think the plot is telling you something
different from the result of the t-test?
3.5 (10 pts) (Qualitative) Are there interesting correlations among other attributes which can be observed
in the given dataset? Can you explain from qualitative research perspective?
Question4 (20 pts)
We are interested in top 100 active users. Let’s define the activeness by the ratio of statusCount and
FollowerCount.
activeScoreu = statusCountFollowerCount
4.1 (5 pts) Now show a boxplot of activeScore of users and explain the result (plot).
4.2 (7 pts) (Qualitative) Is the definition of activeScore trustworthy? Justify with reference to credibility,
transferabililty, dependability, and confirmability.
4.3 (8 pts) (Qualitative) Can you provide a better definition of activeScore? Please justify why your
definition is better.
Question5 (20 pts)
We want to analyze usage of Hashtags.
5.1 (5 pts) What is the most frequent hashtag for all time? (ignore the numbers in brackets)
5.2 (5 pts) Show the frequency of top 10 ‘Hashtags’. Decide what would be the most effective way of ploting
and show the plot.
2
5.3 (5 pts) (Qualitative) We have already discussed Top 10 frequent hashtags. Describe other kinds of
top 10 hastags which would be useful. For example, Top 10 popular hastags or Top 10 sensitive hashtags?
Show your definition.
5.4 (5 pts) Plot for each day the top 10 frequent Hashtags along with top 10 Hashtags of your definition
from 5.3. Briefly describe their differences in relation with spreadability.
Grading Scheme
The Quantitative questions will be graded based on the correctness of the answer and the code. Qualitative
questions will be graded based on the quality of analysis and the breadth and depth of tried techniques. Full
marks will be given for formulations that provides well-reasoned and succinct explanations. Therefore, for
quantitative questions, provide your code along with the answers and for qualitative questions, support your
answer with enough evidence.
Submission
Submit on Wattle an archive entitled assignQQ_.zip which contains:
* the PDF of the written document;
* the R code;
MS Word or other document formats are not accepted.
3
51作业君 51作业君

扫码添加客服微信

添加客服微信: IT_51zuoyejun