代写辅导接单-Where business comes to life

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

Where business comes to life

Business in Practice: Data Analytics

Week 8

Dr. Markos Kyritsis

OBJECTIVES – LEARNING OUTCOMES

• By the end of this week, you should be able to:

• Critically discuss the difference between correlation and causation

• Use a correlation test, and argue when it is suitable to use each one

• Pearson’s r or Spearman’s Rho

• Report the results of a correlation test

CORRELATIONS

WHAT IS A CORRELATION?

• A symmetrical relationship between two numeric variables

POSITIVE CORRELATION. AS X GOES UP, Y

GOES UP

NEGATIVE CORRELATION. AS X GOES UP, Y

GOES DOWN

STRONG VS WEAK CORRELATION

CORRELATION AND CAUSALITY

• Often confused. A symmetrical relationship does not mean a causal effect. Mediators may be the actual

cause for a rise in the bivariate relationship.

Declining

Poverty

Quality of

GDP

Life

DIRECTIONAL RELATIONSHIPS

• Clearly there is a direction here (which of the two has a causative effect?):

EU

Increase in %

Refugee/Migrant

votes for far right

crisis

UNCLEAR RELATIONSHIPS

• The direction of this relationship may not be so clear:

• Are you more likely to become frustrated with an increase in the number of errors you make while using a

system (e.g., a CRM)?

• Does frustration increase the number of errors you make while using a system?

Number of

Frustration

errors

NOT ALL CORRELATIONS ARE MEANINGFUL

• There are pages dedicated to finding

correlations between seemingly unrelated

variables.

Source url:

https://www.tylervigen.com/spurious-correlations

CORRELATION

• By standardising the covariance we can get an idea of the true effect size of the

relationship. This would make the measurement independent of variable scale and

would allow us to compare relationships of variables on any scale.

• The correlation coefficient (denoted as r) ranges from -1 to 1, with 0 being no

relationship and 1 being the strongest possible positive relationship.

• The formula for standardising the covariance is:

• Where s is the standard deviation of x and y.

• This coefficient is called the Pearson Correlation Coefficient.

PARAMETRIC ASSUMPTION FOR

HYPOTHESIS TEST (PEARSON’S R)

• For the hypothesis test part, we use a z or t distribution. Therefore,

we assume bivariate normality (so test both variables using

Shapiro-Wilk, or look at plots for large n)

• If the assumption violated, switch to a non-parametric test

(Spearman’s Rho is the most popular).

SALARY AND YEARS OF SERVICE

• In the salaries dataset, let’s check if there is a correlation between salary and years of service

BIVARIATE NORMALITY

PEARSON’S R OR SPEARMAN’S RHO?

• The bivariate normality assumption is violated (it’s actually not terrible, but let’s play it safe)

• So let’s use Spearman’s

STEP 1

STEP 2

Hold ‘ctrl’ key to select both variables

STEP 3

REPORTING THE RESULTS

• There was a medium correlation between salary and years of service [rs = 0.43, p < 0.05] *

EFFECT SIZE ESTIMATES

Coefficient (absolute values, i.e., positive or Effect Size

negative)

< |0.3| Small

|0.3| <= r < |0.5| Medium

>=|0.5| Large

It is possible to have a negligible but significant correlation, especially as sample size increases

SUMMARY

• Correlation is the symmetrical relationship between two variables

• Correlation is not necessarily causation

• Pearson’s r is the parametric test

• Spearman’s rho is non-parametric

• Report r or rs along with the p-value

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228