辅导案例-MATH5855

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
Lecture 1: Exploratory Data Analysis of Multivariate Data
1.1 Data organisation
1.2 Basic summaries
1.3 Visualisation
1.4 Software
UNSW MATH5855 2020T3 Lecture 1 Slide 1
1. Exploratory Data Analysis of Multivariate Data
1.1 Data organisation
1.2 Basic summaries
1.3 Visualisation
1.4 Software
UNSW MATH5855 2020T3 Lecture 1 Slide 2
Representation
case (a.k.a. item, individual, or experimental trial) p ≥ 1 variables
recorded on each unit of analysis
xij ith (of p) variable observed on jth (of n) case
data matrix:
p×n
X =

x11 x12 · · · x1j · · · x1n
x21 x22 · · · x2j · · · x2n
...
...
. . .
...
. . .
...
xi1 xi2 · · · xij · · · xin
...
...
. . .
...
. . .
...
xp1 xp2 · · · xpj · · · xpn

(1.1)
UNSW MATH5855 2020T3 Lecture 1 Slide 3
1. Exploratory Data Analysis of Multivariate Data
1.1 Data organisation
1.2 Basic summaries
1.3 Visualisation
1.4 Software
UNSW MATH5855 2020T3 Lecture 1 Slide 4
Univariate summaries
sample mean (of variable i) x¯i = 1n
∑n
j=1 xij
sample variance (of variable i) s2i =
1
n
∑n
j=1(xij − x¯i )2
I Sometimes, we will use divisor of n − 1 instead.
UNSW MATH5855 2020T3 Lecture 1 Slide 5
Bivariate summaries
sample covariance (of variables i and k)
sik =
1
n
∑n
j=1(xij − x¯i )(xkj − x¯k)
I Linear association only!
I Symmetric: sik ≡ ski .
sample correlation (of variables i and k) rik = sik√sii√skk ≡
sik
si sk
I A unitless measure.
I Also symmetric.
I Cauchy–Bunyakovsky–Schwartz Inequality =⇒ |rik | ≤ 1.
I Also linear; can use quotient correlation instead for nonlinear.
UNSW MATH5855 2020T3 Lecture 1 Slide 6
Calculations on matrix data
The descriptive statistics that we discussed until now are usually
organised into arrays, namely:
Vector of sample means x¯ =
(
x¯1 x¯2 · · · x¯p
)>
Matrix of sample variances and covariances
n×n
S =

s11 s12 · · · s1p
s21 s22 · · · s2p
...
...
. . .
...
sp1 sp2 · · · spp
 (1.2)
Matrix of sample correlations
n×n
R =

1 r12 · · · r1p
r21 1 · · · r2p
...
...
. . .
...
rp1 rp2 · · · 1
 (1.3)
UNSW MATH5855 2020T3 Lecture 1 Slide 7
1. Exploratory Data Analysis of Multivariate Data
1.1 Data organisation
1.2 Basic summaries
1.3 Visualisation
1.4 Software
UNSW MATH5855 2020T3 Lecture 1 Slide 8
Graphical representations
Some simple characteristics of the data are worth studying before
the actual multivariate analysis would begin:
I drawing scatterplot of the data;
I calculating simple univariate descriptive statistics for each
variable;
I calculating sample correlation and covariance coefficients; and
I linking multiple two-dimensional scatterplots.
UNSW MATH5855 2020T3 Lecture 1 Slide 9
1. Exploratory Data Analysis of Multivariate Data
1.1 Data organisation
1.2 Basic summaries
1.3 Visualisation
1.4 Software
UNSW MATH5855 2020T3 Lecture 1 Slide 10
SAS In SAS, the procedures that are used for this purpose are
called proc means, proc plot and proc corr. Please study
their short description in the included SAS handout.
R In R, these are implemented in base::rowMeans,
base::colMeans, stats::cor, graphics::plot,
graphics::pairs, GGally::ggpairs. Here, the format is
PACKAGE::FUNCTION, and you can learn more by running
library(PACKAGE)
? FUNCTION
UNSW MATH5855 2020T3 Lecture 1 Slide 11

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468