代写辅导接单-STAT7038 -

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

STAT7038

Regression Modelling Semester 1 2025 Week 1 Course Information Teaching staff • Lecturer: Yuan Gao ([email protected]) • Consultation: Wednesday 2-3 pm (Zoom link) or after each lecture • Tutors:

• Houren Hong (head tutor): [email protected] • Ruby Turner: [email protected] • Leo (Tiancheng) Huang: [email protected] • Chuang Xu: [email protected] • Rui Shen: [email protected] • Tutors’ consultation hours to be updated on Wattle. Zoom • You need to use the school account to log in Zoom • Need installation • See information on:

• https://services.anu.edu.au/information-technology/software- systems/anu-zoom-client Communication • Use consultation time. • Use the discussion forum on wattle. • To be fair for all students, the lecturer will read but may not be able to reply emails about questions related to course materials (except for

private questions). Instead, please paste the questions in the discussion

forum. The lecturer and tutor will reply in the discussion forum. • Please get in touch with the lecturer for issues and concerns including

grades, illness, falling behind, and academic accessibility issues. Tutorials • Begin in Week 2. • You should read through the tutorial sheet and think and attempt the

questions before going to the tutorials. • Best opportunity to learn skills and techniques that will be required in

the assignments. • Your tutors are your main source for help. Textbook • The required textbook for this course is: Applied Linear Regression

Models 4th ed by Michael H Kutner • Free ebook from ANU library: link on Wattle. • Linear Models with R by Julian J. Faraway is another good resource. Wattle site • Access to all enrolled students • Course announcements • Lecture resources • Echo360 lecture recordings • Data sets • Tutorial questions, selected solutions • Assessments • Please check this site frequently! Assessment • Must complete independently! Assessment Task Value Due Date Online Quiz 5% Week 5 Assignment 1 15% Week 6 Assignment 2 15% Week 11 Final Examination 65% Central Exam Period Introduction to R and RStudio R and RStudio • R is a programming language and free software environment

for statistical computing and graphics supported by the R Foundation for

Statistical Computing. • Please see the course website for installation instructions for R and

RStudio (suggest choosing English in language options). • You may attempt Tutorial Week 2 - Intro to R before your first tutorial. • Learn R cheatsheet, p1-4. • This course ≠R: The more important thing in this course is to understand statistical concepts.

Your R project • Set your working directory. • Write your code in R script file. • Import data from an external file. • “read” functions • How to get help in R ? • ? Or ?? • Google! Data types in R • Three basic types: • Numeric (numbers) • Character (names) • logical (TRUE / FALSE) Data structures • R operates on named data structures.

• Vector is a single entity consisting of an ordered collection of numbers. • Matrices or more generally arrays are multi-dimensional generalisations of

vectors. • Lists are a general form of vector in which the various elements need not be

of the same type and are often themselves vectors or lists.

• Data frames are matrix-like structures, in which the columns can be of

different types. Think of data frames as ”data matrices” with one row per

observation but with (possibly) both numerical and categorical variables.

• Give meaningful names to your data. R packages • In R, a package is a structured collection of R functions, data, and

compiled code that enhances the capabilities of the base R environment.

• R comes with a standard set of packages, and many more are available

for download and installation. Once installed, a package's contents can

be made available in the current R session by loading the package. • install.packages("x") • Library(“x”) • The Comprehensive R Archive Network (CRAN) is a central repository

that hosts a vast array of R packages contributed by users worldwide.

Revision on Basic Statistics Population & sample • Population (True world) • A collection of the whole of

something • Parameters: true values

describing the population • Eg: , !,

• Unknown • Sample (Your subjective world) • A set of individuals drawn from a

population • Statistics: calculated from the

sample served as estimates of the

parameters

• Eg: $, !,

• Known Properties of estimators • Random variables ( ") • Probability distribution •

" =

" = • Central Limit Theorem (CLT) • ! is asymptotically normally distributed • Make inferences • Confidence interval • Hypothesis testing Linear Regression Regression analysis • Statistical methodology that utilises the relation between two or more

quantitative variables to that a response or outcome variable can be

predicted from the other (or others) . • This methodology is widely used in business, the social and behavioural

sciences, the biological sciences, and many other disciplines. Regression analysis • Examples • Predict sales of a product using the relationship between sales and the

amount spent on advertising. (SLR) • Predict performance of employee using relationship between performance

and aptitude test. (SLR) • Predict the size of the vocabulary of a child using the relationship between

the size of vocabulary and the age of the child and the amount of education

of the parents. (MLR) Relation between Variables • We should distinguish between functional relation and a statistical

relation between variables. • A functional relation between two variables is expressed as mathematical

formula,

= () • A functional relation is a “perfect” mapping from X to Y . • A statistical relation is not perfect. The observations do not fall directly

on the curve of relationship and they are typically scattered around this

curve. Relation between Variables Regression models • Historical Origins • The term regression was first used by Francis Galton in the late 19th century

to explain a biological phenomenon he observed: “regression towards the

mean” . • The height of children of both tall and short parents appeared to “revert” or

“regress” to the mean of the group. Galton Families Dataset • This data set lists the individual observations for 934 children in 205

families on which Galton (1886) based his cross-tabulation. • How to formally describe the relationship? Construction of regression models • Selection of variables • X: Independent variable, predictor, regressor, covariate • Y: Dependent variable, response, outcome, output • Only a limited number of useful covariates should be included in the

regression model

• How do you choose? Through exploratory studies, theory, etc.

Construction of regression models • Functional form of regression relation • Choice of

in the functional form

= () is tied to the choice of

covariate(s). • Sometimes the relevant theory may indicate the appropriate form for . • Typically needs to be determined empirically from the data. Scatter plot may

help. • Linear or quadratic regression functions are often a good first approximation. Construction of regression models • Scope of model • We usually need to restrict the coverage of the model to some interval or

region of values. • The scope is determined either by the design of the investigation or by the

range of data at hand. • The model may perform badly given previously unobserved data. Use of regression • Regression serves three major purposes: • Description (How one variable influence the other) • Control (Set standards, monitor operations, etc.) • Prediction (Given new observations)

Regression and causality • Existence of a statistical relation between response

and covariate

does not imply in any way that

depends causally on

• (correlation ≠causation) • Funny examples? • High ice-cream sales lead to high drowning cases? • Reverse causality:

leads to

or

leads to ? • To reach causality conclusions, experimental studies should be

conducted. Simple linear regression model

(SLR) Formal statement of the SLR • One predictor variable • Linear • ! = " + # ! + ! ,

= 1, … ,

• Where • " : the value of the response variable in the th trial. • # and $ are parameters. • ":

a known constant, the covariate value in the th trial. • ": a random error term with mean 0 and variance ! for all . • " and % are uncorrelated for all

≠ . Important features of SLR • The response ! is a random variable since it is the sum of two

components: • The constant term " + # ! • The random error !. • Since

! = 0, it follows that

! =

" + # ! + ! = " + # ! +

! = " + # ! Important features of SLR • ! is a random variable which probability distribution has a mean value ! = " + # ! • It is more reasonable to describe the linear regression model as() = " + # • ! is a random variable which probability distribution has a variance ! =

" + #! + ! =

! = $ • Our model assumes that !%s come from a probability distribution with

mean " + #! and variance $. The distribution of ! Regression parameters • The parameters are called regression coefficients

• The intercept: " • The slope: # • The slope gives the change in the mean of

per unit increase in

• The intercept (when the scope of the model includes

= 0) gives the

mean of the probability distribution at

= 0 Fitting the model • Data generated from a true model: (unknown)! = " + # ! + ! • What we observe:

• Only

pairs of values #, # , $, $ , … , &, & • Find the best estimated model:5! = " + #! , • meaning finding a straight line that is “closest” to all the observed data

points. Fitting the model Fitting the model—Method of least squares • What’s the “closest” straight line to all observed data points? • For the observations (! , !) for each case, we consider the deviation of ! from its expected value: ! − (" + #!) • The method of least squares considers the sum of the n squared deviations.

= +!$#% ! − ("+#! )& • The estimators of " and # are the values " and # that minimise

given

the observation pairs #, # , &, & , … , %, % . Fitting the model—Method of least squares Properties of LS estimators • Unbiased ["] = ", [#] = # • Minimum variance • More precise/efficient than other unbiased estimator. More than fitting a model – what needs to be

considered in real practice? • What is your question of interest? • Statistical formulation of the question. • Source of the data • Sample size, data cleaning like combing data from different recourses,

checking missing data, data mining (too many variables) • Exploratory Data Analysis • Summary statistics, boxplots, histograms, scatterplots, etc • What model should be used? • Linear/non-linear, simple regression/multiple regression • Fitting a model is the easy part. • Consider appropriateness of the model. • Ensuring the assumptions are met. • Diagnostics for a model to check for validity and significance. • Remedies for violations of assumptions. • Finally, make inferences and predictions More than fitting a model – what needs to be

considered in real practice? • Read Ch 1.1-1.6 of the textbook. 51作业君版权所有

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228