ETC2420/ETC5242 Statistical Thinking 2020 Learning goals and additional points, week by week Workshop Week Learning Goals Additional specific areas of focus for exam 1: Introduction to R and RStudio • Learn how to set up R and RStudio on your own device. • Learn to install and load R packages. • Learn what are RMarkdown files and reproducible research. • Learn what is 'the tidyverse'. • Learn some basic R commands to manipulate and plot data. • Recognise R code • Be able to explain what reproducibility means and why it is important • head() • tibble::glimpse() 2: Introduction to data, visualisation and wrangling • Identify types of variables, summarise them appropriately, and characterise relationships between them. • Describe scientific data collection principles. • Classify variables as being numerical or categorical. • Illustrate 'tidy data' organisational principles. • Produce descriptive summaries of numerical and categorical data using appropriate ggplot2, tidyr and dplyr functions. • Know the meaning of commonly used descriptive statistics and types of plots for different types of data • e.g. histograms, kernel density plots, bar plots, column plots 3: Randomisation and simulation for testing proportions • Explain terms relevant to statistical hypothesis testing and inference problems • Demonstrate the sampling distribution of a statistic • Construct a randomisation test for independence of two binary variables • Build parametric tests for one and two proportions using the Central Limit Theorem • Reiterate the framework for frequentist inference • Permutations • Sampling without replacement • Permutation test for simulation consistent with H0: p1=p2 • Recognise prop.test() function and its use for one sample, and two independent samples (not paired) • Construction of CLT-based confidence intervals • One-sided and two-sided alternative hypotheses 4/5: Resampling techniques for assessing variability in means • Review the Central Limit Theorem • Apply one and two sample t-tests and confidence intervals • Build Bootstrap confidence interval for numerical data • Distinguish between independent and paired samples • Recognise t.test() function and its use for one sample, and two sample (both paired and independent) samples • Construction of CLT-based confidence intervals • Sampling with replacement • Bootstrap sampling distribution • Bootplot.f() • Use of simulation to understand methodology • Interpretation of confidence intervals, p-values • Permutation test for independent means 0 1 2H : µ µ= 6: Distributional models and maximum likelihood • Apply elementary probability and conditional probability rules • Identify common discrete and continuous univariate distributions • Develop distributional models for i.i.d data and estimate them using maximum likelihood methods • Use CLT- and Bootstrap-based confidence intervals to characterise uncertainty in MLEs • Use of MASS::fitdistr() function and interpret its output • How to obtain the “fitted” theoretical distribution using an estimate of the parameters (e.g. the MLE) • Use N(0,1) quantiles for CLT test/confidence intervals • Implement bootstrap for both scalar and vector-valued parameters • Interpretation of MLE-based confidence intervals, p-values Week 7 : Updating discrete probabilities • Discuss model assessment tools for distributions fitted using MLE • Transition to Bayesian Statistical Thinking • Apply Bayes theorem in discrete cases • Different strategies for QQ-plots • Use of Bayes theorem to spell-checking algorithm • Continuous density for Y|θ with discrete prior for θ still a discrete application of Bayes theorem • Denominator of Bayes theorem is a constant (in terms of θ ) 8: Bayesian inference for numerical data and Decision rules • Review Bayesian statistical thinking • Apply Bayes theorem with conjugate continuous priors • Consider loss functions and decision rules • Construct credibility factors • Bayesian A/B testing (Application of 2 independent Beta- Binomials) • Denominator of Bayes theorem is a constant function of θ • Use of simulation in place of analytical posterior • Relationship between A/B testing and 2 independent proportions test 9: Regression models • Synthesise the Bayesian approach • Compare frequentist and Bayesian inference • Recognise when transformations may be required • Review frequentist simple linear regression • Diagnose problems with a regression model • Fit MLE to Olympic medal count data • Relationship between Lognormal and Normal • Adding a constant before taking a log (if zeroes in data) • Checking MLE fit using QQ-plots • MLE-based prediction distribution • Check regression fit using residual plots, LOOCV • R-squared • broom::tidy(), glance(), augment() • Leverage and Cook’s D 10: Multiple Linear Regression • Apply multiple linear regression models • Diagnose issues related to multi-collinearity • Apply model performance measures • Formulate a general strategy for building a regression model • Techniques to select model regressors • Multicollinearity and the variance inflation factor • ggscatmat() • meifly::fitall() • adjusted R-squared • AIC and negAIC • BIC and negBIC 11: Bayesian Multiple Linear Regression • Introduce Markov chain Monte Carlo methods • Apply Bayesian multiple regression models • Consider Bayesian ensembles • Conditionally conjugate N times independent IG prior for regression model • MCMCpack::MCMCregress • Posterior trace plots • Bayesian prediction averages conditional predictions with respect to posterior of θ
欢迎咨询51作业君