程序辅导案例 > Program >

代写辅导接单-FALL 2024 -

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Lecture 1 ECON 2100, FALL 2024 Overview • Population vs Sample • Methods of Sampling • Types of Variables • Data Visualization • Descriptive Statistics • Population Parameters Population vs. Sample POPULATION A population contains all of the items or

individuals of interest that we seek to study. SAMPLE A sample contains only a portion of a

population of interest. Population vs. Sample Population Sample All the items or individuals about

which we want to draw conclusion(s). A portion of the population of

items or individuals.

Say, we wish to find out fraction of students

who speak Spanish at RPI: Entire student body at RPI is the population

while only Math majors is a sample. Lucio wants to know whether the food he serves in

his restaurant is within a safe range of temperatures.

He randomly selects 70 entrees and measures their

temperatures just before he serves them to his

customers. Identify the population and the sample: a. The population is all of the hot entrees Lucio

serves; the sample is the entrees that are a safe

temperature. b. The population is the 70 selected entrees; the

sample is the entrees that are a safe temperature. c. The population is all of the entrees Lucio serves;

the sample is the 70 selected entrees. O Lucio wants to know whether the food he serves in

his restaurant is within a safe range of temperatures.

He randomly selects 70 entrees and measures their

temperatures just before he serves them to his

customers. Identify the population and the sample: a. The population is all of the hot entrees Lucio

serves; the sample is the entrees that are a safe

temperature. b. The population is the 70 selected entrees; the

sample is the entrees that are a safe temperature. c. The population is all of the entrees Lucio serves;

the sample is the 70 selected entrees. Probability Sample: Simple Random

Sample • Every individual or item from the frame has an equal chance

of being selected. • Selection may be with replacement (selected individual is

returned to frame for possible reselection) or without

replacement (selected individual isn’t returned to the frame). • Samples obtained from table of random numbers or

computer random number generators. - We will see how to do this in R

Selecting a Simple Random Sample

Using Random Number Table Sampling Frame For Population

With 850 Items Item Name

Item # Bev R. 001 Ulan X. 002 . . . . . . . . Joann P. 849 Paul F. 850 Portion Of Random Number Table 49280

88924

35779

00283

81163

07275 11100

02340

12860

74697

96644

89439 09893

23997

20048

49420

88872

08401 The first 5 items in a simple

random sample Item # 492 Item # 808 Item # 892

-- does not exist so ignore Item # 435 Item # 779 Item # 002 • Decide on sample size: n • Divide frame of N individuals into groups of k individuals:

k=N/n • Randomly select one individual from the 1st group.

• Select every kth individual thereafter. Probability Sample: Systematic Sample N = 40 n = 4 k = 10 First Group Probability Sample: Stratified Sample • Divide population into two or more subgroups (called strata) according to

some common characteristic. • A simple random sample is selected from each subgroup, with sample sizes

proportional to strata sizes. • Samples from subgroups are combined into one. • This is a common technique when sampling population of voters, stratifying

across racial or socio-economic lines. Population divided into 4 strata Probability Sample: Cluster Sample • Population is divided into several “clusters,” each representative of the population. • A simple random sample of clusters is selected. • All items in the selected clusters can be used, or items can be chosen from a cluster

using another probability sampling technique. • A common application of cluster sampling involves election exit polls, where certain

election districts are selected and sampled. Population

divided into

16 clusters. Randomly selected clusters for sample Probability Sample: Comparing Sampling Methods Simple random sample and Systematic sample: ◦ Simple to use. ◦ May not be a good representation of the population’s

underlying characteristics. Stratified sample: ◦ Ensures representation of individuals across the entire

population. Cluster sample: ◦ More cost effective. ◦ Less efficient (need larger sample to acquire the same level of

precision). Stratified Sampling Cluster Sampling Researcher decides the criterion

for division Natural division Homogeneity within subgroups

and heterogeneity between

subgroups Heterogeneity within subgroups

and homogeneity between

subgroups Ex. Students at RPI divided

based on year/major and then

individuals are sampled from

each subgroup Ex. Determine proportion of

students in Capital Region who

are science majors. Divide into clusters based on

schools. Then, randomly sample

schools 1. Interview every 10th student who enters the school in the morning. a. Random Sampling b. Cluster Sampling c. Systematic Sampling d. Stratified Sampling 2. Assign each car in a dealership a number and then use a random-number

table to select the cars to be inspected. a. Random Sampling b. Cluster Sampling c. Systematic Sampling d. Stratified Sampling 3. A teacher wants to know how well her students are doing on a topic. She

randomly picks one class to survey. a. Random Sampling b. Cluster Sampling c. Systematic Sampling d. Stratified Sampling O O O 1. Interview every 10th student who enters the school in the morning. a. Random Sampling b. Cluster Sampling c. Systematic Sampling d. Stratified Sampling 2. Assign each car in a dealership a number and then use a random-number

randomly picks one class to survey. a. Random Sampling b. Cluster Sampling c. Systematic Sampling d. Stratified Sampling Classifying Variables By Type • Categorical (qualitative) variables take categories as their values such

as “yes”, “no”, or “blue”, “brown”,

“green”.

• Numerical (quantitative) variables have values that represent a

counted or measured quantity. ⸰ Discrete variables arise from a counting process. ⸰ Continuous variables arise from a measuring process. Examples of Types of Variables Question Responses Variable Type Do you have an Instagram

profile? Yes or No How many text messages

have you sent in the past

three days? --------------- How long did the mobile

app update take to

download? --------------- Examples of Types of Variables Question Responses Variable Type Do you have an Instagram

profile? Yes or No Categorical How many text messages

have you sent in the past

three days? --------------- Numerical (discrete) How long did the mobile

app update take to

download? --------------- Numerical (continuous) Types of Variables Variables Categorical Numerical Discrete Continuous Examples: n Marital Status n Political Party n Eye Color (Defined Categories) Examples: n Number of Children n Defects per hour (Counted Items) Examples: n Weight n Voltage (Measured Characteristics) Nominal Ordinal Examples:

Ratings n Good, Better, Best n Low, Med, High (Ordered Categories) 1. List all quantitative variables. 2. List all qualitative variables.

3. List all continuous variables. 4. List all discrete variables. 5. List all ordinal variables. 6. List all nominal variables. Age ,Height , LDL , children Gender , B6 , Happy , SmokegSC Height Age , LDL # children Happy , SC Gender , BG , Smoke Visualizing Categorical Data:

The Bar Chart • The bar chart visualizes a categorical variable as a series of

bars. The length of each bar represents either the frequency or

percentage of values for each category. Reason For

Shopping Online? Percent Better prices 37% Avoiding holiday

crowds or hassles 29% Convenience 18% Better selection 13% Ships directly 3% Visualizing Categorical Data:

The Pie Chart • The pie chart is a circle broken up into slices that represent

categories. The size of each slice of the pie varies according to

the percentage in each category. Reason For

Shopping Online? Percent Better prices 37% Avoiding holiday

crowds or hassles 29% Convenience 18% Better selection 13% Ships directly 3% Visualizing Numerical Data:

The Histogram Class

Frequency 10 but less than 20

.15

15 20 but less than 30

.30

30 30 but less than 40

.25

40 but less than 50

.20

20 50 but less than 60

.10

Total 20

1.00

100 Relative Frequency Percentage 0 2 4 6 8 5 15 25 35 45 55 More Fr eq ue nc y Histogram: Age Of Students (In a percentage

histogram

the vertical

axis would be defined to

show the percentage of

observations per class). i togram: T mperature Visualizing Two Numerical Variables:

The Scatter Plot • Scatter plots are used for numerical data consisting of paired

observations taken from two numerical variables. • One variable is measured on the vertical axis and the other

variable is measured on the horizontal axis. • Scatter plots are used to examine possible relationships between

two numerical variables. Scatter Plot Example Volume

per day Cost per

day 23 125 26 140 29 146 33 160 38 167 42 170 50 188 55 195 60 200 Cost per Day vs. Production Volume

0 50 100 150 200 250 20 30 40 50 60 70 Volume per Day C o st

p er

D ay

Summary Definitions • The central tendency is the extent to which the values of a

numerical variable group around a typical or central value. • The variation is the amount of dispersion or scattering away

from a central value that the values of a numerical variable

show. • The shape is the pattern of the distribution of values from the

lowest value to the highest value. Measures of Central Tendency: The Mean • The arithmetic mean (often just called the “mean”) is the most

common measure of central tendency. ◦ For a sample of size n: Sample size n XXX n X X n21 n 1i i +++ == å =  Observed values The ith value Pronounced x-bar Measures of Central Tendency: The Mean • The most common measure of central tendency. • Mean = sum of values divided by the number of values. • Affected by extreme values (outliers). 11

19 20 Mean = 13 11

19 20 Mean = 14 31 5 65 5 5141312111 == ++++ 41 5 70 5 2041312111 == ++++ Measures of Central Tendency: The Median • In an ordered array, the median is the “middle” number (50%

above, 50% below). Less sensitive than the mean to extreme values. Median = 13 Median = 13 11

19 20 11

19 20 Measures of Central Tendency: Locating the Median • The location of the median when the values are in numerical order

(smallest to largest): • If the number of values is odd, the median is the middle number. • If the number of values is even, the median is the average of the two

middle numbers. dataorderedtheinposition 2 1npositionMedian += 2 1n +Note that

is not the value of the median, only the position of the median

in the ranked data. • n = 7, then median is on

which position? • n = 8, then median is on

which position? • Ex 1. Find the median for 1, 4, 5, 9, 21, 22 • Ex 2. Find the median for 12, 32, 35, 78, 90 # = 4 &H = 4 . 5 ; arg , between 4th & 5th pos #9 = 7 35 • n = 7, then median is on

which position? Ans: (7+1)/2 = 8th position • n = 8, then median is on

which position? Ans: (8+1)/2 = 4.5th position i.e., average of 4th and 5th positions

• Ex 1. Find the median for 1, 4, 5, 9, 21, 22 Ans: (5+9)/2 = 7 • Ex 2. Find the median for 12, 32, 35, 78, 90 Ans: 35 ↳ Measures of Central Tendency: The Mode • Value that occurs most often. • Not affected by extreme values. • Used for either numerical or categorical data. • There may be no mode. • There may be several modes. 0

14 Mode = 9 0

6 No Mode Measures of Central Tendency: Review Example House Prices:

$2,000,000 $

500,000 $

300,000 $

100,000 $

100,000 Sum $ 3,000,000 § Mean: =

§ Median: middle value of ranked

data

§ Mode: most frequent value

3 , 000 , 000/5 600 , 000 300 , 000 100 , 000 Measures of Central Tendency: Review Example House Prices:

$2,000,000 $

500,000 $

300,000 $

100,000 $

100,000 Sum $ 3,000,000 § Mean: ($3,000,000/5)

$600,000 § Median: middle value of ranked

data

= $300,000 § Mode: most frequent value

= $100,000 Measures of Central Tendency: Which Measure to Choose? • The mean is generally used, unless extreme values (outliers)

exist. • The median is often used, since the median is not sensitive to

extreme values.

For example, median home prices may be

reported for a region; it is less sensitive to outliers. • In many situations it makes sense to report both the mean and

the median. Quiz 1. What is the mode of the following numbers? 4, 9, 6, 3, 4, 2 2. What is the median of the following numbers? 3, 5, 6, 7, 9, 6, 8 3. A data set can have more than one median. True

or False? 4 6 False Same center,

different variation Measures of Variation • Measures of variation give

information on the spread or variability or dispersion of

the data values. Variation Standard

Deviation Range Variance Measures of Variation: The Range • Simplest measure of variation. • Difference between the largest and the smallest values: Range = Xlargest – Xsmallest 0

Range = 13 - 1 = 12 Example: Measures of Variation: Why the Range Can Be Misleading • Does not account for how the data are distributed. • Sensitive to outliers. 7

12 Range = 12 - 7 = 5 7

12 Range = 12 - 7 = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119 • Average (approximately) of squared deviations of values

from the mean. ◦ Sample variance: Measures of Variation: The Sample Variance 1-n )X(X S n 1i 2 i 2 å = - = Where = arithmetic mean n = sample size Xi = ith value of the variable X X Measures of Variation: The Sample Standard Deviation • Most commonly used measure of variation. • Shows variation about the mean. • Is the square root of the variance. • Has the same units as the original data. ◦ Sample standard deviation: 1-n )X(X S n 1i 2 iå = - = Measures of Variation: Comparing Standard Deviations Smaller standard deviation Larger standard deviation Locating Extreme Outliers: Z-Score • To compute the Z-score of a data value, subtract the mean and

divide by the standard deviation. • The Z-score is the number of standard deviations a data value is

from the mean. • A data value is considered an extreme outlier if its Z-score is less

than -3.0 or greater than +3.0. • The larger the absolute value of the Z-score, the farther the data

value is from the mean. Locating Extreme Outliers: Z-Score Where

X represents the data value X is the sample mean S is the sample standard deviation S XXZ -= Sx2 = x -* Locating Extreme Outliers: Z-Score • Suppose the mean math SAT score is 490, with a standard

deviation of 100. • Compute the Z-score for a test score of 620. z=0-490 = Locating Extreme Outliers: Z-Score • Suppose the mean math SAT score is 490, with a standard

deviation of 100. • Compute the Z-score for a test score of 620. 3.1 100 130 100 490620 == - = - = S XXZ A score of 620 is 1.3 standard deviations above the

mean and would not be considered an outlier. Numerical Descriptive

Measures for Population • Descriptive statistics discussed previously described a

sample, not the population. • Summary measures describing a population, called

parameters, are denoted with Greek letters. • Important population parameters are the population mean,

variance, and standard deviation. Numerical Descriptive

Measures for Population • Descriptive statistics discussed previously described a

sample, not the population. • Summary measures describing a population, called

parameters, are denoted with Greek letters. • Important population parameters are the population mean,

variance, and standard deviation. • The population parameter is a constant while the sample

statistic is variable Numerical Descriptive Measures

for Population:

The Mean µ • The population mean is the sum of the values in the population

divided by the population size, N. N XXX N X N21 N 1i i +++ ==µ å =  μ = population mean N = population size Xi = ith value of the variable X Where Y mch • Average of squared deviations of values from the mean. ◦ Population variance: Numerical Descriptive Measures

for Population:

The Variance σ2 N μ)(X σ N 1i 2 i 2 å = - = Where μ = population mean N = population size Xi = ith value of the variable X K sigma Numerical Descriptive Measures for

Population:

The Standard Deviation σ • Most commonly used measure of variation. • Shows variation about the mean. • Is the square root of the population variance. • Has the same units as the original data. ◦ Population standard deviation: N μ)(X σ N 1i 2 iå = - =C Sigma Sample Statistics vs.

Population Parameters Measure Population

Parameter Sample

Statistic Mean Variance Standard

Deviation X 2S S µ 2s s Quartile Measures • Quartiles split the ranked data into 4 segments with an

equal number of values per segment. 25% ⸰ The first quartile, Q1, is the value for which 25% of the

values are smaller and 75% are larger. ⸰ Q2 is the same as the median (50% of the values are

smaller and 50% are larger). ⸰ Only 25% of the values are greater than the third quartile. Q1 Q2 Q3 25% 25% 25% Quartile Measures: Locating Quartiles • Find a quartile by determining the value in the appropriate

position in the ranked data, where: First quartile position:

Q1 = (n+1)/4

ranked value Second quartile position:

Q2 = (n+1)/2 ranked value Third quartile position:

Q3 = 3(n+1)/4

ranked value Where

n is the number of observed values. Quartile Measures: Calculation Rules When calculating the ranked position use the following

rules: ◦ If the result is a whole number then it is the ranked position to

use. ◦ If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.) then

average the two corresponding data values. ◦ If the result is not a whole number or a fractional half then

round the result to the nearest integer to find the ranked

position. (n = 9) Q1 is in the (9+1)/4 = 2.5 position of the ranked data, so

Q1 = (12+13)/2 = 12.5. Q2 is in the (9+1)/2 = 5th position of the ranked data, so

Q2 = median = 16. Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data, so

Q3 = (18+21)/2 = 19.5. Quartile Measures Calculating The Quartiles:

Example Sample Data in Ordered Array:

22 Q1 and Q3 are measures of non-central location. Q2 = median, is a measure of central tendency. (n = 8) Q1 is in the (8+1)/4 = 2.25 position of the ranked data, so

Q1 =

Q2 is in the (8+1)/2 = 4.5 position of the ranked data, so

Q2 = median =

Q3 is in the 3(8+1)/4 = 6.75 position of the ranked data, so

Q3 =

Quartile Measures Calculating The Quartiles:

Example Sample Data in Ordered Array:

Q1 and Q3 are measures of non-central location. Q2 = median, is a measure of central tendency. (n = 8) Q1 is in the (8+1)/4 = 2.25 position of the ranked data, so

Q1 = 12. Q2 is in the (8+1)/2 = 4.5 position of the ranked data, so

Q2 = median = 16. Q3 is in the 3(8+1)/4 = 6.75 position of the ranked data, so

Q3 = 18. Quartile Measures Calculating The Quartiles:

Example Sample Data in Ordered Array:

Q1 and Q3 are measures of non-central location. Q2 = median, is a measure of central tendency. Quartile Measures: The Interquartile Range (IQR) • The IQR is Q3 – Q1 and measures the spread in the middle 50% of the data. • The IQR is also called the midspread because it covers the

middle 50% of the data. • The IQR is a measure of variability that is not influenced by

outliers or extreme values. • Measures like Q1, Q3, and IQR that are not influenced by outliers are called resistant measures. Calculating the Interquartile Range Median (Q2) X maximumXminimum Q1 Q3 Example: 25%

25%

25% 12

70 Interquartile range

= 57 – 30 = 27 The Five Number Summary The five numbers that help describe the center, spread and shape

of data are: ⸰ Xlargest ⸰ Third Quartile (Q3) ⸰ Median (Q2) ⸰ First Quartile (Q1) ⸰ Xsmallest 25% of data

25%

25% of data of data

of data Five Number Summary and The Boxplot • The boxplot is a graphical display of the data based on the five- number summary: Example: Xsmallest -- Q1 -- Median

-- Q3 -- Xlargest Xsmallest Q1 Median

Q3 Xlargest Five Number Summary: Shape of Boxplots • If data are symmetric around the median then the box and

central line are centered between the endpoints. • A boxplot can be shown in either a vertical or horizontal

orientation. Xsmallest Q1 Median

Q3 Xlargest Distribution Shape and

The Boxplot Right-SkewedLeft-Skewed Symmetric Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 Two Measures of the Relationship

Between Two Numerical Variables • Scatter plots allow us to examine the relationship between

two numerical variables. • Two quantitative measures of such relationships: ⸰ The Covariance ⸰ The Coefficient of Correlation The Covariance • The covariance measures the strength of the linear relationship

between two numerical variables (X & Y). The sample covariance: • Only concerned with the strength of the relationship.

• No causal effect is implied. 1n )YY)(XX( )Y,X(cov n 1i ii - -- = å = • Covariance between two variables: cov(X,Y) > 0

X and Y tend to move in the same direction. cov(X,Y) < 0

X and Y tend to move in opposite directions. • The covariance has a major flaw: ◦ It is not possible to determine the relative strength of the relationship from

the size of the covariance. Interpreting Covariance Coefficient of Correlation • Measures the relative strength of the linear relationship between

two numerical variables. Sample coefficient of correlation: Where, YXSS Y),(Xcovr = 1n )X(X S n 1i 2 i X - - = å = 1n )Y)(YX(X Y),(Xcov n 1i ii - -- = å = 1n )Y(Y S n 1i 2 i Y - - = å = Features of the Coefficient of Correlation • The population coefficient of correlation is referred as ρ. • The sample coefficient of correlation is referred to as r. • Either ρ or r have the following features: ◦ Unit free. ◦ Range between –1 and 1. ◦ The closer to –1, the stronger the negative linear relationship. ◦ The closer to 1, the stronger the positive linear relationship. ◦ The closer to 0, the weaker the linear relationship. Scatter Plots of

Sample Data with

Various Coefficients of Correlation Y X Y X Y X Y X r = -1 r = -.6 r = +.3r = +1 Y X r = 0 Quiz 1. Suppose we measure heigh-weight correlation. The height in the data

set are changed from feet to inches, so all values are multiplied by 12.

The correlation coefficient for the new data will be: a.

12 times the original b. 144 times larger than the original c. The same as original 2. The pairs in a data set are exchanged, so the x-coordinates are now

the y-coordinates and all values are multiplied by 12. The correlation

coefficients for the original data and for the new data are: a. Opposites b. Reciprocals c. The same 3. Let Y be a random variable. Then V(Y) equals: a.

4. To infer the political tendencies of the students at your university, you

sample 150 of them. Only one is a simple random sample:

a. make sure that the proportion of minorities are the same in your sample

as in the entire student body b. call every fiftieth person in the student directory at 9 a.m. If the person

does not answer the phone, you pick the next name listed, and so on. c. go to the main dining hall on campus and interview students randomly

there. d. have your statistical package generate 150 random numbers in the range

from 1 to the total number of students in your academic institution, and

then choose the corresponding names in the student telephone

directory. 2[( ) ]YE Y µ- [| ( ) |]YE Y µ- 2[( ) ]YE Y µ- [( )]YE Y µ- Quiz 1. Suppose we measure heigh-weight correlation. The height in the data

set are changed from feet to inches, so all values are multiplied by 12.

The correlation coefficient for the new data will be: a.

12 times the original b. 144 times larger than the original c. The same as original 2. The pairs in a data set are exchanged, so the x-coordinates are now

the y-coordinates and all values are multiplied by 12. The correlation

coefficients for the original data and for the new data are: a. Opposites b. Reciprocals c. The same 3. Let Y be a random variable. Then V(Y) equals: a.

4. To infer the political tendencies of the students at your university, you

sample 150 of them. Only one is a simple random sample:

a. make sure that the proportion of minorities are the same in your sample

as in the entire student body b. call every fiftieth person in the student directory at 9 a.m. If the person

does not answer the phone, you pick the next name listed, and so on. c. go to the main dining hall on campus and interview students randomly

there. d. have your statistical package generate 150 random numbers in the range

from 1 to the total number of students in your academic institution, and

then choose the corresponding names in the student telephone