Sunteți pe pagina 1din 43

Statistics

ST 361: Statistics for Engineers Numerical Descriptive Statistics

Kimberly Weems ksweems@ncsu.edu 5260 SAS Hall

Numeric Measures
Why?
Kim is in an introductory history class. On the midterm exam Kim scored 64 out of 100? Did she do well? The class average was a 42.

By knowing the average for the class we can make a comparison.

Statistics

Numeric Measures
Allow us to make comparisons
Of individuals to the group Of group to other groups

Measures of center
Give an idea about the main chunk of the data

Statistics

Measures of Central Tendency


Mean-average Notation:
Population mean: mu Sample mean: y y-bar

Statistics

Measures of Central Tendency


Summation Notation

y
i 1

y1 y2 y3 ... yn

Statistics

Measures of Central Tendency


Summation Notation

y
i 1

y1 y2 y3 ... yn

Sum of y

Statistics

Measures of Central Tendency


Summation Notation

y
i 1

y1 y2 y3 ... yn

Sum of y

Individual values

Statistics

Measures of Central Tendency


y y n

Statistics

Measures of Central Tendency


y y n
Sum of the values

Sample size

Statistics

Measures of Central Tendency


Median- Middle value in a data set when values are put in increasing order
50% of values above and 50% below If even number of observations just average middle two.

Statistics

Simple Example:
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

Statistics

Simple Example:
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40. Mean: 15
y y
i

n 105 15 7
Statistics

9+9+6+15+12+14+40 7

Simple Example:
Soda consumed Median: In increasing order 6 9 9 12 14 15 40

Statistics

Simple Example:
Soda consumed Median: 12 In increasing order 6 9 9 12 14 15 40

Statistics

Mean vs Median
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

10

20

30

40

Statistics

Mean vs Median
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
mean

10

20

30

40

Statistics

Mean vs Median
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
median mean

10

20

30

40

Statistics

Problem with the mean:


Sensitive to unusual values and skewed data
pulled away from the median

Skewed Right
Mean greater than median

Skewed left
Mean less than median.

Symmetric
Mean and median are about the same.

Statistics

Trimmed Mean
A compromise between the average and the median.
Less sensitive to outliers. Observations are ordered from smallest to largest. A trimming percentage 100r% is chosen where r is a number between 0 and 0.5. Suppose r=0.1, so that the trimming percentage is 10%. Then if n=20, 10% of 20 is 2: the trimmed mean results from deleting (trimming) the largest 2 observations and the 2 smallest.
Statistics

Coal
Emissions Uncertainty Project (2009-10), Alissa Anderson, Colin Geisenhoffer, Brody Heffner, Michael Shaw & Emily Wisner

Before 2% Trim

After 2% Trim

Statistics

Measures of Variability
Why? Tell us about consistency and predictability
Allow comparison of groups Gives scale of reference to compare individuals

Statistics

21

Measures of Variability
Range-difference in maximum and minimum
How spread out are the values Soda Amounts: Range = 40-6=34

10

20

30

40

Statistics

22

Measures of Variability
Problem: Range only looks at two values.
Does not quantify spread of the others.

Solution: Look at all values => How far are they from mean Variance- summarizes distance between all individuals and the mean

Statistics

23

Measures of Variability
Important notation:
Population variance: 2 sigma squared Sample variance: s2

Statistics

24

Measures of Variability
Important Formula:


2
Statistics

y
i 1 i

N
25

Measures of Variability
Important Formula:
Calculate average of the squared distances
Squared to get rid of negatives


2
Statistics

y
i 1 i

N
26

Measures of Variability
Sample Variance

s
2

y y
i 1 i

n 1

Statistics

27

Measures of Variability
Sample Variance

s
2

y y
i 1 i

n 1
Divides by n-1 instead of N

Statistics

28

Measures of Variability
Sample Variance
Sum of squares

s
2

y y
i 1 i

n 1
Divides by n-1 instead of N

Statistics

29

Simple Example:
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

Statistics

Statistics

What does it tell us?


By itself not much.
Some people try lots of tricks to try to recreate the data set from this number. The purpose of the number is to make a comparison with other data sets.

Example: Another group of teens had soda consumption that had a variance of 473.2.
Other group was more spread out than our group.

Statistics

The Standard Deviation


Variance is not on the same scale as the original data. Standard Deviation square root of the variance.
Has the same units as original data Allows more direct comparisons

Statistics

33

The Standard Deviation


For amount of soda

s s

s 131.33 11.46

Statistics

What does it tell us?


Understand variability in the data.
Which is more consistent.

Statistics

City Temperature

City Mean Median SD

Raleigh 59 61 15
Statistics
36

City Temperature

City Mean Median SD

Raleigh Fargo 59 61 15
Statistics

42 43 24
37

City Temperature

City Mean Median SD

Raleigh Fargo 59 61 15
Statistics

Fairbanks

42 43 24
38

City Temperature

City Mean Median SD

Raleigh Fargo 59 61 15
Statistics

Fairbanks 28 31 28
39

42 43 24

City Temperature

City Mean Median SD

Raleigh Fargo 59 61 15
Statistics

Fairbanks Honolulu 28 31 28 77 77 3
40

42 43 24

Coefficient of Variation
Coefficient of Variation (CV) ratio of standard deviation to mean
Used to compare variability when scales are very different.

s C.V . y
Statistics
41

Example:
Students in a midwestern state take a end of grade exam that has a maximum of 100 points. A class testing a new teaching method had a standard deviation of 10. Students in an east coast state take an end of grade exam that has a maximum of 500 points. A class testing the new teaching method had a standard deviation of 30. Which is more varied?
Statistics

Example
The mean for the midwestern state was 70.

s 10 C.V . 0.143 14.3% y 70


The mean for the east coast state was 350.

s 30 C.V . 0.086 8.6% y 350

Statistics

S-ar putea să vă placă și