Sunteți pe pagina 1din 60

Chapter 2: Statistical Measures

Chapter 2: Statistical Measures


Keith E. Emmert
Department of Mathematics Tarleton State University

June 7, 2011

Chapter 2: Statistical Measures Outline

Parameters and Statistics

Central Tendency of a Data Set

Variation or Spread of a Data Set

Chapter 2: Statistical Measures Parameters and Statistics

Some Basic Denitions


The population is the entire group of objects or individuals under study, about which information is wanted. A unit is an individual object or person in the population. The units are often called subjects if the population consists of people. A sample is a part of the population that is actually used to get information. A variable is a characteristic of interest to be measured for each unit in the sample. The size of the population is denoted by the capital letter N . The size of the sample is denoted by the small letter n.

Chapter 2: Statistical Measures Parameters and Statistics

Example
Population, Unit, Sample, Size

Population N = 20 n=5 Unit

Sample

Chapter 2: Statistical Measures Parameters and Statistics

Some Basic Denitions

A parameter is a numerical value that would be calculated using all of the values of the units in the population. A statistic is a numerical value that is calculated using all of the values of the units in a sample.

Chapter 2: Statistical Measures Parameters and Statistics

Example
Parameter or Statistic?

According to the Campus Housing Fact Sheet at a Big-Ten University, 60% of the students living in campus housing are in-state residents. In a sample of 200 students living in campus housing, 56.5% were found to be in-state residents. Circle your answer.
1

In this particular situation, the value of 60% is a (parameter, statistic). In this particular situation, the value of 56.5% is a (parameter, statistic).

Chapter 2: Statistical Measures Parameters and Statistics

Denitions

A unit is the item or object we observe. When the object is a person, we refer to the unit as a subject. An observation is the information or characteristic recorded for each unit. A characteristic that can vary from unit to unit is called a variable. A collection of observations on one or more variables is called a data set.

Chapter 2: Statistical Measures Parameters and Statistics

Denitions
More About Variables

Qualitative variables are those which classify the units into categories. The categories may or may not have a natural ordering to them. Qualitative variables are also called categorical variables. Quantitative variables have numerical values that are measurements (length, weight, and so on) or counts (of how many). Arithmetic operations on such numerical values do have meaning.
A discrete variable can only take on a nite (or countable) number of values. A continuous variable can take on any value in an interval (or collection of intervals).

Chapter 2: Statistical Measures Parameters and Statistics

Example
Unit, Observation, Variables

Composer Ludwig Van Beethoven Nikolai Karlovich Medtner Jacques Oenbach Identify the following: Unit Obervation

Period Classique of Vienna Romantique Modernism

Age 56.25 71.5 77.17

Siblings 6 5 6

Variables (Qualitative or Quantitative; Discrete or Continuous)

Chapter 2: Statistical Measures Parameters and Statistics

Example
What Type of Random Variable is Weight?

Packages are brought to a mailing center and weighed. Their results are recorded. Is weight discrete or continuous? Packages are brought to a mailing center and weighed. Their weights are recorded to the nearest pound. Is weight discrete or continuous? Packages under 5 pounds are classied as light, those weighing between 5 and 20 pounds are classied as medium and those over 20 pounds are classied as heavy. We record the variable weight, which takes on the values light, medium, or heavy. Now the variable weight is qualitative. Random variables are determined by their context in experiments, not by general categories. It is important to ask many questions about the data and how they were obtained

Chapter 2: Statistical Measures Central Tendency of a Data Set

Measures of Center
What single number would best represent the most typical age for the 20 subjects? Subject Gender Age Subject Gender Age 1 M 45 11 M 41 2 M 41 12 F 44 3 F 51 13 F 47 4 F 46 14 F 49 5 F 47 15 M 45 6 F 42 16 F 42 7 M 43 17 M 41 8 F 50 18 F 40 9 M 39 19 M 45 10 M 32 20 M 37

Measures of center are numerical values that tend to report (in some sense) the middle of the data. We shall focus on two such measures: the mean and the median.

Chapter 2: Statistical Measures Central Tendency of a Data Set

Measures of Center
Mean

The mean of a set of n observations is the sum of the observations divided by the number of observations, n. If the observations are a sample of a larger group, then we denote the mean by x (pronounced x -bar). If the observations are the entire group, i.e. the entire population, then we denote the mean by the Greek letter . Math Trip: If x1 , x2 , . . . , xn denote the observations, then the mean is calculated by (x1 + x2 + + xn ) . n Note the parentheses in the numerator...if you forget these in your calculator, things will go horribly wrong!

Chapter 2: Statistical Measures Central Tendency of a Data Set

Example
Measures of Center: Mean

Consider the following data. Subject Gender Age Subject Gender Age 1 M 45 11 M 41 2 M 41 12 F 44 3 F 51 13 F 47 4 F 46 14 F 49 5 F 47 15 M 45 6 F 42 16 F 42 7 M 43 17 M 41 8 F 50 18 F 40 9 M 39 19 M 45 10 M 32 20 M 37

Find the mean of the ages of the male subjects. x = (45 + 41 + 43 + 39 + 32 + 41 + 45 + 41 + 45 + 37) 409 = = 40.9 10 10

Chapter 2: Statistical Measures Central Tendency of a Data Set

Example
Measures of Center: Mean

Suppose that the number of children in a simple random sample of 10 households is as follows: 2, 3, 0, 2, 1, 0, 3, 0, 1, 4.
1 2 3

Calculate the sample mean number of children per household. Interpret your answer. Suppose that the observation for the last household in the above list was incorrectly recorded as 40 instead of 4.What would happen to the mean?

Chapter 2: Statistical Measures Central Tendency of a Data Set

Solution
Measures of Center: Mean

Calculate the sample mean number of children per household. 16 = 10 = 1.6. x = (2+3+0+2+1+0+3+0+1+4) 10 Interpret your answer. Note that 1.6 is not rounded up to say 2. We are reporting a value that we would expect on average, over many samples of 10 households. Suppose that the observation for the last household in the above list was incorrectly recorded as 40 instead of 4.What would happen to the mean? 52 x = (2+3+0+2+1+0+3+0+1+40) = 10 = 5.2. 10

Thus we say the mean is sensitive to extreme observations. Most graphical displays would detect this...always graph your data!

Chapter 2: Statistical Measures Central Tendency of a Data Set

Lets Do It!
Measures of Center: Mean

Suppose a sample of size n = 10 observations is observed. Can x be larger than the maximum value or less than the minimum value? If yes, give an example. Can x be the minimum value? If yes, give an example. Can x be the maximum value? If yes, give an example. Can x be exactly the midpoint between the minimum and maximum value (when the minimum does not equal the maximum)? If yes, give an example. Can x be exactly the second smallest value (out of the 10, not all equal observations, when they are ordered from smallest to largest)? If yes, give an example. Can x be not equal to any value in the sample? If yes, give an example.

Chapter 2: Statistical Measures Central Tendency of a Data Set

Lets Do It!
A Mean Is Not Always Representative

Kims biology test scores are 7, 98, 25, 19, and 26. Calculate Kims mean test score. Explain why the mean does not do a very good job at summarizing Kims test scores.

Chapter 2: Statistical Measures Central Tendency of a Data Set

Lets Do It!
Combinint Means

We have seven students. The mean score for three of these students is 54 and the mean score for the four other students is 76. What is the mean score for all seven students?

Chapter 2: Statistical Measures Central Tendency of a Data Set

The Mean
As an Equilibrium Point

The mean = the point of equilibrium, the point where the distribution would balance.
1 2 3 1 2 3 4 5 6 7

Mean = 2

Mean = 3

If the distribution is symmetric, as in the rst picture at the left, the mean would be exactly at the center of the distribution. As the largest observation is moved further to the right, making this observation somewhat extreme, the mean shifts towards the extreme observation. If a distribution appears to be skewed, we may wish also to report a more resistant measure of center.

Chapter 2: Statistical Measures Central Tendency of a Data Set

Frequency Tables
Sometimes data is grouped into classes. This is called a frequency table. The data represent the number of miles run during one week for a sample of 20 runners. 7, 13, 15, 18, 18, 20, 22, 22, 24, 24, 25, 26, 27, 28, 29, 33, 34, 35, 37, 40. This can be grouped into the following frequency table (based upon given classes). Class 5.5 - 10.5 10.5 - 15.5 15.5 - 20.5 20.5 - 25.5 25.5 - 30.5 30.5 - 35.5 35.5 - 40.5 Frequency 1 2 3 5 4 3 2

Chapter 2: Statistical Measures Central Tendency of a Data Set

The Mean of Grouped Data/Frequency Tables


Unfortunately, if the original data is not available, then nding the mean becomes a bit more interesting. Assume that all observations b in a given class, a-b , are at the midpoint, xm = a+ 2 . Class 5.5 - 10.5 10.5 - 15.5 15.5 - 20.5 20.5 - 25.5 25.5 - 30.5 30.5 - 35.5 35.5 - 40.5 Frequency, f 1 2 3 5 4 3 2 n = 20 Midpoint, xm (5.5 + 10.5)/2 = 8 (10.5 + 15.5)/2 = 13 18 23 28 33 38 f xm 8 26 54 115 112 99 76 f xm = 490

The mean is given by x =

f xm n

490 20

= 24.5 miles

Chapter 2: Statistical Measures Central Tendency of a Data Set

Lets Do It!
The Mean of Grouped Data/Frequency Tables

Eighty randomly selected light bulbs were tested to determine their lifetime in hours. The frequency table of the results is shown in table. Find the average lifetime of a light bulb. Class 53-63 64-74 75-85 86-96 97-107 108-118 Frequency, f 6 12 25 18 14 5 n= Midpoint, xm f xm

f xm =
f xm n

The mean is given by x =

Chapter 2: Statistical Measures Central Tendency of a Data Set

Lets Do It!
The Mean of Grouped Data/Frequency Tables

The cost per load (in cents) of 35 laundry detergents tested by consumer organization is given below. Class 13-19 20-26 27-33 34-40 41-47 48-54 55-61 62-68 Frequency, f 2 7 12 5 6 1 0 n= The mean is given by x =
f xm n

Midpoint, xm

f xm

f xm = =

Chapter 2: Statistical Measures Central Tendency of a Data Set

Measures of Center
Median

A measure of center that is more resistant to extreme values is the median. The median, M , of a set of n observations, ordered from smallest to largest, is a value such that half of the observations are less than or equal to that value and half the observations are greater than or equal to that value. If the number of observations is odd, the median is the middle observation. If the number of observations is even, the median is any number between the two middle observations, including either of the two middle observations. To be consistent, we will dene the median as the mean or average of the two middle observations.

Chapter 2: Statistical Measures Central Tendency of a Data Set

Example
Median

Find the median, M , of the ages of the following 8 subjects.. 30 37 39 40 M So, M = 41 + 42 = 41.5. 2 41 42 43 44

Chapter 2: Statistical Measures Central Tendency of a Data Set

Lets Do It!
Median

The number of children in a household of 10 households is shown below. Number of Children Median, M = What happens to the median if the fth observation in the rst list was incorrectly recorded as 40 instead of 4? What happens to the median if the third observation in the rst list was incorrectly recorded as -20 instead of 0? The median is resistant-that is, it does not change, or changes very little, in response to extreme observations. 2 3 0 1 4 0 3 0 1 2

Chapter 2: Statistical Measures Central Tendency of a Data Set

Measures of Center
Mode

The mode of a set of observations is the most frequently occurring value; it is the value having the highest frequency among the observations. The mode of the values: {0, 0, 0, 0, 1, 1, 2, 2, 3, 4} is 0. For {0, 0, 0, 1, 1, 2, 2, 2, 3, 4} two modes, 0 and 2 (bimodal) The mode for {0, 1, 2, 4, 5, 8} is none! The mode is not often used as a measure of center for quantitative data. The mode can be computed for qualitative (non-numeric) data.

Chapter 2: Statistical Measures Central Tendency of a Data Set

Measures of Center
Dierent Measures Can Give Dierent Impressions

Consider the annual incomes of ve families in a neighborhood: $12, 000, $12, 000 $30, 000 $90, 000 $100, 000

1 2 3 4

Calculate the average income. Calculate the median income. Calculate the modal income. If you were trying to promote that this is an auent neighborhood, which measure might you prefer to present? If you were trying to argue against a tax increase, which measure might you prefer to present? If you want to represent these values with the income that is in the middle, which measure might you prefer to present?

Chapter 2: Statistical Measures Central Tendency of a Data Set

Measures of Center
Shapes of Distributions

Symmetric Distribution

Bimodal Distribution

Mean=Median=Mode

Mode

Mean=Median

Mode

Left Skewed

Right Skewed

Mean

Median

Mode

Mode

Median

Mean

Chapter 2: Statistical Measures Central Tendency of a Data Set

Homework

Homework 1: page 37: 1, 4, 5, 8, 12, 19

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Consider the following data sets. List 1: 55, 56, 57, 58, 59, 60, 60, 60, 61, 62, 63, 64, 65 List 2: 35, 40, 45, 50, 55, 60, 60, 60, 65, 70, 75, 80, 85

List 1 35 40 45 50 55 60 65 70 75 80 85 List 2 35 40 45 50 55 60 65 70 75 80 85

Median = Mean = Mode = 60 for both sets The spread is dierent!

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Range

Range is just the dierence between the largest value and the smallest value. Consider the following data sets. List 1: 55, 56, 57, 58, 59, 60, 60, 60, 61, 62, 63, 64, 65 List 2: 35, 40, 45, 50, 55, 60, 60, 60, 65, 70, 75, 80, 85 Range of List 1: 65 55 = 10. Range of List 2: 85 35 = 50. Clearly, List 2 is spread out more than List 1.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Problems with Range

Consider the following data sets.

List 1 35 40 45 50 55 60 65 70 75 80 85

List 2 35 40 45 50 55 60 65 70 75 80 85

Both lists have ranges of 50. Obviously, List 1 has more data concentrted in the middle. List 2 has more data concentrated on the ends.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Quartiles

The three values that divide the data into four parts are called the quartiles, represented by Q1 , Q2 = M = Median, and Q3 . Finding the quartiles: Find the median of all of the observations. First Quartile = Q1 = median of observations that fall below the median. Third Quartile = Q3 = median of observations that fall above the median.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Quartiles

Some things to remember: When the number of observations is odd, the middle observation is the median. This observation is not included in either of the two halves when computing Q1 and Q3 . Although dierent books, calculators, and computers may use slightly dierent ways to compute the quartiles, they are all based on the same idea. In a left-skewed distribution, the rst quartile will be farther from the median than the third quartile. If the distribution is symmetric, the quartiles should be the same distance from the median. In a right-skewed distribution, the third quartile will be farther from the median than the rst quartile.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Example
Quartiles

Find the quartiles of the ages of the following 8 subjects.. 30 37 Q1 37 + 39 = 38 2 41 + 42 M = Q2 = = 41.5 2 42 + 43 Q3 = = 42.5 2 Q1 = 39 40 M 41 42 Q3 43 44

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Interquartile Range

The interquartile range measures the spread of the middle 50% of the data and is dened to be IQR = Q3 Q1 . Find the interquartile range of the ages of the following 8 subjects.. 30 37 Q1 39 40 M 41 42 Q3 43 44

Recall that Q1 = 38 and Q3 = 42.5. Hence IQR = Q3 Q1 = 42.5 38 = 4.5.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Interquartile Range

The p th -percentile is the value such that p% of the observations fall at or below that value and (100 - p)% of the observations fall at or above that value. The rst quartile Q1 is the 25th -percentile since 25% of the data fall below and 75% of the data fall above. The second quartile Q2 = M (the median) is the 50th -percentile since 50% of the data fall below and 50% of the data fall above. The third quartile Q3 is the 75th -percentile since 75% of the data fall below and 25% of the data fall above.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Five Number Summary

One well used measure of variation is the ve number summary dened to be the Minimum, Q1 , Median, Q3 , and Maximum of the data set. Find the ve number summary of the ages of the following 8 subjects.. 30 37 Q1 Solution: Min = 30, Q1 = 38, M = 40.5, Q3 = 42.5, Max = 44. 39 40 M 41 42 Q3 43 44

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


Boxplots

A boxplot is a graphical representation of the ve number summary of a data set. List the data values in order from smallest to largest. Find the ve number summary: Minimum, Q1 , Median, Q3 , and Maximum. Q1 and Q3 determine the ends of the box, and a line is drawn inside the box to mark the value of the Median. Draw lines (called whiskers) from the midpoints of the ends of the box out to the Minimum and Maximum.

Min Q1 M Q3

Max

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Example
Boxplots

Construct a boxplot for the ages of the following 8 subjects.. 30 37 Q1 39 40 M 41 42 Q3 43 44

Recall: Min = 30, Q1 = 38, M = 40.5, Q3 = 42.5, Max = 44.


= 1 38 M = 40. 5 Q = 3 5 42.

M in = 30

M ax = 44 IQR Age

30

35

40

45

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Measures of Variation or Spread


1.5 IQR Rule to Identify Outliers and Build Modied Boxplots

Find the ve number summary. Draw the box part of the boxplot using Q1 , M , and Q3 . Find the Interquartile Range, IQR = Q3 Q1 . Compute the quantity STEP = 1.5 IQR . Find the location of the inner fences.
Lower Inner Fence = Q1 STEP Upper Inner Fence = Q3 + STEP

Draw whiskers from the midpoints of the ends of the box to the smallest and largest values within the inner fences. These whiskers end with small vertical lines. All of the observations that fall outside the inner fences are potential outliers and are plotted with solid dots.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Example
Modied Boxplots

Construct a modied boxplot for the ages of the following 8 subjects 30, 37, 39, 40, 41, 42, 43, 44. Recall: Min = 30, Q1 = 38, M = 40.5, Q3 = 42.5, Max = 44. Note: IQR = Q3 Q1 = 42.5 38 = 4.5 and STEP = 1.5 IQR = 1.5(4.5) = 6.75. Lower Fence: Q1 STEP = 38 6.75 = 31.25 Upper Fence: Q3 + STEP = 42.5 + 6.75 = 49.25
5 5 42. 40. = Q3 M ax = 44

M in = 30

Lower Fence

Q1

38 M

Upper Fence

IQR 30 35 40 45 Age 50

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Example
Side-by-Side Boxplots

Side-by-side boxplots are helpful for comparing two or more distributions with respect to the ve-number summary.

Although the median of the rst process is closer to the target value of 20.000 cm, the second process produces a less variable distribution.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Lets Do It!
Modied Boxplots

Variable = age for 23 children randomly assigned to one of two treatment groups. Amoxicillin 8 9 9 10 10 11 11 12 14 14 17 9 10 10 11 12 13 14 Cefadroxil 7 8 9 9 (a) Give the ve-number summary for each of the two treatment groups. Comment on your results. (b) Make side-by-side Boxplots for the antibiotic study data in part (a). (c) Using our rule of thumb, are there any outliers for the Amoxicillin group? If so, modify your Boxplot above. (d) Using our rule of thumb, are there any outliers for the Cefadroxil group? If so, modify your Boxplot above.

16

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Lets Do It!
Modied Boxplots

For each of the following modied boxplots, report the corresponding ve-number summary and list the values for all outliers (if any). (a)
0 10 20 30 40 50 60 70 80 90 100

Min (b)

, Q1

,M

, Q3

, Max

, Outliers

0 10 20 30 40 50 60 70 80 90 100

Min (c)

, Q1

,M

, Q3

, Max

, Outliers

0 10 20 30 40 50 60 70 80 90 100

Min

, Q1

,M

, Q3

, Max

, Outliers

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Standard Deviation
The Idea

Standard deviation is a measure of the spread of the observations from the mean. Think of the standard deviation as roughly an average (or standard) distance of the observations from their mean. If all of the observations are the same, then the standard deviation will be 0 (i.e. no spread). Otherwise the standard deviation is positive and the more spread out the observations are about their mean, the larger the value of the standard deviation.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Standard Deviation
The Idea

Suppose you make three observations: 0, 5, 7. Then, the sample 0+5+7 mean is x = = 4. 3
Deviation = 1 Deviation = -4 Deviation = 3 x =4 0 5 7

Problem: The average of the deviations is zero! 4 + 1 + 3 0 = = 0. 3 3 (Thats boring!) It turns out that the average of the deviations from the mean will always be zero...so we need a little trick.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Standard Deviation
The Idea

Suppose you make three observations: 0, 5, 7. Then, the sample mean is x = 4. Solution: Use the squared deviations from the mean. Deviations from the Mean Squared Deviations -4 16 1 1 3 3

The average, which is called the sample variance is 16 + 1 + 9 26 = = 13. 31 2 The sample standard deviation is 13 3.60555.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Standard Deviation
The Idea

Notes When calculating sample variance in the previous example, 26 16 + 1 + 9 = = 13, we subtract one in the 31 2 denominator...this is because we estimated the mean and hence have used up some information...if you want more information, then take advanced statistics courses. When calculating (sample or population) standard deviation, we square all of the numbers and then add them...so the variance is measured in squared units...so we take a square root to preserve return to the original units. Just as the mean is not a resistant measure of center, since the standard deviation used the mean in its denition, it is not a resistant measure of spread. It is heavily inuenced by extreme values.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Standard Deviation
The Math

Let x1 , x2 , . . . , xn denote n observations.


n (xi x )2 . The sample variance is denoted by s 2 = i =1 n1 The sample standard deviaiton is s = s 2 .

Suppose is the population mean. The population variance is n (xi )2 2 denoted by = i =1 . n The population standard deviaiton is = 2 . Note that when dealing with population variance or standard deviation, we do not divide by n 1 since we have not estimated the mean...the population mean can be calculated exactly.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Standard Deviation
The Math - Shortcut Formulas for Sample Variance or Sample Standard Deviation

Some shortcut formulas are presented for calculating the sample variance and sample standard deviation. Let x1 , x2 , . . . , xn denote a sample of n observations. Then,
2 n 2 i =1 xi

Variance: s = Standard Deviation: s =

2 n i =1 xi ) /n

n1 s2 =
n 2 i =1 xi

2 n i =1 xi ) /n

n1

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Standard Deviation
The Math - Shortcut Formulas for Population Variance or Population Standard Deviation

Some shortcut formulas are presented for calculating the population variance and population standard deviation. Let x1 , x2 , . . . , xn denote all n observations in a population. Then, Variance: =
2 n 2 i =1 xi

( n

2 n i =1 xi ) /n

Standard Deviation: =

2 =

n 2 i =1 xi

( n

2 n i =1 xi ) /n

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Example
Standard Deviation

In a recent study of the eect of a certain diet on weight reduction, 11 subjects were put on the diet for two weeks and their weight loss/gain in lbs was measured (positive values indicate weight loss). 1, 1, 2, 2, 3, 2, 1, 1, 3, 2.5, 23. What is the standard deviation of the weight loss?
11

xi = 1 + 1 + 2 + 2 + 3 + 2 + 1 + 1 + 3 + 2.5 + (23) = 4.5


i =1 11

xi2 = 12 + 12 + 22 + 22 + 32 + 22 + 12 + 12 + 32 + 2.52 + (23)2


i =1

= 569.25

Continues Next Slide

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Example
Standard Deviation (Continued)

Weve already computed


11 11

xi = 4.5
i =1

and
i =1

xi2 = 569.25

The sample variance is


2 n 2 i =1 xi

= 38.516 11 1 The sample standard deviation is s = s 2 = 38.516 6.20613. So, our answer is s = 6.2 lbs. s = n1 =

2 n i =1 xi ) /n

569.25 (4.5)2 /11

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Lets Do It!
Standard Deviation

The following are the ages of a sample of 20 patients seen in the emergency room of a hospital on a Friday night. 35 37 32 53 21 45 43 23 39 64 60 10 36 34 12 22 54 36 45 55

Find the standard deviation of the ages.

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Variance and Standard Deviation of Grouped Data/Frequency Tables


Unfortunately, if the original data is not available, then nding the sample standard deviation becomes a bit more interesting. Class 5.5 - 10.5 10.5 - 15.5 15.5 - 20.5 20.5 - 25.5 25.5 - 30.5 30.5 - 35.5 35.5 - 40.5 Sum Frequency, f 1 2 3 5 4 3 2 n = 20 Midpoint, xm 8 13 18 23 28 33 38
2 f xm f xm 8 26 54 115 112 99 76 490

64 338 972 2,645 3,136 3,267 2,888 13,310

Bravely turn to the next page...

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Variance and Standard Deviation of Grouped Data/Frequency Tables


The Formulas

2 = 13, 310. We found n = 20, f xm = 490 f xm The formula for sample variance of grouped data is

s =

2 ( f xm

f xm )2 /n =

13, 310 (490)2 /20 20 1 = 68.68

n1

The formula for sample standard deviation of grouped data is s = s 2 = 68.68 8.28734. So, our nal answer is s = 8.3

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Sample Standard Deviation of Grouped Data


The data show distribution of the birth weight ( in oz.) of 100 consecutive deliveries. Find the variance and the standard deviation. Class 29.50-69.45 69.50-89.45 89.50-99.45 99.50-109.45 109.50-119.45 119.50-129.45 129.50-139.45 139.50-169.45 Sum Frequency, f 5 10 11 19 17 20 20 6 Midpoint, xm f xm
2 f xm

Chapter 2: Statistical Measures Variation or Spread of a Data Set

Homework

HW page 37: 2, 3, 9, 13