3 - Analyze - Inferential Statistics

Analyze Phase
Inferential Statistics
Inferential Statistics
Welcome to Analyze
“X” Sifting Inferential Statistics
Inferential Statistics Nature of Sampling
Intro to Hypothesis Testing Central Limit Theorem
Hypothesis Testing ND P1
Hypothesis Testing ND P2
Hypothesis Testing NND P1
Hypothesis Testing NND P2
Wrap Up & Action Items
LSS Green Belt v11.1 MT - Analyze Phase 2 © Open Source Six Sigma, LLC
Nature of Inference
in·fer·ence (n.) The act or process of deriving logical conclusions

from premises known or assumed to be true. The act of reasoning
from factual knowledge or evidence. 1 1. Dictionary.com
Inferential Statistics – To draw inferences about the process or

population being studied by modeling patterns of data in a way that
accounts for randomness and uncertainty in the observations. 2
2. Wikipedia.com
Putting the pieces of

the puzzle together….
5 Step Approach to Inferential Statistics
1. What do you want to know?
2. What tool will give you that information?
3. What kind of data does that tool require?
4. How will you collect the data?
5. How confident are you with your data summaries?
So many
questions….?
Types of Error
1. Error in sampling
– Error due to differences among samples drawn at random from the
population (luck of the draw).
– This is the only source of error that statistics can accommodate.
2. Bias in sampling
– Error due to lack of independence among random samples or due to
systematic sampling procedures (height of horse jockeys only).
3. Error in measurement
– Error in the measurement of the samples (MSA/GR&R).
4. Lack of measurement validity

– Error in the measurement does not actually measure what it is
intended to measure (placing a probe in the wrong slot measuring
temperature with a thermometer that is just next to a furnace).
Population, Sample, Observation
Population
– EVERY data point that has ever been or ever will be generated from a
given characteristic.
Sample
– A portion (or subset) of the population, either at one time or over time.
X
X X
X X
Observation
– An individual measurement.
Significance
Significance is all about differences…

Practical difference and significance is:
– The amount of difference, change or improvement that will be of
practical, economic or technical value to you.
– The amount of improvement required to pay for the cost of making
the improvement.
Statistical difference and significance is:

– The magnitude of difference or change required to distinguish
between a true difference, change or improvement and one that
could have occurred by chance.
Twins: Sure there are differences… but

do they matter?
The Mission
Mean Shift Variation Reduction
Both
A Distribution of Sample Means
Imagine you have some population. The individual values of this

population form some distribution.
Take a sample of some of the individual values to calculate the

sample Mean.
Keep taking samples and calculating sample Means.
Plot a new distribution of these sample Means.
The Central Limit Theorem says as the sample size becomes

large this new distribution (the sample Mean distribution) will
form a Normal Distribution no matter what the shape of the
population distribution of individuals.
Sampling Distributions—The Foundation of Statistics
Population • Samples from the population, each with five observations:

3
5 Sample 1 Sample 2 Sample 3
2
12 1 9 2
10 12 8 3
1 9 5 6
6
12 7 14 11
5 8 10 10
6
12 7.4 9.2 6.4
14
3
6
11 • In this example we have taken three samples out of the
9 population each with five observations in it. We computed a
10
10 Mean for each sample. Note the Means are not the same!
12 • Why not?
• What would happen if we kept taking more samples?
Constructing Sampling Distributions
Open Minitab Worksheet “Die Example”.
Roll ‘em!
Sampling Distributions
Calc> Random Data> Sample from Columns…
Sampling Error
Calculate the Mean and Standard Deviation for each column

and compare the sample statistics to the population.
Stat > Basic Statistics > Display Descriptive Statistics…
Descriptive Statistics: Population, Sample1, Sample2, Sample3, Sample4, Sample5
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
Population 1000 0 3.5510 0.0528 1.6692 1.0000 2.0000 4.0000 5.0000 6.0000
Sample1 5 0 3.400 0.927 2.074 1.000 1.500 3.000 5.500 6.000
Sample2 5 0 4.600 0.678 1.517 2.000 3.500 5.000 5.500 6.000
Sample3 5 0 4.200 0.663 1.483 2.000 3.000 4.000 5.500 6.000
Sample4 5 0 3.800 0.917 2.049 2.000 2.000 3.000 6.000 6.000
Sample5 5 0 3.600 0.872 1.949 1.000 2.000 3.000 5.500 6.000
Range in Mean 1.2 (4.600 – 3.400) Range in StDev 0.591 (2.074 – 1.483)
Sampling Error
Create 5 more columns of data sampling 10

observations from the population.
Sampling Error - Reduced
Calculate the Mean and Standard Deviation for each column

and compare the sample statistics to the population.
Stat > Basic Statistics > Display Descriptive Statistics…
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
Sample6 10 0 3.600 0.653 2.066 1.000 1.750 3.500 6.000 6.000
Sample7 10 0 4.100 0.567 1.792 1.000 2.750 4.500 6.000 6.000
Sample8 10 0 3.200 0.442 1.398 1.000 2.000 3.500 4.250 5.000
Sample9 10 0 3.500 0.563 1.780 1.000 2.000 3.500 5.250 6.000
Sample10 10 0 3.300 0.616 1.947 1.000 1.750 3.000 5.250 6.000
Range in Mean 0.9 (4.100 – 3.200) Range in StDev 0.668 (2.066 – 1.398)
With 10 observations the differences

between samples are now much smaller.
Sampling Error - Reduced
Stat> Basic Statistics> Display Descriptive Statistics…
Variable N Mean StDev

Sample 11 30 3.733 1.818
Sample 12 30 3.800 1.562
Sample 13 30 3.400 1.868
Sample 14 30 3.667 1.768
Sample 15 30 3.167 1.487
Range in Mean 0. 63 Range in StDev 0.381

In theory if we kept taking samples of size n = 5 and n = 10 and

calculated the sample Means we could see how the sample
Means are distributed.
Calc> Random Data> Integer…
Simulate this in MINITABTM by creating ten columns of 1000 rolls

of a die.
Feeling lucky…?
For each row calculate the Mean of five columns.

Calc> Row Statistics…
Repeat this command to

calculate the Mean of C1-C10
and store result in Mean10.
Create a Histogram of C1, Mean5 and Mean10.

Graph> Histogram> Simple…..
Multiple Graph…On separate graphs…Same X, including same bins
Select “Same X,
including same
bins” to facilitate
comparison
Different Distributions
Sample Means
What is different about

the three distributions?
What happens as the

number of die throws
increase?
Individuals
Observations
As the sample size (number of die rolls) increases from 1 to 5 to 10,

there are three points to note:
1. The Center remains the same.
2. The variation decreases.
3. The shape of the distribution changes - it tends to become
Normal.
The Mean of the sample Mean The Standard Deviation of the

distribution: sample Mean distribution, also
known as the Standard Error.
Good news: the Mean of the sample Better news: I can reduce my
Mean distribution is the Mean of the uncertainty about the population
population. Mean by increasing my sample size n.
Central Limit Theorem
If all possible random samples, each of size n, are taken from any
population with a Mean μ and Standard Deviation σ the distribution
of sample Means will:
have a Mean
have a Std Dev
and be Normally Distributed when the parent population is Normally

Distributed or will be approximately Normal for samples of size 30 or
more when the parent population is not Normally Distributed.
This improves with samples of larger size.
Bigger is Better!
So What?
So how does this theorem help me

understand the risk I am taking when I use
sample data instead of population data?
Recall that 95% of Normally Distributed data is within ± 2 Standard

Deviations from the Mean. Therefore the probability is 95% my
sample Mean is within 2 standard errors of the true population Mean.
A Practical Example
Let’s say your project is to reduce the setup time for

a large casting:
– Based on a sample of 20 setups you learn your baseline
average is 45 minutes with a Standard Deviation of 10
minutes.
– Because this is just a sample the 45 minute average is an
estimate of the true average.
– Using the Central Limit Theorem there is 95% probability the
true average is somewhere between 40.5 and 49.5 minutes.
– Therefore do not get too excited if you made a process
change resulting in a reduction of only 2 minutes.
Sample Size and the Mean
Theoretical distribution of
sample Means for n = 2
Theoretical distribution of Distribution of individuals in

sample Means for n = 10 the population
Standard Error of the Mean
The Standard Deviation for the distribution of Means is called

the standard error of the Mean and is defined as:
Standard Error
The rate of change in the Standard Error approaches zero at about 30

samples.
Standard Error
0 5 10 20 30
Sample Size
This is why 30 samples is often recommended when generating

summary statistics such as the Mean and Standard Deviation.
This is also the point at which the t and Z distributions become nearly
equivalent.
Summary
At this point you should be able to:
• Explain the term “Inferential Statistics”
• Explain the Central Limit Theorem
• Describe what impact sample size has on your estimates of

population parameters
• Explain Standard Error
A Simple, Fresh, Clean Approach to Lean Six Sigma
Project Tracking and Program Management.
Signup for a free trial now at…

www.SixGrid.com
LSS Green Belt v11.1 MT - Analyze Phase © Open Source Six Sigma, LLC

3 - Analyze - Inferential Statistics

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

3 - Analyze - Inferential Statistics

Încărcat de

Drepturi de autor:

Formate disponibile

Analyze Phase

“X” Sifting Inferential Statistics

Inferential Statistics Nature of Sampling

Intro to Hypothesis Testing Central Limit Theorem

Hypothesis Testing NND P1

Hypothesis Testing NND P2

Wrap Up & Action Items

in·fer·ence (n.) The act or process of deriving logical conclusions

Inferential Statistics – To draw inferences about the process or

Putting the pieces of

1. What do you want to know?

2. What tool will give you that information?

3. What kind of data does that tool require?

4. How will you collect the data?

5. How confident are you with your data summaries?

4. Lack of measurement validity

Significance is all about differences…

Statistical difference and significance is:

Twins: Sure there are differences… but

Mean Shift Variation Reduction

Imagine you have some population. The individual values of this

Take a sample of some of the individual values to calculate the

Keep taking samples and calculating sample Means.

Plot a new distribution of these sample Means.

The Central Limit Theorem says as the sample size becomes

Population • Samples from the population, each with five observations:

Open Minitab Worksheet “Die Example”.

Calc> Random Data> Sample from Columns…

Calculate the Mean and Standard Deviation for each column

Stat > Basic Statistics > Display Descriptive Statistics…

Descriptive Statistics: Population, Sample1, Sample2, Sample3, Sample4, Sample5

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

Sample1 5 0 3.400 0.927 2.074 1.000 1.500 3.000 5.500 6.000

Sample2 5 0 4.600 0.678 1.517 2.000 3.500 5.000 5.500 6.000

Sample3 5 0 4.200 0.663 1.483 2.000 3.000 4.000 5.500 6.000

Sample4 5 0 3.800 0.917 2.049 2.000 2.000 3.000 6.000 6.000

Sample5 5 0 3.600 0.872 1.949 1.000 2.000 3.000 5.500 6.000

Create 5 more columns of data sampling 10

Calculate the Mean and Standard Deviation for each column

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

Sample6 10 0 3.600 0.653 2.066 1.000 1.750 3.500 6.000 6.000

Sample7 10 0 4.100 0.567 1.792 1.000 2.750 4.500 6.000 6.000

Sample8 10 0 3.200 0.442 1.398 1.000 2.000 3.500 4.250 5.000

Sample9 10 0 3.500 0.563 1.780 1.000 2.000 3.500 5.250 6.000

Sample10 10 0 3.300 0.616 1.947 1.000 1.750 3.000 5.250 6.000

With 10 observations the differences

Calc> Random Data> Sample from Columns…

Stat> Basic Statistics> Display Descriptive Statistics…

Variable N Mean StDev

Range in Mean 0. 63 Range in StDev 0.381

In theory if we kept taking samples of size n = 5 and n = 10 and

Simulate this in MINITABTM by creating ten columns of 1000 rolls

For each row calculate the Mean of five columns.

Repeat this command to

Create a Histogram of C1, Mean5 and Mean10.

What is different about

What happens as the

As the sample size (number of die rolls) increases from 1 to 5 to 10,

The Mean of the sample Mean The Standard Deviation of the

have a Std Dev

and be Normally Distributed when the parent population is Normally

This improves with samples of larger size.

So how does this theorem help me