Documente Academic
Documente Profesional
Documente Cultură
There are three kinds of lies: Lies, damned lies, and statistics. Mark Twain. Statistics are tools. Like any other tool they can be misused, which may result in misleading, distorted, or incorrect conclusions.
It is not sufficient to be able to do the computations. One must also be able to make the correct interpretations
The Most Important Analysis Tool Plot the Data Always Always Always Always
It is amazing what you can see just by looking. --Yogi Berra
3
77 78 79 80 81 82 83 84 85 86 87 88 89 90
Dot diagram for a sample of 60 launches of the catapult The Dot Diagram enables the experimenter to quickly see the general location and spread of the observations.
Histograms
0.15
0.10
Density
0.05 0.00 80 85 90 95
Distance
The histogram shows the general location spread general shape of the distribution of the data.
90 80 70
Freq en u cy
60 50 40 30 20 10 0 75 85 95
Distance
Bumps in the frequency diagram due to sampling variation tend to disappear. What if we were able to graph ALL possible catapult launches? 6
0.15
Density
0.10
0.05
0.00 70 80 90 100
Dist.
Imagine the grouping interval in the histogram to be made smaller and smaller without limit until it is represented by a continuous distribution
ENTIRE POPULATION
SAMPLE SAMPLE
10
POPULATION
Frequency
0 80 85 90 95
Distance
Sample Statistics
A sample is a set of n observations actually obtained and a statistic is a numerical value that describes the sample.
Population Parameters
a hypothetical set of N observations from which the sample is obtained (typically N very large)
Measures of Location
Mean: Arithmetic average of a set of values
Reflects the influence of all values Strongly Influenced by extreme values Would you prefer your income to be the mean or the median?
Median: Reflects the 50% rank - the center number after a set of numbers has been sorted from low to high.
Does not include all values in calculation Is robust to extreme outlier scores.
Why would we use the mean instead of the median in process improvement?
7xi /N
i=1
= X1 + X2 +....XN N
Examples: Coating weights: 8.47, 8.67, 9.34, 7.99 Coating AVERAGE = 8.47 +8.67 + 9.34 + 7.99 = 8.62 4 Batting Performance: 0, 0, 1, 0, 1 (0= no hit, 1=hit) BATTING AVERAGE = 0+ 0 +1 +0 + 1 = 0.400 5
Mean = Average
10
Sample Median
Assume that x1, x2, xn is a list of sample data sorted in ascending order. Then
middle value, if n is odd X =_ the average of the two middle values, if n is even
~
Find the sample mean and median for the two data sets below:
X: Data Set 1 : 10, 12, 11, 14, 11, 13, 12, 14, 16, 13 X= ~ X=
Y: Data Set 2: 10, 12, 11, 14, 11, 13, 12, 14, 44, 13 Y= ~ Y=
11
F equency r
~ Symmetric y = y
20 30 40 50 60 70 80 90 100 110
50
N o rm a l
Median Mean
300
F equen cy r
200
100
0 0 10 20 30 40 50 60 70 80
Ne g S k ew
Median
300
Mean
F equency r
200
100
P os S kew
12
Standard Deviation
Deviation is the distance from the mean. Deviation score = observation - true mean Variance = mean or average of squared deviation scores. is the symbol for variance. Standard Deviation = square root of variance. is the symbol for the standard deviation.
Q = Population
Mean
Measures of Variation
^ Sample Variance: s2 = W2 ( an estimate of W2)
W =
^2
s2
(X
X )2
i =1
n-1
Uses every value in the data set in its computation. Mean squared distance from the mean
^ W =s=
(X
X )2
i=1
n-1
The square root of the variance and provides a measure of the standard distance from the mean.
14
Q
Point of Inflection
The distance between the point of inflection and the mean constitutes a standard deviation. If three such deviations can be fit between the target value and the specification limit, we would say the process has three sigma capability.
1W T
p(d) USL
Upper Specification Limit (USL) Target Specification (T) Lower Specification Limit (LSL) Mean of the distribution (m) Standard Deviation of the distribution (s)
3W
15
Population Mean
X
Q =
i !1
(X
W =S=
i=1
Q )2
Sample Mean
Q= x =
(X
xi
i =1
n
i
^ W
=s=
X )2
i =1
n -1
16
Degrees of Freedom
Suppose we were going to choose a sample of size n =3 and we calculated the mean = 10. How many free choices would we have in choosing the 3 values that make up our sample. If we new that X1 = 8 and X2 = 10 what must X3 equal? Our choice for X3 is constrained by the first two choices and the mean. Therefore our degrees of freedom are 2 not 3 or equal to n-1.
SAMPLE SAMPLE
10
POPULATION
Population
0.15
Frequency
Density
80 85 90 95
0.10
0.05
0.00 70 80 90 100
Distance
Dist.
Variation
X X X
On-Target
Center Process
XXXXX X X XX X X XX X
Reduce Spread
Six Sigma methodology identifies processes that are off-target, and/or have a high degree of variation, and corrects the process
19
Another View
Off-Target
Large
Variation
LSL
USL
LSL
USL
Reduce Spread
Accuracy
Precision
21
Accuracy
x x x x x
x x
Accuracy
Does the average of the reported measurements deviate from the true value?
22
Precision
x x x x xx x xx x
Precision
What is the spread of the reported measurements?
23
Standard deviation=.41
Standard deviation=.04
The smaller the standard deviation; the lower the amount of variation. Variation is the Enemy!
24
DPM
DPM = defects per million units. = Proportion of observations outside spec * 1,000,000
Lower spec Upper spec.
1st distribution
2nd distribution
3rd distribution
Defects
Probability
Relationships between samples and populations most often are described in terms of probability.
There is a 20% chance that the next defect found on the enclosure will be due to a missing fastener.
We make this statement based on the relative frequency of this defect from the sample data.
Sample Probability is the link that lets one predict population behavior based on a sample
26
Population
Density
0.10
0.05
0.00 70 80 90 100
y1
Dist.
y2
1. The probability Pr(y<y1) will be equal to the area under the histogram to the left of y1 2. The probability Pr(y>y1) will be equal to the area under the histogram to the right of y1. What is the probability Pr(y1<y<y2)? How Can We Calculate the Area Under the Curve?
Normal Distribution
Perhaps the most important distribution because many processes can be described as approximating it.
f ( x; Q , W 2 ) !
Parameters:
1 x Q 2 1 2 W e 2T
Since the normal probability density function cannot be integrated in closed form, probabilities relating to normal distributions are usually obtained from tables. These tables use the standard normal distribution, namely the normal distribution with Q= 0 and W = 1.
F ( z) !
1 2T
28
g
1 2 t 2
dt
Standardized Z Transformation
The standardized Z transformation
X Q Z! W
Suppose the diameters of shafts are normally distributed with a mean of 45 and a variance of 1, X~N(45,1). The customer derived upper specification limit is 47.5. What is the DPM for this process?
X Q W 47.5 45 Z ! 1 Z ! 2.5 Z !
DEFECTS
47.5 From a Z table (or the normsdist function in excel) the probability that a shaft is less then 47.5 is 99.37%. The probability of a defect is 1-.9937 or .006%. DPM = .006 X 1,000,00 DPM = 6000
Knowing the Distribution and the Specification Limits 29 Allows the Prediction of Capability
N o rm a l C u r v e a n d P ro b a b ility A r e a s
0 .4 0 .3 0 .2 0 .1 0 .0 -4 -3 -2 -1 0 1 2 3 4 68% 95% 99.73%
Output 30
The distributions that have been seen so far are Normal Distribution. However, the following rules apply to most distributions youll find in the real world: Rule 1 Roughly 60-75% of the data are within a distance of one standard deviation on either side of the mean. Rule 2 Usually 90-98% of the data are within a distance of two standard deviations on either side of the mean. Rule 3 Approximately 99% of the data are within a distance of three standard deviations on either side of the mean
31
Distribution One
Distribution Two
Distribution Three
The Means are the Same but the Standard Deviations Differ
32
Frequency
6 5 4 3 2 1 0 80 85 90
Probability
Catapult Launch
Average: 83.5822 StDev: 2.99316 N: 60
Catapult Lau
Normal Distribution
F qec r uny e
Pbbt r aiiy o l
50
0 20 30 40 50 60 70 80 90 1 00 10 1
Av erage: 70 Std D : 10 ev N of data: 500
26
36
46
56
66
76
86
96
1 06
C 1
N orm al
Anderson-D arling N ormality Test ASquared: 0.418 pvalue: 0.328
F qec r uny e
200
1 00
0
60 70 80 90 1 00 10 1 1 20 1 30
60
70
80
90
1 00
10 1
1 20
1 30
Av erage: 70 Std D : 10 ev N of data: 500
Pbbt r aiiy o l
P os S ew k
Anderson-D arling N ormality Test ASquared: 46.447 pvalue: 0.000
C 2
F qec r uny e
200
1 00
Pbbt r aiiy o l
0 0 1 0 20 30 40 50 60 70 80
Av erage: 70 Std D : 10 ev N of data: 500
1 0
20
30
40
50
60
70
80
N eg S ew k
C 3
The central limit theorem (CLT) states that the distribution of the sample mean, our estimate of Q, can be approximated with a normal distribution even though the original population may be non-normal.
35
Summary
Continuous Distributions
Normal
f ( x; Q , W 2 ) !
1 x Q 2 W 1 2 e 2T
X Q Z! W
Between Q - 3W and Q+ 3W Q - 2W and Q+ 2W Q - 1W and Q + 1W Percent of area under normal curve 99.7 95 68
36
Point of Inflection
1W
p(d) p(d)
1
T
6
USL
W
3W