Sunteți pe pagina 1din 37

Basic Statistics

STATISTICS Statistics is Communicating Information from Data


Schilling

There are three kinds of lies: Lies, damned lies, and statistics. Mark Twain. Statistics are tools. Like any other tool they can be misused, which may result in misleading, distorted, or incorrect conclusions.

It is not sufficient to be able to do the computations. One must also be able to make the correct interpretations

The Most Important Analysis Tool Plot the Data Always Always Always Always
It is amazing what you can see just by looking. --Yogi Berra
3

77 78 79 80 81 82 83 84 85 86 87 88 89 90
Dot diagram for a sample of 60 launches of the catapult The Dot Diagram enables the experimenter to quickly see the general location and spread of the observations.

Histograms
0.15

0.10

Density
0.05 0.00 80 85 90 95

Distance

Histogram for a sample of 60 launches of the catapult

The histogram shows the general location spread general shape of the distribution of the data.

A histogram is a visual display of a set of measurements


5

As the number of observations increases

90 80 70

Freq en u cy

60 50 40 30 20 10 0 75 85 95

Distance

600 Observations of a catapult launch

Bumps in the frequency diagram due to sampling variation tend to disappear. What if we were able to graph ALL possible catapult launches? 6

0.15

Density

0.10

0.05

0.00 70 80 90 100

Dist.

Conceptual population of catapult launches

Imagine the grouping interval in the histogram to be made smaller and smaller without limit until it is represented by a continuous distribution

ENTIRE POPULATION

SAMPLE SAMPLE
10

SAMPLE WITHIN (subset)

POPULATION

Frequency

0 80 85 90 95

Distance

Sample Statistics
A sample is a set of n observations actually obtained and a statistic is a numerical value that describes the sample.

Population Parameters
a hypothetical set of N observations from which the sample is obtained (typically N very large)

X ! Sample Mean s2 = Sample Variance s = Sample Standard Deviation


Sample Statistics Estimate
8

Q! Population mean W2 = Population Variance W = Population Standard Deviation


Parameters Population

Measures of Location
Mean: Arithmetic average of a set of values
  

Reflects the influence of all values Strongly Influenced by extreme values Would you prefer your income to be the mean or the median?

Median: Reflects the 50% rank - the center number after a set of numbers has been sorted from low to high.
 

Does not include all values in calculation Is robust to extreme outlier scores.

Why would we use the mean instead of the median in process improvement?

Sample Mean for a Distribution


For a discrete function
_ ^ X= Q=
N

7xi /N
i=1

= X1 + X2 +....XN N

7 y means, Add up all the Y's

Examples: Coating weights: 8.47, 8.67, 9.34, 7.99 Coating AVERAGE = 8.47 +8.67 + 9.34 + 7.99 = 8.62 4 Batting Performance: 0, 0, 1, 0, 1 (0= no hit, 1=hit) BATTING AVERAGE = 0+ 0 +1 +0 + 1 = 0.400 5

Mean = Average
10

Sample Median
Assume that x1, x2, xn is a list of sample data sorted in ascending order. Then

middle value, if n is odd X =_ the average of the two middle values, if n is even
~

Find the sample mean and median for the two data sets below:
X: Data Set 1 : 10, 12, 11, 14, 11, 13, 12, 14, 16, 13 X= ~ X=

Y: Data Set 2: 10, 12, 11, 14, 11, 13, 12, 14, 44, 13 Y= ~ Y=

11

Relationship of the Mean and Median


Mean, Median
100

F equency r

~ Symmetric y = y
20 30 40 50 60 70 80 90 100 110

50

N o rm a l

Median Mean
300

Tail on left ~ Skewed left y < y

F equen cy r

200

100

0 0 10 20 30 40 50 60 70 80

Ne g S k ew

Median
300

Mean

F equency r

200

100

Tail on right ~ Skewed right y > y


60 70 80 90 100 110 120 13 0

P os S kew

12

Standard Deviation
  

Deviation is the distance from the mean. Deviation score = observation - true mean Variance = mean or average of squared deviation scores. is the symbol for variance.  Standard Deviation = square root of variance. is the symbol for the standard deviation.

Q = Population
Mean

Deviation (distance from mean)

The Standard Deviation is a Measure of Variability


13

Measures of Variation
^ Sample Variance: s2 = W2 ( an estimate of W2)

W =

^2

s2

(X

 X )2

i =1

n-1

Uses every value in the data set in its computation. Mean squared distance from the mean

^ Sample Standard Deviation: s = W

^ W =s=

(X

 X )2

i=1

n-1

The square root of the variance and provides a measure of the standard distance from the mean.
14

The Standard Deviation

Q
Point of Inflection

The distance between the point of inflection and the mean constitutes a standard deviation. If three such deviations can be fit between the target value and the specification limit, we would say the process has three sigma capability.

1W T

p(d) USL

Upper Specification Limit (USL) Target Specification (T) Lower Specification Limit (LSL) Mean of the distribution (m) Standard Deviation of the distribution (s)

3W
15

Population Vs. Sample


N

Population Mean

X
Q =
i !1

Population Standard Deviation

(X
W =S=
i=1

 Q )2

Sample Mean

Q= x =

(X

xi

i =1

n
i

Sample Standard Deviation

^ W

=s=

 X )2

i =1

n -1

16

Degrees of Freedom

Suppose we were going to choose a sample of size n =3 and we calculated the mean = 10. How many free choices would we have in choosing the 3 values that make up our sample. If we new that X1 = 8 and X2 = 10 what must X3 equal? Our choice for X3 is constrained by the first two choices and the mean. Therefore our degrees of freedom are 2 not 3 or equal to n-1.

DEGREE OF FREEDOM = n-1


17

SAMPLE SAMPLE
10

POPULATION

Population
0.15

Frequency

Density
80 85 90 95

0.10

0.05

0.00 70 80 90 100

Distance

Dist.

Sample Statistics X ! 85.6 s2 = 8.27 s = 2.7

Population Parameters Q! 84 W2 = 9 W = 3

The Sample Statistics Approximate the Population Parameters


18

THE NATURE OF THE PROBLEM


Off-Target
X XXX X X XX X X X X X X X X X X X X

Variation
X X X

On-Target

Center Process

XXXXX X X XX X X XX X

Reduce Spread

Six Sigma methodology identifies processes that are off-target, and/or have a high degree of variation, and corrects the process
19

THE NATURE OF THE PROBLEM - A STATISTICAL LOOK

Another View
Off-Target

Large
Variation

LSL

USL

LSL

USL

On-Target Center Process


LSL USL

Reduce Spread

LSL = Lower spec limit

The statistical view of a problem


20

USL = Upper spec limit

Accuracy

Precision

Accuracy Describes Centering Precision Describes Spread


Accuracy & Precision

21

Accuracy

x x x x x

x x

Accuracy
Does the average of the reported measurements deviate from the true value?

22

Precision

x x x x xx x xx x

Precision
What is the spread of the reported measurements?

23

Standard Deviation as it relates to specifications


If we superimpose the customer derived specification limits on top of two distributions with different standard deviations...

Lower Specification Limit LSL

Upper Specification Limit USL

Standard deviation=.41

Standard deviation=.04

Outside of spec. limits

All points in spec.

The smaller the standard deviation; the lower the amount of variation. Variation is the Enemy!
24

DPM
DPM = defects per million units. = Proportion of observations outside spec * 1,000,000
Lower spec Upper spec.

1st distribution

2nd distribution

3rd distribution

Defects

As the standard deviation increases DPM increases


25

Probability
Relationships between samples and populations most often are described in terms of probability.

There is a 20% chance that the next defect found on the enclosure will be due to a missing fastener.

We make this statement based on the relative frequency of this defect from the sample data.
Sample Probability is the link that lets one predict population behavior based on a sample
26

Population

Probability Density Function


Suppose we again launch the catapult. What predictions can we make about how far the ball will travel?
0.15

Density

0.10

0.05

0.00 70 80 90 100

y1

Dist.

y2

Probability density function for the catapult launch

1. The probability Pr(y<y1) will be equal to the area under the histogram to the left of y1 2. The probability Pr(y>y1) will be equal to the area under the histogram to the right of y1. What is the probability Pr(y1<y<y2)? How Can We Calculate the Area Under the Curve?

The Distribution Can Be Used to Make Predictions About Future Events


27

Normal Distribution
Perhaps the most important distribution because many processes can be described as approximating it.

f ( x; Q , W 2 ) !
Parameters:

1 x  Q 2  1 2 W e 2T

Wis the point of inflection

Q = mean W = standard deviation

Since the normal probability density function cannot be integrated in closed form, probabilities relating to normal distributions are usually obtained from tables. These tables use the standard normal distribution, namely the normal distribution with Q= 0 and W = 1.

F ( z) !

1 2T
28

g

1 2  t 2

dt

Standardized Z Transformation
The standardized Z transformation

X Q Z! W

Suppose the diameters of shafts are normally distributed with a mean of 45 and a variance of 1, X~N(45,1). The customer derived upper specification limit is 47.5. What is the DPM for this process?
X Q W 47.5  45 Z ! 1 Z ! 2.5 Z !

DEFECTS

47.5 From a Z table (or the normsdist function in excel) the probability that a shaft is less then 47.5 is 99.37%. The probability of a defect is 1-.9937 or .006%. DPM = .006 X 1,000,00 DPM = 6000

Knowing the Distribution and the Specification Limits 29 Allows the Prediction of Capability

The Distribution of Data with Respect to the Standard Deviation


Although Z tables are readily accessible the following area relationships are used so frequently they should be memorized. Between Percent of area under normal curve Q - 3W and Q + 3W Q - 2 W and Q + 2W Q-1W and Q + 1W
m  3s

99.73 } 99.7 95.44 } 95 68.26 } 68

N o rm a l C u r v e a n d P ro b a b ility A r e a s
0 .4 0 .3 0 .2 0 .1 0 .0 -4 -3 -2 -1 0 1 2 3 4 68% 95% 99.73%

Output 30

The Empirical Rule of the Standard Deviation

The distributions that have been seen so far are Normal Distribution. However, the following rules apply to most distributions youll find in the real world: Rule 1  Roughly 60-75% of the data are within a distance of one standard deviation on either side of the mean. Rule 2  Usually 90-98% of the data are within a distance of two standard deviations on either side of the mean. Rule 3  Approximately 99% of the data are within a distance of three standard deviations on either side of the mean

31

The Normal Distribution takes Different Forms

Distribution One

Distribution Two

Distribution Three

The Means are the Same but the Standard Deviations Differ
32

Normal Probability Plots


If are going to use the normal distribution to estimate our capability how do we know the distribution is normal? Normal Probability Plots (NOPP )uses the cumulative percentage distribution of the sample data to give a visual display about the likely shape of the process output distribution.
Normal Probability Plot
9 8 7

.999 .99 .95

Frequency

6 5 4 3 2 1 0 80 85 90

Probability

.80 .50 .20 .05 .01 .001 80 85 90


Anderson-Darling Normality Test A-Squared: 0.208 P-Value: 0.858

Catapult Launch
Average: 83.5822 StDev: 2.99316 N: 60

Catapult Lau

Catapult Launch Histogram and Normal Probability Plot


33

Normal Probability Plots


Normal Probability Plots
1 00
.999 .99 .95

Normal Distribution

F qec r uny e

Pbbt r aiiy o l

.80 .50 .20 .05 .01 .001

50

0 20 30 40 50 60 70 80 90 1 00 10 1
Av erage: 70 Std D : 10 ev N of data: 500

26

36

46

56

66

76

86

96

1 06

C 1

N orm al
Anderson-D arling N ormality Test ASquared: 0.418 pvalue: 0.328

Normal Probability Plots


300
.999 .99 .95 .80 .50 .20 .05 .01 .001

Positive Skewed Distribution

F qec r uny e

200

1 00

0
60 70 80 90 1 00 10 1 1 20 1 30

60

70

80

90

1 00

10 1

1 20

1 30
Av erage: 70 Std D : 10 ev N of data: 500

Pbbt r aiiy o l

P os S ew k
Anderson-D arling N ormality Test ASquared: 46.447 pvalue: 0.000

C 2

Normal Probability Plots


300
.999 .99 .95 .80 .50 .20 .05 .01 .001

Negative Skewed Distribution

F qec r uny e

200

1 00

Pbbt r aiiy o l

0 0 1 0 20 30 40 50 60 70 80
Av erage: 70 Std D : 10 ev N of data: 500

1 0

20

30

40

50

60

70

80

N eg S ew k

C 3

Anderson-D arling N ormality Test ASquared: 43.953 pvalue: 0.000

Where could these distributions occur?


34

Central Limit Theorem - definition

The central limit theorem (CLT) states that the distribution of the sample mean, our estimate of Q, can be approximated with a normal distribution even though the original population may be non-normal.

The Distribution of the Averages is Normal

35

Summary
Continuous Distributions
Normal

f ( x; Q , W 2 ) !

1 x  Q 2  W 1 2 e 2T

X Q Z! W
Between Q - 3W and Q+ 3W Q - 2W and Q+ 2W Q - 1W and Q + 1W Percent of area under normal curve 99.7 95 68

36

The Standard Deviation


Q

Point of Inflection

1W

p(d) p(d)

1
T

6
USL

W
3W

This is a 6 Sigma Process


37

S-ar putea să vă placă și