Sunteți pe pagina 1din 53

Basic Statistics

1. Introduction to Statistics
2. Probability distributions
- Binomial distribution
- Poisson Distribution
- Normal distribution
3. Sampling distributions and Estimation.

1. The concept of Statistics


Why Statistics?
Through the advancement of electronics and computers,
todays society is inundated with vast amount of data. In its
raw form, this data is of little use. But, with statistical
analysis, the data can be transformed into valuable
information. This knowledge is vital for drawing
conclusions and making decisions.
Statistical thinking will be one day be as necessary for
efficient citizenship as the ability to read and write.
H.G. Wells

The field of statistics can be broken into two major


areas: Descriptive statistics and Inferential
Statistics

Descriptive statistics: It describes some of the


fundamental features of a set of data (Population or
Sample) such as mean, median, standard deviation,

Inferential statistics: It deals with drawing


conclusions from a population based on information of the
sample (drawn from the population).

Probability and Statistics


Population

Descriptive
Statistics

Probability

Inferential
Statistics

Sample
4

Data Collection
A decision can be no better than the data
upon which it was based.
Why do we need to collect data?
1. To identify and/or verify a problem.
2. To Analyze a problem.
3. To understand, describe, or monitor a process
4. To Test a hypothesis
5. To find a relationship between inputs and outputs of a process

Two kinds of Numerical Data


Continuous data: length, height, volume,.
Discrete data: number of defects, number of
failures,.

Population and Sample


Population is a set or collection of all possible
objects or individuals of interest.
Finite population: ex) The number of employees in
Samsung Electro-Mechanics as of January 1, 2001.
Infinite population: ex) MLCC chips coming from the
production line.

Population and Sample


A Sample is any subset or sub collection of a
population.

A Random Sample of size n is a sample chosen


in such a way that every possible sample of size n has a
likely chance of being chosen equally. (unbiased).
It is highly unlikely to know the true population
parameters. There is a need to draw conclusions from
sample statistics.
8

Characteristics of distribution
Statistical analysis is detecting the characteristics of data distribution
and expressing that characteristics into figures.

Characteristics of distribution
Central tendency (mean, median,mode)
- It shows the location where data is centered.
Variation (range, variance, standard deviation)
- Degree of data scattering centered on the arithmetic mean
Shape
- In what direction is the data biased?

Central tendency
Mode
Most frequently occurring value in a data set.

Median
Number reflecting the 50% rank of a set of values.
1) In case of data in odd number : Data in the middle
2) In case of data in even number : (Sum of two data in the middle)/2

Mean(arithmetic mean)
X1 + X2 + X3 + + Xn
Average of population
=
N
Sample of population
X=

X1 + X2 + X3 + + Xn
n

Xi
=
N
Xi
=
n

10

Variability
Range
Numerical distance between the highest and the lowest
values in a data set.
Variance and Standard deviation
Population variance
2 =

( Xi X )2
N

Sample variance
( Xi X )2
S2 =
n-1

population standard dev.


=

( Xi X )2
N

Sample standard dev


S =

( Xi X )2

n-1
The arithmetic mean is a one-dimensional value, while variance is a twodimensional value. We get the standard deviation by extracting the square
root of the variance. In sample statistics, however, the variance loses 1
degree of freedom.. In case of the sample, it has n-1 degree of freedom
as divisor.

11

Comparison of symbols between parameter


and statistics
Value

population

sample statistics

number of set

mean

variance

s2

St. dev

Correlation coefficient

Regression coefficient

a, b

Error

12

2. Probability Distribution
It is the major pillar of the bridge that allows us to make
inferences about a population based on information
obtained from a sample

The Probability Distribution of a


discrete random variable is an assignment of
probabilities to each of the possible values that
the random variable can take on. And, its
mathematical model is the Probability Density
Function.
13

(1) Binomial distribution


The problem of determining the probability associated with
defective data.

A Binomial Distribution needs to satisfy the following


conditions:
1) A sequence of n Bernoulli trials.(Only two possible
outcomes)
2) Trials are identical.
3) Trials are independent.
4) Probability of success on every trial is the same.

14

Example
<Problem>
In a certain diode manufacturing process, the defective rate is known to
be 1%. When the inspector take 50 random sample every hour, what is
the probability of finding no more than 1 defective.
<Solution>
The solution can be obtained by adding the probability of finding none
and one.
At first, we will try to find the probability of finding none of defectives,

15

From Minitab
menu
Calc>Probability Distributions>Binomial

This is the place


where all the
probability
distributions
can be found!

16

Probability of finding none of defectives

Number of
Random Sample

Defectiv
e rate

No
defective

17

Result in Session window


Defective rate of 1%

Number of Random sample

Probability of no defective
is
0.6050.

18

Next, probability of one defective


In this case, we
put 1 here

Result is 0.3056

Total Probability:
0.6050+3056=0.9106

19

Another way of calculation using worksheet.


Prepare a following worksheet.
Input the number of
defect in C1( named x)

Prepare a column for


probability(named p)

20

From Minitab Menu

Calc>Probability
Distribution>Binomial

We use this

21

Result is..

Probability of no defective

Probability of one defective

Final answer is additives.

22

To find cumulative probability at a


time
Check
here!

Cumulative Probability

23

Understanding of Binomial Distribution


The binomial probability distribution is defined by

P(X=x)=nCxpx(1-p)n-x
n

Cx =

x
(n

n!

) =x!(n-x)!

The Binomial distribution is used frequently in quality control. It is


appropriate probability model for sampling from an infinitely large
population, where p represents the defective rate and x, the number of
defects out of n sample.

The control chart of defects is based on the Binomial


distribution with the mean and variance in the next page.
24

The property of binomial distribution


Binomial distribution for n=4, p=1/2
P(X)

1)

6/1
6
5/1

The probability distribution always shows


symmetry in p=0.5 although n is low.

6
4/1
6
3/1

2)

6
2/1
6
1/1
6

Form of binomial distribution

Binomial distribution for n=9, p=1/3


P(X)
0.3

If n increases, probability distribution gets


near
symmetry even not in p=0.5.

Expectation value, standard deviation,


variance of binomial distribution
Expectation value : = E(X) = np

0.2

Variance : 2 = Var(X) = np(1-p) = npq


Standard deviation : = np(1-p) = npq

0.1
0 1 2 3 4 5 6 7 8 9

25

(2) Poisson distribution


Poisson distribution is characterized by the form

the number of occurrences per unit


interval
Defect, Electric or Mechanical
failure, an arrival, call,..

Time, space, area,

26

example

<Problem>
Suppose that the number of wire-bonding defects per unit that
occur in a semiconductor device is Poisson distributed with
mean=4. Then, what is the probability that a randomly selected
semiconductor device will contain two or fewer wire-bonding
defect?

27

From Minitab menu

File>New>Minitab Worksheet

In the worksheet, make one


column of defect number(x),
And another column for
cumulative probability(p)

28

Calc>Probability Distribution>Poisson

1. Select Cumulative

2. Mean=4

3. Input defect number


column and output
column

29

Probability of no defect

Cumulative Probability of 0,1

Cumulative Probability of 0, 1, 2

30

Examples for Poisson Distribution


1. The number of speeding tickets issued in a certain county
per week
2. The number of disk drive failures per month for a particular
kind of disk drive
3. The number of calls arriving at an emergency dispatch
station per hour.
4. The number of flaws per square yard in a certain type of
fabric.

31

Relationship with RTY

P(X=x) =

e-m mx
x!

m : Average
x : no of occurence

When x=0

RTY = e-dpu
dpu = -ln(RTY)

32

(3) Normal distribution


The normal distribution is probably the most important
distribution in quality control and statistical analysis.

X~N( ,

Variable

Normal
distribution

Mean

Standard
deviation

Normal distribution is defined by the mean and


standard deviation.
33

The shape of normal distribution?


Symmetric
Unimodal

68.3
%

Bell-shaped

95.5
%
99.73

-4

-3

-2

-1

34

What is Sigma?
The distance from
mean to deflection
point.

68.3
%

95.5
%
99.73

-4

-3

-2

-1

68.3% of the
population values fall
between the limits
defined by the mean
plus and minus one
sigma.

35

Probability density function


The Probability distribution function is
defined by

36

Shapes of Normal curve


[For difference and ]
1 2 , 1 = 2

68.3
%

-4

-3

-2

-1

1 = 2 , 1
2

95.5
%
99.73
%

2
1

1 2 , 1
2

1 = 1

2
2

37

Standard Normal Distribution


X-
Z =

Is used for coordinate transformation.

It becomes normal distribution with mean=0 and


standard deviation=1.
N(0,12)
68.3

-4

-3

-2

95.5
99.7
%
-1 3%
0

38

Minitab application
Calc>Probability distribution>Normal

Find
area(probability)
with known x

Find x with
known
Probability

Minitab recognizes left-sided area as cumulative probability

39

Normal distribution Example 1


<Problem> The tensile strength of a certain product is an
important quality characteristics. It is known that the strength is
normally distributed with mean=40 and standard
distribution of 2, denoted as N(40,22).
When the customer wants a strength of at least 35, what is the
probability of customer satisfaction?

40

solution
2

N(40,22).

3
5
Known
spec.

What is
the
area?

40

Minitab solution
provides area here!
41

Calc>Probability
Distribution>Normal
Check here
Mean is 40
St. deviation is 2

X is 35

42

The area we
want(probability) is
1-0.0062=0.9938
43

Example 2
It is known that the quality characteristics of certain process
follows normal probability function(mean=0, st.dev.=1). When the
defective rate is 1%, what is the sigma level?

<Solution> The problem is to find the value of z when the


cumulative probability is known. In minitab, the inverse
cumulative probability is used.

44

Check here

Input 1-0.01=0.99

45

Z is 2.33

46

3. Sampling Distributions and Estimation


Question:
When we do not know the mean of the
population, we use sample but what is
degree of accuracy that this represent
the population mean?

47

Standard Error of the Mean

Mean of the
sample mean
Variance of the
sample mean
Standard error of the
mean

=
2

_ =
n
2
x

x_ =

48

Central Limit Theorem


For almost all populations, the sampling distribution of
the mean can be approximated closely by a normal
distribution, provided the sample size is sufficiently
large.

Z=

X-
/n

49

Estimation
Estimate parameters out of sample

1) Point Estimation
single number

2) Interval Estimation
estimate confidence interval

50

Confidence interval for population mean.


1) Known standard deviation : use Normal distribution
=0.05 Z/2 -Z /2
, : 95%

/2 = 0.025

P(L< <U) = 1-
X-
P(-Z /2 < /n

-Z0.025= -1.96

<Z

/2

) = 1-

X- Z /2 /n < < X+ Z /2 /n

/2 = 0.025

Z0.025= 1.96


100(1-)

51

2)unknown standard deviation : t-distribution


=0.05 t/2 -t /2

, Reliability standard :
95%

P(L< <U) = 1-

P(-t /2 <

X-
S/n < t

/2

) = 1-

X- t /2 S/n < < X+ t /2 S/n

100(1-)

) t- n-1 t /2, n-1


.

52

Example
1. A random sample of 64 customers at a local
supermarket showed that their average shopping
time was 33 minutes with a sample standard
deviation of 16 minutes. Find a 90% confidence
interval for the true average shopping time.

2. A test on a random sample of 9 cigarettes yielded an


average nicotine content of 15.6 milligrams and a
standard deviation of 2.1 milligrams. Construct a 99%
confidence interval for the true but unknown average
nicotine content of this particular brand of cigarette.
Assume that nicotine content is normally distributed.
53

S-ar putea să vă placă și