Documente Academic
Documente Profesional
Documente Cultură
Table of contents
n1
Represents
n2
○ n3 the result
○
Of observation
○
○
○
○
Sample (n)
N (population)
want to observe
Data
Collection
mean : µ
Standard population
deviation :σ (N=5,000)
parameter
sample
mean : x
N=50
Standard
deviation : S
Descriptitive statistic
Let’s think about the
followings
(Exercise)
Discrete data : , , .
Types of Data
(Continued)
Variable(Continuous) Attribute(Discrete)
Characteristi measurable countable
cs continuous discrete units or occurrences
may derive from counting good/bad
Ordinal Data is arranged in some order but Product defects are tabulated as follows:
differences between values cannot A 16 C 42 B 32 D 30
be determined or are meaningless. Where, A defects are more critical than D
defects
Interval Data is arranged in order and The temperatures of three aluminum ingots
differences can be found. However, were 200°F, 400°F and 600°F. Note, that
there is no inherent starting point and three times 200°F is not the same as 600°F
ratios are meaningless. as a measurement of warmth.
(Difference is meaningful)
Ratio An extension of the interval level that Product A costs $300 and product B costs
includes an inherent zero starting $60. Note, That $600 is twice as much as
point. Both differences and ratios $300.
are meaningful.
∑
x i
(A) sample mean : x =i=
1
n
2 + 9 + 11 + 5 + 6
<ex> 2, 9, 11, 5, 6 x= = 6 .6
5
(B) population mean : µ
Mean
Mode
Median
Media
Mode n Mean
Midrange
(M)
: Mid point
: M = X max + X min
2
Harmonic
mean
n
H=
1 1 1
+ ++
x1 x2 xn
used to calculate average speed
(Exercise) Diameter of the bolts (5 data)
7.08 7.00 7.04 7.02 6.96
1) Sample mean (x ):
~
2) Median ( X ):
3) Mode ( M0 ) :
4) Midrange ( M ) :
5) Harmonic mean (H) :
Measure of
Dispersion
Central tendency does not necessarily provide enough
information
to describe data adequately. Consider the bursting strengths
obtained from two sample of six bottles each : (unit : psi)
● Sample 1 230 250 245 258 265 240
○
Sample 2 190 228 305 240 265 260
The mean of both samples is 248 psi
○ ○ ○
○ ●○ ●● ● ● ● ○
Sample mean =
248
Fig. Bursting – strength data
Variance and Standard
Deviation
Consider, sample 2 data
observation Xi - (Xi - )2
x x
X1 = 190 -58 3364
X2 = 228 -20 400
X3 = 305 57 3249
X4 = 240 -8 64
X5 = 265 17 289
X6 = 260 12 144
x
= 248 n Sum = 0 Sum = 7510
∑ ( x − x)i
2
s = s2
. qer F evi t al e R
0 1 2 3 4 5 6 7 8 X 0 1 2 3 4 5 6 7 8 X
∑ ( xi −µ)2
(A) Variance of population : 2
σ = i =1
N n
n n
(∑X i ) 2
(B) Variance of sample : ∑( x i − x) 2 ∑X i
2
− i =1
n
s2 = i =1
= i =1
n −1 n −1
Measure of
■Dispersion
Variance
<Example> Complete s2 for the following measurement data :
Data : 5, 7, 1, 2, 4
<Sol> Xi Xi2
5 25
7 49
1 1
2 4
4 16
∑x i =19 ∑x 2
i =95
s
cov( data 1) = =0.103
x
s
cov( data 2) = =0.067
x
Dispersio
n
■ Skewness
measure of symmetric
µ3 E[ X − µ]3
α3 = =
σ3 3
σ
; α 3 = 0 (mean=median=mode)
α 3 > 0 (positively skewed : mean>median>mode)
■ Kurtosis
measure of sharp
µ4 E ( X − µ) 4
α4 = 4 =
σ σ4
; The greater α 4 , the more sharp
(Normal distribution α
; 4 =3 )
Degree of
Freedom
Results from the fact that the n observations
X1- x , X2- x, ∙∙∙ ,Xn- x always sum to zero, so
Specifying the values of any (n-1) of these quantities
Automatically determines the remaining one.
Thus only (n-1) of the n observations Xi-x are independent.
Display Measurement
Data
Histogram
<ex>
7/30 Frequency
*Relative Frequency=
6/30
n
5/30
4/30
≈
F evi t al e R
1.85 2.85 X
Display Measurement
Data
Histogram
1) Collect Data (50~200 data)
2) Find Xmax and Xmin : R = Xmax – Xmin
3) Determine the number of classes (k)
7) Frequency count
8) Relative Frequency
9) Draw a picture
Display Measurement
Data
Histogram
4 ㆍ ㆍ ㆍ ㆍ
0.03
5 ㆍ ㆍ ㆍ ㆍ
6 ㆍ ㆍ ㆍ ㆍ 0.02
7 ㆍ ㆍ ㆍ ㆍ
0.01
8 ㆍ ㆍ ㆍ ㆍ
9 49.45~49.95 49.70 7 0.07
45.70 50.95
10 49.95~50.45 50.20 3 0.03 44.2 50.0
Histogra
m
(Exercise)
1) Measurement data (unit : cm)
163 172 178 174 174 167 182 176 159 158
169 171 174 182 178 180 164 163 154 181
175 183 173 164 165 172 170 169 177 172
172 156 158 168 164 178 173 175 176 171
163 164 170 172 170 169 168 164 186 174
1
2
3
4
5
6
7
8
9
10
11
12
cm
Display Measurement
Data
Histogram
7 15 71 5
~ 65 + 68
or X= = 66.5
2
Display Measurement
Data
1
2) 1 Quartile : ( 40) + 0.5 = 10.5(rank 10 과 11)
st
4
53 + 54
or = 53.5
2
3) 3 Quartile : 3 (40) + 0.5 = 30.5; (79 + 81) / 2 = 80
rd
4
27.0
26.67 ●
25.70
25.8
25.19 24.67 ●
24.7
23.5 23.68
22.37
22.3 21.88
22.02 ●
21.2 21.08
20.0 20.33 ●
1 2
• Mixture 1 has higher viscosity than 2
• Distribution is not symmetric
• The Max viscosity value in mixture 2 seems unusually large.
The Box-plot
■ Graphical display that simultaneously display several important
features of the data
(location, central tendency, spread, departure from symmetry,
outliers)
120.6(Q2)
Tchebysheff’s
Theorem
■ Given a number k greater than 1 and a set of n measurements
1
X1, X2, …, Xn, at least [1 − 2
] of the measurements will lie within
k
k standard deviation of their mean.
K 1
(1 − )100 (%)
k2
1 1 0
At least (1 − 2
)
k 1.5 55.6
2 75
2.5 84
3 88.9
. qerf evi t al e R
µ X
kσ kσ
Tchebysheff’s
Theorem
<Example> The mean and variance of a sample of n=25
measurements
are 75 and 100, respectively.
Use Tchebysheff’s Theorem to describe the distribution of
measurements
x = 75
<Sol> We are given and s=10 (s2=100)
① At least ¾ of the 25 measurements lie in the
x ± 2 s = 75 ± 2(10)
interval [55, 95]
3
At least ( 25 )( ) data lies
4
55 95 X
Empirical Rule (Rule of
Thumb)
■ Given a distribution of measurements that is approximately
bell-shaped, the interval
① (µ ± σ ) contains approximately 68% of the measurements
② ( µ ± 2σ ) contains approximately 68% of the measurements
③ ( µ ± 3σ ) contains approximately more than 99% of the
measurements
68%
F evi t al e R
µ X
σ σ
Empirical
Rule
<Example> A time study is conducted to determine the length of
time
necessary to perform a specified operation in a
manufacturing
plant. The length of time necessary to complete the
operation
is measured for each of n=40 workers. The mean and
standard
( x ± s ) = 12.8 ± 1.7
deviation are We found
expectto be 12.8 and 1.768%
approximately respectively.
of the Describe
the measurements
sample datatobyfall using the Empirical Rule to 14.5
( x ± 2s ) = 12.8into
± 2(1interval
.7 ) from
[9.4, 11.1
16.2]
<Sol> [11.1, 14.5]
Normal
Distribution
• What is normal ?
• How can we predict from the distribution ?
• How to make normal distribution ?
• Where can we use it ?
Symmetric
Bell shaped
µ
Can predict what will happen
Can make X r.v. to be normal
Normal
Distribution
µ 1 ≠ µ 2 , σ 1=
σ σ
σ2
1 2
68.3
% σ µ µ
µ 1 = µ 2 , σ 11 ≠ σ 2
σ2 1
95.5%
σ
99.73%
2
-4 -3 -2 -1 0 1 2 3 4 µ 1=
µ µ 1 ≠ µ 2 , σ 1 ≠µ
σ2 σ1
1
σ
2
µ µ
1 2
Normal probability
■plot
Estimate process yields and fallouts, % to fail, … etc.
• slope = standard deviation
^
• σ = 84th (percentile) – 50th (percentile)
■ Test Normality
<Normal probability
99.9 plot>
99
If the spec. on bottle strength
● ●
●
● is
●
95 ● ●
●
●
● LSL=210psi, we can estimate
●
80 ●
●
that about 25% of the bottles
● ●
50
●
●
● ● Manufactured by this process
●
( evi t al u mu C
20
)
●
(equipment) would be below
5
This limit.
0
190 230 270 310 350
%
Bottle Strength
Normal probability
plot
(Ex) Use probability plotting to determine the parameter of the
normal
distribution given the following data set.
<bottle strength
(unit :
data> psi) X axis : data
193 198 219 197 …
Y axis :
• Cumulative(%)
• i
• F ( x) = × 100(%)
• n +1
: mean rank
218 231 294 318 … i − 0.3
• F ( x) = × 100(%)
n + 0.4
: median rank
Normal probability
plot
<Table> Time to fail data (xx model) ; unit : month
t F(t) t F(t)
31 4.8 48 52.4
35 9.5 48 57.1
36 14.3 51 61.9
41 19.0 52 66.7
42 23.8 54 71.4
44 28.6 54 76.2
45 33.1 55 81.0
46 38.1 56 85.7
46 42.9 57 90.5
47 47.6 59 95.2
i
* use F(t) =
n +1
Normal probability
plot
99.99
⇑
µ + 3σ
F(x) 99 ●
●
%
80 ● µ + 2σ
µ=
● ●
● ●
µ +σ ①
50 ●
② σ = (µ + σ ) − (µ )
● ● µ
40 ●
● ③ F ( x ) = 10%
● ●
● ●
30 µ −σ
②
●
① ?
10
③
●
µ − 2σ
1
0.1 µ − 3σ
0.01
30 40 50 60 ⇒ Time to fail
Normal probability
plot
<Exercise> Diode failure data : (unit : hours)
F(ti)%
i ti F(ti)%
1 1500 6.7
2 2400 16.2
3 3000 26.9
4 3200 35.5
5 3700 45.2
6 4000 -
7 4000 -
8 4700 - t=103
• Normal? hours
9 5100
• µ=
10 5900
• σ =
i - 0.3
* use F(x) =
n + 0.4 • When 20% of diode will fail?