Quantitative Methods Guide

6.
Quantitative Methods and Tools

A. Collecting and Summarizing Data
B. Quantitative Concepts
C. Probability Distributions
D. Statistical Decision-Making
E. Relationships Between Variables
F. Statistical Process Control (SPC)
G. Process and Performance Capability
H. Design and Analysis of Experiments
Qualitative vs Quantitative
Data
Qualitative Data Quantitative Data
Description Numbers
Continuous vs Discrete
❖ Continuous Data
❖ Measurements: Length, height, time
❖ More information with less samples
❖ More sensitive
❖ Provide more information
❖ More expensive to collect
❖ Discrete Data
❖ Count: Number of students, Number of
heads
Measurement Scales
Data
Nominal Ordinal Interval Ratio

Measurement Scales
Data
Example:
Color: Blue, Green, Red
Measurement Scales
Data

Example:
Pass/Fail
Good, Bad, Worst
Measurement Scales
Data
Example:
Temperature: Celsius
Measurement Scales
Data
Example:
Height, mass, volume
Measurement Scales

Ordered N Y Y Y
Difference N N Y Y
Absolute Zero N N N Y
Example Red, Blue Good, Bad, Temperature Length,
Worst : Degree C Weight
Central Tendency Mode Mode, Mode, Mode,
Measurement Median Median, Median,
Mean Mean
Qualitative vs Quantitative
Data
Qualitative Data Quantitative Data

Data Collection Methods
❖ Data Collection Plan
❖ Tally or Check sheets
❖ Data coding
❖ Data cleaning
❖ Automatic gaging
Data Collection Plan
❖ Why you need to collect data?
❖ Goal and Objective
❖ Operational Definition
❖ How much? How? Where? When? Etc.
❖ Type of data – NOIR
❖ Manual or Automatic
❖ Past data vs Future
❖ Is data reliable?
Data Collection Plan
Measurement Operational How is it Type of Data Sample size Who? Data Comments
Definition measured? Recording
Form
Time to Time from Using a stop Continuous Every 10th Operator Assembly
assemble picking up watch piece Record
the first Ratio F-0156
piece to
placing the
assembled
item in tray
Check Sheet
Defects during water bottle manufacturing

Defect
Defect Scratch Loose Cap Label Volume Leakage Frequency
Capacity
300 ml. || || |||| | | 11

500 ml. ||| |||| || | | 12
1000 ml. |||| |||| | | | 13
Sum 5 18 8 2 3 36
Data Coding
❖ Adding, Subtracting
❖ Example: -95, -97, -98, -90
❖ Add 100 to each: 5, 3, 2, 10
❖ Coded mean: 5
❖ Un-coded mean: 5-100 = -95
❖ Standard deviation remains same and is not
affected by addition and subtraction.
❖ s = 3.559
Data Coding
❖ Multiplying or dividing
❖ Example: 1.05, 1.03, 1.02, 1.10
❖ Multiply 100 to each: 105, 103, 102, 110
❖ Coded mean: 105
❖ Un-coded mean: 105 / 100 = 1.05
❖ Standard deviation need to divided by you
multiplied for coding.
❖ For coded data s = 3.559
❖ For original data s = 3.559/100 = 0.03559
Data Coding
❖ By truncation of repetitive terms
❖ Example: 0.555, 0.553, 0.552, 0.550
❖ Truncate 0.55 from all: 5,3,2,0
❖ This means we multiplied it by 1000 and
subtracted 550
❖ Coded mean: 2.5
❖ Un-coded mean: (2.5+550)/ 1000 =.5525
❖ Standard deviation need to divided by you
multiplied for coding.
❖ For coded data s = 2.0816
❖ For original data s = 2.0816/1000 =
0.0020816
Data Cleaning – Missing Data
❖ In statistics, imputation is the process of
replacing missing data with substituted
values.
❖ Missing data can introduce bias.
❖ Missing randomly
❖ Reason for missing
❖ Delete the row

❖ Replace with the average value
Data Accuracy and Integrity
❖ Factors of Data Quality include:
❖ Integrity (reliability – current and relevant)
❖ Accuracy (correctness)
❖ Some factors which affect data accuracy:
❖ Bias
❖ Lack of knowledge
❖ Boredom – too much of data being
recorded
❖ Rounding off
❖ Intentional falsification
Data Accuracy and Integrity
❖ Actions to maintain data accuracy and
integrity:
❖ Avoid manual entry
❖ Checking and auditing
Benford’s Law
❖ The law states that in many naturally
occurring collections of numbers, the
leading significant digit is likely to be
small.
❖ For example, in sets that obey the law,
the number 1 appears as the most
significant digit about 30% of the time,
while 9 appears as the most significant By Gknor - Own work, Public Domain,
https://commons.wikimedia.org/w/index.ph
digit less than 5% of the time. p?curid=4509760
Descriptive Statistics
Descriptive
Statistics
Central
Variability
Tendency
Mean Range
Standard
Mode
Deviation
Interquartile
Median
Range
Percentile
Mean
❖ Also known as Average Central
Tendency
❖ Affected by extreme values Mean Mode Median Percentile
❖ Example: 10, 11, 14, 9, 6

Quartile
❖ Mean = (10+11+14+9+6)/5 = 50/5 = 10
Mode
❖ Most occurring item Central
Tendency
❖ Example: 10, 11, 14, 9, 6, 10 Mean Mode Median Percentile
❖ Mode = 10 Quartile
Median
❖ Middle value when put in ascending or Central
Tendency
descending order.
❖ Example: 10, 11, 14, 9, 6
Mean Mode Median Percentile
❖ In ascending order - 6,9,10,11,14 Quartile
❖ Median = 10
❖ Example: 10, 11, 14, 9, 6, 11

❖ In order - 6,9,10,11, 11,14
❖ Median = 10.5
Percentile
❖ Median divides the data in two equal Central
Tendency
parts when arranged in ascending or

Mean Mode Median Percentile
descending order
❖ Percentile divides data in 100 parts Quartile
❖ Quartile divides data in 4 parts

❖ Example: 6,9,10,11, 11,14
❖ Q1=9, Q2=10.5, Q3=11
Percentile/Quartile Steps
❖ Arrange in ascending or descending Central
Tendency
order
❖ Calculate location(i) = P.(n)/100 Mean Mode Median Percentile
❖ P=percentile, n=numbers in data set Quartile
❖ If i is whole number – Percentile is

average of (i)th and (i+1)th location
❖ If i is “not” a whole number – Percentile
is located at (i+1)th whole-num.
❖ Example: 6,9,10,11, 11,14
❖ Q1=9, Q2=10.5, Q3=11
Descriptive Statistics
Variability
Interquartile Standard
Range
Range Deviation
Range
❖ Difference between lowest and the Variability
highest value.
Range
❖ Example: 6,9,10,11, 11,14 Range Deviation
❖ Range = 14-6 = 8
Interquartile Range
❖ Range of middle 50% data Variability
❖ IQR = Q3-Q1 Interquartile Standard

Range
Range Deviation
❖ Example: 6,9,10,11, 11,14
❖ Q1=9, Q2=10.5, Q3=11
❖ IQR = 11-9 = 2
❖ Box-and-Whisker Plot
Standard Deviation
❖ Variance = average of squared deviation Variability
about the arithmetic mean.

Range
❖ Square root of variance is standard Range Deviation
deviation
Standard Deviation
x x-x̅ (x-x̅ )2
100 0 0 ∑(x-x̅ )2
S2 =
101 1 1 n-1
99 -1 1
102 2 4 S 2 = 10/5 = 2
98 -2 4
S = √ 2 = 1.414
100 0 0
x̅ =100 ∑(x-x̅ )=0 ∑(x-x̅ )2=10
Graphical methods for depicting relationships
❖ Stem-and-leaf Plot
❖ Box-and-whisker Plots
❖ Scatter Plot
Stem-and-Leaf Plot
❖ 11, 22, 55, 13, 45, 14, 19, 10, 33, 52, 13
Stem Leaf
1 013349
2 2
3 3
4 5
5 25
Stem-and-Leaf Plot
21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9
21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
15.0 21.4
Stem-and-Leaf Plot
21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9
21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
15.0 21.4
Box and Whisker Plots
❖ Also known as Box Plot
❖ Shows the median
❖ Shows Q1, Q3 and IQR
70
60
50 Median
25th
40
75th
30 Mean
20 Outliers
10 Avg No. of
orders per
0 mo
Scatter Diagram
❖ One of seven basic quality tools
❖ To see relationship between two
variables
❖ Relationship should make practical
sense
❖ Temperature(X) vs Ice cream sale (Y)
❖ Some times relationship between two
variables is because of a third variable.
(ice cream sale vs heat stroke cases)
❖ Correlation/Regression is covered in the
Analyze Phase
Histogram
❖ Graphical representation of the
distribution of numerical data
❖ Values are assigned “bins” and
frequency for each bin is plotted.
Histograms
Graphical method for depicting distributions
❖ Probability plots
❖ Normal and non-normal
❖ Histogram and Normal Probability

Distribution
❖ Is this data normal? Histogram of Length
Normal
50 Mean 99.72
StDev 9.997
N 150
40
Frequency
30
20
10
0
72 80 88 96 104 112 120 128
Length
❖ Probability plots (QQ Plot) are used to

check this.
❖ The p value is less than 0.05 hence
conclude that data is not normal.
Histogram of Length
Normal
50 Mean 99.72
StDev 9.997
N 150
40
Frequency
30
20
10
0
72 80 88 96 104 112 120 128
Length
❖ Using different data to check Normality

❖ The p value > 0.05 and hence data is
normal with 95% confidence level.
Histogram of New Length
Normal
35 Mean 99.53
StDev 9.774
N 150
30
25
Frequency
20
15
10
0
70 80 90 100 110 120 130
New Length
Q-Q Plot
❖ Quantile-Quantile Plot
❖ Earlier we talked about Quartile (Q1, Q2,
Q3). There are Q0 and Q4 as well. This
divides data in four parts.
❖ Percentile divides the data in 100 parts.
For example Q2 is 50th Percentile.
❖ Quantile can divide data in any number
of parts. Quartile and Percentile are
examples of Quantile.
Q-Q Plot
❖ Quantile-Quantile Plot
Theoretical Quantile
Data Quantile
Data Quantile
Theoretical Quantile
Errors of Statistical Tests
True State of Nature
H0 Ha
Is true Is true
Support H0 /
Reject Ha Correct Type II
Conclusion Error
Conclusion Support Ha / Correct
Reject H0 Type I Error Conclusion
(Power)
Errors of Statistical Tests
Type I error (alpha) Type II error (beta)
Name Producer’s risk/ Consumer’s risk
Significance level
1 minus error is Confidence level Power of the test
called
Example of Fire False fire alarm leading Missed fire leading to
Alarm to inconvenience disaster
Effects on Unnecessary cost Defects may be produced
process increase due to frequent
changes
Control method Usually fixed at a pre- Usually controlled to < 10%
determined level, 1%, by appropriate sample size
5% or 10%
Simple definition Innocent declared as Guilty declared as innocent
guilty
Significance Level
Level of Confidence / Confidence Interval:
C = 0.90, 0.95, 0.99 (90%, 95%, 99%)
Level of Significance:
α = 1 – C (0.10, 0.05, 0.01)
Power
❖ Power = 1 – β (or 1 - type II error)
❖ Type II Error: Failing to reject null
hypothesis when null hypothesis is false.
❖ Power: Likelihood of rejecting null
hypothesis when null hypothesis is false.
❖ Or: Power is the ability of a test to

correctly reject the null hypothesis.
Alpha vs Beta
❖ Researcher can not commit both Type I
and II error. Only one can be committed.
❖ As the value of α increases (say 0.01 to
0.05) β goes down and the Power of test
increases.
❖ To reduce both Type I and II errors
increase sample size.
Hypothesis Testing
1. State the Alternate Hypothesis.
2. State the Null Hypothesis.
3. Select a probability of error level (alpha
level). Generally 0.05
4. Select and compute the test statistic
(e.g t or z score)
5. Critical test statistic
6. Interpret the results.
Hypothesis Testing
❖ Lower Tail Tests
❖ H0: μ ≥ 150cc
❖ Ha: μ < 150cc
❖ Upper Tail Tests

❖ H0: μ ≤ 150cc
❖ Ha: μ > 150cc
Hypothesis Testing
❖ Two Tail Tests
❖ H0: μ = 150cc
❖ Ha: μ ≠ 150cc
Calculate Test Statistic
❖ Single sample
❖ z = (x - μ)/ σ
❖ Mean of Multiple samples

❖ z = (x̄ - μ) / (σ / √n)
Z Critical
❖ α = 0.05 Single Tails
❖ Z Critical = 1.645
Z Critical
❖ α = 0.01 Two Tails means 0.005 on both
tails. Z Critical = 2.575
•90% – Z Score = 1.645

•95% – Z Score = 1.96
•99% – Z Score = 2.576
p Value
❖ p value is the lowest value of alpha for
which the null hypothesis can be
rejected. (Probability that the null
hypothesis is correct)
❖ If p = 0.01 you can reject the null
hypothesis at α = 0.05
❖ p is low the null must go / p is high the
null fly.
Sample Size
❖ n = ( zα/2 . σ / ME)2
❖ n is sample size
❖ zα/2 is standard score
❖ α = 0.01 Z Critical = 2.575
❖ α = 0.05 Z Critical = 1.96
❖ α = 0.10 Z Critical = 1.645
❖ σ is standard deviation
❖ ME is the Margin of Error (shift to be
detected)
Sample Size
❖ In perfume bottle filling m/c with mean
of 150cc and s.d. of 2cc, what is the
minimum sample size which at 95%
confidence will confirm a mean shift
greater than 0.5cc?
❖ n = ( zα/2 . σ / ME)2
❖ zα/2 = 1.96, σ = 2cc, ME=0.5cc
❖ n = 61.46
Sample Size - Proportion
❖ n = ( zα/2)2. p̂ . (1-p̂) / (Δp) 2
❖ n is sample size
❖ zα/2 is standard score
❖ α = 0.01 Z Critical = 2.575
❖ α = 0.05 Z Critical = 1.96
❖ α = 0.10 Z Critical = 1.645
❖ p̂ is proportion rate
❖ Δp is the desired proportion interval
Sample Size - Proportion
n = ( zα/2)2. p̂ . (1-p̂) / (Δp) 2
Drawing Statistical Conclusions
❖ Numeric vs Analytical Studies
❖ In the example of acceptance sampling
❖ Numeric – To accept of reject the lot
❖ Analytical – To accept or reject the supplier
Numeric and Analytical Study
❖ An numeric study is an analysis collected
on data from a study on a limited group
or frame. A decision will be made to
accept or reject (to buy or not buy) or to
act on the group or frame studied.
❖ Example: Acceptance sampling to accept
or reject the lot based on samples
drawn from the lot.
Numeric and Analytical Study
❖ An analytic study is an analysis aimed at
answering questions about future
material not yet made.
❖ The analytic study is not interested in
making a decision on the shipment but
on the supplier. In the analytic study, a
decision will be made on the cause
system generating the material.
❖ Example: Acceptance sampling of a lot
to accept or reject the supplier.
Robustness of Conclusions
❖ Numeric statistical tests are by
themselves inadequate to answer
analytic questions.
❖ Conducting experiment on a set of data
and extrapolating results on another set
leads to wrong decisions.
❖ We need to be aware of the limitations
of the test.
Probability
❖ Classic Model
Number of outcomes in which the event occurs

Total Number of possible outcomes of an experiment
Probability
❖ Relative Frequency of Occurrence
Number of times an event occurred

Total number of opportunities for an event to occur
Probability
❖ Experiment/Trial: Some thing done with
an expectation of result.
❖ Event or Outcome: Result of experiment
❖ Sample Space: A sample space of an

experiment is the set of all possible
results of that random experiment.
{1, 2, 3, 4, 5, 6}
Probability
❖ Union: Probability that events A or B
occur: P(A ∪ B)
❖ Intersection: Probability that events A

and B occur: P(A ∩ B)
Probability
❖ Mutually Exclusive Events: When two
events cannot occur at the same time
❖ Independent Events: The occurrence of

Event A does not change the probability
of Event B
❖ Complementary Events: The probability

that Event A will NOT occur is denoted
by P(A').
Probability
❖ Rule of Addition
The probability that Event A or Event B occurs
=
Probability that Event A occurs
+
Probability that Event B occurs
-
Probability that both Events A and B occur
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Probability
❖ Rule of Multiplication:
The probability that Events A and B both occur
=
Probability that Event A occurs
x
Probability that Event B occurs, given that A has
occurred
P(A ∩ B) = P(A) P(B|A)

Probability
❖ Independent Events
Probability
❖ Dependent Events
Factorial
❖ Factorial of a non-negative integer n,
denoted by n!, is the product of all
positive integers less than or equal to n
Permutation/ Combination
❖ Permutation: A set of objects in which
position (or order) is important.
❖ e.g. Lock combination: 3376
❖ Combination: A set of objects in which

position (or order) is NOT important.
❖ e.g. Selecting 2 students out of 5
Normal Probability Distribution
❖ Symmetrically distributed
❖ Long Tails / Bell Shaped
❖ Mean/ Mode and Median are same

❖ Two factors define the shape of the
curve:
❖ Mean
❖ Standard Deviation
❖ About 68% of the area under the curve
falls within 1 standard deviation of the
mean.
❖ About 95% of the area under the curve

falls within 2 standard deviations of the
mean.
❖ About 99.7% of the area under the

curve falls within 3 standard deviations
of the mean.
❖ The total area under the normal curve =
1.
❖ The probability of any particular value is
0.
❖ The probability that X is greater than or
less than a value = area under the
normal curve in that direction
❖ The value of the random variable Y is:
Y = { 1/[ σ * sqrt(2π) ] } * e -(x - μ)2/2σ2
❖ where X is a normal random variable,

❖ μ = mean,
❖ σ = standard deviation,
❖ π is approximately 3.14159,
❖ e is approximately 2.71828.
❖ Z Value / Standard Score
❖ How many standard deviations an

element is from the mean.
❖ z = (X - μ) / σ
❖ z is the z-score,
❖ X is the value of the element,
❖ μ is the population mean,
❖ σ is the standard deviation.
Z Table
Continuous Probability Distributions
❖ Normal probability distribution
❖ Student's t distribution
❖ Chi-square distribution
❖ F distribution
Continuous vs Discrete Variable
❖ If a variable can take on any value
between two specified values, it is called
a continuous variable; otherwise, it is
called a discrete variable.
Discrete Probability Distributions
❖ Binomial Probability Distribution
❖ Bernoulli Distribution
❖ Hypergeometric Probability Distribution
❖ Poisson Probability Distribution

Binomial Probability Distribution
❖ A binomial experiment has the
following properties:
❖ The experiment consists of n repeated
trials.
❖ Each trial can result in just two possible
outcomes. We call one of these outcomes a
success and the other, a failure.
❖ The probability of success, denoted by p, is
the same on every trial.
❖ The trials are independent; that is, the
outcome on one trial does not affect the
outcome on other trials.
❖ A binomial experiment has the
❖ The experiment consists of n repeated
trials.
❖ Each trial can result in just two possible
outcomes. We call one of these outcomes a
success and the other, a failure.
❖ The probability of success, denoted by p, is
the same on every trial.
❖ The trials are independent; that is, the
outcome on one trial does not affect the
outcome on other trials.
❖ x: The number of successes that result from the P x = n𝐶x ⋅ px ⋅ 1 − p n−x
binomial experiment.
❖ n: The number of trials in the binomial
experiment.
❖ p: The probability of success on an individual
trial.
❖ q: The probability of failure on an individual
trial. (This is equal to 1 - p.)
❖ n!: The factorial of n (also known as n factorial).
❖ P(x) : Binomial probability - the probability that
an n-trial binomial experiment results in exactly
x successes, when the probability of success on
an individual trial is p.
❖ nCx: The number of combinations of n things,
taken x at a time.
❖ The binomial probability refers to the
probability that a binomial experiment
results in exactly x successes.
❖ Suppose a binomial experiment consists
of n trials and results in x successes. If
the probability of success on an
individual trial is p, then the binomial
probability is:
❖ 𝐏 𝐱 = 𝐧𝐂𝐱 ⋅ 𝐩𝐱 ⋅ 𝟏 − 𝐩 𝐧−𝐱
or 𝐧!
𝐏 𝐱 = , 𝐩𝐱 ⋅ 𝟏 − 𝐩 𝐧−𝐱
𝐱! 𝐧 − 𝐱 !
❖ The mean of the distribution (μx) is n: The number of trials in the
binomial experiment.
n.p
p: The probability of success on an
❖ The variance (σ2x) is individual trial.
n.p.(1-p)
❖ The standard deviation (σx) is

𝐧⋅𝐩⋅ 𝟏−𝐩
Five Conditions - Binomial
❖ 1. There is a fixed number, n , of P x = n𝐶x ⋅ px ⋅ 1 − p n−x
identical trials.
❖ 2. For each trial, there are only two
possible outcomes (success/failure).
❖ 3. The probability of success, p, remains
the same for each trial.
❖ 4. The trials are independent of each
other.
❖ 5. x = the number of successes observed
for the n trials.
Bernoulli Distribution
❖ Distribution of successes on a single
trial.
❖ What is the probability of getting head in
tossing of a coin once?
Hypergeometric Distribution
❖ There is a fixed number, n , of identical
trials.
❖ For each trial, there are only two possible
outcomes (success/failure).
❖ The probability of success, p, remains the
same for each trial.
❖ The trials are independent of each other.
❖ Finite and known population without
replacement.
❖ Number of successes in population are
known
❖ x = the number of successes observed for
the n trials.
❖ N: size of population P(x) = ACx . N-ACn-x / NCn
❖ A: number of successes in population
❖ x: The number of successes that result from the
experiment.
❖ n: The number of trials without replacement.
❖ p: The probability of success on an individual
trial.
❖ q: The probability of failure on an individual
trial. (This is equal to 1 - p.)
❖ P(x) : The probability that an n-trial experiment
results in exactly x successes
❖ nCx: The number of combinations of n things,
taken x at a time.
❖ Out of 10 people (6M, 4F), 3 people are P(x) = ACx . N-ACn-x / NCn
selected without replacement. What is
the probability that two of them are
females?
❖ P(2) = 4C2 . 10-4C3-2 / 10C3
❖ = 4C2 . 6C1 / 10C3 = 6x6/120 = 0.3
❖ In Excel use: HYPGEOM.DIST function

❖ When sample size is less than 5%
population then can use Binomial.
Poisson Distribution
❖ A Poisson experiment has the following
properties:
❖ The experiment results in outcomes that can
be classified as successes or failures.
❖ The average number of successes (μ) that
occurs in a specified region is known.
❖ Outcomes are random. Occurrence of one
outcome does not influence the chance of
another outcome of interest.
❖ The outcomes of interest are rare relative to
the possible outcomes.
x
❖ e: A constant equal to approximately μ
2.71828. (Actually, e is the base of the P x, μ = ⅇ−μ ⋅
x!
natural logarithm system)
❖ μ: The mean number of successes that
occur in a specified region.
❖ x: The actual number of successes that
occur in a specified region.
❖ P(x; μ): The Poisson probability that
exactly x successes occur in a Poisson
experiment, when the mean number of
successes is μ.
❖ The Poisson distribution has the
❖ The mean of the distribution is equal to
μ.
❖ The variance is also equal to μ .
❖ On a booking counter on the average 3.6 μ x
people come every 10 minute on P x, μ = ⅇ−μ ⋅
weekends. What is the probability of x!
getting 7 people in 10 minutes?
❖ μ = 3.6, x=7
❖ P(x; μ) = (e-μ) (μx) / x! = (e-3.6) (3.67) / 7!
❖ =0.02732 x 7836.41 / 5040 = 0.0424
Point vs Interval Estimates
❖ Point estimate:
❖ Summarize the sample by a single number
that is an estimate of the population
parameter.
❖ Interval estimate:
❖ A range of values within which, we believe,
the true parameter lies with high
probability.
Point Estimates
❖ Point estimate:
❖ Summarize the sample by a single number
that is an estimate of the population
parameter.
❖ The sample mean x̄ is a point estimate of
the population mean μ. The sample
proportion p is a point estimate of the
population proportion P.
Point vs Interval Estimates
❖ Interval estimate:
❖ A range of values within which, we believe,
the true parameter lies with high
probability.
❖ For example, a < x̄ < b is an interval
estimate of the population mean μ. It
indicates that the population mean is
greater than a but less than b.
Confidence Interval
❖ Factors affecting the width of
confidence interval
❖ sample size
❖ standard deviation
❖ confidence level
Confidence Interval
❖ When population standard deviation is
known/ Sample size is >=30
σ
CI = xത ± zαΤ2 ⋅
n
❖ Zα/2 = z table value for confidence level,
❖ σ = standard deviation
❖ n = sample size.
Confidence Interval
❖ The average income of 100 random residents of city was
found to be $42,000 per annum with standard deviation of
5,000. Find the 95% confidence interval of the town income.
σ
n
❖ Zα/2 = z table value for confidence level,
❖ σ = standard deviation
Confidence Interval
❖ The average income of 100 random residents of city was
found to be $42,000 per annum with standard deviation of
5,000. Find the 95% confidence interval of the town income.
σ
n
❖ Zα/2 = z table value for confidence level = 1.96
❖ σ = standard deviation = 5,000
❖ n = sample size = 100
•90% – Z Score = 1.645

•95% – Z Score = 1.96
•99% – Z Score = 2.576
Confidence Interval
❖ The average income of 100 random residents of
city was found to be $42,000 per annum with
standard deviation of 5,000. Find the 95%
confidence interval of
σ the town income.
n
5000
𝐶𝐼 = 42,000 ± 1.96 ⋅
100
𝐶𝐼 = 42,000 ± 980
𝐶𝐼 = 41,020 to 42,980
Confidence Interval
❖ When population standard deviation is
unknown and Sample size is < 30
s
CI = xത ± t αΤ2 ⋅
n
❖ tα/2 = t distribution value for the confidence
level and (n-1) degrees of freedom
❖ s = sample standard deviation
Confidence Interval
confidence interval of the town income.
❖ CI = x̄ +/- (tα/2 )* s/√(n).
level and (n-1) degrees of freedom
❖ s = sample standard deviation
•90% – Z Score = 1.645

•95% – Z Score = 1.96
•99% – Z Score = 2.576
Introducing t distribution
❖ Also known as Student’s t distribution
❖ Used when the sample size is small
and/or when the population variance is
unknown
❖ Calculated value
𝑥ҧ − 𝜇
𝑡=
𝑠Τ 𝑛
❖ The form of the t distribution is

determined by its degrees of freedom
(n-1)
Confidence Interval
Confidence Interval
confidence interval of the town income.
s
CI = xത ± t αΤ2 ⋅
n
level and (n-1) degrees of freedom = 2.064
❖ s = sample standard deviation = 5000
❖ n = sample size. = 25
5000
𝐶𝐼 = 42,000 ± 2.064 ⋅
25
𝐶𝐼 = 42,000 ± 2,064 = 39,936 − 44,064
Confidence Interval - Proportion
𝑝Ƹ 1 − 𝑝Ƹ σ
𝐶𝐼 = 𝑝Ƹ ± 𝑧𝛼Τ2 CI = xത ± zαΤ2 ⋅
𝑛 n
❖ Conditions to satisfy for this:

❖ np ≥ 5 and
❖ n(1 − p) ≥ 5
❖ Proportions follow Binomial
Distribution.
❖ The np and n(1-p) condition is used to
approximate it to Normal Distribution.
Confidence Interval - Proportion
❖ Out of 100 pieces sample inspected 10
were found to be defective. What is the
95% confidence interval for proportions?
𝑝Ƹ 1 − 𝑝Ƹ
𝐶𝐼 = 𝑝Ƹ ± 𝑧𝛼Τ2
𝑛
❖ p = 0.10, np = 100x0.10=10, n(1-p)=90

❖ Conditions np ≥ 5 and n(1 − p) ≥ 5 satisfied
0.10 1−0.10
CI = 0.10 ± 1.96
100
𝐶𝐼 = 0.10 ± 0.03 = 0.07 − 0.13
Confidence Interval - Variation
❖ Confidence interval for variance:
𝑛 − 1 𝑠2 2 ≤
𝑛 − 1 𝑠 2
≤ 𝜎
𝑥𝛼2Τ2 2
𝑥1− 𝛼 Τ2
❖ Let’s understand Chi-square distribution

first
Chi-Square Distribution
❖ Select a random sample of size n from a
normal population, having a standard
deviation equal to σ.
❖ The standard deviation in sample is
equal to s.
❖ chi-square for this can be calculated by:
𝑛 − 1 𝑠 2
𝑥2 =
𝜎2
𝑛−1 𝑠 2 52
❖ 𝑥2 = 𝑛−1 𝑠2
2
𝑛−1
𝜎2 2 ≤𝜎 ≤ 2
𝑥𝛼Τ2 𝑥1−𝛼Τ2
𝑛 − 1 𝑠 2
𝑥2 = 𝑛 − 1 𝑠2 2 ≤
𝑛 − 1 𝑠 2
𝜎2 ≤ 𝜎
𝑥𝛼2Τ2 2
𝑥1−
❖ Df = 24, Χ2 0.05 = 36.42, Χ2 0.95 = 13.848 𝛼 Τ2
𝑛 − 1 𝑠 2
𝑥2 =
𝜎2
❖ Df = 24, Χ2 0.05 = 36.42, Χ2 0.95 = 13.848 𝑛 − 1 𝑠2 2 ≤
𝑛 − 1 𝑠 2
≤ 𝜎
❖ For 25 sample of perfume bottles, 𝑥𝛼2Τ2 2
𝑥1− 𝛼 Τ2
variance was found to be 4. Find the CI
of the population with 90% confidence.
❖ (25-1).(4)/36.42 and (25-1).(4)/13.848
❖ Between 2.636 and 6.93
Tests for Mean, Variance & Proportion
One sample z test
One Sample One sample t test
One sample p test
Two sample z test
Two sample t test

Two
Tests
Samples
Paired t test
Two sample p test
Two sample standard

deviation
More than 2
ANOVA
samples
One Sample z Test
❖ z = [x̄ - μ ] / [σ / sqrt( n ) ]
❖ Example: Perfume bottle producing
150cc with sd of 2 cc, 100 bottles are
randomly picked and the average
volume was found to be 152cc. Has
mean volume changed? (95%
confidence)
❖ zcalculated = (152-150)/[2 / sqrt( 100 ) ] =
2/0.2 = 10
❖ zcritical = ?
One Sample z Test
zcritical = 1.96
One Sample z Test
❖ z = [x̄ - μ ] / [σ / sqrt( n ) ]
150cc with sd of 2 cc, 100 bottles are
randomly picked and the average
volume was found to be 152cc. Has
confidence)
❖ zcalculated = (152-150)/[2 / sqrt( 100 ) ] =
2/0.2 = 10
❖ zcritical = 1.96 > Reject Ho
One Sample t Test
❖ t = [x̄ - μ ] / [s / sqrt( n ) ]
150cc, 4 bottles are randomly picked
and the average volume was found to be
151cc and sd of sample was 2 cc. Has
confidence)
❖ tcal = (151-150)/[2 / sqrt( 4 ) ] = 1/1 = 1
❖ tcritical = ?
One Sample t Test
tcritical = 3.182
One Sample t Test
❖ t = [x̄ - μ ] / [s / sqrt( n ) ]
150cc, 4 bottles are randomly picked
and the average volume was found to be
151cc and sd of sample was 2 cc. Has
confidence)
❖ tcal = (151-150)/[2 / sqrt( 4 ) ] = 1/1 = 1
❖ tcritical = 3.182 > Fail to reject Ho
One Sample p Test
❖ H0: p = p0
❖ Example: Smoking rate in a town in past

was 21%, 100 samples were picked and
found 14 smokers. Has smoking habit
changed?
One Sample p Test
changed at 95% confidence? (two tail)
❖ p0 = 0.21, p=0.14
❖ np0 = 0.21x100 = 21 and n(1-p0)= 0.79x100 = 79
❖ >5 means sample size is sufficient.
❖ z = (0.14-0.21)/sqt (0.21x0.79/100)
❖ z = -0.07/0.0407 = -1.719
❖ z critical = 1.96
One Sample p Test
reduced at 95% confidence? (one tail)
❖ H0 : p < p 0
❖ p0 = 0.21, p=0.14
❖ z = (0.14-0.21)/sqt (0.21x0.79/100)
❖ z = -0.07/0.0407 = -1.719
❖ z critical = 1.645
One sample z test
One sample p test
Two sample z test
Two sample t test

Two
Tests
Samples
Paired t test
Two sample p test
Two sample standard

deviation
More than 2
ANOVA
samples
Two Sample z Test
❖ Null hypothesis: H 0: μ 1 = μ 2
❖ or H 0: μ 1 – μ 2= 0
❖ Alternative hypothesis: H a : μ 1 ≠ μ 2
Two Sample z Test
❖ Example: From two machines 100
samples each were drawn.
❖ Machine 1: Mean = 151.2 / sd = 2.1
❖ Machine 2: Mean = 151.9 / sd = 2.2
❖ Is there difference in these two machines.
Check at 95% confidence level.
Two Sample z Test
❖ Machine 1: Mean = 151.2 / sd = 2.1
❖ Machine 2: Mean = 151.9 / sd = 2.2
❖ Is there difference in these two machines.
Check at 95% confidence level.
❖ Zcal = -0.7 / 0.304 = -2.30
❖ Zcritical = 1.96
❖ Reject Null.
❖ There is a difference.
Two Sample z Test
❖ Machine 1: Mean = 151.9 / sd = 2.1
❖ Machine 2: Mean = 151.2 / sd = 2.2
❖ Is there difference of more than 0.2 cc in
these two machines. Check at 95%
confidence level.
❖ H 0: μ 1 – μ 2= 0.2
Two Sample z Test
❖ Machine 1: Mean = 151.9 / sd = 2.1
❖ Machine 2: Mean = 151.2 / sd = 2.2
❖ Is there difference of more than 0.2 cc in
these two machines. Check at 95%
confidence level.
❖ Zcal = 0.5/0.304 = 2.30
❖ Zcritical = 1.64
❖ Reject Null Hypothesis.
Two Sample t Test
❖ If two set of data are independent or
dependent.
❖ If the values in one sample reveal no
information about those of the other
sample, then the samples are independent.
❖ Example: Blood pressure of male/female
❖ If the values in one sample affect the values

in the other sample, then the samples are
dependent.
❖ Example: Blood pressure before and after a
specific medicine
Two Sample t Test
❖ If two set of data are independent or
dependent.
❖ If the values in one sample reveal no
information about those of the other
sample, then the samples are independent.
❖ Example: Blood pressure of male/female
Two sample t test
❖ If the values in one sample affect the values

in the other sample, then the samples are
dependent.
specific medicine Paired t test
Two Sample t Test
❖ Is variance for two samples equal?
❖ If yes: Pooled variance calculate Sp for

finding out t
Two Sample t Test
❖ Example: Samples from two machines A
and B have the following volumes in
bottles. If the mean different? Calculate
with 95% confidence.
Two Sample t Test
Two Sample t Test
tcritical = 2.306
Two Sample t Test
❖ Assumptions: Normality, independent
random samples, population variances
are equal
Two Sample t Test
❖ What if variance of two samples is not
A C
equal? 150 144
152 162
154 177
152 150
151 140
2 Sample t-Test
Test Information
H0: Mean Difference = 0
Ha: Mean Difference Not Equal To 0
Assume Unequal Variance
Results: A C
Count 5 5
Mean 151.80 154.60
Standard Deviation 1.483 15.027
Mean Difference -2.800

Std Error Difference 6.753
DF 4.078
t -0.414644
P-Value (2-sided) 0.6997
Two Sample t Test
❖ Degrees of freedom are calculated by:
Results: A C
Count 5 5
Mean 151.80 154.60
Standard Deviation 1.483 15.027
Two Sample t Test
tcritical = 2.776
Two Sample t Test
❖ Minitab 17 output:
Two-Sample T-Test and CI: A, C
Two-sample T for A vs C
N Mean StDev SE Mean

A 5 151.80 1.48 0.66
C 5 154.6 15.0 6.7
Difference = μ (A) - μ (C)

Estimate for difference: -2.80
95% CI for difference: (-21.55, 15.95)
T-Test of difference = 0 (vs ≠): T-Value = -0.41 P-Value = 0.700
DF = 4
Paired t Test
❖ Where you have two samples in which
observations in one sample can be
paired with observations in the other
sample.
❖ Or
❖ If the values in one sample affect the
values in the other sample, (the samples
are dependent.)
specific medicine
Paired t Test
❖ Find the difference between two set of
readings as d1, d2 …. dn.
❖ Find the mean and standard deviation of
these differences.
Paired t Test
❖ Example: Before and after medicine BP
was measured. Is there a difference at
95% confidence level?
Patient Before After

1 120 122
2 122 120
3 143 141
4 100 109
5 109 109
Paired t Test
Patient Before After difference
1 120 122 2
2 122 120 -2
3 143 141 -2
4 100 109 9
5 109 109 0
❖ d-bar = 1.4 , s = 4.56 , n=5

❖ tcal. = 1.4/2.04 = 0.69
Paired t Test
Patient Before After difference
1 120 122 2
2 122 120 -2
3 143 141 -2
4 100 109 9
5 109 109 0
❖ tcal. = 1.4/2.04 = 0.69
❖ t0.025, 4 = 2.766
❖ Fail to reject null hypothesis
One sample z test
One sample p test
Two sample z test
Two sample t test

Two
Tests
Samples
Paired t test
Two sample p test
Two sample standard

deviation
More than 2
ANOVA
samples
Two Sample p Test
❖ Null hypothesis: H 0: p1 = p2
❖ or H 0: p 1 – p 2= 0
❖ Alternative hypothesis: H a : p 1 ≠ p 2
❖ Normal approximation – Pooled
❖ Normal approximation – Un-pooled

Two Sample p Test
Two Sample p Test
❖ Normal approximation – Pooled
❖ p1 = 30/200 =0.15 , p2= 10/100 = 0.10
❖ Pooled p = (30+10)/(200+100) = 0.1333
❖ Expected value = 13.33% (is >=5)
❖ Z = 0.0500 /
Sqrt(0.133x0.866)(1/200+1/100)
❖ Z = 0.0500/0.4156 = 1.20
Tests for Variance
❖ F-test
❖ for testing equality of two variances from
different populations
❖ for testing equality of several means with
technique of ANOVA.
❖ Chi-square test
❖ For testing the population variance against
a specified value
❖ testing goodness of fit of some probability
distribution
❖ testing for independence of two attributes

Two Sample Variance – F Test
❖ F-test
❖ H0: σ21 = σ22

❖ F calculated
❖ Keep higher value at the top for right tail

test.
❖ Remember: Variance is square of standard
deviation
❖ F critical
❖ Use table with appropriate degrees of
freedom
❖ For two tail test use the table for α/2
❖ Example: We took 8 samples from
machine A and the standard deviation
was 1.1. For machine B we took 5
samples and the variance was 11. Is
there a difference in variance at 90%
confidence level?
❖ n1 = 8, s1 = 1.1, s21 = 1.21, df = 7 (denominator)
❖ n2 = 5, s22 = 11, df = 4 (numerator)
❖ F calculated = 11/1.21 = 9.09 (higher value at top)
F critical = 4.1203
❖ Example: We took 8 samples from
machine A and the standard deviation
was 1.1. For machine B we took 5
samples and the variance was 11. Is
there a difference in variance at 90%
F critical = 4.1203
confidence level?
❖ n1 = 8, s1 = 1.1, s21 = 1.21, df = 7 (denominator)
❖ n2 = 5, s22 = 11, df = 4 (numerator)
❖ F calculated = 11/1.21 = 9.09 (higher value at top)
❖ Reject H0
Tests for Variance
❖ F-test
technique of ANOVA.
❖ Chi-square test
a specified value
distribution

One Sample Chi Square
a specified value σ
❖ Example: A sample of 25 bottles was
selected. The variance of these 25 bottles
as 5 cc. Has it increased from established 4
cc? 95% confidence level.
❖ Ho: s2 <= σ2 / Ha: s2 > σ2
❖ X2 = 24x5 / 4 = 30
❖ What is critical value of Chi Square for 24

degrees of freedom?
❖ Ho: s2 <= σ2 / Ha: s2 > σ2
❖ X2 = 24x5 / 4 = 30
❖ What is critical value of Chi Square for 24

degrees of freedom?
❖ Ho: s2 <= σ2 / Ha: s2 > σ2
❖ X2 = 24x5 / 4 = 30
❖ Critical value of Chi Square for 24 degrees

of freedom = 36.42
❖ Fail to reject H0
❖ SigmaXL Output
ANOVA
❖ F-test
technique of ANOVA.
❖ Chi-square test
a specified value
distribution

ANOVA
❖ F-test
❖ H0: σ21 = σ22

❖ F calculated
❖ Keep higher value at the top for right tail

test.
❖ Remember: Variance is square of standard
deviation
ANOVA
❖ Why ANOVA?
❖ We used t test to compare the means of
two populations.
❖ What if we need to compare more than two
populations? With ANOVA e can find out if
one or more populations have different
mean or comes from a different population.
❖ We could have conducted multiple t Test.
❖ How many t Test we need to conduct if
have to compare 4 samples? … 6
ANOVA
❖ Why ANOVA?
4 x
3 x 3 vs 4
2 x 2 vs 3 2 vs 4
1 x 1 vs 2 1 vs 3 1 vs 4
1 2 3 4
ANOVA
❖ Why ANOVA?
❖ Each test is done with alpha = 0.05 or 95%
confidence.
❖ 6 tests will result in confidence level of
0.95x0.95x0.95x0.95x0.95x0.95 = 0.735
ANOVA
❖ Comparing three machines:
Machine 1 Machine 2 Machine 3
150 153 156
151 152 154
152 148 155
152 151 156
151 149 157
150 152 155
x̄1 = 151 x̄2 = 150.83 x̄3 = 155.50
ANOVA
❖ Comparing three machines: 150 153 156
151 152 154
152 148 155
152 151 156
158
151 149 157
156 150 152 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
154
Median
25th
152
75th
Mean
150
148
146
ANOVA
❖ Comparing three machines: 150 153 156
151 152 154
152 148 155
152 151 156
158
151 149 157
156 150 152 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
154
Median
25th
152
75th
Mean
150
148
146
ANOVA
Machine 1 Machine 2 Machine 3 Machine 4 Machine 5 Machine 6

150 153 156 130 163 166
151 152 154 155 152 154
152 148 155 160 143 155
152 151 156 158 141 151
151 149 157 152 149 152
150 152 155 145 157 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50 x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
ANOVA
Machine 1 Machine 2 Machine 3 Machine 4 Machine 5 Machine 6
150 153 156 130 163 166
151 152 154 155 152 154
152 148 155 160 143 155
152 151 156 158 141 151
151 149 157 152 149 152
150 152 155 145 157 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50 x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
170
158
165
156 160
154 Median
Median 155
25th 25th
152 150
75th 75th
145
150 Mean Mean
140 Outliers
148
135
146 130
125 Machine 4 Machine 5 Machine 6
ANOVA
❖ ANOVA is Analysis of Variance
❖ Variance
❖ Numerator of this formula is Sum of

Squares
❖ Total of Sum of Squares (SST) =
SS between/or treatment +SS within/or error
ANOVA
❖ SST = SS between(or treatment) +SS within(or error)
❖ Ratio:
SS between (or treatment) / SS within(or error)
F = MS between (or treatment) / MS within(or error)

ANOVA
❖ SST = SS between(or treatment) +SS within(or error)

150 153 156
151 152 154
152 148 155
152 151 156
151 149 157
150 152 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
ANOVA
150 153 156
151 152 154
152 148 155
152 151 156
❖ SST = SS between(or treatment) +SS within(or error) 151 149 157

150 152 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
Machine 1 x1 - x̄1 Sqr(x1 - x̄1) Machine 2 x2 - x̄2 Sqr(x2 - x̄2) Machine 3 x3 - x̄3 Sqr(x3 - x̄3)
150.00 -1.00 1.00 153.00 2.17 4.69 156.00 0.50 0.25
151.00 0.00 0.00 152.00 1.17 1.36 154.00 -1.50 2.25
152.00 1.00 1.00 148.00 -2.83 8.03 155.00 -0.50 0.25
152.00 1.00 1.00 151.00 0.17 0.03 156.00 0.50 0.25
151.00 0.00 0.00 149.00 -1.83 3.36 157.00 1.50 2.25
150.00 -1.00 1.00 152.00 1.17 1.36 155.00 -0.50 0.25
151.00 150.83 155.50 152.44
4.00 18.83 5.50
❖ SS within = 4.00+18.83+5.50 = 28.33

ANOVA
150 153 156
151 152 154
152 148 155
152 151 156

150 152 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
150.00 -1.00 1.00 153.00 2.17 4.69 156.00 0.50 0.25
151.00 0.00 0.00 152.00 1.17 1.36 154.00 -1.50 2.25
152.00 1.00 1.00 148.00 -2.83 8.03 155.00 -0.50 0.25
152.00 1.00 1.00 151.00 0.17 0.03 156.00 0.50 0.25
151.00 0.00 0.00 149.00 -1.83 3.36 157.00 1.50 2.25
150.00 -1.00 1.00 152.00 1.17 1.36 155.00 -0.50 0.25
151.00 150.83 155.50 152.44
4.00 18.83 5.50
❖ SS within = 4.00+18.83+5.50 = 28.33

ANOVA
150 153 156
151 152 154
152 148 155
152 151 156

150 152 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
150.00 -1.00 1.00 153.00 2.17 4.69 156.00 0.50 0.25
151.00 0.00 0.00 152.00 1.17 1.36 154.00 -1.50 2.25
152.00 1.00 1.00 148.00 -2.83 8.03 155.00 -0.50 0.25
152.00 1.00 1.00 151.00 0.17 0.03 156.00 0.50 0.25
151.00 0.00 0.00 149.00 -1.83 3.36 157.00 1.50 2.25
150.00 -1.00 1.00 152.00 1.17 1.36 155.00 -0.50 0.25
151.00 150.83 155.50 152.44
4.00 18.83 5.50
1.44 2.07 1.61 2.58 -3.06 9.36
❖ SS between = (2.07+2.58+9.36)x6 = 84.06

ANOVA
150 153 156
151 152 154
152 148 155
152 151 156

150 152 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
❖ SST = 84.06 +28.33 = 112.39
❖ Degrees of freedom
❖ Total df = df treatment + df error
❖ (N-1) = (C-1) + (N-C)
❖ df treatment = 3-1=2, df error = 18-3=15
❖ df total = 17
ANOVA
150 153 156
151 152 154
152 148 155
152 151 156
❖ Mean Sum of Square = SS / df 151 149 157

150 152 155
❖ MSbetween = SS between(or treatment) /df treatment x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
❖ MSbetween = 84.06 / 2 = 42.03
❖ MSwithin = SS within(or error) /df within
❖ MSwithin = 28.33 /15 = 1.89
❖ F = Msbetween / Mswithin = 42.03/1.89 = 22.24

ANOVA
❖ F = MSbetween / MSwithin = 42.03/1.89 =
22.24
❖ Compare this with F critical

❖ F (2, 15, 0.95) = 3.68
❖ Reject Null Hypothesis
❖ DEMONSTRATE MS Excel
ANOVA
150 153 156
151 152 154
152 148 155
F = 22.24 152 151 156

151 149 157
One-Way ANOVA & Means Matrix:
150 152 155
H0: Mean 1 = Mean 2 = ... = Mean k x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
Ha: At least one pair Mean i ≠ Mean
j
Summary Information Machine 1 Machine 2 Machine 3

Count 6 6 6
Mean 151 150.83 155.50
Standard Deviation 0.894427 1.941 1.048808848
UC (2-sided, 95%, pooled) 152.20 152.03 156.70
LC (2-sided, 95%, pooled) 149.80 149.64 154.30
ANOVA Table
Source SS DF MS F P-Value
Between 84.111 2 42.056 22.265 0.0000
Within 28.333 15 1.889
Total 112.44 17
ANOVA
❖ Practice Exercise: Fill in the values for ?A
to ?E in this ANOVA Table:
ANOVA Table
Source SS DF MS F
Between 84.111 ?C ?D ?E
Within ?A 15 1.889
Total ?B 17
❖ STOP the video and try to find out the

values. Once done, go ahead and start
the video
ANOVA
❖ ?A = 15x1.889 = 28.333
❖ ?B = 84.111+28.33 = 112.44
❖ ?C = 17-15 = 2
❖ ?D = 84.111/2 = 43.056
❖ ?E = 42.045/1.889 = 22.265
ANOVA Table
Source SS DF MS F
Between 84.111 ?C= 2 ?D = 42.056 ?E= 22.265
Within ?A= 28.333 15 1.889
Total ?B= 112.44 17
Goodness of Fit Test (Chi Square)
❖ To test if the sample is coming from a
population with specific distribution.
❖ Other goodness-of-fit tests are
❖ Anderson-Darling
❖ Kolmogorov-Smirnov
❖ Chi Square Goodness of Fit can be used
for any time of data: Continuous or
Discrete.
❖ H0: The data follow a specified
distribution.
❖ Ha: The data do not follow the specified
distribution.
❖ Calculated Statistic:
❖ Critical Statistic: Chi square for k-1

degrees of freedom for specific alpha.
❖ A coin is flipped 100 times. Number of
heads are noted. Is this coin biased?
Expected Observed
50 51
50 52
50 56
50 82
50 65
Expected Observed O-E (O-E)2 (O-E)2/E
50 51 1 1 0.02
50 52 2 4 0.08
50 56 6 36 0.72
50 82 32 1024 20.48
50 65 15 225 4.5
X2 = 25.8
X2cal= 25.8
X2(4,0.95)= 9.49
❖ X2cal= 25.8
❖ X2(4,0.95)= 9.49
❖ Reject Null Hypothesis

❖ Coin is biased
Contingency Tables
❖ To find relationship between two
discrete variables.
Smoker Non
Smoker
Male 60 40 100
Female 35 40 75
95 80 175
Operator 1 Operator 2 Operator 3

Shift 1 22 26 23 71
Shift 2 28 62 26 116
Shift 3 72 22 66 160
122 112 115 347
Contingency Tables
❖ Null hypothesis is that there is no
relationship between the row and
column variables.
❖ Alternate hypothesis is that there is a
relationship. Alternate hypothesis does
not tell what type of relationship exists.
Shift 1 22 26 23 71
Shift 2 28 62 26 116
Shift 3 72 22 66 160
122 112 115 347
Contingency Tables
❖ Calculate Chi square statistic.

Shift 1 22 26 23 71
Shift 2 28 62 26 116
Shift 3 72 22 66 160
122 112 115 347
Contingency Tables
OBSERVED
Shift 1 22 26 23 71
Shift 2 28 62 26 116
Shift 3 72 22 66 160
122 112 115 347
EXPECTED
Shift 1 122x71/347 112x71/347 115x71/347 71
Shift 2 122x116/347 112x116/347 115x116/347 116
Shift 3 122x160/347 112x160/347 115x160/347 160
122 112 115 347
Contingency Tables
EXPECTED
Shift 1 122x71/347 112x71/347 115x71/347 71
Shift 2 122x116/347 112x116/347 115x116/347 116
Shift 3 122x160/347 112x160/347 115x160/347 160
122 112 115 347
EXPECTED
Shift 1 24.96 22.91 23.53 71
Shift 2 40.78 37.44 38.44 116
Shift 3 56.25 51.64 53.02 160
122 112 115 347
Contingency Tables
OBSERVED EXPECTED
Operator 1 Operator 2 Operator 3 Operator 1 Operator 2 Operator 3
Shift 1 22 26 23 71 Shift 1 24.96 22.91 23.53 71
Shift 2 28 62 26 116 Shift 2 40.78 37.44 38.44 116
Shift 3 72 22 66 160 Shift 3 56.25 51.64 53.02 160
122 112 115 347 122 112 115 347
Operat Operat
2
(O-E) /E Operator 1
or 2 or 3
Shift 1 (22-24.96)2/24.96 = 0.35 0.42 0.01 71
Shift 2 (28-40.78)2/40.78 = 4.00 16.11 4.03 116
Shift 3 (72-56.25)2/56.25 = 4.41 17.01 3.18 160
122 112 115 347 X2 = 49.52
Contingency Tables
❖ Calculate Chi square statistic = 49.52
❖ Degrees of freedom = (r-1)(c-1) = 4
❖ Chi square critical = 9.49
❖ Reject null hypothesis
❖ There is a relationship between the shift
and the operator.
Contingency Tables
❖ Practice Exercise:
❖ Calculate the Expected value for Non
Smoker Male?
❖ What will be the degrees of freedom in
this example?
Smoker Non
Smoker
Male 60 40 100
Female 35 40 75
95 80 175
Contingency Tables
❖ Practice Exercise:
❖ Calculate the Expected value for Non
Smoker Male? = 80x100/175 = 45.71
❖ What will be the degrees of freedom in
this example? (2-1)(2-1)=1
Smoker Non
Smoker
Male 60 40 100
Female 35 40 75
95 80 175
Correlation
❖ Y = f(X),
❖ where Y is Dependent variable or the result
(output)
❖ X is Independent variable, input or the
controllable variable
❖ For example in the study of marks

obtained by students in a subject (Y) vs
hours of study (X)
Correlation
Correlation
Correlation
❖ Demonstration:
❖ Calculate Pearson’s Correlation coefficient
using MS Excel
Column 1 Column 2
Column 1 1
Column 2 0.879350768 1
Correlation Coefficient
❖ Correlation
❖ Measures the strength of linear
relationship between Y and X
❖ Pearson Correlation Coefficient, r (r
varies between -1 and +1)
❖ Perfect positive relationship: r = 1
❖ No relationship: r = 0
❖ Perfect negative relationship: r = -1
Correlation Coefficient
Correlation vs Causation
❖ Correlation does not imply causation
❖ a correlation between two variables does
not imply that one causes the other
Correlation – Confidence Interval
❖ Population correlation (ρ) – usually
unknown
❖ Sample correlation (r)
❖ Since r is not normally distributed, there
are three steps to find out confidence
interval
❖ Convert r to z’ (Fisher’s Transformation)
❖ Calculate confidence interval in terms of z’
❖ Convert confidence interval back to r
❖ z’ = .5[ln(1+r) – ln(1-r)]
❖ Variance = 1/N-3
❖ N=10, r=0.88 find confidence interval
❖ Step 1.
❖ Convert r to z’
❖ z’ = .5[ln(1+r) – ln(1-r)]
❖ z’ = .5[ln(1+0.88) – ln(1-0.88)]
❖ z’= . 5[0.63 – (-2.12)] = 1.375
❖ Step 2. Confidence interval for z’
❖ Variance = 1/N-3 = 1/7 = 0.1428
❖ Standard error = Sqrt (0.1428) = 0.378
❖ 95% confidence Z = 1.96
❖ CI = 1.375 +/- (1.96)(0.378)
❖ Lower Limit = 0.635
❖ Upper Limit = 2.11
❖ Step 3. Convert back to r
❖ z’ Lower Limit = 0.635
❖ z’ Upper Limit = 2.11
z’ = .5[ln(1+r) – ln(1-r)]
❖ r Lower Limit = 0.56

❖ r Upper Limit = 0.97
Coefficient of Determination
❖ Coefficient of Determination, r2
❖ Proportion of the variance in the
dependent variable that is predictable
from the independent variable
❖ (varies from 0.0 to 1.0 or zero to 100%)
❖ None of the variation in Y is explained by X,
r2 = 0.0
❖ All of the variation in Y is explained by X,
r2= 1.0
❖ r = 0.88, r2 = 0.77
Regression Analysis
❖ Quantifies the relationship between Y
and X (Y = a + bX)
Regression Analysis
and X (Y = a + bX)
Hours Studied (X) Test Score % (Y) XY X2 Y2
20 40 800 400 1600
24 55 1320 576 3025
46 69 3174 2116 4761
62 83 5146 3844 6889
22 27 594 484 729
37 44 1628 1369 1936
45 61 2745 2025 3721
27 33 891 729 1089
65 71 4615 4225 5041
23 37 851 529 1369
SUM 371 520 21764 16297 30160
Regression Analysis
and X (Y = 15.79 + 0.97.X)
Hours
Test Score
Studied XY X2 Y2
% (Y)
(X)
20 40 800 400 1600
24 55 1320 576 3025
46 69 3174 2116 4761
62 83 5146 3844 6889
22 27 594 484 729
37 44 1628 1369 1936
45 61 2745 2025 3721
27 33 891 729 1089
65 71 4615 4225 5041
23 37 851 529 1369
SUM 371 520 21764 16297 30160
Regression Analysis
❖ For a student studying 50 hrs what is the
expected test score %?
Residual Analysis
❖ Y = 15.79 + 0.97.X
Residual Analysis – No pattern
Residual
20
15
10
0
0 10 20 30 40 50 60 70
-5
-10
-15
Residual
Time Series
• The terms Time Series

and Run Charts are used
interchangeably.
• Time series data means
that data is in a series
of particular time periods
or intervals.
Time Series
Patterns
• Trend
• Seasonality
Time Series
Patterns
• Trend
• Seasonality
Time Series
Patterns
• Trend
• Seasonality
Time Series Prediction Techniques
❖ Moving Average Method
❖ Exponential Smoothening
❖ Vector Auto Regression
❖ ARIMA (Autoregressive
moving average model)
Model
Statistical Process Control (SPC)
❖ SPC helps to monitor and control a
process.
❖ Monitoring and controlling the process
ensures that it operates at its full
potential.
❖ At its full potential, the process can
make as much conforming product as
possible with a minimum waste
❖ Products conforming to specification are
acceptable products
❖ Two phases of SPC
❖ Understanding the process variation
❖ Monitoring and Controlling
❖ Finding the trend

❖ Too early
❖ Too late
❖ Key factor is the cause of variation:

Common Cause or Special Cause
❖ Common Cause or Special Cause
❖ Help us in understanding when and when
not to take action.
COMMON CAUSES SPECIAL CAUSES
Many Causes Few Causes

Each Having minimum Impact Each Having Significant Impact
Un-economical to eliminate Economically viable to eliminate
Also Called: Also Called:

Random, Chance, Signal, Systematic,
Non-assignable Assignable
Selection of Variables
❖ What are you interested in?
❖ Is this an important characteristic leading
value to client?
❖ Is client specifically asking to control a
particular variable?
❖ Is this the most difficult to maintain?
❖ Univariate Control Charts
❖ Individual chart for key characteristics
❖ Multivariate Control charts

❖ Monitoring multiple parameters on a single
control chart
❖ T² Hotelling method is used to generate
multivariate charts.
❖ Details of multivariate are not part of this
course
Rational Subgrouping
❖ Subgroups
❖ In control chart we select some number of
units (say 5 units) each time, summarize the
variable and plot it on the control chart.
❖ For example in X-bar, R chart, we calculate
the mean and range of these subgroup
variables.
❖ Subgroup is the snapshot of the process
at that time.
❖ Measurements within a subgroup must
be taken close together in time but still
be independent of each other.
❖ Two types of variation:
❖ Within subgroup: the variation within
subgroups is because of common cause
variation.
❖ Between subgroup: the variation
between subgroups is caused by special
causes.
❖ Control Limits (UCL, LCL) are calculated
based on variation “within” subgroups.
❖ Subgroup should be from single stable
source. Too much variation in subgroup
will lead to too wide control limits.
❖ Subgroups should be time based and
not randomly selected from a already
produced items. This is a time snapshot
of the process.
❖ Subgroup size:
❖ Subgroup size of 5 is more common
❖ Small subgroup size > meaningful
process shifts may go undetected.
❖ Too large subgroup size > insignificant
process shifts gives false alarm
Control Chart Selection
Variables / Measurements
❖ I-MR or X-MR chart (Individual, Moving Range)
❖ X bar - R chart (Average – Range)
❖ X bar - s chart (Average – Standard deviation)
Attributes / Counts
❖ np Chart (Number of defectives)
❖ p Chart (Proportion defectives)
❖ c Chart (Number of defects)
❖ u Chart (Number of defects per unit)
Data Type
Attribute - Counts Variable - Measurements
# Pieces # Occurrences
# Defectives # Defects
Constant Variable Constant Variable n>9 n =2 to 9 n=1
np chart p chart c chart u chart Xbar - s Xbar - R I-MR/X-MR
n is subgroup size
Variable Measurements
Variable - Measurements
n>9 n =2 to 9 n=1
Xbar - s Xbar - R I-MR/X-MR

I-MR or X-MR Chart
Individual Control Limits Moving Range Control Limits
Attributes / Counts
Xbar – R Chart
Xbar Control Limits Range Control Limits
Xbar – s Chart
Xbar Control Limits Range Control Limits
Attributes / Counts
Data Type
Attribute - Counts Variable - Measurements
Constant Variable Constant Variable n>9 n =2 to 9 n=1
np chart p chart c chart u chart Xbar - s Xbar - R I-MR/X-MR
n is subgroup size
np and p Chart
Variable Attribute
Total Defectives and Percent Defective
❖ Binomial Distribution
❖ Subgroup size is normally big compared to
variable charts
# Pieces
# Defectives
Constant Variable
np chart p chart
np Chart
Equal Subgroup size Control Limits
❖ Control limits are straight lines
12 10.931
10
NP - Defectives
8
6 4.680
4
2
0.000
0
p Chart
Unequal Subgroup size Control Limits
❖ Control limits change with the number of items
in the subgroup (subgroup size)
❖ Larger Subgroup – narrow control limits
❖ Smaller Subgroup – wider control limits
0.100
P - Unplanned Return
0.080
0.053
0.060
0.040
0.021
0.020
0.000
0.000
Attribute Measurements
Attribute - Counts
Constant Variable Constant Variable
np chart p chart c chart u chart

c and u Chart
Total Defects / Defects per Unit
❖ Poisson Distribution
❖ Subgroup size is normally big compared to
variable charts
# Occurrences
# Defects
Constant Variable
c chart u chart
c Chart
Equal Subgroup size Control Limits
Average Defects
20.192
20
C - Defects
15
10.480
10
5
0.768
0
u Chart
Defects per Unit Control Limits
Unequal Subgroup size

❖ Control limits change with the number of items
in the subgroup (subgroup size)
❖ Larger Subgroup – narrow control limits
❖ Smaller Subgroup – wider control limits 0.206
0.200
0.150
U - Defects
0.105
0.100
0.050
0.004
0.000
Control Chart Analysis
What is the problem with this process?
110.00
109.00
107.45
108.00
107.00
106.00
X-Bar: Shot 1 - Shot 3
105.00
104.00
103.00
102.00 100.91
101.00
100.00
99.00
98.00
97.00
96.00
95.00 94.36
94.00
Control Chart Rules
❖ Nelson Rules
Rule Pattern Probable Cause
1 1 point more than 3 Stdev from CL New person, wrong setup
2 7 points in a row on same side of CL Setup change, process change
3 7 points in a row all increasing or all decreasing Trend, Tool wear
4 14 points in a row alternating up and down Over control, tempering
5 2 out of 3 points more than 2 Stdev from CL (same side) New person, wrong setup
6 4 out of 5 points more than 1 Stdev from CL (same side) Small shift similar to Rule 1, 5
7 14 points in a row within 1 Stdev from CL (either side) Process change
8 8 points in a row more than 1 Stdev from CL (either side) Process change
Rule Pattern
Rule 1 1
2
1 point more than 3 Stdev from CL
7 points in a row on same side of CL
3 7 points in a row all increasing or all decreasing
4 14 points in a row alternating up and down
5 2 out of 3 points more than 2 Stdev from CL (same side)
7 14 points in a row within 1 Stdev from CL (either side)
8 8 points in a row more than 1 Stdev from CL (either side)
110.00
1
109.00
107.45
108.00
A
107.00
5
106.00
5
105.00
104.00
103.00
B
102.00
101.00 C 100.91
C
2
100.00
99.00 6 2
B
98.00
97.00
96.00
95.00 A 3
94.36
94.00
Rule Pattern
Rule 2 1
2
110.00
1
109.00
107.45
108.00
107.00
5
106.00
5
105.00
104.00
103.00
102.00 100.91
101.00
2
100.00
99.00 6 2
98.00
97.00
96.00 3
94.36
95.00
94.00
Rule Pattern
Rule 3 1
2
110.00
1
109.00
107.45
108.00
107.00
5
106.00
5
105.00
104.00
103.00
102.00 100.91
101.00
2
100.00
99.00 6 2
98.00
97.00
96.00 3
94.36
95.00
94.00
Rule Pattern
Rule 4 1
2
108.00 107.32
107.00
106.00 4
4 5
105.00
104.00
4
103.00
102.00 100.65
101.00
100.00
99.00
98.00
97.00
96.00
95.00 93.98
94.00
93.00
Rule Pattern
Rule 5 1
2
110.00
1
109.00
107.45
108.00
107.00
5
106.00
5
105.00
104.00
103.00
102.00 100.91
101.00
2
100.00
99.00 6 2
98.00
97.00
96.00 3
94.36
95.00
94.00
Rule Pattern
Rule 6 1
2
110.00
1
109.00
107.45
108.00
107.00
5
106.00
5
105.00
104.00
103.00
102.00 100.91
101.00
2
100.00
99.00 6 2
98.00
97.00
96.00 3
94.36
95.00
94.00
Rule Pattern
Rule 7 1
2
8 8 points in a row more than 1106.80
Stdev from CL (either side)
107.00
106.00
5 5
105.00
104.00
103.00
102.00 7
100.45
101.00
7
100.00
99.00
98.00
97.00
96.00
95.00 94.11
94.00
93.00
Rule Pattern
Rule 8 1
2
107.44
108.00
107.00
106.00
105.00 8
104.00
103.00
101.71
102.00
101.00
100.00
99.00
98.00
97.00 95.98
96.00
95.00
Rule Pattern
Rule 1,2,7,8
1 1 point more than 3 Stdev from CL
2 7 points in a row on same side of CL
❖ Probability of Rule 1 4 14 points in a row alternating up and down
❖ (1-0.9973) = 0.0027
❖ Probability of Rule 2 7 14 points in a row within 1 Stdev from CL (either side)
❖ (0.5)7 = 0.0078 8 8 points in a row more than 1 Stdev from CL (either side)
❖ Probability of Rule 7
❖ (0.68)14 = 0.0045
❖ Probability of Rule 8
❖ (1-0.68)8 = 0.0001
Pre-Control Charts
❖ Use specification limits instead of
statistically-derived control limits to
determine process capability over time.
❖ Used during the initial setup process.
❖ Easier to setup, implement and interpret
Pre-Control Charts
• Pre-Control Limits (LPCL and UPCL) are
50% of the tolerance.
• To establish process control, 5 items
should fall in the Pre-Control Limits.
• After that 2 successive units are
periodically samples.
• Continue if both fall in green or
one in green and one in Yellow.
• Stop and adjust process if both fall
in Yellow, or one fall in the red
zone.
Short-run SPC
❖ A typical Control Chart needs 20-25
samples with 4 to 5 items as the
subgroup size.
❖ You need roughly 100 measurements to
define control limits.
❖ What if there are a very few pieces
manufactured?
❖ Use Short-run Chart
Short-run SPC
❖ Short-run SPC focuses on the process
rather than the product.
❖ Example: Different diameter items
produced
❖ E.g Eight items with 300, 400 and 500 mm
each
❖ Options:
❖ 100% inspection – Expensive
❖ First-off inspection – What about process
variation?
❖ Last-off inspection – Too little too late
❖ Separate control chart – limited data
Short-run SPC
302.634 Run A 504.188 Run B 400.548 Run C
300.558 Run A 506.879 Run B 403.193 Run C
301.604 Run A 506.189 Run B 392.790 Run C
298.130 Run A 517.210 Run B 399.538 Run C
298.824 Run A 479.511 Run B 392.192 Run C
301.384 Run A 495.170 Run B 403.812 Run C
302.373 Run A 506.851 Run B 393.457 Run C
298.685 Run A 489.671 Run B 401.051 Run C
Short-run – Difference Chart
Stamp Data
302.634
Run
Run A
Nominal
300
Difference
2.6338 Assumption: Each run has similar variance
300.558 Run A 300 0.5579
301.604 Run A 300 1.6043
298.130 Run A 300 -1.8704
298.824 Run A 300 -1.1757 I-MR Chart of C4
301.384 Run A 300 1.3837 20 UCL=21.16
302.373 Run A 300 2.3729
298.685 Run A 300 -1.3154 10
Individual Value
504.188 Run B 500 4.1884 0
_
X=-0.15
506.879 Run B 500 6.8792
-10
506.189 Run B 500 6.1887
517.210 Run B 500 17.2103 -20 LCL=-21.45
479.511 Run B 500 -20.4891 1 3 5 7 9 11 13 15 17 19 21 23
495.170 Run B 500 -4.8304 Observation
506.851 Run B 500 6.8514

40
489.671 Run B 500 -10.3293 1
400.548 Run C 400 0.5483 30
Moving Range
403.193 Run C 400 3.1932 UCL=26.17
20
392.790 Run C 400 -7.2102
399.538 Run C 400 -0.4623 10 __
MR=8.01
392.192 Run C 400 -7.8076
403.812 Run C 400 3.8119 0 LCL=0
1 3 5 7 9 11 13 15 17 19 21 23
393.457 Run C 400 -6.5428
Observation
401.051 Run C 400 1.0513
Z-MR Chart
Stamp Data Run Nominal Difference
302.634 Run A 300 2.6338
300.558 Run A 300 0.5579
301.604 Run A 300 1.6043
298.130 Run A 300 -1.8704
298.824 Run A 300 -1.1757
301.384 Run A 300 1.3837
302.373 Run A 300 2.3729
298.685 Run A 300 -1.3154
504.188 Run B 500 4.1884
506.879 Run B 500 6.8792
506.189 Run B 500 6.1887
517.210 Run B 500 17.2103
479.511 Run B 500 -20.4891
495.170 Run B 500 -4.8304
506.851 Run B 500 6.8514
489.671 Run B 500 -10.3293
400.548 Run C 400 0.5483
403.193 Run C 400 3.1932
392.790 Run C 400 -7.2102
399.538 Run C 400 -0.4623
392.192 Run C 400 -7.8076
403.812 Run C 400 3.8119
393.457 Run C 400 -6.5428
401.051 Run C 400 1.0513
Process Capability Studies
❖ Select the process
❖ Data Collection Plan
❖ Measurement System Analysis
❖ Gather data
❖ Confirm normality of data
❖ Confirm that the process is in control
❖ Estimate the process capability
❖ Continually improve process
Process Performance Matrices
❖ Percent Defectives
❖ PPM
❖ DPMO
❖ DPU
❖ Rolled Through Yield
Process Performance Matrices
❖ Percent Defectives
❖ PPM
❖ DPMO
❖ DPU
❖ Rolled Through Yield
Percent Defectives
❖ Percent of parts having one or more
defects
❖ 2 percent – 2 pieces per 100 pieces
Parts per Million (PPM)
❖ Defective parts per million.
❖ 2 percent – 2 pieces per 100 pieces
❖ 0.02 x 1,000,000 = 20,000 PPM
Defect vs Defective
❖A nonconforming unit is a defective
unit
❖Defect is nonconformance on one of

many possible quality characteristics of
a unit that causes customer
dissatisfaction.
Defect Opportunity
❖Circumstances in which CTQ can fail to
meet.
❖Number of defect opportunities relate to
complexity of unit.
❖Complex units – Greater opportunities of
defect than simple units
❖Examples:
❖ A units has 5 parts, and in each part there are 3
opportunities of defects – Total defect
opportunities are 5 x 3 = 15
Defects Per Opportunity (DPO)
❖Number of defects divided by number of
defect opportunities
❖Examples:
❖ In previous case (15 defect opportunities), if 10
units have 2 defects.
❖ DPO = 2 / (15 x 10) = 0.0133333

Defect Per Million Opportunities (DPMO)
❖DPO multiplies by one million
❖Examples:
❖ In previous case (15 defect opportunities), if 10
units have 2 defects.
❖ DPO = 2 / (15 x 10) = 0.0133333

❖ DPMO = 0.013333333 x 1,000,000 = 13,333
❖13,333 DPMO is 3.7 Sigma

❖Six Sigma performance is 3.4 DPMO
Defects Per Unit
❖ Number of Defects / Number of Units
❖ In 3,000 welds defects observed were:

❖ 10 Cracks
❖ 15 Porosity
❖ 5 Undercut
❖ DPU = (10+15+5)/3,000 = 30/3,000

= 1/100 = 0.01
Rolled Through Yield
❖ Units entering a process = P
❖ Defective Units = D
❖ Yield = (P-D)/P
❖ Y1 = 0.99, Y2 =0.95, Y3=0.98
❖ RTY = Y1 . Y2 . Y3 = 0.99x0.95x0.98
=0.92169
Process Capability Indices
❖ Ratio of the spread between the process
specifications to the spread of the
process values, (6 process standard
deviations) .
❖ LSL – Lower Specification Limit
❖ USL - Upper Specification Limit
❖ LCL – Lower Control Limit

❖ UCL - Upper Control Limit
❖ Cp = (USL – LSL)/(6*σ within)
❖ Cp = (USL – LSL)/(6*σ within)
❖ CpL = (Process Mean – LSL)/(3* σ within)
❖ CpU = (USL – Process Mean)/(3* σ within)
❖ Cpk = Min (CpU, CpL )

Capability Ratio Cr
❖ Capability ratio (Cr).
❖ This index is computed as 1/Cp (the inverse
of Cp).
❖ Cr = 6σ / (USL-LSL)
❖ Cr x 100 shows the percent of the

specifications that are being used by the
variation in the process.
❖ Why to do Process Capability study?
❖ Understand the behaviors of new/repaired/

adjusted equipment
❖ Review of tolerances
❖ Allocation of equipment
❖ Conditions to be met:
❖ Sample to represent the population
❖ Normal distribution of data
❖ The process must be in statistical control
❖ Sample size must be sufficient
❖ Process Capability vs Rejections
USL−LSL 6σ 8σ 10σ 12σ

Cp 1.00 1.33 1.66 2.00
Rejects 0.27 % 64 ppm 0.6 ppm 2 ppb
Process Performance Indices
❖ Conditions to be met:
❖ Sample to represent the population
❖ Normal distribution of data
❖ The process must be in statistical control
❖ Sample size must be sufficient
Process Performance Indices
❖ Pp = (USL – LSL)/(6* σ overall)
❖ PpL = (Process Mean – LSL)/(3* σ overall)
❖ PpU = (USL – Process Mean)/(3* σ overall)
❖ Ppk = Min (PpU, PpL )

Difference Between Cpk and Ppk
❖ Cpk is calculated using “within” standard
deviation, while Ppk is using “overall”
standard deviation.
❖ Cpk is for short term and Ppk is for long
term.
Taguchi Capability Index - Cpm
❖ Process Capability
Design of Experiments
❖ We conduct experiments in our daily
life.
❖ Car:
❖ Does AC affect the car mileage?
❖ Does number of passengers affect the car
mileage?
❖ What about tire pressure, speed …..
❖ Course selling:
❖ Does intro video affects the sale?
❖ What about course length, quizzes, closed
captions …..
❖ Y = f (X)
❖ Output (Y) is the function of inputs
(X)
Y X
Output Input
Dependent Variable Independent Variable
Response (or Outcome) Factor
❖ Response: The output(s) of a process.
Sometimes called dependent
variable(s).
❖ Factor: A factor of an experiment is a

controlled independent variable; a
variable whose levels are set by the
experimenter. These can be numeric
or categorical.
❖ In coffee making process
❖ Response: taste of coffee
❖ Factor: milk and sugar.

(Lets ignore other factors to keep it simple)
❖ In coffee making process – 150cc
❖ Milk : 40 cc vs 80 cc
❖ Sugar : 10 gms vs 20 gms

❖ In coffee making process – 150cc
❖ Milk : 40 cc (-) vs 80 cc (+)
❖ Sugar : 10 gms (-) vs 20 gms (+)

# Sugar Milk Rating Sequence
1 - - 3 2
2 - + 6 4
3 + - 6 1
4 + + 9 3
6 9
+
Milk
-
3 - Sugar + 6
6 9
+
Milk
1 - - 3 2
2 - + 6 4
3 + - 6 1 -
4 + + 9 3 3 - Sugar + 6
Interaction Chart 10 Interaction Chart 9
10 9
8
8
6 6 6 6
6 6
4 3 4 3
Milk -
2 2 Sugar -
Milk +
Sugar +
0 0
Sugar - Sugar + Milk - Milk +
1 - - 3 2
2 - + 6 4
3 + - 6 1
4 + + 9 3
Contour Plot 6 9
+
Milk
-
3 - Sugar + 6
6 9
❖ Y = B0 + B1X1 + B2X2 +
❖ Y = B0 + BsXs + BmXm
Milk
❖ Y = 6 + 1.5 Xs + 1.5 Xm
-
3 - Sugar + 6
Y = 6 + 1.5 Xs + 1.5 Xm
# Sugar Milk Rating

1 - - 3
2 - + 6
3 + - 6
4 + + 9
Planning and Conducting a DoE
1. Define the objective
❖ To screen
❖ To optimize
❖ Robustness
2. Choice of factors and levels
3. Choose the response variable
4. Design the experiment
5. Conduct the experiment
6. Analyze results
7. Draw conclusion and make
recommendations
Objective - Screening Experiments
❖ When you have a large number of
factors to study e.g. > 5
❖ Purpose is to select important factors
from the large list.
❖ Used when the system is new and you
don’t want to waste too much of
resources on non important factors.
Objective - Optimization
❖ When we know what are the key factors
affecting the performance
❖ When we are looking for the optimum
setting for these selected factors.
Objective - Robustness
❖ To determine conditions under which
the output (response) is least affected
by variations in inputs.
Selecting Factors
❖ Knowledge of the process is important.
❖ Factors of interest are selected.
❖ Other factors are called nuisance.
❖ Address nuisance by
❖ Randomization,
❖ Blocking
❖ Analysis of covariance.
Choosing the Response Variable
❖ Response variable should provide
important information about the
process.
❖ It should be measurable.
❖ Measurement error should be
considered.
Design the Experiment
❖ Full Factorial or Partial Factorial Design
❖ Various design options
❖ One Factor Designs
❖ Completely Randomized
❖ Randomized Block
❖ Latin Square and Graeco-Latin Square
Designs
❖ Fractional Factorial
❖ Plackett-Burman designs (for screening
neglecting higher order interactions )
Conduct the Experiment
❖ Follow the plan
❖ Mistakes in setting up the correct level
or recording the results might be costly.
Consider doing some pilot test before
doing the full experiment.
Analyze Results
❖ Various software are available to do
analysis.
❖ Minitab
❖ SigmaXL
❖ JMP
❖ Microsoft Excel
❖ R
Make Conclusion / Recommendations
❖ Draw conclusion from analysis and
present using simple charts.
❖ Results could lead to the next
experiment.
Standard Sugar Milk Rating Run Order
Order
1 - - 3.7 2
2 - + 7.2 4
3 + - 7.9 1
4 + + 8.8 3
7.2 8.8
+
Milk
-
3.7 - Sugar + 7.9
7.2 8.8
+
Order
Milk
1 - - 3.7 2
2 - + 7.2 4
3 + - 7.9 1
-
4 + + 8.8 3 3.7 - Sugar + 7.9
Interaction Chart 10 Interaction Chart 8.8
10 8.8 7.9
7.2 8
8
7.2
7.9 6
6
3.7 3.7
4 4
Milk -
2 2 Sugar -
Milk +
Sugar +
0 0
Sugar - Sugar + Milk - Milk +
Order
1 - - 3.7 2
2 - + 7.2 4
3 + - 7.9 1
4 + + 8.8 3
7.2 8.8
Contour Plot +
Milk
-
3.7 - Sugar + 7.9
7.2 8.8
❖ Y = B0 + B1X1 + B2X2 +
Milk
❖ Y = 6.9 + 1.45 Xs + 1.1 Xm
-
❖ For low milk, low sugar
3.7 - Sugar + 7.9
❖ Y = 6.9 + 1.45 (-1) +1.1 (-1) = 4.35 (against 3.7)
❖ Hence something else is playing here … called
interaction or Xs . Xm
❖ Interaction Xs . Xm = (8.8-7.2)-(7.9-3.7)/2
7.2 8.8
❖ Y = B0 + B1X1 + B2X2 +
Milk
❖ Y = 6.9 + 1.45 Xs + 1.1 Xm
-
❖ Interaction is half the difference in the
3.7 - Sugar + 7.9
effect of sugar when milk is high or low is
= (8.8-7.2)-(7.9-3.7)/2 = -1.3
❖ Report of half of this in the equation as
multiple of Xs . Xm or -1.3/2 = -0.65
❖ Y = 6.9 + 1.45 Xs + 1.1 Xm – 0.65 Xs . Xm
7.2 8.8
❖ Y = 6.9 + 1.45 Xs + 1.1 Xm – 0.65 Xs . Xm +
❖ For high (+) sugar high (+) milk
Milk
❖ Y = 6.9 + 1.45 (+1) + 1.1 (+1) – 0.65 (+1).(+1)
-
❖ Y = 6.9 + 1.45 + 1.1 – 0.65 = 8.8
3.7 - Sugar + 7.9
❖ For high (+) sugar low (-) milk

❖ Y = 6.9 + 1.45 (+1) + 1.1 (-1) – 0.65 (+1).(-1)
❖ Y = 6.9 + 1.45 - 1.1 + 0.65 = 7.9
❖ Factor: A factor of an experiment is a
controlled independent variable; a
variable whose levels are set by the
experimenter.
❖ Level: Settings of each factor in the
study.
❖ Treatment: A treatment is a specific
combination of factor levels whose
effect is to be compared with other
treatments.
❖ Response: The output(s) of a process.
Sometimes called dependent
variable(s).
❖ Effect : How changing the settings of a
factor changes the response. The effect
of a single factor is also called a main
effect.
❖ Interaction: Occurs when the effect of
one factor on a response depends on
the level of another factor(s).
❖ Randomization: to eliminate bias. The
use of randomization in experiments is
common practice.
❖ Replication :Performing the same
treatment combination more than
once.
❖ 3 Factors 2 Levels
Standard Sugar Milk Bean Rating Run +
Order Order
Milk
1 - - -
2 - - + +
3 - + - -
-
4 - + + - Sugar +
5 + - -
6 + - +
7 + + -
8 + + +
❖ Y = B0 + BsXs + BmXm + Bsm Xs . Xm +
❖
Milk
❖ 3 Factors 2 Levels +
❖ Y = B0 + BsXs + BmXm + BbXb -
-
+ Bsm Xs . Xm + Bsb Xs . Xb + Bmb Xm . Xb - Sugar +
+ Bsmb Xs . Xm. Xb
❖ Y = B0 + B1X1 + B2X2 + B3X3 + B4X4
+ B12 X1 . X2 + B13 X1 . X3 + B14 X1 . X4
+ B23 X2 . X3 + B24 X2 . X4 + B34 X3 . X4
+ B123 X1 . X2. X3 + B124 X1 . X2. X4

+ B134 X1 . X3. X4 + B234 X2 . X3. X4
+ B1234 X1 . X2. X3. X4

❖ Number of experiments = Level Factors
❖ 3 Factors 2 Level experiment requires
23 = 8 experiments (full factorial)
Half Factorial
+
Milk
23-1 = 4 experiments (half factorial) +
Standard Sugar Milk Bean Run Order
Order
-
1 - - -
-
2 - - +
- Sugar +
3 - + -
4 - + +
5 + - -
6 + - +
7 + + -
8 + + +
Half Factorial
8
4
Standard Sugar Milk Bean Run 3

Order Order + 7
1 - - -
Milk
2
+6
2 - - +
3 - + -
4 - + + - -
5 + - -
1 - Sugar + 5
6 + - +
7 + + -
8 + + +
Half Factorial
Standard Sugar Milk Bean

Order
1 - - -
2 - - +
3 - + -
4 - + +
5 + - -
6 + - +
7 + + -
8 + + +
Half Factorial
8
4
3
+ 7
Milk
2
+6
-
1 - -
Sugar + 5
Half Factorial
Standard Sugar Milk Bean Run
Standard Sugar Milk Bean Run Order Orde
Order Order r
1 - - - 1 - - (-).(-) = +
2 - - + 2 - + (-).(+) = -
3 - + - 3 + - (+).(-) = -
4 - + + 4 + + (+).(+) = +
5 + - -
6 + - +
7 + + -
8 + + +
+C = A.B
-C = A.B
Half Factorial
Sugar Milk Bean A.B B.C C.A A.B.C
(A) (B) (C)
1 - - + + - - +
2 - + - - - + +
3 + - - - + - +
4 + + + + + + +
A = B.C
B = A.C
C = A.B
Confounding/Aliased
❖ When effects that cannot be estimated
separately from each other.
A = B.C
B = A.C
C = A.B
❖ In above A is the alias for BC, or BC is the

alias for A. There effects are mixed up.
❖ Confounding occurs in fractional
factorial design. (Price we pay for less
experiments)
Resolution
❖ Resolution III
❖ No main effects are aliased with any other
main effect,
❖ but main effects are aliased with 2-factor
interactions.
❖ Resolution IV
main effect or 2-factor interactions,
❖ but some 2-factor interactions are aliased with
other 2-factor interactions and main effects are
aliased with 3-factor interactions.
❖ Resolution V
❖ No main effects or 2-factor interactions are
aliased with any other main effect or 2-factor
interactions,
❖ but 2-factor interactions are aliased with 3-
factor interactions and main effects are aliased
with 4-factor interactions.
Blocking
❖ To avoid the effect of nuisance factors
❖ In terms of agricultural experiments,
blocks are pieces of land or blocks used
for different treatment.
❖ These are dummy factors
Nuisance Factors
❖ A nuisance factor is a factor that has
some effect on the response, but is of
no interest to the experimenter.
❖ For example:
❖ Operator
❖ Environmental conditions
❖ etc
Nuisance Factors
❖ How to deal with nuisance factors?
❖ Known and Controllable
❖ Blocking (location, shift, land piece)
❖ Unknown and Uncontrollable
❖ Randomization (bias, order of treatment)
❖ Known and Uncontrollable (but
measurable)
❖ Analysis of Covariance ANCOVA
(combination of ANOVA and linear
regression)
❖ E.g. weight of patient, IQ
Balanced Design
Balanced Design Unbalanced Design
Sugar Milk Bean Sugar Milk Bean
- - - - - -
- - + - - +
- + - - + -
- + + - + +
+ - - + - -
+ - + + - +
+ + - + + -
+ + +
Balanced Design
❖ balanced design has an equal number of
observations for all possible level
combinations.
❖ Some times loss of data leads to
unbalanced design, e.g.
❖ Delay in getting one result
❖ Loss of one experiment
One Factor Experiments
❖ ANOVA – This was a One Factor 3 Level
Experiment
150 153 156
151 152 154
152 148 155
152 151 156
151 149 157
150 152 155
x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50
158
156
154 Median
25th
152
75th
150 Mean
148
146
One Factor Experiments
❖ Blocking (shift, operator)
❖ Randomization (vibration)
❖ Three types of experiments

❖ Completely Randomized
❖ Randomized Block
❖ Latin Square Designs
Completely Randomized
❖ When nuisance factors are Unknown
and Uncontrollable
❖ Example: 30 patient
❖ 15 provided treatment A (Experimental
Group)
❖ 15 provided placebo (Control Group)
❖ Nuisance factors – age, medical history
..
❖ Placebo is fake medicine, which has no
medical effect but may have
psychological effect. Patients do not
know which group (Experimental or
Control) they are in.
Completely Randomized
Experimental
Treatment A
Group Compare
Patients Results
Control
Placebo
Group
Randomized Block Design
❖ Blocking (gender)
❖ Randomization (age, medical history)
❖ Example: 30 patient (16 male, 14
female)
Randomized Block Design
16 males Experimental
Treatment A
Group
Compare Results
Males
Control
Placebo
Group
Patients
Experimental
Treatment A
Group
16 males Females
Compare Results
14 females Control
Placebo
14 females Group
Here we have one blocking factor. Male/Female

Latin square designs allow for two blocking factors.
Latin Square Design
❖ Latin square designs allow for two
blocking factors. It is used to
simultaneously eliminate two sources of
nuisance variability.
❖ Lets take an example of wheat crop
field. We are looking at the effect of 4
fertilizers on the yield.
Latin Square Design
❖ To study the effect of 4 types of
fertilizers, we can divide the field into 4 A B
parts and apply fertilizers.
C D
Latin Square Design
blocking factors (rows, columns). It is
used to simultaneously eliminate two
sources of nuisance variability.
D A B C
B C D A
A B C D
C D A B
Graeco-Latin Square Designs
blocking factors. It is used to
simultaneously eliminate two sources of
nuisance variability.
❖ Graeco-Latin square designs allow for
three blocking factors.
Graeco-Latin Square Designs
D A B C
B C D A
A B C D
C D A B
Two Level Factorial Experiments
❖ 3 Factors Two Level Full Factorial
Requires 8 experiments +
Standard Sugar Milk Bean Rating Run
Milk
Order Order
1 - - - +
2 - - + -
-
3 - + - - Sugar +
4 - + +
5 + - -
6 + - +
7 + + -
8 + + +
Two Level Factorial
❖ 3 Factors Two Level Half Factorial
Requires 4 experiments
❖ Partial Factor experiments have
confounding.
Sugar Milk Bean A.B B.C C.A A.B.C
(A) (B) (C)
1 - - + + - - +
2 - + - - - + +
3 + - - - + - +
4 + + + + + + +
A = B.C
B = A.C
C = A.B
Resolution
❖ Resolution III
A = B.C
main effect, B = A.C
❖ but main effects are aliased with 2-factor C = A.B
interactions.
❖ Resolution IV
main effect or 2-factor interactions,
❖ but some 2-factor interactions are aliased with
other 2-factor interactions and main effects are
aliased with 3-factor interactions.
❖ Resolution V
❖ No main effects or 2-factor interactions are
aliased with any other main effect or 2-factor
interactions,
❖ but 2-factor interactions are aliased with 3-
factor interactions and main effects are aliased
with 4-factor interactions.
2 Designs
k-p
❖ k Factors at 2 levels require 2k

experiments (Full Factorial)
❖ 5 factors at 2 level require 25 = 32
experiments
❖ 10 factors at 2 level require 210 = 1024
experiments
❖ P is the fraction
❖ p = 1 for half factorial
❖ p = 2 for quarter factorial
❖ 10 factors at 2 level require 210-2 = 256
experiments (quarter factorial)
Plackett-Burman Designs
❖ 2k-p designs require 4, 8, 16, 32, 64, 128
… number of experiments.
❖ Plackett-Burman Designs provides
number of experiments between these
values.
❖ 8, 12, 16, 20, 24, 28, 32, 36, 40, ….., 64,
68, 72, ……..128
❖ Number of experiments are multiples of
4.
❖ Very efficient for screening experiments.
❖ It can be used for up to N-1 factors using
N trials. For example if you have 7
factors you can run the experiment with
8 trials.
❖ Full factorial for 7 factors (2 level) will
require 27 = 128 experiments
❖ Number of experiments have to be
multiple of 4.
❖ Disadvantage
❖ Effect of one main factor might depend
upon other factor (or interaction of two or
more factors).
❖ For this reason Plackett-Burman designs are
used as starting point and further
experiments would be require.

Quantitative Methods Guide

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Quantitative Methods Guide

Încărcat de

Drepturi de autor:

Formate disponibile

6.

Quantitative Methods and Tools

Qualitative Data Quantitative Data

Nominal Ordinal Interval Ratio

Nominal Ordinal Interval Ratio

Nominal Ordinal Interval Ratio

Nominal Ordinal Interval Ratio

Nominal Ordinal Interval Ratio

Nominal Ordinal Interval Ratio

Nominal Ordinal Interval Ratio

Qualitative Data Quantitative Data

Defects during water bottle manufacturing

300 ml. || || |||| | | 11

❖ Delete the row

❖ Affected by extreme values Mean Mode Median Percentile

❖ Example: 10, 11, 14, 9, 6

❖ Example: 10, 11, 14, 9, 6, 10 Mean Mode Median Percentile

❖ In ascending order - 6,9,10,11,14 Quartile

❖ Example: 10, 11, 14, 9, 6, 11

parts when arranged in ascending or

❖ Quartile divides data in 4 parts

❖ P=percentile, n=numbers in data set Quartile

❖ If i is whole number – Percentile is

❖ IQR = Q3-Q1 Interquartile Standard

about the arithmetic mean.

❖ Histogram and Normal Probability

❖ Probability plots (QQ Plot) are used to

❖ Using different data to check Normality

❖ Or: Power is the ability of a test to

❖ Upper Tail Tests

❖ Mean of Multiple samples

•90% – Z Score = 1.645

Number of outcomes in which the event occurs

Number of times an event occurred

❖ Event or Outcome: Result of experiment

❖ Sample Space: A sample space of an

❖ Intersection: Probability that events A

❖ Independent Events: The occurrence of

❖ Complementary Events: The probability

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

P(A ∩ B) = P(A) P(B|A)

❖ Combination: A set of objects in which

❖ Long Tails / Bell Shaped

❖ Mean/ Mode and Median are same

❖ About 95% of the area under the curve

❖ About 99.7% of the area under the

Y = { 1/[ σ * sqrt(2π) ] } * e -(x - μ)2/2σ2

❖ where X is a normal random variable,

❖ How many standard deviations an

❖ Poisson Probability Distribution

❖ The standard deviation (σx) is

❖ In Excel use: HYPGEOM.DIST function

•90% – Z Score = 1.645

•90% – Z Score = 1.645

❖ The form of the t distribution is

❖ Conditions to satisfy for this:

❖ p = 0.10, np = 100x0.10=10, n(1-p)=90

❖ Let’s understand Chi-square distribution

One Sample One sample t test

One sample p test

Two sample z test

Two sample t test

Two sample p test