Documente Academic
Documente Profesional
Documente Cultură
/1
Correlation and Regression Analysis
When to use?:
• X and Y both are variable / continuous
• Data on X and Y to be in pair (for each value of X,
there needs to be a corresponding value of Y).
2
Three steps
• Scatter Plot.
• Correlation Analysis.
• Regression Analysis.
3
Scatter Plot
Construction of a scatter diagram:
Collect paired samples of data from two variables that you
think might be related and make a dataset.
Place the supposed independent variable, the one that
potentially “affects” a change in the other variable, on the
X axis.
Place the potential response or dependent variable on the
Y axis.
4
Scatter Plot
5
Scatter Plot
6
Scatter Plot
n=30 r=0.9 n=30 r=-0.9
y-effect
y-effect
x-cause x-cause
Positive Correlation Negative Correlation
There are
n=30 r=0.6 n=30 r=-0.6
many types of
scattering
patterns
7
Correlation Analysis
• Scatter diagrams or plots provides a graphical
representation of the relationship.
(x i − x)( yi − y ) 𝐒 𝐱𝐲
r= i =1 =
n n 𝐒 𝐱𝐱𝐒 𝐲𝐲
(x
i =1
i − x) 2
(y
i =1
i − y) 2
8
r = Linear Correlation Coefficient
9
r = Linear Correlation Coefficient
Thumb rule:
p < 0.05 for correlation to exist.
Once correlation exists, strength can be classified as
follows:
Value of correlation
Type of correlation
coefficient (r)
> 0.9 Strong
Between 0.7 and 0.9 Moderate
< 0.7 Weak
10
r = Linear Correlation Coefficient
12
Correlation vs. Causation
• It is important to keep in mind that a strong
mathematical (or graphical) relationship between two
variables does not confirm that one causes the other.
Two variables can be highly related to one another, but
neither is caused by the other.
• Validation of root cause is made only when two
requirements are met:
– There is a statistically significant relationship between the
suspected root cause and the effect.
– Knowledge of the process corroborates this causal relationship.
13
Finding Relationships in Data
• One of the most important aspects of statistical analysis in Six
Sigma is the identification of a mathematical model (equation) that
explains relationships present in a dataset.
• If X & Y variables are continuous, the method used is Regression,
also sometimes referred to as “curve fitting”.
Regression provides:
• A hypothesis test of whether each input variable (X) is significantly
correlated with the response (Y) under study.
• A quantitative estimate of the relationship of each input variable
with the response.
– A coefficient in a mathematical equation.
• An estimate of how much of the total variation in the response is
explained by each factor.
14
Regression Analysis
• Two types of regression:
– If there is one X and one Y it is called simple linear
regression
• Model will be Y = b 0 + b 1X
Y
• Where b 0 is y intercept
b1
• b 1 is slope 𝚫𝐲
𝚫𝐱
b0
X
– If there are multiple X’s and one Y, then we have
multiple linear regression
• Model will be Y= b 0 + b 1 X1 + b 2 X2 + b 3X3 + ……….
15
Simple Linear Regression
16
Analysis & Interpretation –
Coefficients Estimates
17
Analysis & Interpretation –
Coefficient of determination
Try to manually draw the scatter diagram for these situations and discuss
18