Sunteți pe pagina 1din 11

1/1/2007

Correlation
& Regression

Prof.Dhananjay M.Apte

Trainer-Six Sigma, Faculty-Statistics, Q.T. Operation Mgt. dhananjayapte@yahoo.com

Cell- 98231 90939

For private circulation only. All rights reserved Prof.D.M.Apte Monday, January 01, 2007

Example
Correlation
between Age & Growth is

More linear, Less Scattered Good correlation

Data is presented Graphically (Called

Scatter Diagram)

Correlation
between Age & Growth is

Less linear More Scattered


Not so good correlation

Monday, January 01, 2007

1/1/2007

To quantify the Correlation

Its done by a value ranging from 0 to 1


The value is called Correlation Coefficient Termed as r
It is generally determined by Karl Pearson Formula.

r = close to 0

r = 0.94

r = 0.99
Negative Correlation

How will be the points, if..

r =1
Example: Calculation of cell phone bill 3

Monday, January 01, 2007

Karl Pearson Method


Example: Lets consider 2 Parameters, Weight (x Axis) & Blood Pressure (y Axis) Lets have some values

Lets determine r

Prof.D.M.Apte

Monday, January 01, 2007

1/1/2007

826

826

r = 0.837The value is non negative, Its Positive Correlation,


The Scatter Diagram will show the

Correlation is

Monday, January 01, 2007

Correlation

Correlation is a measure of association/relationship between two numerical variables. Correlation r measures the direction and the strength of the linear association between two numerical paired variables.

r can be negative also-Examples Typically, in the summer


as the Temperature increases, the Coldness decreases.

Linear and Non Linear correlation. y = ax + b y = a x 2 + bx + c


Coefficient of Determination = Square of r It describes how
much y is explained by change in x.

Different Scatter plots

Monday, January 01, 2007

1/1/2007

Monday, January 01, 2007

Monday, January 01, 2007

1/1/2007

exercise

Monday, January 01, 2007

EXERCISES
1) Find r for Tree Problem.
10

x 65

y 67

2) Determine r using scientific calculator for the given dataMode-reg-linear. x,y M+ S-sum

66
67 67 68 69 70 72

68
65 68 72 72 69 71

Ans. 2) Sum x=544, sum y= 552, sum x2 = 37028,

sum y2= 38132, sum xy= 37560

r = 0.604
3) Find r for following.. x y 100 130 200 110 300 100 400 80 500 60 600 50

700 30

Ans.3) n= 7, sum xy = 178000, sum x= 2800, sum y= 560, sum x 2=1400000 Sum y2=52400.r
10

= -- 0.99
XX

Prof.D.M.Apte

Monday, January 01, 2007

1/1/2007

Regression

11

Monday, January 01, 2007

Scatter Diagram For a Data

r<1

We draw (Fit) the line that is representative of all the data points. Such line is called the line of best fit, the Regression line
Thus Regression calls for estimating the Best fitted line, passing through the given data points. It is done by using Least Square Theory Hence the line is also called as least square line

12

Least sq theory

Monday, January 01, 2007

1/1/2007

Y is Dependent Variable X is Independent Variable

( Least
13

Square Theory )
use

Monday, January 01, 2007

A new baby is born that had gestated for 30 weeks. Whats your best guess at the birth-weight?
Extrapolated line

3000
Y=birthweight (g)

30
14
How to fit the line

X=gestation time (weeks)


Monday, January 01, 2007

1/1/2007

= s xy /sxx

N = 10

b= 2.76 , a= --65 Regression Equation


15
Error ?

= b0 + b 1 X
Monday, January 01, 2007

Whats

Error ?..

Its the deviation between given data point & respective point on Regression line

error

are the points on Regression Line y are the given points in the data set. The difference between these, (i.e. y y cap) is the error (Also called Residual) Residual Plot
16 Monday, January 01, 2007

1/1/2007

Exercise
Find the error for all the y points given below

17

Monday, January 01, 2007

Exercise Exercise

1) From the Regression Equation, find the value of y at x = 75 2) How to find the value of x at y =150 ?
Note: . ............Replace x by y in the denominator part of the Slope Equation

Regression 18

y on x

and

Regression

x on y
Monday, January 01, 2007

1/1/2007

Prediction quality (Uncertainty) of Regression line

To measure prediction quality of fitted line, we measure S.D. of residualsSe Se = Sq. Root of Syy b1 Sxy.= We can define the value of Slope (b, b1) in

Interval

Interval of Slope =

The Multiple Regression Equation


Each of these factors has a separate relationship with the price of a home. The equation that describes a multiple regression relationship is:

y = a + b1x1 + b2x2 + b3x3 + bnxn + e


This equation separates each individual independent variable from the rest, allowing each to have its own coefficient describing its relationship to the dependent variable. If square footage is one of the independent variables, and it has a coefficient of $50, then every additional square foot of space adds $50, on average, to the price of the home.

20

Monday, January 01, 2007

10

1/1/2007

21

(4) Obtain Regression Equation (both, y on x & x on y) for the data in above problems no 1 & 3 and Also find the Mean Error
21 Prof.D.M.Apte Monday, January 01, 2007

11

S-ar putea să vă placă și