Sunteți pe pagina 1din 52

GrowingKnowing.

com © 2014 1
Section 11

GrowingKnowing.com © 2014 2
Correlation and Regression
 Correlation is the relationship between two variables:
 If a salesperson doubles the number of sales calls, will that
affect their sales commissions?
 If a student attends half the classes during a particular course,
will that affect their final grade ?

 Regression is an equation or a model,


 Helps predict what will happen,
 As the volume of sales calls increase, how much will
commissions increase?
 As class attendance increases, how much will a students
grades increase?
GrowingKnowing.com © 2014 3
Correlation
Relationships can be positive, negative.... or none at all!
 Positive Relationship:
 The more time spent studying, the better your grades !
 Both variables (studying & grades) increase together.
 Negative Relationship:
 The more time spent partying, the lower your grades!
 One variable goes up (partying) while the other goes down (grades).
 No Relationship:
 I call Lady Gaga and leave a message, she does not return my call.
 I keep calling her... 5, 10, then 15 times......and even send flowers!
… yet she still does not return my calls!
 One variable (my calls) is increasing,
 the other variable (her response) does not change!

GrowingKnowing.com © 2014 4
GrowingKnowing.com © 2014 5
Correlation and Regression
Simple Regression has two variables:
 Dependent variable: (y)
 Independent variable: (x)

 The Dependant Variable is:


 What you want to predict,
 What you need to study,
 More important.
The Independent Variable:
 Affects or influences the dependant variable.

GrowingKnowing.com © 2014 6
The Variables:
 To identify the dependent versus independent variable,
 Ask these questions:
 Does variable 1 affect/influence variable 2 ???
 Or.....does variable 2 affect/influence variable 1 ?
Example 1:
Total beer sales at a Raptors game and the attendance at
the ACC for that game,
Which are the dependent and the independent variables?
 Dependant: beer sales
 Independent: attendance

 The level of attendance affects beer sales.


 So beer sales is dependent on attendance!!!!
 Attendance is not dependent on beer sales…..

GrowingKnowing.com © 2014 7
The Variables:
Example 2:
The sales volumes for a particular product and the
amount spent on advertising for that product.
 Dependant variable: sales volume
 Independent variable: advertising budget

HINT....in business,
 the dependent variable is almost always money …
 Sales, expenses, profit, price, … etc.
 business cares more about money than anything or
anyone else!
GrowingKnowing.com © 2014 8
Coefficient of Correlation: r
 The coefficient of correlation (r) gives you the strength of
the relationship between the variables:
 +1 is a perfectly positive relationship,
 Positive indicates the regression line points upward.

 1 indicates the data values form a perfect line.

 -1 is a perfectly negative relationship,


 Negative : the line points downward.

 -.5 or +.5 is a moderate relationship.


 Data values are a bit scattered but are in the shape of a column.
 0 (zero) means there is no relationship.
 Data is too scattered; no pattern so there is no line.

GrowingKnowing.com © 2014 9
Weak or Strong?
 The more scattered the data, the closer the correlation
coefficient is to zero.
 The closer the correlation value is to zero,
 the weaker the relationship.
 The more concentrated the data around the regression
line (the graph line), the closer the correlation
coefficient is to either +1 or -1.
 The closer the correlation value is to +1 or -1,
 the stronger the relationship.

GrowingKnowing.com © 2014 10
Strength of the Relationship:
1 (or -1): Perfect
.99 to .8: Strong
.79 to .50: Moderate
.49 to .10: Weak
o: No relationship

Remember: A relationship can have a positive value…


or a negative value!
GrowingKnowing.com © 2014 11
GrowingKnowing.com © 2014 12
Coefficient of Determination: r2
 Is the coefficient of correlation squared (r² or R2 ):
Shows how much of an impact an independent
variable (x) has on a dependent variable (y).
 Why did you have good grades? (dependent variable):
Attended each class, studied for each quiz, completed the
homework, had a good teacher (independent variables).
 R2 explains how much your grades (y) are affected by
an independent variable (x).
 The comparison of the R² value for each (x) variable will
tell you which one has the greatest impact on grades:
 Attendance, studying, completion of the homework,
a good teacher …….or any other independent variable!

GrowingKnowing.com © 2014 13
Coefficient of Determination: r2

 This value is always positive.


 Can be expressed as a decimal or as a percentage.
 Cannot be greater than:
 1 (as a decimal) or
 100 (as a percentage)

 Coefficient of Determination:
= (coefficient of correlation)^2

GrowingKnowing.com © 2014 14
 Coefficient of Correlation (r):
Defines the strength of relationship between (x) and (y).
 Coefficient of Determination (r2):
Indicates how much of an affect (x) has on (y).

 DO NOT say (x) causes (y).


 More analysis is required to determine cause,
 Correlation and regression indicates if there is a
relationship (or a connection) between (x) and (y).
 Cause is much more precise.
 Further detailed study is necessary to confirm a cause.

GrowingKnowing.com © 2014 15
Class Exercise, Part 1:
 What is the relationship between the effort in studying
and course grades?

Effort Level: 1,2,3,4,5


Grades: 9,11,15,14,20

Which is the dependent and the independent variable?


Dependent (y): Grades
Independent (x): Effort Level

GrowingKnowing.com © 2014 16
Computer Check....
To ensure your computer is set up properly,
 Open a spreadsheet,
 Click on “Data” in the menu,
 Does “Data Analysis” appear on the far right of the
menu bar?

GrowingKnowing.com © 2014 17
Toolpak update....
 Click on “File” in the top left of the menu bar,
 Click on “Options” at the bottom left,
 Click on “Add-ins”
 Click on “Analysis Toolpak” (not the VBA version)
 Click on “GO”
 Click on the menu box next to
“Data Analysis/Toolpak”,
 Click on “OK”
 Return to Excel menu; click on “Data”;
 Does “Data Analysis” now appear?

GrowingKnowing.com © 2014 18
Class Exercise, Part 1:
 What is the relationship between the effort in studying
and course grades?
Effort Level: 1,2,3,4,5 (x variable)
Grades: 9,11,15,14,20 (y variable)
 Input the data in two columns,
 Click on “Data” at the top of the screen,
 Click on “Data Analysis” at the top right,
 Scroll down the menu and click on “regression” then “OK”,
 Click on the box next to “input y range” and type in the parameters for
the grade column, including the heading (or scroll down the
spreadsheet).
 Repeat for the Effort Level column,
 Click on “OK”.

19
Excel Output:
** Note: focus on items highlighted in red
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.939557535
R Square 0.882768362
Adjusted R Square 0.843691149
Standard Error 1.663329993
Observations 5

ANOVA
df SS MS F Significance F
Regression 1 62.5 62.5 22.59036145 0.01767543
Residual 3 8.3 2.766666667
Total 4 70.8

Coefficients Standard Error t Stat P-value


Intercept 6.3 1.744515214 3.611318461 0.036469725
x variable 1 2.5 0.525991128 4.752931879 0.01767543

GrowingKnowing.com © 2014 20
Excel Output
Multiple R 0.93955754
R Square 0.88276836

Coefficients Standard Error t Stat P-value


Intercept 6.3 1.744515 3.611318 0.0364697
x variable 1 2.5 0.525991 4.75293 0.01767543

 Multiple R: The Coefficient of Correlation.


 R Square: The Coefficient of Determination.
 The lower case “r” in the text is the same as the upper case “R” in Excel.

GrowingKnowing.com © 2014 21
Conclusions:
 The Coefficient of Correlation is .9395 or .94,
 Confirms that there is a relationship between study
effort and grades.
 Value is very close to 1 so it is a very strong relationship.

 The Coefficient of Determination is .8828 or 88.3%


 Indicates that the amount of studying is the primary
variable affecting grades.
 If grades increase, 88.3% of the change is due to an
increase in study effort.

GrowingKnowing.com © 2014 22
Class Exercise, Part 2:
The Seneca Student Federation is concerned about the
cost of student textbooks. They believe that there is a
relationship between the number of pages in the text
and the selling price of the book. A sample of 8
textbooks currently on sale at the bookstore was
selected.

 Identify the dependent and the independent variables.


 Dependent(y): Price
 Independent(x): Number of pages

GrowingKnowing.com © 2014 23
Textbook Sample:
Book # pages $ price
Intro to History 500 84
Basic Algebra 700 75
Intro to Psych 800 99
Intro to Sociology 600 72
Bus. Mgmt. 400 69
Intro to Biology 500 81
Fun with Stats 600 63
Intro to Nursing 800 93
 Prepare an Excel Regression Output .
GrowingKnowing.com © 2014 24
Excel Output:
SUMMARY OUTPUT
Regression Statistics

Multiple R 0.613878889
R Square 0.376847291
Adjusted R Square 0.272988506
Standard Error 10.41290408
Observations 8

Coefficient of Correlation: .6138 or .614


Coefficient of Determination: .3768 or 37.7%

GrowingKnowing.com © 2014 25
Conclusions:
 At .614, the coefficient of correlation indicates that
there is a moderate relationship between the number
of pages and the price of a textbook.

 At 37.7%, the coefficient of determination indicates


that changes to the number of pages does not have a
significant affect on the price of a text.
 If price increases, 37.7% of the change is due to an
increase in pages.

GrowingKnowing.com © 2014 26
Typical test questions.
 Quiz or exam questions will show information that is
similar to an Excel Regression Output.
 You will then be asked to explain or to draw
conclusions from the given information .

GrowingKnowing.com © 2014 27
Class Exercise
 Log in to growingknowing,
 Go to Section 11, Correlation and Regression,
 Complete practice questions up to Level 2,

GrowingKnowing.com © 2014 28
Class Exercise:
The H.R. department of a large corporation wants to
determine if there is a relationship between the annual
bonus employees received and the years of experience of
an employee. Identify the variables;
Experience (yrs): 1 2 3 4 5 6 : independent
Bonus ($ thousands): 6 1 9 5 17 12 : dependent

Prepare the regression output…..

GrowingKnowing.com © 2014 29
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.700696
R Square 0.490974
Adjusted R Square 0.363718
Standard Error 4.502909
Observations 6

Coefficients
Intercept 0.933333
X Variable 1 2.114286

growingKnowing.com © 2014 30
Analysis:
 Coefficient of Correlation:
 At .70, a moderate relationship exists between the bonus
amount and years of experience.
 Coefficient of Determination:
 At .49 (or 49%), years of experience has a weak affect on
the bonus bonus.
 If a bonus increases, 49% of the increase is due to
additional years of experience. Other variables
(individual performance, the corporations financial
status, status of the economy, etc.) make up the
remaining 51%.
GrowingKnowing.com © 2014 31
Correlation vs. Regression
Correlation Analysis:
Determines whether or not a relationship exists
between two variables.

Regression Analysis:
Determines the value of one variable, given the
value of the other.

GrowingKnowing.com © 2014 32
Regression Analysis:
 An equation is used to calculate what could happen:
ŷ = a + bx

ŷ= the dependant variable for any value of x,


a= the intercept,
b= the slope,
x= the independant variable.
In a quiz or an exam, you will be given three of these
values and then asked to solve for the unknown.

GrowingKnowing.com © 2014 33
The Equation and the Slope:
If the slope, (b), is positive, the Regression Equation is:

ŷ = a + bx

If the slope is negative, the equation changes to:

ŷ = a – bx

GrowingKnowing.com © 2014 34
Intercept:
Where the regression line
intersects the (y) axis.
Slope:
Indicates the angle of the
regression line.

Slope: ‐ high number = steep slope


‐ low number = gentle slope
- positive value = line slopes upward,
... a positive relationship.
- negative value = line slopes downward,
... a negative relationship.
These values are found at the bottom left-hand corner of the excel
regression output.
35
SUMMARY OUTPUT: Bonus vs. Years of Experience

Regression Statistics
Multiple R 0.700696
R Square 0.490974
Adjusted R Square 0.363718
Standard Error 4.502909
Observations 6

Coefficients
Intercept 0.933333 ← Intercept
X Variable 1 2.114286 ← Slope

growingKnowing.com © 2014 36
The Regression Equation: Analysis
Coefficients
Intercept 0.9333
x Variable 1 2.114
Slope:
Is 2.114; A positive value;
Result: The Coefficient of Correlation is positive.

 How do you complete the Regression Equation?


 ŷ = a + bx
 ŷ = 0.93 + 2.11(x)
 If the value for x (yrs. of experience) is 10, what would ŷ (annual bonus) be?
 ŷ = 0.93 + 2.11 (10) = 22.03 or $22,030.00
 If an employee has 3 yrs of experience, what would the projected annual
bonus ?
 ŷ = 0.93 + 2.11(3) = 7.26 or $7,260.00

 For each additional year of experience, annual bonus increases by $2,110.00 .

GrowingKnowing.com © 2014 37
Class Exercise:
 It is believed that the monthly maintenance costs for a
particular model of automobile is related to its age.

 What are the dependent and independent variables?


Dependent: Maintenance Cost

Independent: Vehicle Age

 Using the data on the next slide, prepare a regression


output sheet.

GrowingKnowing.com © 2014 38
Age(yrs) Mthly Cost ($)
2 72
3 99
1 65
7 138
6 170
8 140
4 114
1 83
2 101
5 110

GrowingKnowing.com © 2014 39
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.867405237
R Square 0.752391845
Standard Error 17.32128175
Observations 10

Coefficients
Intercept 65.0456942
X variable 1 11.32161687

GrowingKnowing.com © 2014 40
Interpretations:
 What is the regression equation?
y= 65.05 + 11.32(x)
 If a vehicle is 11 yrs old, what would be the anticipated
maintenance costs monthly?
y= 65.05 + 11.32 (11) = $189.57
 How would you interpret the slope?
Positive relationship between maintenance costs and the age of a
vehicle. Monthly maintenance cost are expected to increase by
$11.32 with each anniversary date of the vehicle.
 What does the coefficient of correlation tell you?
At .867 there is a strong correlation between age of vehicle and its
maintenance costs.
 What does the coefficient of determination tell you?
 At 75.2%, the age of a vehicle has a moderately strong impact on
its maintenance costs. If maint. costs increase, 75% of the
change is be due to the increased age of the vehicle. 41
Class Exercise:
 Return to Section 11 in growingknowing.com,
 Attempt the practice questions for the section!

GrowingKnowing.com © 2014 42
Class Exercise:
 A college surveyed its graduates regarding the number
of statistics classes they missed and their starting
salaries. What would be the dependant and
independent variables?
 Dependant(y): salaries
 Independent(x): classes missed
 Data:
 Classes missed: 1, 2, 3, 4, 5, 6
 Salary (in thousands): 30, 28, 32, 25, 18, 24
Prepare a regression output in excel.

GrowingKnowing.com © 2014 43
Multiple R 0.716738113 Classes-missed & salary
R Square 0.513713523
Standard Error 3.895663034
Coefficients
Intercept 32.86666667
X variable 1 -1.914285714

 What is the regression equation?


Starting Salary = 32.86 - 1.91*(classes missed)
 If a student missed 7 classes, what could be their starting salary ?
Salary = 32.86 -1.91*(7) = 19.49 or $19,490.00
 Interpret the slope?
The slope is negative. The relationship is negative. For each class missed,
starting salary is projected to reduce by $1,914.00 .
 What is coefficient of correlation, interpret it?
The coefficient of correlation is .716, which indicates a moderately strong
relationship between classes missed and a students starting salary.

GrowingKnowing.com © 2014 44
Multiple R 0.716738113 Classes-missed and salary .
R Square 0.513713523
Standard Error 3.895663034
Coefficients
Intercept 32.86666667
X variable 1 -1.914285714

 What is coefficient of determination, interpret it?


At 51.37%, missing classes has a moderate impact on a change to a students
starting salary. If starting salary decreases, 51.37% of the change is due to
the individual missing classes.

GrowingKnowing.com © 2014 45
Exercise 4:
Determine the relationship between average mortgage
rates and the number of new homes constructed
(housing starts). What would be the dependent and
independent variables?
 Dependent: Housing Starts
 Independent: Mortgage Rates

Using the date on the next slide prepare a regression


output!

GrowingKnowing.com © 2014 46
Housing Data:
Housing Starts
Rate (%) (000)

8.5 115
7.8 111
7.6 185
7.5 201
8 206
8.4 167
8.8 155
8.9 117
8.5 133
8 150
GrowingKnowing.com © 2014 47
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.542950618
R Square 0.294795374
Adjusted R Square 0.206644795
Standard Error 31.47717427
Observations 10

ANOVA

Significanc
df SS MS F eF
Regression 1 3313.5 3313.5 3.344225 0.104842
Residual 8 7926.5 990.8125
Total 9 11240

Standard Lower Upper


Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95.0%
Intercept 475.1666667 175.9054 2.701263 0.027021 69.5281 880.8052 69.5281 880.8052
X variable 1 -39.16666667 21.4175 -1.82872 0.104842 -88.5555 10.22219 -88.5555 10.22219

GrowingKnowing.com © 2014 48
Analysis:
 What is the regression equation?
y= 475.17 – 39.17(x)
 If the average mortgage rate was 7.15%, what would be the number of
housing starts?
y = 475.17 – 39.17(7.15) = 195.1 or 195,100
 How would you interpret the slope:
Negative relationship. With every 1% rate increase, housing starts
are projected to decrease by 39,170.
 How do you interpret the coefficient of correlation?
At .543 , the relationship between mortgage rates and housing starts is
moderate .
 How do you interpret the coefficient of determination?
At 29.5%, changes to interest rates have a weak impact on the change in
housing starts. If number of housing starts changes, 29.5% of the change
is due to changes in interest rates.

GrowingKnowing.com © 2014 49
Homework
Chapter 11: Correlation and Regression
 Read the text,
 View the video
 Complete the practice questions (levels 1-4),
 Attempt the additional exercises in the ppt.

GrowingKnowing.com © 2014 50
Quiz 3
 Will be held on Monday April 10th ,
 Results could be worth 30% of your final grade,
 Primary focus:
 Proportion (Sections 8, 9, 10),
 Correlation & Regression (Section 11)
 Structure and protocol for Quiz 2 will be repeated.

GrowingKnowing.com © 2014 51
Final Exam:
 Friday, April 21st @ 8.30am (sharp)
 Rm. C3030
 2 hour time limit,
 Worth 30% of your final grade,
 Covers material from the entire course.

GrowingKnowing.com © 2014 52

S-ar putea să vă placă și