Sunteți pe pagina 1din 4

SIMPLE LINEAR REGRESSION

A college bookstore must order books two months before each semester starts. They
believe that the number of books that will ultimately be sold for any particular course is
related to the number of students registered for the course when the books are ordered.
They would like to develop a linear regression equation to help plan how many books to
order. From past records, the bookstore obtains the number of students registered, X,
and the number of books actually sold for a course, Y, for 12 different semesters. These
data are below.

A. Obtain a scatter plot of the number of books sold versus the number of registered
students.

B. At a .01 level of significance is there sufficient evidence to conclude that the number
of books sold is related to the number of registered students in a straight-line manner?

C. Carefully explain what the p-value found in part A means.

D. Fully interpret the strength of the straight-line relationship.

E. Give the regression equation, and interpret the coefficients in terms of this problem.

F. If appropriate, predict the number of books that would be sold in a semester when 30
students have registered. Use 95% confidence.

G. If appropriate, estimate the average number of books that would be sold in a semester
for all courses with 30 students registered. Use 95% confidence.
H. If appropriate, predict the number of books that would be sold in a semester when 5
students have registered. Use 95% confidence.

SOLUTION

A. The following scatterplot with the fitted line was obtained using StatCrunch.

As the number of students registered for the course increases, the number of books sold
by the bookstore appears to increase in a straight-line manner.

B. H0: The number of students registered and the number of books sold are not
correlated

Ha: The number of students registered and the number of books sold are
correlated

Decision Rule: Accept Ha if the calculated p-value < .01.

Test Statistic: r = the Pearson coefficient of correlation

Calculations from StatCrunch: r = 0.8997, p-value < 0.0001 < .01Accept


Ha

Interpretation: At the .01 level of significance I conclude that as the number of students
registered increases, the number of books sold increases in a straight-line manner.
C. Since the p-value is less than 0.0001, this indicates that if the number of students
registered and the number of books sold are not correlated (if the null hypothesis is true),
then there is virtually no chance that the observed points in the scatterplot would exhibit
such an obvious straight-line pattern.

D. r 2 = .809 (80.9%). 80.9% of the variability in the number of books sold is


explained by the straight-line relationship with the number of registered students. 19.1%
of this variability is unexplained, and due to error. This relationship is quite strong.

When no students have registered for a course, the number of books sold is 9.30 (or
about 9). This is the starting point of the straight-line when x = 0. It is not particularly
meaningful in this problem since all the classes sampled had more than 25 students
registered. For each additional student registered for a course, the number of books sold
increases by 0.673.

F. Since 30 students is within the range of the sampled number of students, it is


appropriate to make this prediction. From Minitab the calculated prediction interval is
(25.865078, 33.09856). I am 95% confident that for a course that has 30 students
registered the bookstore will sell between 25.9 and 33.1 books.

G. Since 30 students is within the range of the sampled number of students, it is


appropriate to make this estimation. From Minitab the calculated confidence interval is
(28.279491, 30.684145). I am 95% confident that for all courses that have 30 students
registered the bookstore will sell an average of between 28.3 and 30.7 books per
semester.

H. Since 5 students is not within the range of the sampled number of students, it is not
appropriate to use the regression equation to make this prediction. We do not know if
the straight-line model would fit data at this point, and we should not extrapolate.

COMMENTS ABOUT THE SOLUTION

 This example contains the typical parts for a complete regression problem. All
such problems should be solved in a similar manner.

 It is valid to do a regression analysis when the error components have a normal


distribution with a mean of zero and a constant variance. All errors must be
independent of each other.
 The complete StatCrunch analyses are below.

Simple linear regression results:


Dependent Variable: Books
Independent Variable: Students
Books = 9.3 + 0.6727273 Students
Sample size: 12
R (correlation coefficient) = 0.8997
R-sq = 0.80946046
Estimate of error standard deviation: 1.5308939

Parameter estimates:

Parameter Estimate Std. Err. DF T-Stat P-Value


Intercept 9.3 3.4345746 10 2.707759 0.022
Slope 0.6727273 0.10321285 10 6.5178633 <0.0001

Analysis of variance table for regression model:

Source DF SS MS F-stat P-value


Model 1 99.56364 99.56364 42.482544 <0.0001
Error 10 23.436363 2.3436363
Total 11 123

Predicted values:

X value Pred. Y s.e.(Pred. y) 95% C.I. 95% P.I.


30 29.481817 0.5396101 (28.279491, 30.684145) (25.865078, 33.09856)

[ Prev ] [ Next ]

[ TABLE OF CONTENTS ]

S-ar putea să vă placă și