Documente Academic
Documente Profesional
Documente Cultură
Q1. Develop an estimated regression equation that relates risk of a stroke to the person’s age,
blood pressure, smoking habit, gender and ethnicity where: dummy smoke is defined as 1for
smoke and 0 for non-smoke, dummy gender is defined as 1 for female and 0 for male and
dummy ethnicity is defined as 1 for white and 0 for non –white.
a. Write the model for :
𝑅𝑖𝑠𝑘 = 𝛽0+𝛽1 𝐴𝑔𝑒 + 𝛽2 𝐵𝑙𝑜𝑜𝑑 𝑃𝑟𝑒𝑠𝑠𝑢𝑟𝑒 + 𝛽3 Smoking +𝛽4 Gender +𝛽5 Ethnicity
b. Is smoking and gender a significant factor in the risk of a stroke? Explain. Use
alpha= 0.05. Explain Significance of Smoking and Gender variables.
Yes, there is a significant relationship between smoking and a stroke. However, the relationship
between gender and stroke is not significant based on the p value at the 95% confidence interval.
c. What is the probability of a stroke over the next 10 years for Mrs. Smith, a 68 years
old smoker who has blood pressure of 175, is a female and of white ethnicity?
The probability of stroke over the next 10 years for Mrs. Smith is: Risk = -94.0283 +
1.08415*Age + 0.270265*Pressure + 8.87355*Smoker + 0.614361*Gender - 2.8532*Ethnicity
The physician should recommend her quit smoking because her the risks are based on variables,
age gender and blood pressure.
[ 5 points]
1
Plot of Risk
60
50
40
observed
30
20
10
0
0 10 20 30 40 50 60
predicted
Standard T
Parameter Estimate Error Statistic P-Value
CONSTANT -94.0283 16.2594 -5.78301 0.0000
Age 1.08415 0.174021 6.23002 0.0000
Pressure 0.270265 0.0530679 5.09283 0.0002
Smoker 8.87355 3.21345 2.76137 0.0153
Gender 0.614361 4.299 0.142908 0.8884
Ethnicity -2.8532 4.18612 -0.681586 0.5066
Analysis of Variance
Source Sum of Squares Df Mean Square F-Ratio P-Value
Model 3683.98 5 736.797 20.35 0.0000
Residual 506.967 14 36.2119
Total (Corr.) 4190.95 19
Q2. Run a multiple regression model using the following formulation and conduct a hypothesis
test. The model is (𝑆𝑎𝑙𝑒𝑠𝑡 = 𝛽0 + 𝛽1 𝑆𝑎𝑙𝑒𝑠𝑡−1 + 𝛽2 𝐴𝑑𝑣𝑒𝑟𝑡𝑖𝑠𝑒𝑡−2+𝜀) and data set is
PINKHAM. Answer the following questions.
2
b. P value and comparison at 𝛼 = 0.05 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 for𝛽1, 𝛽2 and model
The P value is less than alpha, 0.05. There is a significant relationship between the beta
variables sales and advertise at the 95% confidence interval.
The R-squared is 89%. 89% of variability is fitted in the model. This is considered a relatively
good model.
Sales and advertise have a strong relationship at the 95% confidence interval. They are both
significant in the hypothesis test for this model.
e. What will you state to your Sales Director after running the hypothesis test?
f.
The hypothesis test is a useful to predict sales numbers and interpret the results. With the
model, we can figure out the relationship between the variables. We can also find the minimum
sales when all variables are equal to zero.
Plot of sales
3900
3400
2900
observed
2400
1900
1400
900
900 1400 1900 2400 2900 3400 3900
predicted
3
Multiple Regression - sales
Dependent variable: sales
Independent variables:
lag (sales,1)
lag (advertise,2)
Number of observations: 52
Standard T
Parameter Estimate Error Statistic P-Value
CONSTANT 243.057 89.4894 2.71604 0.0091
lag (sales,1) 1.10492 0.062965 17.5482 0.0000
lag (advertise,2) -0.454662 0.106287 -4.27767 0.0001
Analysis of Variance
Source Sum of Squares Df Mean Square F-Ratio P-Value
Model 1.76913E7 2 8.84567E6 218.52 0.0000
Residual 1.9835E6 49 40479.6
Total (Corr.) 1.96748E7 51
[ 5 points]
Q3. In real estate often the housing prices are dependent upon the cost of construction, location,
and various other attributes of housing such as number of bedroom, bathrooms, size of lot, and
proximity to various locations (employment, freeway, and other amenities).
a. Use the housing data to develop a relationship among prices, square foot, bedrooms,
bathrooms, 3 car garage (dummy variable with 1 as a house with 3 garages and 0 less
than 3 garages), and pool (dummy variable with 1 as a house with a pool and 0 without).
b. Is a three car garage a significant factor in the estimation of house price? Explain.
Yes, after running the data in statgraphics, a 3 car garage has a significant positive impact on
price since the p-value at a 95% confidence interval.
c. What is the expected price of a house if you are considering buying a house in this
neighborhood with the following characteristics: 2500 square foot, 5 bedrooms, 3 baths,
three car garage and a pool?
4
[ 5 points]
Q4. Define the terms dummy variable and multicollinearity (MC). How will you use the dummy
variable? Describe an example. When does the problem of MC occur and how will you address
the problem?
Multicollinearity a explanatory variable that defines unique information that is not provided by
other explanatory variables. It is used when tier is not unique and they are duplicated. The
problem occurs when two variables are highly correlated and the solution to this is to remove it.
Dummy variables is used when to incorporate qualitative variables into our analysis. For
example, when looking at qualatiative variable there could be two dummy variables expressed as
x=1 if garage =0.
[5 points]
*****************