Sunteți pe pagina 1din 6

STAT 4432- Regression Analysis

Assignment #1

Name:

SCORE: 50 points

Coverage: Warming up review.


Due date: 06/10/2016 (Sect10)
04/10/2016 (Sect. 20)

General Instructions
Although I do encourage you to work together both in and outside of class,
remember that collaboration on homework problems should be minimal and
everyone should create their own set of solutions.
For all assignments in this class, please remember that neatness matters! Except
for problems that require lots of hand-calculations, you should generally make your
answers clear and readable. Please do the problems in order, and provide plots /
graphs / equations as needed and within the solution to the problem (DO NOT
include appendices or spam me with Minitab/R output, if either is the case you will
be given a 0 for that portion). Please use complete sentences and/or show work as
necessary. Answers that are not supported by appropriate reasoning will not be
graded.
Additional Important Notes
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)

Start each question on a separate page.


Use only one side of a page.
This cover sheet needs to be attached with your returned homework.
This homework is one set of several homework sets that will be given to you
during the course and will be graded at the end of the semester.
Two to three problems will be chosen randomly from each set for grading
but you still need to attempt all of the questions in each set.
Homework is due during class period on the due date.
No late homework will be accepted and a grade of zero will be assigned.
Copy-paste practice is not allowed. Every student is fully held responsible for
his/her answers.
Acknowledgement

I acknowledge that I am the one whose name listed above, who did this HW
assignment and
( ) no additional help received from other students.
( ) I did receive additional help from the following students (list them
below)
1

Part One: Reading and writing questions:


For this part you need to read the following two sections and just try to highlight the
main points that you understood from the reading. Sections 1.1 and 1.3.
Your writing should not exceed one page and a half only.

Part Two: General application questions:


Question One: Use the software R to find the following critical values. It is
necessary that you know how to find these critical values, they will be useful later
on in the course. You need to submit the R code as well.
i) t(34), 0.90
ii) t(8), 0.026
iii)

2(16),0.84

iv) F(4,17 ), 0.975


Question Two: Assume that you have two normal populations with identical
variances. You draw independent random samples of size 14 and 16, respectively.
Sample means and sample standard deviations for the two samples are given in the
table below.
Sample Size
1
14
2
16
i.
ii.
iii.
iv.

Mean
17.3
26.8

Standard Deviation
9.1
10.1

Estimate the difference between the group means.


Estimate the standard error for the difference you calculated in (a).
Perform a hypothesis test to determine if the Group 1 mean is less than
the Group 2 mean. Estimate the p-value for your test.
What is the name of the statistical method that you used to perform the
test in part (iii) AND why do you think it is the suitable one?

Question Three: Suppose a professor gave a 9-point quiz to a small class of five
students. The results of the quiz were 5, 4, 9, 6.5 and 8. For the sake of discussion,
assume that the five students constitute the population.
(a) Find the mean and the standard deviation of the population.
(b) Present the data graphically (use bar graph with attached bars)
(c) Take a sample of size 2 repeatedly and for each sample find the mean.

(Example, samples (5,5), (5,4), )

(d) From part c, how many sample means you have?


(e) Using your sample means from part c, find the mean and the standard

deviation of these sample means.

(f) What theoretical conclusion can you make from part e?


(g) Make a histogram using the sample means you have from part (c ) and

compare it with the graph you have from part ( b)


Question Four: A sales firm receives, on average, 3 calls per hour on its toll-free
number. For any given hour, find the probability that it will receive the following.
a. At most 3 calls b. At least 3 calls c. 5 or more calls
Question Five: An officer from the Ministry of Man Power found that in a sample
of 68 retired men, the average number of jobs they had during their lifetimes was
6.4. The population standard deviation is 1.98.
(a)
(b)
(c)
(d)

Find the best point estimate of the mean.


Find the 90% confidence interval of the mean number of jobs.
Find the 98% confidence interval of the mean number of jobs.
Which interval is smaller? Explain why.

Question Six: A researcher wishes to determine whether there is a relationship


between the gender of an individual and the amount of alcohol consumed. A sample
of 68 people is selected, and the following data are obtained.

Can the researcher conclude that alcohol consumption is related to gender?


Use alpha = .05. Show all work.

Question Seven: Let Y1 and Y2 be two random variables with means 1 and 2 and
variances 12 and 22 . Define W 3Y1 2Y2 and V 3Y1 2Y2
(a) Find E(W) and E(V)
(b) Find Var(W) and Var(V)
(c) Show that cov(W, V) = 912 4 22
(d) Now, define G (W V ) , where is a constant, show that the standard

1
2

deviation of ( G ) is 2 2 .

Question Eight: In this small exercise, we asked each ten students from a Stat1001
to collect a random sample of times on how long it took students to get to class from
their homes. All the sample sizes were 16, this means that each students collected
16 samples. The data are summarized in the following table.
Student 1
2
Mean
21 26
Std. Dev. 2.3 1.8

3
23
2.7

4
29
2.4

5
14
3.1

6
24
2.2

7
27
1.9

8
17
2.8

9
24
2.2

10
29
2.1

(a) The students noticed that everyone had different answers. If you randomly

(b)
(c)
(d)
(e)
(f)

sample over and over from any population, with the same sample size, will
the results ever be the same? Explain.
The students wondered whose results were right. How can they find out
what the population mean and standard deviation are?
Input the means into the R and check to see if the distribution is normal.
(Draw a histogram)
Is the distribution of the means a sampling distribution?
Check the sampling error for students 3, 7, and 10.
Compare the standard deviation of the sample of the 10 means. Is that equal
to the standard deviation from student 3 divided by the square of the sample
size? How about for student 7, or 10?

Question Nine: Suppose you want to determine whether the brand of laundry
detergent used and the temperature affects the amount of dirt removed from your
laundry. To do this end, you buy two different brand of detergent (Super and
Best) and choose three different temperature levels (cold, warm, and hot).
Then you divide your laundry randomly into 6n piles of equal size and assign each
n piles into the combination of (Super and Best) and (cold, warm, and hot).
The data are given below

(a) Use R to draw a well-nice side-by-side box plots and clearly explain what you
see in the graph.
(b) Ignore the variable brand of detergent, use an appropriate statistical method
to test if the mean amount of dirt removed is the same between the three
levels of temperature. Be sure you show all of your work. Use alpha = 0.05.
(c) Now ignore the variable temperature, use an appropriate statistical method
to test if the mean amount of dirt removed is the same between the two
levels of brand of detergent. Be sure you show all of your work. Use alpha =
0.05.
(d) Now, you need to explain why you chose the statistical methods to do the
analyses in parts b and c.

Question Ten: Use the following data taken from a sample of 5 people.
minutes exercise/day:
15
80
85
45
50
percent body fat:
14
8
10
22
21
i)
ii)
iii)

Using the data set, identify the response and explanatory variables.
Calculate Pearsons correlation by hand and interpret the result. In
doing so, show your calculations. Use Google to find the formula for the
Pearson correlation.
Use R to draw a scatterplot of the data (with Y on the vertical axis), AND
to obtain Pearson correlation. Turn in the resulting printouts.

Question Eleven: A cholesterol level above 200 mg/dl suggest a person is at a


higher risk of heart disease. Suppose that there is a test that measures a persons
cholesterol level and that the uncertainty in the test result is described by a Normal
distribution with the mean equal to the subjects true cholesterol level and a
standard deviation of 6.0 mg/dl. What is the probability that
i. a subject with a true cholesterol level of 188 will be classified as being
at higher risk for heart disease?
ii. a subject with a true cholesterol level of 210 will not be classified as
being at higher risk for heart disease?
iii. a subject, with a true cholesterol level of 197, takes the test three
times (assume independent) and all tests classify the subject as not
being at higher risk for heart disease?
Question Twelve: Consider the following data sets that are presented in forms of
matrices.
14.9
1 15.3
15.8
1 16.4

17
1 19.5
Y
, X

18.1
1 21.9
17.1
1 18.8

19.8
1 23.4
i. What are the dimensions of Y and of X?
ii. Find the following:
T
(a) X (X-transpose)
T
(b) X X
T
(c) X Y
T
1
(d) ( X X ) (this is just finding the inverse of 2 by 2 matrix found in part
b)
T
T
1
(e) ( X X ) X Y . Call this matrix B matrix.
*Hint: Go back to your linear algebra course or use Google.
iii. Now, use R to find the equation of the least-squares line (Regression
equation). Hint you need to enter the data in R as follows
X=c(15.3, 16.4, 19.5, 21.9, 18.8, 23.4)
Y=c(14.9, 15.8, 17, 18.1, 17.1, 19.8)
Compare this result with the one in part (ii/e)..what can you see?.
Submit both the R code and the output.

The End
6

S-ar putea să vă placă și