Sunteți pe pagina 1din 3

BerkeleyX: Stat_2.

3x Introduction to Statistics: Inference

READING AND PRACTICE GUIDE FOR WEEK1


All the material in Stat 2.3X is covered in the EdX video lecture segments under Courseware.
This guide assumes that you have watched Section 1 (video lecture segments Lec 1.1, Lec 1.2, Lec 1.3, and Lec 1.4). Please,
don't attempt problems without going through the instruction; if you get stuck on a problem, please go to the lectures in
Courseware for help.
As stated in Expectations and Requirements in the REQUIRED READING section of Courseware, material in the text SticiGui does not
correspond exactly to the material of Stat 2.2X, though there is large overlap. The text is written for a class that has a calculus
prerequisite, so it assumes a uency with notation and algebra that is not being assumed in Stat 2X.
Lec 1.1 Random samples
Text Chapter 24
http://www.stat.berkeley.edu/~stark/SticiGui/Text/sampling.htm
Skim this, with particular attention to Parameters and Statistics, and Simple Random Samples. There is almost no computation in
this chapter; there are several interesting examples of surveys and biases, and descriptions of a variety of methods of sampling.
Lec 1.2 Estimating population averages and percents
Text Chapter 25
http://www.stat.berkeley.edu/~stark/SticiGui/Text/estimation.htm
Start at Estimating Means and Percentages, and focus on the estimate and its SE. Play with Fig 25-1 and set the sample size to be at
least 100 - were dealing with large samples at the moment. The figure essentially demonstrates the CLT. Note the formula for the SE
of the sample mean: it has the correction factor in it, which Im allowing you to assume to be close enough to 1 that it doesnt have
to be computed. Read the sections on the conservative and bootstrap estimates of the SE of the sample percent; you can ignore the
correction factor throughout. Important: You do not at this stage need the Sample Standard Deviation and Sample Variance, as
those are essentially the same as our usual SD and variance when samples are large.
Lec 1.3 Approximate confidence intervals
Text Chapter 26
http://www.stat.berkeley.edu/~stark/SticiGui/Text/confidenceIntervals.htm
Scroll down till you reach Approximate Confidence Intervals for Percentages, and start there. Play with Fig 26-2 to see how the
confidence level affects the proportion of good intervals; always keep the sample large, as the normal approximation methods
apply only to large samples. Do Exercises 26-2 through 26-5. Read the section Approximate Confidence Intervals for the Population
Mean. Do Exercise 26-6.
Lec 1.4 Interpreting confidence intervals
Try the problems below. Answers are at the bottom of the page.
Link to the normal curve applet: http://www.stat.berkeley.edu/~stark/Java/Html/NormHiLite.htm

ADDITIONAL PRACTICE PROBLEMS FOR EXERCISE SET 1


1. A simple random sample of size 300 is taken from a population of hundreds of thousands of adults. The average weight of the
sampled people is 150 pounds and the SD of their weights is 30 pounds.
a) The average weight of the population is estimated to be _______ pounds; the SE for this estimate is about ______ pounds.
b) An approximate 99%-confidence interval for the average weight in the population goes from ________ pounds to _________
pounds.

2. In a simple random sample of size 400 taken from over 500,000 workers, 21% of the sampled workers are in carpools.
a) In the population, the percent of workers in carpools is estimated to be _______%; the SE for this estimate is about _________%.
b) An approximate 95%-confidence interval for the percent of carpooling workers in the population goes from _______ % to _______
%.
3. A simple random sample of 150 undergraduates is taken at a large university. The average MSAT score of the sampled students is
528 with an SD of 90. Construct an approximate 90%-confidence interval for the average MSAT score of undergraduates at the
university.
4. In a simple random sample of 500 students taken at a large university, 180 have undeclared majors. Construct an approximate
85%-confidence interval for the percent of students at the university who have undeclared majors.
5. A simple random sample of 900 households is taken in a city. The average household size in the sample is 2.2 people, with an SD
of 2 people.
a) Pick one of the two options: The average household size in the sample is
(i) known to be 2.2.
(ii) estimated to be 2.2.
b) Pick one of the two options: The average household size in the city is
(i) known to be 2.2.
(ii) estimated to be 2.2.
c) Pick one of the two options (justify your answer carefully). The distribution of household sizes in the sample
(i) is approximately normal.
(ii) is not normal, not even approximately.
d) Do you think the distribution of household sizes in the city is approximately normal? Explain.
e) Pick one of the two options (justify your answer carefully). The normal curve
(i) can be used
(ii) cannot be used
to construct an approximate 95%-confidence interval for the average household size in the city. If you picked option (i), construct the
interval.
f) True or false (explain): Approximately 95% of the households had sizes in the range 2.07 to 2.33 people.

6.
A survey organization took a simple random sample of 275 units out of all the rental units in a city. The average monthly rent of the
sampled units was $920 and the SD was $500. There were 964 people living in the sampled units, and there were 120 children
among the these 964 people.
In parts (a)-(c) construct an approximate 68%-confidence interval for the given quantity, if pos- sible. If this is not possible,
explain why not.
a) the average monthly rent of the sampled units
b) the average monthly rent of all the rental units in the city
c) the percent of children among all people living in rental units in the city
d) "About 68% of the sampled units had rents in the range $420 to $1420."" Do you agree with the quoted statement? Why or
why not?

ANSWERS
1a) 150; 1.73. b) 145.5, 154.5. [z = 2.6.]
2a) 21; 2.04. b) 16.92, 25.08.
3. 515.87 to 540.13. [SE for average is approximately 7.35, z = 1.65.]
4. 32.88% to 39.12%. [SE for percent is approx. 2.15%, z = 1.45.]
5
(a) (i)
(b) (ii)
(c) (ii). If you draw the normal curve with average 2.2 and SD 2, then values that are more than 1.1 SDs below the average are
negative. But household size is a positive variable. So the distribution of the household sizes in the sample cannot be normal.
d) No. The distribution of a large simple random sample looks like the distribution of the population that was the reason for
taking the sample in the first place. The sample is far from normal. So the best guess is that the population is far from normal
too.
e) (i). The confidence interval is constructed using the probability histogram of the sample average, that is, the curve that
shows the chances of all the possible ways the sample average could have turned out. The sample size is large. By the Central
Limit Theorem, this probability histogram will be approximately normal no matter what the shape of the distribution of
household sizes. The interval is 2.07 to 2.33.
f) False. This is total nonsense. No household contains between 2.07 and 2.33 people. A household may contain 2 people, or 3
people, but no values in between! The interval 2.07 to 2.33 only provides an estimate for the average household size.
Averages can be fractional even if the values of the variable are whole numbers.
6
a) Not possible; in fact it's silly. The average rent of the sampled units is known to be $920. There is nothing to estimate.
b) Possible: $889.85 to $950.15.
c) Not possible. The 964 people are not a simple random sample of people in the city, so our formulas don't apply.
The 964 people form a cluster sample of people in rental units. Why is it a cluster sample? They took a simple random sample of
275 units. Each unit yielded a cluster of people. So they got a cluster sample of people.
Important clue: look at the sample size you are using in your SE calculation. Check that the problem contains an SRS of that size.
If it doesn't, it's likely that you are doing something wrong.
d) Disagree. The range "sample average sample SD" is a good start. But the distribution of the rents in the sample cannot be
normal, because 2 SDs below average takes you into negative numbers while of course rents must be positive. So you cant say
that "sample average sample SD" contains 68% of the sample.

Show Discussion

All Rights Reserved


2015 edX Inc.
EdX, Open edX, and the edX and Open edX logos are registered trademarks or trademarks of edX Inc.

S-ar putea să vă placă și