Sunteți pe pagina 1din 16

HATFIELD STAT 211 SAMPLE FINAL - Discussion Please note that Forms A & B were the same.

Page 1 of 16

1. Based on the summary data below (obtained from StatCrunch summary information on stream concentration data presented earlier this semester), what is

(X
i =1

X )2 ?

Column

n Mean Variance

Std. Dev. Std. Err. Median Range Min Max Q1 60.8

Q3

concentration 50 58.54 270.84692 16.457428 2.327432 a. b. c. d. e.

67.5 27.1 94.6 45.9 69.3

806.41 this answer is 49 times the standard deviation 114.04 made up 270.85 this is the variance stated in the table 271.59 not sure, I think this was a made up number 13,271.50 We went over this in class. There is a difference between the variance and the variation of X. StatCrunch provides you with the sample variance, which is actually the squared variation of X (which is what is being asked for here), divided by n-1. You can calculation the value being requested here by substituting the values given and solving through algebra. Which of the following is an example of independent events?

2.

a. None of these answers b. (c) and (d) See discussion below for c and d. c. Sampling with replacement If you are sampling with replacement, that means that the probability of an event occurring does not change from trial to trial and what happened on prior trials has NO IMPACT on the probability of a specific event. d. The results of successive outcomes when rolling a single die time after time. Much like c above, the probability of any specific number is unaffected by prior roles. e. Revealing a card in a 52-card deck, followed by discarding the card and repeating the process. For example, the probability of getting the Ace of Spades on the first draw is 1/52.if you dont get it then, discard the card and repeat the process, the subsequent probability is 1/51, then 1/50 and so on.dependent.

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 2 of 16

3. Krogers analysis shows that, during mid-peak times, 6 people arrive for checkout every 10 minutes. What is the probability that, during any given 10-minute interval during a mid-peak time, that the number of people arriving for checkout is less than 2? a. 0.06197 P(X<=2) b. 0.01735 This is a Poisson problem (varying number of occurrences within a fixed interval). What youre trying to identify is P(X<=1), which is the summation of P(X=0)+P(X=1). Youd have to calculate the individual probabilities and add them together. Finally, remember that if Im asking for probabilities for the stated per 10 minute period, lambda = 6, not .6. c. 0.99752 P(X>=1) d. 0.01487 P(X=1) e. 0.87810 You assumed that lambda = .6, instead of 6 4. What is the result of the process of standard normal transformation? a. The Z distribution This process takes any normally distributed variable (with any mean and any standard deviation) and knocks it down through the transformation process (Z=(x-mu)/sigma), into a normal with mean = 0 and std dev =1. b. One of a family of Z curves, dependent on sample size. Youre confusing Z with T. There is only one Z distribution.there is a host of T distributions. c. The distribution T, with mean = 0 and standard deviation of 1. See b above. d. A continuous distribution that can be used as a reference tool for comparing any two parent distributions. It can only be used as a reference tool to compare any two normal distributions. e. Its an ancient struggle that erupts on Earth between two extraterrestrial clans. I believe this is referred to in Alien Nation.in any event, not Statistics.

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 3 of 16

5. NBA teams are reviewing the most recent season scoring for Josh Carter. They think those stats are representative of what hell be able to do in an NBA game. Assuming his scoring is normally distributed, with a mean of 15 points and a standard deviation of 3 points, whats the rough probability that hell score between 9 and 18 points in any random NBA game? a. 2.5% Thats the area to the left of 9 on this curve b. 13.5% Thats the area between 9 and 12 on this curve c. 34.0% Thats the area thats 1 std right (or 1 std dev left) of the mean on this curve. d. 81.5% This is a problem dealing with the normal distribution and we dont have hard fast values for probabilities, but are given the standard deviation. Therefore, we can calculate the rough values by the empirical rule. 9 is two standard deviations from the mean. 15 is one std dev right of the mean. The area between those points is approximately 81.5% e. 16.0% Made up

6. X\Y 0 1 2

Given the following table, what is the procedure for developing E(X+Y)? 0 1/2 1/16 1/32 1 1/16 1/32 1/32 2 1/32 1/32 1/32 3 1/32 1/32 1/32 4 1/32 1/32 1/32

a. Its the summation of the marginal values. Always 1. b. Its the expected value of X times the expected value of Y. This would be true if it was E(X) PLUS E(Y), but I put times in there. c. It cannot be calculated unless X and Y are independent. False. d. Its the marginals for X times the marginals for Y. Thats what the individual intersections of X and Y would be IFF they were independent. e. Its the summation of the specific joint outcomes (there are 15 of them), times their corresponding probabilities of occurrence.

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 4 of 16

7. Given the following 95% confidence interval on the acid rain example discussed in class, if someone made the claim that the mean pH level was 4.518, what would your statement be and why? Variable n Sample Mean Std. Err. L. Limit pH_level 90 U. Limit

4.577889 not given 4.5181427 4.6376348

a. Based on the large sample size in this problem, the CLT supports your statement. Large sample could not support. Its the CI that were going for here, not the CLT.The CLT only supports the use of Z, regardless of the parent population. b. Your claim is not supported by my sample information, because it doesnt fall within the limits of my confidence interval. Any value that falls within the bounds set up by our CI is deemed reasonable for the true value of the population parameter. c. If we follow this procedure many times, we would expect our CI to contain the true population parameter about 95% of the time. Thats a proper CI interpretation, but not an exact and proper use of the CI for a hypothesis. d. I cannot support or reject your claim, as I dont have an indication of the shape of the parent population. Dont need to know that, because the CLT allows us to use Z. e. Since the standard error is not given, the CI boundaries listed cannot be correct. Not necessarily..if we needed to, we could have backed into the standard error.

8. Based on the 95% confidence interval shown above, what are the upper and lower bounds if we had constructed a 98% confidence interval, using the same sample data? a. b. c. d. (4.518 ; 4.638) 95% CI already given (4.559 ; 4.717) 4.638 +/- 2.5758 (.0305) (4.559 ; 5.717) not centered on the point estimator (4.507 ; 4.649) The upper and lower bounds are (1.96 * std err) to the right and left of the mean. Therefore, you can calculate the std err. All you have to do then is bump it out, using 2.326, instead of 1.96 to get the correct values. e. (4.757 ; 4.899) correct, but off by 0.25

HATFIELD STAT 211 SAMPLE FINAL - Discussion 9.

Page 5 of 16

What is r2 and what does it mean in a regression equation? a. It is the correlation coefficient and it tells whether the association is linear or curvilinear. r is the correlation coefficient b. It is the sum of the squared vertical differences between our estimate of y and the actual y values. Thats a description of the method of least squares c. It is the coefficient of determination and it indicates the percentage of the variation in the dependent variable that is explained by the linear relationship to the independent variable. d. It is the correlation coefficient and it gives an indication of the central tendency and spread of the data. r is the correlation coefficient and its not used to measure the central tendency or spread of the distribution. e. It refers to the distribution of error terms in the regression analysis. That would be residual analysis.

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 6 of 16

10. We manufacture golf balls. Our major competitor has made the claim that, for the average golfer, their ball results in drives of 248 yards. Our Research & Development team has been working on a new ball that they claim will be longer than the competitions for the average golfer. Subsequent trials tested 100 of the new balls and found an average of 255 yards, with a standard deviation of 5 yards. Test the hypothesis at the 5% alpha level. What is your alternative hypothesis, test statistic, p-value and conclusion? a. Ha: u(new) = u(competition), Z test = 140, p-value ~0.000 and reject the null. For starters, thats a statement of the null hypothesis. b. Ha: pnew > pcompetition, T test = -14, p-value 0.010 and reject the null. For starters, not a test of proportions. c. Ha: u(new) < u(competition), Z test = -14, p-value ~1.000 and reject the null. Hope you understand that, since were trying to prove Ha in most cases, it wouldnt make sense to prove that our new product goes SHORTER than the competition.think of the ad campaign. d. Ha: new
2 2 , F test = 22.7, p-value ~0.000 and reject the null. > competition

Nothing stated about the spread of distances for the new ball being wider than the competition. Again, think of the ad campaign..Our new ball has a wider spread of distances if you hit it on the sweet spot every time..This ball could go ANYWHERE!! e. Ha: u(new) > u(competition), Z test = 14, p-value ~0.000 and reject the null. The Z-test value is calculated as (255-248)/(5/100^.5) = 14. Obviously, a Z-test value that extreme has a really low p-value (much less than the 5% alpha) and we reject the null hypothesis.

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 7 of 16

11. In the class discussion on pooled versus paired test of the differences in population means, violation of a key assumption makes the paired test favorable, compared to the pooled test. What is this key assumption? a. Normality You can and should have normality of the parent distributions.remember that even if youre running a paired test, the parent distributions HAVE to be normal, or you cant use the T test. b. Squared terms Theres no assumption of squared terms c. Independence The independence assumption allows for a straight up comparison of one distribution vs the other (or overlay). Without independence (such as in the car price example), values that should be compared (such as the $$ from Dealership A between the two people) could be lost, with one being in the left side of one distribution slightly, and the other slightly in the right side of the other distribution. If you overlay the distributions, the underlying differences could not be apparent. d. Pre/Post sample This is not an assumption, but something that differentiates criticals from test stats or alpha from p-values. e. Having a known parent population This is a separate assumption, that is not different between the paired or pooled.rather its used in the determination of Z vs. T.

12. What term relates to the inappropriate usage of regression results outside the bounds of the independent variable used to develop the regression? a. Intermediation This is a 3rd party involvement in the resolution of a dispute between parties. b. Interpolation Opposite idea of extrapolation. If you know the values at two extreme points, you can estimate a value between, either through linear interpolation, or some other advanced form of interpolation. c. Alien Nation I believe this was a bogus answer d. Extrapolation We went over this in-depth in class and its bad. Remember, youre assuming that the relationship that holds true between Xmin and Xmax in your sample, also holds true outside those levels and it may very well not be true, especially to the degree youve identified. e. Correlation

HATFIELD STAT 211 SAMPLE FINAL - Discussion 13. Given the altered polio data presented below: Treatment Vaccine Placebo Total Patients 200,745 201,229

Page 8 of 16

# with Polio 44 73

If youre trying to prove that the vaccine is effective (lower incidence of polio with the vaccine), whats your test statistic and your conclusion at the 1% level of significance? a. b. c. d. e. T test = -2.364 and fail to reject the null F test = -7.44 and reject the null Z test = -2.364 and reject the null T test = 2.364 and fail to reject the null Z test = -2.67 and reject the null If this thing is effective, youre hoping that the percent with the vaccine should be less than with the placebo. Therefore, the test stat should be negative. In this case, the numerator is .000219184 less . 000362771 and the denominator is the square root of the quantity (.000291064*.999708936(1/200745+1/201229), where the .000291064 is the common p for both groups. The Z critical value if you were doing a 1-tailed test was -2.326..if you did a two tailed test (inequality), it was -2.5758..in either case, youre test stat was more extreme and you would have rejected the null hypothesis.

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 9 of 16

14. Morgan Pools has developed a new electronic stick pool tester that is said to provide similar results when compared to the old strip method and wants to perform a quick and dirty test to prove that theres no difference between the two for determining pH levels (they assume that if they can state that theres no difference between the two that speed of application will create sales for the new product over the old). Tests were run at 8 separate pools, with the strip tested in the north end and the stick in the south end. The summary results are below;

The questions are: Whats the appropriate degrees of freedom, is the test statistic weak or strong and whats the minimal level of alpha that would have resulted in a rejection of the null hypothesis in this case? a. Degrees of freedom = 7, the test statistic is weak and the minimal alpha level to reject based on this sample was about 22.76%. The DF = n-1, the test statistic is only weak here because of its related p-value, which is poor and of course, the alpha level that would have needed to be set here (that establishes the goal line, perse, is just a bit larger than the p-value.the value listed here is .0001 larger. b. Degrees of freedom = 7, the test statistic was strong and the minimal alpha level to reject based on this sample was about 10%. c. Degrees of freedom = 8, the test statistic was strong and the minimal alpha level to reject based on this sample was about 0.5%. d. Degrees of freedom = 7, the test statistic was weak and the minimal alpha level to reject based on this sample was about 0.5%. e. Degrees of freedom = 8, the test statistic was strong and the minimal alpha level to reject based on this sample was about 10%. 15. study? What term refers to the various groups under study in an ANOVA

a. Response variable this is the variable under consideration in the study, the observed values b. Treatments or factors c. Observations individual results relating to the response variable d. Variables there is only one variable e. Confounding effects bogus

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 10 of 16

16. Assume that you are conducting an experiment where you impose a change on various treatments (hoping to cause a change in the observed responses) and analyzing the results via an ANOVA study. In that case, why is variation between treatment means good and variation within each treatment bad? a. There is no good or bad variation. Variation is, by its nature, always bad. Thats not necessarily a true statement from a statistical standpoint. b. Good variants are helpful; while bad variants are a detriment to society. While that may be a true statement, its more a statement of sociology, not statistics. c. Variation between treatment means is good, because it is small. Variation within each treatment is bad, because its big. Not necessarily a true statement.it depends on how things play out in your study. d. Variation between treatment means is due to some change we imposed and that is good. Variation within each treatment is due to pure random effects, which is bad. e. That statement is backward. Variation between treatment means is bad because all means are supposed to be equal in ANOVA. Variation within each treatment is good, because it provide differences. Thats just not true. 17. I ran an ANOVA test for fiber content across the 3 shelves from the cereal data discussed earlier this year. The partial output is shown below;

Source Treatments Error Total

df

SS

MS

P-value

2 72.09163 74 76 4.8597374

What is the F-Statistic for this test? a. b. c. d. e. F=36.04582 this is MStrt F=359.62057 this is SSerr F<0.0001 this is the p-value F=7.417 its the ratio of MStrt to MSerr.you calculate MStrt as SStrt/DFtrt F=2.0000 this is just made up

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 11 of 16

18. What are the null and alternate hypothesis for the cereal fiber example shown on the prior page? a. Ho: The mean cereal fiber content is the same for at least two of the shelves. Ha: The mean cereal fiber content is the same for all of the shelves. Backwards b. Ho: The shelves are all the same. Ha: At least two of the shelves have are different. The shelves themselves are not the response variable. c. Ho: The differences between the cereal fiber contents are the same for all shelves. Ha: Two of the cereal fiber contents are the same. Were not running a comparison of difference between means here. d. Ho: The average cereal fiber content is the same for all three shelves. Ha: At least two of the shelves have different average cereal fiber contents. e. Ho: At least 3 of the cereals have the same mean fiber content. Ha: At least 2 of the cereals have different mean fiber contents. The test was for means by shelf 19. In the ANOVA table below, what proportion of the total variation in the response variable id due to the various treatment means? Source Treatments Error Total df 2 6 8 SS 5.756057 0.26574945 6.0218062 MS F-Stat P-value 2.8780285 64.97913 <0.0001 0.044291575

a. b. c. d. e.

0.9848 0.9559 0.0443 64.979 0.0154

MStrt/MStot SStrt/SStot % due to error within treatments The F stat Made up?

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 12 of 16

20. One of the first steps in developing a regression model might be to graph the variables in question to get an idea of the strength and possible direction of the relationship. What is the name of this graph? a. QQ plot gives an indication of normality of a dataset b. Boxplot basic graphical comparison of 2 or more subsets c. Histogram graphic representation of frequency or relative frequency of occurrences d. Scatterplot e. Relative frequency distribution see(c) 21. Given the following output regression from StatCrunch, relating to predicting housing prices, based on square footage: Simple linear regression results: Dependent Variable: PRICE Independent Variable: SQFT PRICE = 47.819305 + 0.61366683 SQFT Sample size: 117 R (correlation coefficient) = 0.8448 R-sq = 0.7136788 Estimate of error standard deviation: 204.45117 Parameter estimates: Parameter Intercept Slope Estimate 47.819305 Std. Err. DF T-Stat P-Value 0.4483

62.85482 115 0.7607898

0.61366683 0.03624592 115 16.930645 <0.0001 and the variety of T listed below: T.01,115= 2.35921 , T.02,115=2.07731 , T.01,116=2.35892 T.02,116=2.07710 , T.01,117=2.35864

What is the 98% confidence interval on the slope?


a. b. c. d. e. (-52.649 ; 148.288) (0.5282 ; 0.6992) (0.5384 ; 0.6890) (0.5582 ; 0.7292) (0.0000 ; 1.0000) this is more a CI on the intercept than anything its .6137 +/- 2.3591(.036250) .6137 +/- .07529 correct but offset by .03 bogus

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 13 of 16

22. A regression was performed on the StatCrunch dataset body measurements of sparrows to determine the relationship between weight of the humerus (independent variable) and the weight of the beak/head (dependent variable). The results are shown below. Simple linear regression results: Dependent Variable: beak/head Independent Variable: humerus beak/head = 11.665302 + 1.0718235 humerus Sample size: 49 R (correlation coefficient) = 0.761 R-sq = 0.5791743 Estimate of error standard deviation: 0.5210033 Parameter estimates: Parameter Estimate Intercept Slope Source DF Model Error Total 11.665302 Std. Err. DF T-Stat P-Value 2.4624748 47 4.737227 <0.0001 8.04271 <0.0001 P-value

1.0718235 0.13326645 47 SS MS

Analysis of variance table for regression model: F-stat 1 17.558435 47 17.558435 64.68519 <0.0001

12.75789 0.27144447

48 30.316326

Based on the regression results, estimate the beak/head weight if the weight of the humerus is 0.250 and is the slope shown for this regression significant at a 0.001 level of significance? a. Beak/head = 3.988 and yes, because the p-value is for the slope is greater than the level of significance. This is 1.0718235 + 11.665302(.25), you confused slope and intercept in the calculation. b. Beak/head = 0.25 and no, because the T-stat is extreme. Beak/head = humerus? c. Beak/head = 8.04271 and yes, because the F-stat is high. Made up

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 14 of 16

d. Beak/head = 11.665302 + 1.0718235(.25) = 11.9333 and yes, because the pvalue is for the slope is less than the level of significance. e. Beak/head = 11.665, but I wouldnt use the model to estimate this value, since the 0.25 is outside the reasonable range for x. good idea, but we dont know what the range of X values used in the regression equation was 23. What is the definition of confounding effects? a. When you have multiple factors in a study and you cant tell with reasonable certainty which caused the change in the variable under consideration. b. When you have results and you cant remember how you got them. This would be being confounded by how you got the results. c. When the results of the hypothesis test are not supported through the development of a confidence interval. This is almost always true that both WILL come to the same result d. When the results you get are based on univariate data. Univariate means that youre dealing with a single variable.it is possibly to get solid results studying a single variable. e. When your p-value and your critical value dont relate to the same item. Your pvalue and your critical are not supposed to relate to the same item..however, I could understand if your one tail p-value was .025 and your critical was 1.96, youd have a hard time telling what the results of your study were.technically, your test stat exactly REACHED the goal line 24. probability? What is the purpose of combinations in the calculation of a Binomial

a. It provides the probability that a specific order of successes and failures will occur. Thats p^x times q^(n-x) b. It allows for the use of the Central Limit Theorem in the calculation of large sample probabilities on proportions. c. It supports the use of np>5 and nq>5 for the use of Z. d. It reduces variance and error in the determination of Binomial probabilities. e. It accounts for the number of ways that a specific number of successes or failures can be obtained from a given number of trials.

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 15 of 16

25. Suppose the average weight of a balloon passenger is 175 lbs, with a standard deviation of 25 lbs. The balloon has a total lifting capacity of 7,150 lbs (dont worry about the combined weights of the pilot and crew). If we have 40 potential passengers in line for a ride, how many passengers can we allow on the balloon, if we are willing to accept at most a 1% chance that the combined weight will exceed the carrying capacity (the balloon wont rise)? a. 41 b. 40 c. 38 This works much like the airplane problem we went through in class. The mean weight of the 40 passenger balloon is 40 x 175 or 7000 lbs. The std dev is the square root of the variance, which is 40 x 25^2 or 158.111. Whats the probability that the NO LIFT point of 7150 falls on that curve.calculated as (7150-7000)/158.111 or .94roughly 16% of the curve falls outside that point which is unacceptable. You have to drop down to 39 and redo the problem..the first time you reach an acceptable number of passengers is at 38..your Z value is 3.24 and the probability of that occurring is <.0001. d. 39 e. 33 26. What does the level of significance define in a test of hypotheses? a. The test statistic. At best, the p-value defines the test statistic. b. The probability of rejecting the null hypothesis when it is actually true. c. The area of the curve associated with values that are more extreme than our test statistic. P-value d. The failure to reject region. opposite of what we have here e. The p-value. Alpha might have been a right answer. 27. When two population variances are close to equal, it is acceptable to develop a common variance, which is calculated as the weighted average of the variance obtained from the samples. What statistical parameter is defined by this weighted average? a. E(X) weighted average of a discrete distribution b. V degrees of freedom if you have unequal population variances c. Common p estimator I guess a common p estimator would be if your two populations proportions were similar, but weve never talked about this. d. Pooled variance estimator e. E(X+Y) The weighted average of X plus Y.

HATFIELD STAT 211 SAMPLE FINAL - Discussion

Page 16 of 16

28. At the local Quicki Mart Gas Station, two employees review the breakdown of last months sales data, as follows: 30% of all sales were made to trucks 70% of all sales were for regular gas 20% of all sales were for mid-grade gas 0% of the sales of premium gas were for trucks. Note that theres only 3 types of gas; regular, mid-grade and premium. 75% of the sales of mid-grade gas were not for trucks Trucks made up 35.71% of the sales of regular gas Assuming that this distribution is true for future purchases, if a customer pulls into the station and they are driving a truck, whats the probability that they will buy regular gas? a. 70% b. 30% c. 83.33% you needed to set up the tables..rows were Trucks and All Other and the columns were Regular, Mid Grade and Premium. Based on the information given in the problem, the marginals for vehicle type were .3 and .7 respectively and the marginals for gas type d. 25% e. 100%

S-ar putea să vă placă și