Sunteți pe pagina 1din 12

Brynn Lawrence

Math 1040- Fall 2016

Skittles Term Project


Candy Color- Qualitative because it is a characteristic
Individual: candy color

Level of Measurement: Nominal

# Of Candies per Bag- Quantitative because it is numerical measure


Individual: # of candies

Level of Measurement: Ratio

Class Skittles (sample size: 3351 skittles)


Color
Red
Orange
Yellow
Green
Purple

Summaries
716
698
726
710
701

Proportions
.202
.197
.204
.199
.197

Summary statistics:
Column
# of Candies

n
60

Mean
59.183333

Std. dev.
3.1108976

Median
59

Range
14

Min Max Q1 Q3 Mode IQR


50
64 58 61.5
59
3.5

Outliers:
Lower Fence: 58-1.5(3.5) = 52.75
Upper Fence: 61.5+1.5(3.5) = 66.75
Class Data Set Outliers: 50 & 52
Is the bag you purchased an outlier? No

Distribution:
Is it appropriate to discuss the shape of the distribution for each variable? No
Candy Color: Skewed Right, because it is pulled out to the right barely, towards the smaller
numbers.
# Of Candies per Bag: Skewed Left, because it is pulled out to the left, towards the smaller
numbers.

Can height be used to predict the number of candies that will be in a


bag of Skittles you purchase?
What do you think your results will be? I believe that there will be no linear relationship
between height and the number of skittles per bag, because height will not determine how many
skittles are in each bag.
Which is the explanatory variable? Height
Which is the response variable? Number of skittles per bag

Simple linear regression results:


Dependent Variable: Total per Bag
Independent Variable: Height (inches)
Total per Bag = 50.713668 + 0.1287705 Height (inches)
Sample size: 60
R (correlation coefficient) = 0.17042887
R-sq = 0.029046
Estimate of error standard deviation: 3.0916979

Parameter estimates:

Parameter
Intercept
Slope

Estimate
50.713668
0.1287705

Std. Err.
6.442338
0.097759403

Alternative
0
0

DF
58
58

T-Stat
7.8719354
1.3172185

P-value
<0.0001
0.1929

Analysis of variance table for regression model:

Source
Model
Error
Total

DF
1
58
59

SS
16.584782
554.39855
570.98333

MS
16.584782
9.5585957

F-stat
1.7350647

P-value
0.1929

Is there a significant relationship between the two variables? No, because the absolute value of
the correlation coefficient is less than the critical value.
Correlation Coefficient: r=0.17042887

Critical Value= 0.361

Is this what you expected when you thought about what the results would be before analyzing the
data? Explain. Yes, because it does not make sense that someones height would determine how
many candies per bag that they would receive.
How many candies would be expected to be in a bag purchased by someone who is 63.5 inches
tall? y=50.714+.1288(63.5) Total # of candies=58.91
Was it appropriate to use the regression equation to make this prediction? Why or why not? No,
because there is no significant relationship.
R-sq meaning is 2.9% of the variation in number of candies per bag can be explained by the
regression line relationship with height.
Assume there is a significant relationship between height and number candies per bag. Would it
be appropriate to predict the number of candies in a bag purchased by retired Houston Rocket
player Yao Ming, who is 90 inches tall? Why or Why not? No, it would not be appropriate to
predict the number of candies for Yao Ming due to his height being much larger than those
observed.
Systematic Sample: (2nd row, every 10th after)
Correlation Coefficient: r=.1457
Regression Equation: y=52.962+.0769x
Critical Value: 0.811
Is there a significant linear relationship between X and Y? No, there is no significant linear
relationship due to the absolute value of the correlation coefficient being less than the critical
value.

Probability
Problem 1: Suppose you are going to randomly select two Skittles from the bag YOU
purchased.
Red: 10

Orange: 18

Yellow: 12

Green: 7

Purple: 12

Total Number of Candies: 59

(a) What is the probability that both Skittles are purple if you select them with replacement?
Give your answer correct to four decimal places.
12/59= .2034
12/59= .2034

.2034*.2034=.0414

(b) What is the probability that both Skittles are purple if you select them without
replacement? Give your answer correct to four decimal places.
12/59= .2034
11/58=.1897

.2034*.1897=.0386

(c) What is the probability that at least one Skittle is purple if you select them with
replacement?
12/59= .2034

1-.2034=.7966

1-(.7966)2=.3654

Probability that at least one is purple = 1-p(none are purple)


Problem 2: Suppose all of the Skittles in the class data set are combined into one large bowl
and you are going to randomly select one Skittle.
Red: 716

Orange: 698

Yellow: 726

Green: 710

Purple: 701

Total Number of Candies: 3551


(a) What is the probability that you select a green Skittle?
710/3551= .1999
(b) What is the probability that you select a Skittle that is NOT green?
1-.1999= .8001
(c) What is the probability that you select a Skittle that is red OR yellow?
716+726/3551= .4061
(d) What is the probability that you select a Skittle that is orange GIVEN that it is a
secondary color (secondary colors are green, orange and purple)?
Total of Secondary Colors= 710+698+701=2109
698/2109= .3309
Problem 3: Suppose all of the Skittles in the class data set are combined into one large bowl
and you are going to randomly select ten Skittles with replacement and count how many are
yellow.
(a) Show that this meets the requirements of the binomial probability distribution and
identify n and p.
This is binomial probability distribution because the experiment is performed a fixed
number of times which each trial is independent. The trails are independent when there is
replacement. Within each trail there are 2 disjoint outcomes. The disjoint outcomes are
getting a yellow and not getting a yellow. The probability of success is the same for each
trail.
N= 10
P=726/3557= .2044

(b) What is the probability that exactly 4 of the 10 Skittles are yellow?
10 C 4(.2044)4(1-.2044)10-4= .0929
(c) For samples of size 10, what is the expected value and standard deviation for the number
of yellow skittles that will be included?
Mean: 10(.2044) = 2.044
Standard Deviation: 2.044 (1-.2044) = 1.275
Problem 4: For this problem, treat a 2.17 ounce bag of Skittles as an individual. Suppose the
values for our class data are the parameter values for all 2.17 ounce bags of Skittles. In other
words, assume = mean number of candies per bag in our class data set and = standard
deviation of number of candies per bag in our class data set (you computed these values in
Part 2).
= 3.1108976

= 59.183333

(a) Describe the sampling distribution for the mean number of candies per bag for samples of
32 bags. Include center, spread and shape. Note: The shape of the SAMPLING
DISTRIBUTION is different from the shape of the population, which you determined in
Part 2 of the project.
Center: stay the same
Spread: get smaller
Shape: approx. normal

mean=59.183333
3.1108976/32=.5499
3230

(b) What is the probability that the mean number of candies per bag for a sample of 32 bags
is greater than 58.5?
58.5-59.183333/.5499= -1.24 = .1075

1-.1075=.8925

Confidence Intervals
Explain in general the purpose and meaning of a confidence interval.
A confidence interval is an interval of numbers based on a point estimate that gives a range of
likely values for an unknown parameter
Identify the requirements for computing confidence intervals. List the requirements separately for a
confidence interval for a population proportion and for a population mean.
Population Proportion:
SRS
An approximately normal sampling distribution of p: np(1-p) 10
Independent trials: n 0.05N (sample is smaller than 5% of the population)

Population Mean:
SRS or Randomized Experiment
Sample size is small relative to the population size (n0.05N)
The data come from a population that is normally distributed, or the sample size is
large. (n30, so that the sampling distribution of the mean is approximately normal)
Using values for the class data that you computed in Part 2 of the project, construct a 99% confidence
interval estimate for the true proportion of yellow candies using the class data as your sample.
Remember that for this computation, n is the number of CANDIES for the entire class data. Include all
your work, showing the formula used and appropriate values inserted (neatly written and scanned or
typed).
Confidence Interval:
Check Requirements:
X=726
SRS
n=3351
3551(.2167)(1-.2167) 10
p=726/3551=.2167
3551 0.05 (of all skittles)
/2= .01/2= .005
Z/2= 2.575
1-Prop Z Interval: x= 726 n=3551 c-level=.99 = (.18702, .22188)
Give an appropriate interpretation of your interval.
We are 99% confident that the true proportion of receiving yellow skittles is between .18702
and .22188.
Based on your interval for the true proportion of yellow candies, was the proportion of yellow candies
in the single bag of candy you purchased a likely value for the true population proportion? Explain
how you know using actual values from your data and computations.
Yes, my proportion of getting a yellow in my single bag was 12/59= .2034 which fits within the
class confidence interval.
Using values you computed in Part 2 of the project, construct a 95% confidence interval estimate for
the true mean number of candies per bag using the class data as your sample, but for this
computation, n is the number of BAGS. Include all your work, showing the formula used and
appropriate values inserted (neatly written and scanned or typed).
Mean: 59.183 Std Dev: 3.111 df=60-1=59
=1-.95=.05
/2=.025
x t/2(s/n)
59.183 2 (3.111/60)
(58.38, 59.99)
T-interval: x= 59.183 s=3.111 n=60 c-level=.95 = (58.38, 59.987)

t/2= 2.0000

Give an appropriate interpretation of your interval.


With 95% confidence, the true mean of the number of skittles per bag is between 58.38 and
59.987

Based on your interval for the true mean number of candies per bag, was the total number of candies
in the single bag you purchased a likely value for the population mean? Explain how you know using
actual values from your data and computations.

In the single bag I purchased there was a total of 59 skittles which lies between 58.38 and
59.987 making it a likely value for the population mean.

Hypothesis Tests
Explain in general the purpose and meaning of a hypothesis test.
Hypothesis testing is a procedure, based on sample evidence and probability, used to test
statements regarding a characteristic of one or more populations.
Using values for the class data that you computed in Part 2 of the project and a 0.05
significance level, test the claim that 20% of all Skittles candies are red. Show all the steps
(neatly written and scanned, typed, or copied from Stat Crunch) including:
1. The hypotheses with correct notation
Ho: p= .20
H1: p .20
2. The conditions for performing the hypothesis test, along with checking that they are
methint: they are not all met!
Requirements/Met:
(NOT MET): The sample is a simple random sample.
(This test is a convenience sample.)
(MET): np(1-o) 10
(3551(.20)(1-.20)=56810)
(MET): The sample values are independent of each other (n.05N)
(3551 is less than 5% of all the skittle population)
3. The test statistic
x=716 (red skittles)
n=3551 (total # of skittles in class)
Stat>tests>1-prop Z test> calculate
z=.2433
p/hat= .2016
4. The p-value
p=.8078

5. The appropriate decision about the null hypothesis and an appropriate conclusion
Fail to reject Ho, There is insufficient evidence to conclude that not 20% of all skittles
arent red.
6. Also describe the Type I and Type II errors for this test.
Type I Error: Reject the proportion of red skittles of 20% when it really is 20%
Type II Error: Fail to reject that the proportion in not 20% of red skittles when it really
isnt 20%.
Using values for the class data that you computed in Part 2 of the project and a 0.01
significance level, test the claim that the mean number of candies in a bag of Skittles is more
than 58. Show all the steps (neatly written and scanned, typed, or copied from Stat Crunch)
including:
1. The hypotheses with correct notation
HO: =58
H1: >58
2. The conditions for performing the hypothesis test, along with checking that they are
methint: they are not all met!
Requirements:
(NOT MET) The sample is obtained using simple random sampling or
from a randomized experiment. (This is a convenience sample.)
(MET) The sample has no outliers and comes from a normal population,
or the sample size is n30 (6030)
(MET) The sample values are independent of each other. (Sample values
are independent of each other because they do not rely on each other for the
outcome.)
3. The test statistic
L1 (total per bag)> stat>test>t-test>data>calculate:
t=2.946
p=.0023
x=59.183
s=3.111
n=60
4. The p-value
p=.0023
5. The appropriate decision about the null hypothesis and an appropriate conclusion
Reject HO, there is sufficient evidence to conclude that the true mean is greater than 58.

6. Also interpret the p-value for this test.


If the mean is really 58 then the probability of getting a mean of 59.183 or more than is
.0023

S-ar putea să vă placă și