Sunteți pe pagina 1din 8

Lucero Vargas

Math 1040
MWF 11:00-11:50
Skittles Term Project
The skittles term project is based on a study we, the students did by each buying a bag of
a 2.17-ounce bag of Original Skittles and recording how many portions we had of each color in
the bag. After collecting all the data, the instructor gave us all a file with our data put together.
With this information we were required to do the proportion of each color within the overall
sample by creating a Pie Chart and a Pareto Chart, which is showing categorical data. With the
knowledge weve gained through this class we were required to also calculate the mean, the
standard deviation, the 5-number summary, create a frequency histogram and a boxplot, which is
showing quantitative data. After we were done with both of these different type of datas we
explain the difference categorical and quantitative data. We also explain the purpose and
meaning of a confidence interval and a hypothesis test. The goals I have with this project is being
able to do all the procedures correctly as well as summarizing what I have learned in elementary

Propotion of each color in a Skittles

bag of the whole classroom data










Propotion of each color in a Skittles bag of the

whole classroom data






As I did the pareto and pie chart I came to realize that the data from all the class was not
what I expected it to be. The color red which is I thought to believe is the most common was not
the one with the highest portion, it was purple. The overall data collected by the class compared
to mine was not too similar since one of my highest portions was red and green candies.

Portions of candies by color from class data











Portions of candies by color from my own bag.









The Number of Candies per Bag

Mean: 1496/25=59.8
Standard Deviation: 1.82
5 Number Summary:
Quartile 1=59
Quartile 2 (Median) =59
Quartile 3= 61

Frequency of number of candies per

Skittles bag
Skittles bag






number of skittles

The shape of the frequency histogram and the boxplot is not a normal distribution; it
appears to be highly skewed to the right. The graphs are not what I expected them to be because
when I was looking at the whole class data it seems that everyone had about the same portion of
skittles in each bag and it seemed to be a normal distribution. The whole class data appears to

match mine since I had 61 candies and that many candies in a skittles bag did not appear to be
The number of candies from my own personal bag: 61
The number of bags in the sample: 25
The difference between categorical and quantitative data is that categorical data has a no
numerical value for example, the numbers of a social security card. Quantitative data does have a
numerical value for example, counting how many eggs there is in the fridge. A bar graph could
be a appropriate graph you can use with categorical data because that way you can compare the
sizes of categories. A pie chart could also be a great chart for categorical data because that way
you can the percentage of the whole each category constitutes. A stem-and-leaf plot would not be
used with categorical data, it should be used with quantitative data since you display the shape of
the distribution, you can organize numbers and make them more comprehensive. The chi-square
test is the only type of calculation you can use with categorical data since the data has been
counted and divided into categories, but the numbers dont have any specific mean. With
quantitative data you can use almost any type of calculations except for the chi-square
Confidence Interval Estimates:

A confidence interval is a range (or an interval) of values used to estimate the true value
of a population parameter. The purpose is to find out a true proportion of a sample.

I am 99% confident that the true proportion of yellow candies is in the interval (.1707,
.22369). P-hat=197

I am 95% confident that the true mean number of candies per bag is in the interval
(59.709, 59.891). Sx=1.80

I am 98% confident that the true standard deviation of the number of candies per bag is in
the interval (1.36, 2.71)

Hypothesis Test:

A hypothesis test is a test for significance when wanting specific evidence to find some
results. Meaning is a procedure for testing a claim about a property of a population

The p-value 0 is less than the significance level .05, so we reject Ho. There is convincing
evidence that 20% of all skittles candies are red

The p-value 1.085 is greater than the significance level .01, we failed to reject Ho. We do
not have convincing evidence that the mean number of candies in a bag of skittles is 55.

The conditions for doing a population proportion for confidence interval and hypothesis
test are:
-the sample is a simple random sample
-the conditions for the binomial distributions are satisfied. There is a fixed number of trials, the
trials are independent, there are two categories of outcomes, and the probabilities remain
constant for each trial.
-the sample is large enough for z procedures, using the formulas np>10, n(1-p)>10
-you would multiply the sample size by 10

The conditions for doing a population mean for confidence interval and hypothesis test
- the sample is a random sample
- n>30 or the population is normal
- You multiply the sample size by 10 to make sure its not greater than the amount of the
My samples did meet these conditions since the sample is a random sample, there is more
than 14,960 skittles in the population and the sample is large enough for z procedures. The errors
that could have been made by using this data are that someone perhaps counted the number of
candies wrongly and gave wrong data. The sampling methods could be improved by perhaps
having a second person count your number of candies. The conclusions I have drawn from my
statistical research is that I was able to reject Ho when my p-value was smaller than .05 meaning
that we do have convincing evidence that 20% of all skittles candies are red.

As I finished doing this project I have overcome many challenges and have
figured out how to do solutions I completely didnt comprehend before, for example how
to construct a confidence interval for the standard deviation. Statistics has been a
challenging class, since there is so much more thinking involved contrast to just putting
numbers in a formula. It has helped me gained knowledge that I will use in my life. One
of the most important things I have learned in this class was about a normal frequency
distribution and how it should look like. With the knowledge about frequency
distributions I know itll be an impact in other classes that I will take in the future. Why
is that? Since I plan on going to the medical field I am assured that I will be seeing
normal frequency distributions graphs. Already in my psychology class I was taking at
the same time as statistics we had to learn about frequency distributions in IQ scores, yet
it was easier for me to comprehend than other students since I had already known this
information from statistics. The project has helped me to develop more excel skills since
I did multiple graphs that I didnt know how to do before as the Pareto chart that I didnt
even know what it was or what it was used for. The project has also helped me with my
problem skills since I had to learn how to use a graphing calculator or excel for special
formulas like finding the standard deviation of the number of candies per bag because the
data set was too large and it would have taken a very long time to find the solution by
hand. This project has changed the way I think about real-world math applications since
before I took this class I never took much thought to statistics or what it used for. Now
having the knowledge I know, I have come to understand how easily a survey can
become biased. The different types of studies, some that are done in the moment others
that take years to make.

S-ar putea să vă placă și