Documente Academic
Documente Profesional
Documente Cultură
Math 1040-004
Skittles Project
6th November, 2014
Skittles Project
Introduction
Skittles, arguably one of the most recognizable candies ever to grace the shelves of grocery
stores from California to New York. From that easy to recognize packaging to the delectable
multicolored candies themselves, Skittles get peoples attention. Have you ever wondered
however, just how consistent the color distribution is from package to package? Like me, have
you ever pondered how those giant machines seem to put the perfect amount of each color in one
package? These may seem like silly questions yet for the makers of Skittles these are important
questions to ask. Quality control is an important aspect of a business and if Skittles Corp wants
to maintain its market share in the candy world, consistency and efficiency in distribution are
vital.
The proceeding information will analyze the statistics behind a sample of 25 individual 2.17
ounce bags of skittles. We will analyze the color distribution, the number of candies, and many
other variables to gain a better understanding of the statistics behind a bag of skittles. So grab
some sun glasses and get ready to taste the rainbow!
chart; then you will see the quantitative data represented using a frequency histogram and a box
plot. My individual sample data has been included for comparison.
Red
Purple
18.27% 21.24%
Orange
19.32%
Yellow
20.25%
Red
Orange
Green
Yellow
Green
20.91%
Purple
Purple
20.34%
Red
16.95%
Orange
16.95%
Yellow
13.56%
Green
32.20%
Red
Orange
Green
Yellow
Purple
Pareto Chart
300
250
200
150
321
316
306
292
276
Red
Green
Yellow
Orange
Purple
100
50
0
Candy Color
Red
Green
Yellow
Orange
Purple
Frequency (Tens)
19
10
10
Red
Green
Yellow
Orange
12
Purple
Candy Color
Red
Green
Yellow
Orange
Purple
The Pie Chart and the Pareto Chart do an excellent job of visually displaying the data. The
purpose of such graphs is to enable viewers to quickly gain an idea of the layout or spread of
the data. By comparing the charts for an individual, to those of the total class data, we can see
how an increase in sample size yields greater consistency with less variation. My personal bag
of skittles had greater variation (as shown in the graph) than that of the total class data.
Variation among small samples is to be expected, and will lessen as more samples are added.
The important thing to remember here however is how graphic representations of data (like
Pareto charts and Pie charts) make categorical information much easier to understand. Now you
should have a better understanding of the color distribution of a typical bag of skittles. The
preceding information shows us that the colors in the skittles rainbow are pretty even!
Organizing and Displaying Quantitative Data: the Number of Candies per Bag
Next well look at some quantitative data. As previously mentioned, quantitative data are data
that consist of measurements or counts of something. For the sake of this project the quantitative
data we are interested in include: mean number of candies, and the total number of candies. The
totals, also known as frequencies, were calculated; then some basic statistical calculations were
used to obtain the mean, the standard deviation, and the 5 number summary. Each of the
aforementioned figures is seen below. Following the 5 number summary you will see a brief
description of a frequency histogram and a boxplot, followed by a graphical representation of the
data in that form.
Class Sample (total number of bags) = 25 Bags
Mean (average # of candies in the sample) = 60.4 Candies
Standard Deviation (shift away from the mean) = 4.36 Candies
5-number Summary: Min = 53, Q1 = 59, Median = 60, Q3 = 62, Max = 77. Each number
represents total number of candies in a bag. For example the bag with the least candies had 53;
the bag with the greatest number of candies had 77 and so on
Frequency Histogram: Graphs that depicts frequencies for a set of data. In our case the
frequency will represent the number of candies (frequency) in each of the 25 bags.
Boxplot: Using the 5-number summary we can construct a boxplot that shows us the spread of
our data. Using our sample, the box plot will enable you to see where the majority of our data
lies, and whether or not there were any outliers. See the following pages for both graphs and
keep in mind my individual skittles bag had a frequency of 59.
Frequency Histogram
Box Plot
The preceding quantitative data is shown in a way that makes it easy to see where the majority of
the data lies. The histogram has a distribution that is skewed right; that means, although most of
our data lies on the left side of the graph, it is being pulled to the right because of our outlier of
77. The box plot also has a distribution that is skewed right; the boxplot is merely another
representation of the spread of our data.
After analyzing both graphs I would say that the distribution of the data is what one would
expect to see based on our sample data. My individual bag of skittles with 59 candies lies very
near the sample mean of 60.4. It would appear that Skittles Corp. does a decent job at keeping
the number of candies in each individual bag relatively equal. It is this aspect of consistency that
is so vital to the success of Skittles Corp.
Reflection:
As the preceding information shows, there are two different types of data that are important
pieces of the puzzle here. First, categorical which is data that consists of labels (Red, Green,
Yellow, Orange, Purple). Categorical data is crucial to Skittles because it allows the candy
maker to equally distribute the various colors of candies. Categorical data is best expressed in
Pie Charts and Pareto charts (as shown); this enables viewers to quickly see the different color
distributions easily.
The second type of data is quantitative data which is a number representing a count of
something. Quantitative data is another important parameter that allows Skittles Corp to ensure
quality control, and that the number of candies in each bag (of the same size) is nearly the same.
As we have seen, quantitative data is best represented in Histograms and Boxplots. Additionally,
calculations of the mean, standard deviation, and the 5-number summary, all provide an easy
way to observe the frequency distribution for the data.
Part II: Confidence Intervals and Hypothesis Tests
Introduction:
Confidence intervals and hypothesis tests are critical in statistics; they allow statisticians to
observe characteristics regarding a population, or allow them to estimate the value of a
population parameter (proportion, mean, standard deviation). The following sections will
explain in detail the purpose of both confidence intervals and hypothesis tests with examples of
each. Before we begin however, one thing to understand about Confidence Intervals and
Hypothesis Tests is that there are some conditions that must be met in order to be able to
calculate them. Lets discuss those really quickly.
Conditions for Confidence Intervals:
The conditions to construct a confidence interval vary between the mean, proportion, and
standard deviation, well look at each.
Mean: 1.) The sample is a simple random sample. 2.) either or both of these conditions is
satisfied: The population is normally distributed or N > 30.
Proportion: 1.) The sample is a simple random sample. 2.) There is a fixed number of trials, the
trials are independent, there are two categories of outcomes, and the probabilities remain
constant for each trial.
Standard Deviation: 1.) The sample size is a simple random sample. 2.) The sample MUST be
normally distributed.
Confidence Intervals
Lets begin with Confidence Interval Estimates. Confidence intervals allow us to take our
sample Skittles data and construct a range of values for our population parameter, such as the
mean, standard deviation, or proportion. Remember the objective of statistics is to draw
conclusions about a population using a sample; confidence intervals allow us to do this. By
constructing a confidence interval using our sample data, we will be able to make inferences
about the Skittles population as a whole; for example whether or not our population parameters
lie within the calculated range of values. The following page contains confidence intervals using
the sample skittles data.
Construct a 99% confidence interval estimate for the true proportion of yellow candies.
In this particular confidence interval, the proportion of yellow candies is being observed. Based
on the results of this confidence interval, we can infer with 99% certainty that among the entire
population of skittles bags, the true proportion of yellow candies should make up between 17.6%
and 23% of every 2.17oz bag of skittles in the world.
Construct a 95% confidence interval estimate for the true mean number of candies per bag.
In this confidence interval, the mean number of candies per bag is being observed. Based on the
results we can conclude (with 95% confidence) from our sample, that among the entire
population of skittles bags, the mean number of candies should lie between the values of 58.6
candies and 62.2 candies (59 and 63 rounded).
Construct a 98% confidence interval estimate for the standard deviation of the number of
candies per bag.
This last confidence interval test deals with the standard deviation of candies per bag in the
population. From our sample we can state with 98% confidence that the standard deviation of
candies per bag, will fall between 3.258 and 6.483 candies.
Hypothesis Tests
Hypothesis testing is a procedure for testing a claim (hypothesis) about a property of a
population. The property often refers to the mean, proportion, or standard deviation. So with
hypothesis tests there will be a claim stated about a population. The claim will deal with any one
of the population parameters; it is then the job of the statistician to test the claim with some
calculations and draw a conclusion about the validity of the claim. Hypothesis tests allow us to
reject or fail to reject a given claim. It is this rejection process that then allows us to draw a
conclusion about the population. With hypothesis testing it is important to understand the idea of
a null hypothesis and the alternative hypothesis. The following definitions were taken from
Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
Skittles is 55
This particular hypothesis test makes the claim that the mean number of candies in a bag is 55.
In this case our null hypothesis is the claim. After performing the hypothesis test we reject the
null. Our test statistic of t=6.193 fell inside the rejection region leading to rejection of the null.
Because we rejected the null there is insufficient evidence to support the claim. Therefore we
cannot be certain our mean number of candies per bag will equal 55. The mean could actually be
below or above the claimed mean of 55 candies.