Sunteți pe pagina 1din 11

Caylor Woods

Math 1040-004
Skittles Project
6th November, 2014

Skittles Project
Introduction
Skittles, arguably one of the most recognizable candies ever to grace the shelves of grocery
stores from California to New York. From that easy to recognize packaging to the delectable
multicolored candies themselves, Skittles get peoples attention. Have you ever wondered
however, just how consistent the color distribution is from package to package? Like me, have
you ever pondered how those giant machines seem to put the perfect amount of each color in one
package? These may seem like silly questions yet for the makers of Skittles these are important
questions to ask. Quality control is an important aspect of a business and if Skittles Corp wants
to maintain its market share in the candy world, consistency and efficiency in distribution are
vital.
The proceeding information will analyze the statistics behind a sample of 25 individual 2.17
ounce bags of skittles. We will analyze the color distribution, the number of candies, and many
other variables to gain a better understanding of the statistics behind a bag of skittles. So grab
some sun glasses and get ready to taste the rainbow!

Part 1: Categorical Data vs. Quantitative Data


Before we get into the heart of Skittles statistics, there are a few things we must understand about
data that will make our journey over the rainbow a little easier. As stated in Elementary Statistics
12th edition by Mario Triola statistics involves collecting data about a sample (like our sample
of 25 skittles bags) and then analyzing the data to make inferences about an entire population.
The intended purpose of the following data is for us to make some generalizations about the
world population of skittles by analyzing our sample of 25 bags. We will begin with two
primary types of data most commonly used in statistics, categorical data and quantitative data.
In Statistics there are two primary ways to classify data, that is, categorical data or quantitative
data. Categorical data involves the use of names, labels, or even numbers. Categorical data
cannot be counted or measured, it is used merely as a way to identify classes or categories of
information such as political party (Democrat or Republican) or in the case of Skittles Corp,
candy color (red, orange, green etc). Unlike categorical data quantitative data involves
numbers that can be counted or measured. Quantitative data includes things like the number of
Democrats and Republicans, or the number of red candies in a bag of skittles.
Now back to the project, the following page shows graphs depicting both our categorical and
quantitative data. You will see the categorical data represented using a pie chart and pareto

chart; then you will see the quantitative data represented using a frequency histogram and a box
plot. My individual sample data has been included for comparison.

Organizing and Displaying Categorical Data: Colors


Pie Chart (In Percentages)

Skittles Project: Class Total Candies by Color


(All Samples: Total 1,511 Skittles)

Red
Purple
18.27% 21.24%
Orange
19.32%

Yellow
20.25%

Red
Orange
Green
Yellow

Green
20.91%

Purple

Skittles Project: Individual Total Candies by Color


(Single Sample: Total 59 Skittles)

Purple
20.34%

Red
16.95%
Orange
16.95%

Yellow
13.56%
Green
32.20%

Red
Orange
Green
Yellow
Purple

Pareto Chart

Skittles Project: Class Total Candies by Color


(All Samples: Total 1,511 Skittles)
350
Frequency (Hundreds)

300
250
200
150

321

316

306

292

276

Red

Green

Yellow

Orange

Purple

100
50
0
Candy Color
Red

Green

Yellow

Orange

Purple

Skittles Project: Individual Total Candies by Color

Frequency (Tens)

(Single Sample: Total 59 Skittles)


20
18
16
14
12
10
8
6
4
2
0

19
10

10

Red

Green

Yellow

Orange

12

Purple

Candy Color
Red

Green

Yellow

Orange

Purple

The Pie Chart and the Pareto Chart do an excellent job of visually displaying the data. The
purpose of such graphs is to enable viewers to quickly gain an idea of the layout or spread of
the data. By comparing the charts for an individual, to those of the total class data, we can see

how an increase in sample size yields greater consistency with less variation. My personal bag
of skittles had greater variation (as shown in the graph) than that of the total class data.
Variation among small samples is to be expected, and will lessen as more samples are added.
The important thing to remember here however is how graphic representations of data (like
Pareto charts and Pie charts) make categorical information much easier to understand. Now you
should have a better understanding of the color distribution of a typical bag of skittles. The
preceding information shows us that the colors in the skittles rainbow are pretty even!
Organizing and Displaying Quantitative Data: the Number of Candies per Bag
Next well look at some quantitative data. As previously mentioned, quantitative data are data
that consist of measurements or counts of something. For the sake of this project the quantitative
data we are interested in include: mean number of candies, and the total number of candies. The
totals, also known as frequencies, were calculated; then some basic statistical calculations were
used to obtain the mean, the standard deviation, and the 5 number summary. Each of the
aforementioned figures is seen below. Following the 5 number summary you will see a brief
description of a frequency histogram and a boxplot, followed by a graphical representation of the
data in that form.
Class Sample (total number of bags) = 25 Bags
Mean (average # of candies in the sample) = 60.4 Candies
Standard Deviation (shift away from the mean) = 4.36 Candies
5-number Summary: Min = 53, Q1 = 59, Median = 60, Q3 = 62, Max = 77. Each number
represents total number of candies in a bag. For example the bag with the least candies had 53;
the bag with the greatest number of candies had 77 and so on
Frequency Histogram: Graphs that depicts frequencies for a set of data. In our case the
frequency will represent the number of candies (frequency) in each of the 25 bags.
Boxplot: Using the 5-number summary we can construct a boxplot that shows us the spread of
our data. Using our sample, the box plot will enable you to see where the majority of our data
lies, and whether or not there were any outliers. See the following pages for both graphs and
keep in mind my individual skittles bag had a frequency of 59.

Frequency Histogram

Box Plot

The preceding quantitative data is shown in a way that makes it easy to see where the majority of
the data lies. The histogram has a distribution that is skewed right; that means, although most of
our data lies on the left side of the graph, it is being pulled to the right because of our outlier of
77. The box plot also has a distribution that is skewed right; the boxplot is merely another
representation of the spread of our data.
After analyzing both graphs I would say that the distribution of the data is what one would
expect to see based on our sample data. My individual bag of skittles with 59 candies lies very
near the sample mean of 60.4. It would appear that Skittles Corp. does a decent job at keeping
the number of candies in each individual bag relatively equal. It is this aspect of consistency that
is so vital to the success of Skittles Corp.
Reflection:
As the preceding information shows, there are two different types of data that are important
pieces of the puzzle here. First, categorical which is data that consists of labels (Red, Green,
Yellow, Orange, Purple). Categorical data is crucial to Skittles because it allows the candy
maker to equally distribute the various colors of candies. Categorical data is best expressed in
Pie Charts and Pareto charts (as shown); this enables viewers to quickly see the different color
distributions easily.
The second type of data is quantitative data which is a number representing a count of
something. Quantitative data is another important parameter that allows Skittles Corp to ensure
quality control, and that the number of candies in each bag (of the same size) is nearly the same.
As we have seen, quantitative data is best represented in Histograms and Boxplots. Additionally,
calculations of the mean, standard deviation, and the 5-number summary, all provide an easy
way to observe the frequency distribution for the data.
Part II: Confidence Intervals and Hypothesis Tests
Introduction:
Confidence intervals and hypothesis tests are critical in statistics; they allow statisticians to
observe characteristics regarding a population, or allow them to estimate the value of a
population parameter (proportion, mean, standard deviation). The following sections will
explain in detail the purpose of both confidence intervals and hypothesis tests with examples of
each. Before we begin however, one thing to understand about Confidence Intervals and
Hypothesis Tests is that there are some conditions that must be met in order to be able to
calculate them. Lets discuss those really quickly.
Conditions for Confidence Intervals:
The conditions to construct a confidence interval vary between the mean, proportion, and
standard deviation, well look at each.
Mean: 1.) The sample is a simple random sample. 2.) either or both of these conditions is
satisfied: The population is normally distributed or N > 30.

Proportion: 1.) The sample is a simple random sample. 2.) There is a fixed number of trials, the
trials are independent, there are two categories of outcomes, and the probabilities remain
constant for each trial.
Standard Deviation: 1.) The sample size is a simple random sample. 2.) The sample MUST be
normally distributed.
Confidence Intervals
Lets begin with Confidence Interval Estimates. Confidence intervals allow us to take our
sample Skittles data and construct a range of values for our population parameter, such as the
mean, standard deviation, or proportion. Remember the objective of statistics is to draw
conclusions about a population using a sample; confidence intervals allow us to do this. By
constructing a confidence interval using our sample data, we will be able to make inferences
about the Skittles population as a whole; for example whether or not our population parameters
lie within the calculated range of values. The following page contains confidence intervals using
the sample skittles data.
Construct a 99% confidence interval estimate for the true proportion of yellow candies.
In this particular confidence interval, the proportion of yellow candies is being observed. Based
on the results of this confidence interval, we can infer with 99% certainty that among the entire
population of skittles bags, the true proportion of yellow candies should make up between 17.6%
and 23% of every 2.17oz bag of skittles in the world.

Construct a 95% confidence interval estimate for the true mean number of candies per bag.
In this confidence interval, the mean number of candies per bag is being observed. Based on the
results we can conclude (with 95% confidence) from our sample, that among the entire
population of skittles bags, the mean number of candies should lie between the values of 58.6
candies and 62.2 candies (59 and 63 rounded).

See the following page for the final confidence interval.

Construct a 98% confidence interval estimate for the standard deviation of the number of
candies per bag.
This last confidence interval test deals with the standard deviation of candies per bag in the
population. From our sample we can state with 98% confidence that the standard deviation of
candies per bag, will fall between 3.258 and 6.483 candies.

Hypothesis Tests
Hypothesis testing is a procedure for testing a claim (hypothesis) about a property of a
population. The property often refers to the mean, proportion, or standard deviation. So with
hypothesis tests there will be a claim stated about a population. The claim will deal with any one
of the population parameters; it is then the job of the statistician to test the claim with some
calculations and draw a conclusion about the validity of the claim. Hypothesis tests allow us to
reject or fail to reject a given claim. It is this rejection process that then allows us to draw a
conclusion about the population. With hypothesis testing it is important to understand the idea of
a null hypothesis and the alternative hypothesis. The following definitions were taken from

Elementary Statistics by Mario F. Triola. Additionally, as with confidence intervals, hypothesis


test have some conditions inorder to be calculated.
Null Hypothesis: Statement that the value of the population parameter is equal to some value.
Alternative Hypothesis: Statement that the population parameter has a value that somehow
differs from the null. (Whether it be <, >, or not equal)
(The goal of the hypothesis test is to reject or fail to reject the null or alternative hypotheses.
The Hypothesis tests on the following pages relate directly to our sample skittles data. They
allow us to analyze some claims made about our skittles population.
Conditions for Hypothesis Tests:
Mean: 1.) The sample is simple random. 2.) Either or both of these conditions is satisfied: The
population is normally distributed or N > 30.
Proportion: 1.) The sample observations are a simple random. 2.) The conditions for a binomial
distribution are satisfied. 3.) There must be at least 5 successes and 5 failures.
Standard Deviation: 1.) The sample is a simple random sample. 2.) The population has a normal
distribution.
Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
This particular hypothesis test makes the claim that 20% of all skittles candies are red. In this
case our null hypothesis is the claim. After performing the hypothesis test we did in fact fail to
reject the claim. To quickly explain the work shown; in this hypothesis test we found a test
statistic, which in this case was a z score of 1.21. After finding the test stat, I found the critical
value that corresponded to the area in each .025 tail. The critical values were also z scores of
-1.96 and 1.96(1.92 is shown and incorrect). Because my test statistic fell outside the shaded
region (known as the rejection region) I failed to reject the null and thus supported the original
claim that 20 percent of all skittles are red.

Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
Skittles is 55
This particular hypothesis test makes the claim that the mean number of candies in a bag is 55.
In this case our null hypothesis is the claim. After performing the hypothesis test we reject the
null. Our test statistic of t=6.193 fell inside the rejection region leading to rejection of the null.
Because we rejected the null there is insufficient evidence to support the claim. Therefore we
cannot be certain our mean number of candies per bag will equal 55. The mean could actually be
below or above the claimed mean of 55 candies.

Reflection: Confidence Intervals and Hypothesis Tests


Now lets wrap this up, as we have just seen, confidence intervals and hypothesis tests are a great
way of using a sample to draw conclusions about a population. With our hypothesis tests we
were given a claim, and after doing some calculations we were able to make a judgment on the
entire skittles population. We did the same thing with our confidence intervals. After working
some calculations based off of our sample, we were able to draw some conclusions about the
proportion of yellow candies for the entire skittles population, as well as the mean number of
skittles per bag for the entire skittles population.
While we did try our hardest to construct a sample in a way that was suitable for accurate
calculations, we did experience some variables that could lead to some errors. The key to a welldesigned statistical experiment is obtaining data in a way that was simple random, that means,
every sample of the same size has the same chance of being selected. For the sake of our
sample, we cannot consider our sample a truly simple random sample due to the fact that all of
the skittles in our sample were obtained in the same region (Northern Utah). In order for our
sample to have been truly random, we would need packages of skittles from New York, to L.A.
and even as far as China or India. It is this error of not being truly random that makes our
inferences potentially incorrect. The fact remains however that we tried our darndest to be
random, and with our sample we constructed our data so you could see the statistical process
anyway. Hopefully now that you have completed the journey over the rainbow you will think of
just how statistical that bag of skittles you bought truly is!

S-ar putea să vă placă și