Sunteți pe pagina 1din 7

MATH 1040: Final Skittles Project Osiris Garcia

Prof. Elizabeth Jones

Statistics is the science of collecting, organizing, summarizing, and analyzing


information to draw conclusions or answer questions. Within the course of Math 1040, we are
introduced to the basic principles and tests used to gather data and infer back to the population.
For our semester project we have conducted a study to see if could estimate certain numbers
within the candies in an average bag of Skittles. Throughout the course we have gained the
knowledge of various skillsets in order to successfully conduct this test.

The Skittle Bag Project.

In order to conduct our experiment, we need to complete the first step: obtaining a
sample. A sample is a set of data collected or selected that becomes a subset of the population
that is being studied. In order to collect the data, our professor assigned each student to purchase
a simple 2.17 oz regular bag of Skittles. We then compiled the data, to which became our
sample. But, how do we know this sample is okay to use? Samples are at their best when the
group is selected at random. When we were asked to purchase a bag of Skittles the variable of
randomization became its own. Every student purchased their Skittle bag, however, not every
student went to the same store and not every student lives in the same area. Therefore, the
sample can be viewed as random. Below lies the numerical data found in my own skittle bag:
Red Orange Yellow Green Purple
Expected Prop. 0.2 0.2 0.2 0.2 0.2
Observed Prop. 0.204 0.191 0.213 0.195 0.198

After collecting our own Skittles bag, we were asked what proportion or percentage of
each color we expected to see in our bag of skittles. I then answered the following: “I expect that
there should be an equal proportion of all skittle colors. I believe that there should be an equal
amount of skittles for each color available. A rainbow should have equal representation of colors
and as their slogan says “Taste the rainbow” I believe there should be equal representation for
the skittles of each color”. This statement will be considered our hypothesis. Our expected
outcome once gathering enough data to test if our guess is correct and Skittles produces skittles
in equal amounts.
Now that we have collected our personal data we must complete the next step: organize
and compile the data along with that of our fellow classmates. In order to facilitate the
interpretation of the data, we will use graphs to display the gathered data. Graphical data allow
our brain to more easily view and comprehend the general trends occurring within the sample
data. Below are graphical summaries of the skittle proportions gathered by the entire class.
(Total: 2291 Skittles)

My Bag Proportions and Count vs. The Class Proportions and Count
Red Orange Yellow Green Purple Total Ct.
My Bag 0.203 0.186 0.254 0.153 0.203 59
(12) (11) (15) (9) (12) (1.00)
Class 0.204 0.191 0.213 0.195 0.198 2291
Counts (467) (437) (487) (447) (453) (1.00)

Once again, How do we know this sample is random? and What population would be the
one studied exactly? The data the class has collected was random. Each student had the task to
go to a store and purchase a bag of skittles. When given this task, it is very likely that not all
students purchased their bag of skittles in the same store. Likewise, it is also likely that not all
students live within the same city or county. When arriving at the store and picking out their bag
of skittles, the students may have gone different ways in choosing their bag. One student may
have taken the one in the back, another may have grabbed one in the middle, one may have
gotten the bag all the way in the back. These bags of skittles were therefore randomly selected in
order to conduct this sampling project. When thinking about the population, this would be all of
the bags of skittles that are out there in the world. The reason being, we are viewing the
proportions that are contained within the skittle bags in general. Meaning all of the skittle bags
existing.
Now, as I view the graphs. They seem to represent the data close to what I had expected
it to be. I had previously expected there to be equal representation of all the colors within a
single bag of skittles. When compiling the data I noticed that the proportions are in fact, very
close to what I had expected. Although our graphs represent what I seem to have expected, we
need to make sure there are not any outliers within the data collected. Outliers are very extreme
values that may affect the studies outcome. However, On average there seems to be an equal
proportion of skittles though out the data. When observing the evidence there is no significant
difference within the proportions. It can be concluded that there does not appear to be any
outliers. Lastly, we must compare the once compiled data with that of our own bag. The
distribution of colors in the total class does not fully match those of my own bag. However, they
are very similar. The only very drastic difference is the proportion of yellow skittles in my own
bag. This is explained because, when one collects data on a single individual, the data may
appear to be skewed or uneven. Yet, when collecting data on various individuals the proportions
observed will be closer to those of the true proportion.
Once we have compiled and organized our data it is now time for step three: data
summarization and analysis. The data we have compiled can be summarized in 5 simple
numbers. Min, Q1, Mean, Q3, and the Max. Our Min, is the smallest outcome in out results also
known as the minimum. Our Q1, is our middle value between the minimum and the mean. The
mean is the average of ALL our data. Our Q3, is the middle value between the mean and the
maximum. Our Max, is our greatest value in our outcomes also known as our Maximum.
Another important summary statistic is the standard deviation, which is a measure to define the
variation within our sample.
Using the total number of candies in each bag in our class sample. I will be computing
the following: mean number of candies per bag, standard deviation of the number of candies per
bag, and the respective 5-number summary. (All rounded to one decimal place). When
summarizing the number of Skittles per 2.17 oz bag, the following summary stats arose:
In order to compute the mean we will add all of the totals and divide by the number of totals.
(59+61+57+61+59+59+60+60+60+59+59+59+63+59+57+59+57+37+63+57+58+61+53+58+60+58+60+56+59+60+59+58+93+61+64+78+72+58)
38

The mean for the number of candies per bag, in our class sample is 60.3
In order to calculate the standard deviation we will use technology
The resulting S.D is 7.8
Finally, we will continue to use technology in order to get our 5-number summary.
- Min: 37, Q₁: 58, Med: 59, Q₃: 61, Max: 93

Once we have our numerical summary statistics, it is important to visually be able to


understand the trends that are going on within our sample. However, summary stats require a
different sort of graph representation. A box-plot is used to display this type of data. This
specific type of graph lies above a number line and stretches out horizontally. The vertical lines
found on the line represent our summary statistics and make it easy to visualize the spread or
variation found within our data.
Finally, we are able to interpret what we see in our graphical summaries. When observing
the frequency histogram we can conclude that the slightly skewed to the right. Most of the data
seems to lie within the 50’s – 59’s. We seem to have some outliers among our data, these values
are known to be those found outside the lower and upper fences unlike what I had inferred
before. In order to calculate the lower fence we will take Q₁ - 1.5 (IQR=3) = 53.5; likewise we
will calculate the upper fence: Q₃ + 1.5 (3) = 65.5. The values within our data that lie outside of
these fences are 37, 53 & 72, 78, 93. The representation of these graphs did not surprise me,
when observing the counts (total of skittles per bag) when entering the data I knew that there
would be outliers by viewing the most common values being entered, pinpointing those that “did
not belong” was easy. I assumed that the data would be skewed to the right because most bags
would contain about the same amount of skittles and not have very much variation to be able to
create a bell-shaped graph. Overall, the data of the class does in fact represent what I observed
from my own bag. Since the total of my bag is included in the “common” values within the data,
the information did conform to what I believed the results were going to be. However, both of
the extreme outliers did catch me off guard. Yet, overall the data seems to represent my thoughts
and beliefs of what it would look like.
Taking all of this information into consideration, we must now take on the question whether
this data is categorical or quantitative. Within the practice of statistics we encounter two types of
data. Categorical and Quantitative. Categorical data deals with being categorized, or is sorted
though qualities. It does not make sense to perform any sort of mathematical process (ex. Add or
subtract) towards this data because it is non-numerical. Some examples include: gender, zip
codes, nationalities, colors, etc. When wanting to summarize this data within a graph there are
certain types that can be used due to the type of data collected. The types of graphs that can be
done for such are the following: Bar charts, Pareto charts, and Pie charts. The characteristic that
these graphs hold are that the data presented within them can be placed and organized into
categories. When thinking of graphs that would not be appropriate for categorical data are:
histograms, dot plots, boxplots, and stem and leaf plots. These graphical summaries all require
numerical data, something that the categorical data cannot provide. Now, when data is collected
and it is referred to as quantitative, the data is known to be numerical or in other words
countable/measurable. This data can summarizes numerical information that can be
mathematically processed by addition, and subtraction. Since this data involves numerical
properties that can be added or subtracted there are certain graphs that align with the respective
data. These include: Histograms, Dot plots, Stem and leaf plots, and boxplots. This data
summarizes the amounts of the respective information such as mean (average) temperature,
heights, age etc. However the types of graphs that would not fit within the quantitative data are
bar charts, pie charts, and pareto charts, because, the information cannot be put into categories.
Which then concludes that our data for Skittles is quantitative.
Knowing everything we now know, we are able to construct a confidence interval.
A confidence interval is just as it sounds; a certain interval that may contain the real or true
proportion of skittles with a certain confidence that may have other samples have the result
within the interval. The main purpose for a confidence interval is to give a range of plausible
values for an outcome within a certain scenario. It allows us to give a guess as to what to expect
when opening a bag of skittles. This leads us into step four: testing and drawing a conclusion.
For this specific project we were asked to construct intervals for two different numerical values,
them being: the proportion of yellow candies in a Skittles bag and the average number of Skittles
per 2.17 oz bag. Below is a step by step process to complete both confidence intervals:
We will use the following statistics and proportions:

- Statistics
o Min=37
o Q1= 58
o Med=61
o Q2=61
o Max=93
o Mean = 60.3
o Standard Dev. = 7.8
- Proportions
o Yellow = 20.4%
o Red = 19.1%
o Purple = 21.2%
o Green = 19.5%
o Orange = 19.8%
Construct a 95% Confidence Interval Estimate for the population proportion of yellow
candies.
o The formula used to compute a confidence interval for proportions is the
following:

o Z represents the critical value on a standard normal distribution. To find the test
statistic we will use the inverse normal function on our calculator.
 Invnorm(.975 , 0 , 1)
1.959963987 or 1.96

o Now we’ll insert all of our information into our formula.


 When calculating the second part of the formula it results in: 0.065
o We will now both add and subtract from the observed proportion:
 0.204 – 0.065 = 0.139
 0.204 + 0.065 = 0.269
o The interval estimate is (0.139, 0.269)

- Now we must interpret the results:


o I am 95% confident that the true proportion of yellow candies within a bag of
skittles lies between 0.139 and 0.269.
Construct a 95% Confidence Interval Estimate for the population mean of number of
candies.
o The formula to compute a confidence interval for population mean is the
following:

o T represents the critical value of a normal t-distribution. In order to find this value
we will use the inverse t function on our calculator.
 invT ( 0.975, 37)
2.026192447 or 2.026
o Now we will insert our information into out formula
 When calculating the second part of the formula we get: 2.56
o We will now add and subtract the result from our sample mean:
 60.3 – 2.56 = 57.74
 60.3 + 2.56 = 62.86
o The interval estimate is (57.74, 62.86)
- Now we will interpret the results:
o I am 95% confident that the true mean number for skittle candies lies between
57.74 and 62.86 candies per bag.
We can expect to have a the proportion of yellow skittles to be within 13.9% and 26.9%
out of all of the skittles. And we can expect to have at least 57 to 62 candies per bag. Confidence
intervals however are only confident to an extent. Since we are only 95% confident within our
calculations there is a possibility that we get values outside of our perspectives due to having that
5% uncertainty.
All in all, this project has now summarized some of the basic concepts and tests needed to begin
a small path into the course of statistics. Everywhere we go, everywhere we look, we are
surrounded by statistics. The billboards regarding drinking and driving, news on certain foods
and diabetes among much more. Statistics are the foundation to major research projects going on
each and every day. There is a possibility to conduct a statistical experiment on pretty much
anything you want. Just go out there are look for your next statistical encounter or curiosity.

S-ar putea să vă placă și