Documente Academic
Documente Profesional
Documente Cultură
Zeph Smith
Fall 2014
Math 1040
Zeph Smith
Fall 2014
appropriate. I have learned how to test a sample or population and determine its confidence
level or claims levied upon that data.
Math 1040
Zeph Smith
Fall 2014
Term Project #1
Red
11
Orange
13
Yellow
12
Green
10
Purple
15
Term Project #2
An introduction to the graphs you are about to see are the results of term project in my
Statistics class. We were given the task of buying a 2.17 oz. bag of skittles and counting them and then
submitting them back to our teacher. The first two graphs represent categorical data which can be
quickly viewed and assumed that the bags contained roughly what you would expect to eat when
purchasing a bag of skittles. The last two charts are quantitative data charts (Box-Plot, Histogram) that
represent how fair the results were for the categorical data. These will be talked about below each
graph. My number of skittles counted were: Red=11, Orange=13, Yellow=12, Green=10, Purple=15, Total
of 61 Skittles.
Red,
[VALUE], .205
1
2
3
Green,
[VALUE], .207
Orange,
[VALUE], .183
Yellow,
[VALUE], .195
4
5
Math 1040
Zeph Smith
Fall 2014
3000
2500
2435
2000
1500
1000
500
446
500
474
503
512
0
TOTALS for each Color
Series1
Series2
Linear Trendline
Math 1040
Zeph Smith
Fall 2014
Math 1040
Zeph Smith
Fall 2014
Here is an example of a Histogram. This example is right skewed. Red Bar indicates what the
Normal Distribution of Candies should look like.
When looking at this graph and the Box-Plot I could tell one or two people had to have bought a bag of
skittles larger than requested. It still shows that there is variation in quantity of candy given. If you look
at the red line in the Histogram; it displays what the normal distribution should look like. If the three
outliers above 90 skittles was not there it would be a Normal Distribution. I realize that this is only a
Math 1040
Zeph Smith
Fall 2014
sample of 38 bags and that observing a larger amount of samples could very well make this a Normal
Distribution but after reviewing all the data it is still pretty close.
In summary comparing Categorical (Pie and Pareto Charts) and Quantitative graphs (Box-Plot,
and Histograms) can show two different factors of data. Categorical focuses on the breakdown of the
overall picture while Quantitative focuses on all data as a whole and can determine erroneous data that
does not fit impacting Categorical insights. After looking at all data available from the graphs we
determined that the outliers did not impact results too badly.
Term Project #3
Confidence Intervals - Confidence interval estimates are a range of values that are used to
estimate the true value of a population parameter. The confidence interval is useful because it
provides upper and lower bounds to determine if numbers fall within an estimate made.
-
Observing the results of these 3 confidence intervals we can see that the usual bag of
skittles contains 64 candies with about 19.5% of them being yellow. The standard deviation
between bags is roughly 13 skittles. This means that it is not uncommon for a bag to have a
difference of 13 more candies compared to another bag. I disagree with this data because if you
were to try and cram another 1/5th more candies into a standard 2.17oz bag the packaging
would fail; causing further waste and cost to the company. This indicates some of the data from
the class was miscalculated or another size bag of candies was sampled by some students.
Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
We accept this claim because the P-Value of 0.5092 is less than the Z-Value of 0.659.
Math 1040
Zeph Smith
Fall 2014
Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
Skittles is 55.
This claim is not appropriate because we reject the Hypothesis that the mu=55 because
it does not fall within the Confidence interval of 58.286<64.1<69.914.
Our two hypothesis tests lead us to conclude that the proportion of red skittles is about
20% of a bag and that the number of candies in a bag does not vary as wide as + or - 13 candies.
The claim of 55 skittles per bag is not accepted because 55 is a low estimate of candies per bag
which leads us to believe that some students used a larger bag contrary to the instructions.
Further Evidence to support our decision not to accept the claim is the fact that our average
was 64 candies per bag.
Reflection
Conditions to use the tools of a confidence interval and hypothesis testing:
1. Data has to be a simple random sample
2. Must have sufficient numbers (At least 30)
3. The sample of your data is normally distributed
My sample definitely met these conditions my sample had 61 candies which fell within
the 99% CI. I also made sure the bag was labeled as 2.17oz and my number of candies
per color were evenly distributed between all the other sampled candies in the class
minus a few outliers.
Possible errors of the data we used could have been due to eaten candies or wrong size of bag
sampled.
We could have improved sampling by having everyone who purchased the candies bring the
bag into class for verification, or gathered other metadata like a photograph of the bag that was
to be sampled. Then we could identify outliers and either discard data or make that person
purchase another (correctly labeled) bag to be sampled.
Conclusions I have learned from this research is that it is hilarious to hear statements from all
sources (news, products, reviews) stating so called facts without even giving the audience a
sample size or stating how the data was given. If that wasnt erroneous enough they make
statements of benefits or how correct the opinion they are giving is true or unbiased.