Sunteți pe pagina 1din 8

Math 1040

Zeph Smith
Fall 2014

Statistics Term Project Reflection


By
Christopher Mabey
When I first came to class for statistics I knew it was going to be different from all the
other math that I was used to. The first thing we were asked to do was to buy a 2.17oz bag of
skittles and add up the different colors and submit the results to our teacher. Fast forward to
the end of this semester and now I know the reasons why this project was done the way it was
and why knowing sample sizes, types of samples, type of data, as well as testing the data to see
if it is correct is so important.
The benefits from what I have learned in statistics will go far as I am going to school to
be an Information System Manager. Being able to apply the skill of using graphing solutions
such as box plots and histograms will be a better tool to apply logical numbers unless I have to
make a pretty categorical graph such as a pareto or pie chart.
Hypothesis testing of a claim is something I do not think I will use often but having to be
able to calculate within a 90-99% certainty that a piece claim is true will is an awesome skill.
This skill is also useful to test claims from others that what they are saying is accurate. If my
results are more critical using the same confidence level then I have evidence to dispute what
somebody else states. This is something I feel a manager needs and now understand why
statistics is required for my major.
In conclusion to my reflection I know the requirements for a good sample. I understand
what different charts state in terms of the data collected and in what situation each chart is

Math 1040
Zeph Smith
Fall 2014
appropriate. I have learned how to test a sample or population and determine its confidence
level or claims levied upon that data.

Math 1040
Zeph Smith
Fall 2014
Term Project #1

Red
11

Orange
13

Yellow
12

Green
10

Purple
15

Total Candies in Bag


61

Term Project #2
An introduction to the graphs you are about to see are the results of term project in my
Statistics class. We were given the task of buying a 2.17 oz. bag of skittles and counting them and then
submitting them back to our teacher. The first two graphs represent categorical data which can be
quickly viewed and assumed that the bags contained roughly what you would expect to eat when
purchasing a bag of skittles. The last two charts are quantitative data charts (Box-Plot, Histogram) that
represent how fair the results were for the categorical data. These will be talked about below each
graph. My number of skittles counted were: Red=11, Orange=13, Yellow=12, Green=10, Purple=15, Total
of 61 Skittles.

Class Skittle Totals


Purple, 512,
.210

Red,
[VALUE], .205
1
2
3

Green,
[VALUE], .207

Orange,
[VALUE], .183

Yellow,
[VALUE], .195

Here we see the totals of each color of skittles in a pie chart.

4
5

Math 1040
Zeph Smith
Fall 2014

SKITTLE COLOR COMPARISON


CANDY TOTAL 38 BAGS

3000
2500

2435

2000
1500
1000

500

446

500

474

503

512

0
TOTALS for each Color
Series1

Series2

Linear Trendline

Here we see an example of a Pareto Chart.


Looking at the two categorical charts I thought everything was going to be pretty even because of the
factory having sophisticated machinery that benefits the company by making sure that each bag is filled
correctly. While putting together the chart I could see there would be some outliers but I would have to
wait and see what the categorical data would show me. My own data fell right in with these numbers.

Math 1040
Zeph Smith
Fall 2014

Here we see a Quantitative graph called a Box-Plot.


To get this graph you have to calculate Sample Size, Standard Deviation, and 5-Number summary (Min,
Max, Q1, Median, Q3).
Sample Size=38, Std. Dev=13.20, Min=45, Max=114, Q1=59.3, Median=61, Q3=62
This result shows 6 outliers that effect the overall data given in the Categorical Charts above.
With this being said it still appears that the color distribution was close to being on par with smaller
quantities of skittles inside most of the other bags.

Math 1040
Zeph Smith
Fall 2014

Sample Size (38 Bags) I

Here is an example of a Histogram. This example is right skewed. Red Bar indicates what the
Normal Distribution of Candies should look like.
When looking at this graph and the Box-Plot I could tell one or two people had to have bought a bag of
skittles larger than requested. It still shows that there is variation in quantity of candy given. If you look
at the red line in the Histogram; it displays what the normal distribution should look like. If the three
outliers above 90 skittles was not there it would be a Normal Distribution. I realize that this is only a

Math 1040
Zeph Smith
Fall 2014
sample of 38 bags and that observing a larger amount of samples could very well make this a Normal
Distribution but after reviewing all the data it is still pretty close.
In summary comparing Categorical (Pie and Pareto Charts) and Quantitative graphs (Box-Plot,
and Histograms) can show two different factors of data. Categorical focuses on the breakdown of the
overall picture while Quantitative focuses on all data as a whole and can determine erroneous data that
does not fit impacting Categorical insights. After looking at all data available from the graphs we
determined that the outliers did not impact results too badly.

Term Project #3
Confidence Intervals - Confidence interval estimates are a range of values that are used to
estimate the true value of a population parameter. The confidence interval is useful because it
provides upper and lower bounds to determine if numbers fall within an estimate made.
-

99% confidence interval for true proportion of yellow skittles is:


17.4% < 19.5% < 21.5%
95% confidence interval for the true number of skittles per bag is:
63.9915 < 64.1 < 64.2085
98% confidence interval for the true standard deviation of skittles per bag is:
10.375< 13.2 < 17.972

Observing the results of these 3 confidence intervals we can see that the usual bag of
skittles contains 64 candies with about 19.5% of them being yellow. The standard deviation
between bags is roughly 13 skittles. This means that it is not uncommon for a bag to have a
difference of 13 more candies compared to another bag. I disagree with this data because if you
were to try and cram another 1/5th more candies into a standard 2.17oz bag the packaging
would fail; causing further waste and cost to the company. This indicates some of the data from
the class was miscalculated or another size bag of candies was sampled by some students.

Hypothesis testing Hypothesis testing is for testing a claim about a property of a


population. First an assumption is made, and then we calculate the evidence which either
supports or does not support a given assumption.

Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
We accept this claim because the P-Value of 0.5092 is less than the Z-Value of 0.659.

Math 1040
Zeph Smith
Fall 2014
Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
Skittles is 55.
This claim is not appropriate because we reject the Hypothesis that the mu=55 because
it does not fall within the Confidence interval of 58.286<64.1<69.914.
Our two hypothesis tests lead us to conclude that the proportion of red skittles is about
20% of a bag and that the number of candies in a bag does not vary as wide as + or - 13 candies.
The claim of 55 skittles per bag is not accepted because 55 is a low estimate of candies per bag
which leads us to believe that some students used a larger bag contrary to the instructions.
Further Evidence to support our decision not to accept the claim is the fact that our average
was 64 candies per bag.

Reflection
Conditions to use the tools of a confidence interval and hypothesis testing:
1. Data has to be a simple random sample
2. Must have sufficient numbers (At least 30)
3. The sample of your data is normally distributed
My sample definitely met these conditions my sample had 61 candies which fell within
the 99% CI. I also made sure the bag was labeled as 2.17oz and my number of candies
per color were evenly distributed between all the other sampled candies in the class
minus a few outliers.
Possible errors of the data we used could have been due to eaten candies or wrong size of bag
sampled.
We could have improved sampling by having everyone who purchased the candies bring the
bag into class for verification, or gathered other metadata like a photograph of the bag that was
to be sampled. Then we could identify outliers and either discard data or make that person
purchase another (correctly labeled) bag to be sampled.
Conclusions I have learned from this research is that it is hilarious to hear statements from all
sources (news, products, reviews) stating so called facts without even giving the audience a
sample size or stating how the data was given. If that wasnt erroneous enough they make
statements of benefits or how correct the opinion they are giving is true or unbiased.

S-ar putea să vă placă și