Sunteți pe pagina 1din 6

Cameron Jackson

12/4/17
Skittles Research Project

In this project I aim to look at the differences between categorical and quantitative data.
I will do this by using data from the class on how many skittles of each color were in a regular
sized bag. I will then find the mean, standard deviation, five number summary, and make charts
to analyze the data so that I can interpret the variations in bags of skittles. The data, with
myself highlighted, is in the relative frequency table below.

Student Red Orange Yellow Green Purple Total Relative


Freque
ncy

1 Aylin 12 11 12 13 11 59 0.033
Ayala S

2 Sonia C 13 12 8 12 16 61 0.034

3 Aaron C 10 12 15 13 8 58 0.032

4 Brittany 12 16 12 16 5 61 0.034
C

5 Alicia D 18 9 10 14 7 58 0.032

6 Alex E 9 16 9 14 14 62 0.034

7 Erika G 5 23 15 10 10 63 0.035

8 Camero 9 12 13 14 15 63 0.035
nJ

9 Brianna 11 8 17 11 12 59 0.033
M

10 Maria M 6 15 13 13 16 63 0.035

11 James 15 15 10 5 14 59 0.033
N

12 Justin N 13 14 4 17 12 60 0.033

13 Thomas 12 9 11 10 12 54 0.030
P

14 Kharki 13 8 11 15 10 57 0.032
R

15 Jennifer 17 10 15 10 10 62 0.034
R

16 Stevie 9 12 13 11 13 58 0.032
S
Cameron Jackson
12/4/17
Skittles Research Project

17 Kristin 18 13 3 20 8 62 0.034
S

18 Bradley 12 5 14 14 6 51 0.028
T

19 Joshua 16 12 10 11 9 58 0.032
V

20 Sandra 12 11 12 13 11 59 0.033
Z

21 Elia F 14 12 6 9 23 64 0.035

22 Natoshi 9 20 7 12 11 59 0.033
aI

23 Hannah 17 14 10 13 9 63 0.035
M

24 Maria R 13 10 12 15 15 65 0.036

25 Mustap 10 15 12 13 15 65 0.036
ha S

26 Jennifer 10 11 14 16 10 61 0.034
S

27 Laura S 14 9 15 12 11 61 0.034

28 Sabrina 9 9 10 13 18 59 0.033
S

29 Guiller 6 9 15 14 14 58 0.032
mo V

30 Natalie 16 7 15 9 15 62 0.034
W

Total 360 359 343 382 360 1804 1

Relative 0.200 0.199 0.190 0.212 0.200 1 -


Freque
ncy
Cameron Jackson
12/4/17
Skittles Research Project

First I focused on the categorical data of the skittles colors and made a pie and pareto
chart using the data, all shown above. The graphs turned out about what I expected to see,
with the pie chart being almost split evenly and pareto chart having only a little variation in color.
Each color took about a fifth of the bag and thats to be expected. However, compared to my
single bag I started with, there was much more variation with only 9 red skittles in the bag out of
the 63. This goes to show how samples must be large to accurately reflect a whole population,
and with thirty bags of skittles in the data I trust we have a pretty accurate representation of the
population of skittles.

Average Skittles 60.1

Standard Deviation 3.07

Minimum 54

Q1 58

Median 60.5

Q3 62

Maximum 65

Outlier 51
Cameron Jackson
12/4/17
Skittles Research Project

Next, I shifted my focus to the quantitative data of the number of candies per bag with a
histogram and boxplot shown above. Out of the thirty bags of skittles, there was one outlier at
51 skittles pulling the mean down by a little bit. The distribution actually surprised me here,
with the histogram having a left shift and an outlier on the boxplot. I had assumed there would
be a standard bell shaped distribution in the number of candies in each bag, but I was wrong.
This was probably because my bag of skittles was actually three above average with sixty-three
of them.
Overall this part of the project helped me see a real world example of how to use
statistics and interpret them. I now know categorical data and quantitative data must be treated
differently, with pie charts and bar charts for categorical data and boxplots and histograms for
quantitative data to interpret them effectively, since the quantitative data is much more useful
and comparable. This is because categorical data is nominal, having no real value, while this
quantitative data is a ratio with the highest degree of value. Therefore you use much simpler
calculations to compare nominal data like percentages and pie charts, but those arent enough
to effectively look at ratio data. There is quite a difference between the two and how you can
use them to interpret data.

Confidence intervals and Hypothesis Testing

A confidence interval is an estimate that a said value like mean or standard deviation is
truly between two points. This lets us estimate what the data would like for the whole population
and not just for our sample. We can see this in our skittle data right here with a few intervals i
have compiled on three different values.
First is the 99% confidence interval for the true proportion of yellow candies which is
(0.166,0.214). This comes from a margin of error of .0237 and a sample proportion of .19. This
means that we are 99% confident the proportion for the whole population of skittles bags is
between .166 and .214. Were not exactly sure where but we estimate it is within those bounds.
Secondly is the 95% confidence interval estimate for the true mean number of candies
per bag which came out to be (58.95, 61.25). This is calculated from a sample mean of 60.1
candies per bag, a standard deviation of 3.07, and sample size of 30. Therefore we are 95%
confident the population mean of skittles per bag is between 58.95 and 61.25.
Last is the 98% confidence interval estimate for the true standard deviation of candies
per bag which came out to be (2.35,4.38). Calculated using numbers stated above, it means we
are 98% confident that the population standard deviation of candies per bag is between 2.35
and 4.38.

Hypothesis tests are used to test a claim about a property of a population. So pertaining
to our skittle data we will look at two claims. The first is that 20% of all skittles are red and we
tested that using a .05 significance level. Finding a test statistic of -.047 we dont have enough
evidence to reject the null hypothesis that 20% of all skittles are red, since its outside of the
area of a type I error, and accept that claim.
Cameron Jackson
12/4/17
Skittles Research Project

For the second test however we looked at if the mean number of candies per bag was
55. This test came out to actually reject the null hypothesis so the mean is not 55. We found
this by finding a test statistic of 9.099 which was within the area of a .01 significance level which
gave us the evidence to reject our initial hypothesis, and accepting the alternate that the mean
is not 55.

Now for these tests and intervals we cant completely accept that they are accurate for
the whole population. This is because of some of the conditions for doing such tests like how
our class might not be a completely random sample of the whole population, or our sample size
might not be large enough to reflect all skittle bags. We are also assuming that our data is
approximately normal with no extreme outliers affecting the data. So overall if I were to do this
research again I might do it in a larger scale with more bags from random places so to not
include any bias. That would ensure an approximate normal spread of the data and more
accurate representation of the whole population of skittle bags. Ive concluded that, if done
correctly, confidence intervals and hypothesis tests are great way to estimate what the whole
population may look like from just a sample. Theres no way to truly test a population so Im
glad there are ways to look at it from a smaller scale.
Cameron Jackson
12/4/17
Skittles Research Project

Reflection

As I look back on this project the main thing I remember is how much I loved

having a real world application of math to try out. Seeing how these formuli and skills

can be used for much more than just textbook story problems was very interesting.

Especially since I am planning on going into a field of engineering in the future, which

would require a lot of real world applications of math and science. The gathering of real

data, use of math on those numbers, and interpretation of the results are all skills I know

I will use in my future. Most likely my profession in the future will completely revolve

around a process similar to that and I know many classes I take to get there will involve

these techniques. As I grow as an amatuer engineer these skills will come in need more

and more as they become more challenging. I am excited to see where that road takes

me. Overall I actually really enjoyed the project and what was required to complete it. I

have always like problem solving and math in particular so putting that to the test with a

real world question was very intriguing. Even though the problem was just looking at the

statistics of a candy and our sample size probably couldve been larger for a more

accurate reflection of the population, I found the project to simulate what it would be like

to do a study like this for real. Using real data to find a real conclusion would be very

interesting and I would love to try it some time in my future if opportunities arise. I know

I will continue to use the skills I have learned in this class and build upon them. This

experience with a real world application of math just grew my passion for problem

solving and Im glad i got the chance to try it out. Hopefully I can use this experience to

build upon my previous knowledge and move toward a career doing things like this.

S-ar putea să vă placă și