Sunteți pe pagina 1din 15

!

SKITTLES Term Project


Calculated Proportions of Each Candy Color in Class Sample (sample
size = 1378 total candies):
Color

Count

Proportion

Red

296

0.215

Orange

251

0.182

Yellow

261

0.189

Green

283

0.205

Purple

287

0.208



Pie Chart:

!2



Pareto Chart:

!3
The class data represents a random sample of 23 bags of Skittles, each with varying numbers of
colored candies. It is true that each 2.17 ounce bag of Skittles in a population of all the Skittles
bags in the world did not have an equal chance of being selected by one of the 23 students in our
class; we also do not know exactly how Skittles are made or where they come from. However, it
is doubtful that any colored candies are counted when the bags are filled at a large Skittles
manufacturing plant. With that said, it should be considered that the chances of any one of the
2.17 ounce bags of Skittles being filled with a random variety of colors is the same.
In this instance, the population would be a collection of every 2.17 ounce bag of Skittles. Our
class data is a sample of 23 bags of Skittles, which is a sub-collection of the entire population of
2.17 ounce bags of Skittles that have been randomly packaged and distributed.

Color

Individual Bag Count

Total Class Count

Red

11

296

Orange

10

251

Yellow

14

261

Green

16

283

Purple

10

287

Since I have the philosophy that the combined 23 bags is a random sample and that each 2.17
ounce Skittles bag (out of a population of all the 2.17 ounce Skittles bags in the world) have an
equal chance of being filled with randomly selected colors, the graphs of this data did reflect
what I expected to see. The graph of my individual data varies from the graph of the class data
because each bag of Skittles has a random assortment of the rainbow colored candies. This is
evident because the majority of my individual candies were green, while the majority of the total
class candies were red.

Using the total number of candies in each bag in our class sample,
compute the following measures for the variable Total candies in each
bag.
(a) mean number of candies per bag: 59.9
(b) standard deviation of the number of candies per bag: 3.9
(c) 5-number summary for the number of candies per bag: Min 54, Q1 58, Median 60, Q3 61,
Max 75

!4



Create a frequency histogram for the variable Total candies in each


bag.

!5

Create a box plot for the variable Total candies in each bag.



!6

Write a paragraph about your findings about the variable Total candies
in each bag.
Based on the frequency histogram and box plot that we created from the data for the variable
total candies in each 2.17 ounce bag of skittles from the class sample of 23 bags of skittles, the
shape of the distribution is positively skewed or skewed to the right. The graphs reflect what I
expected to see since 21 of the 23 bags of skittles contained between 55 and 65 candies, one bag
contained 54 candies, and one bag contained 75 candies. The overall data collected by the class
agrees with my own data from a single bag of skittles. This is because my single bag contained
61 candies and the majority of the class sample bags contained between 55 and 65 candies. There
were a total of 1378 candies in the class sample and 1249 of the candies came from bags that
contained between 55 and 65 candies.

In a half page, explain the difference between categorical and


quantitative data.
According to our textbook, Categorical (or qualitative or attribute) data consist of names or
labels that are not numbers representing counts or measurements and Quantitative (or
numerical) data consist of numbers representing counts or measurements (Triola, 2014, p. 17).
Graphs such as pie charts and Pareto charts make sense for categorical data. This is because these
graphs contain names or labels that represent measurements. The pie chart and Pareto chart that
we created in the term project 2 group portion assignment are good examples of how categorical
data may be represented in graph form. The pie chart showed the percentages of colors of
candies in the total class sample and the Pareto chart showed the frequencies of each color of
candies from the class sample. Graphs such as histograms and box plots are not useful when
representing categorical data because they use numbers to represent counts or measurements. On
the other hand, histograms and box plots make sense for representing quantitative data as they
use numbers to represent counts or measurements. The histogram and box plot our group created
to represent the variable total number of candies in each bag for the class sample show this well.
The histogram showed the frequencies of the number of candies from each bag and the box plot
showed specific values such as the minimum, Q1, median, Q3, and maximum that were
calculated from the total number of candies for each bag taken from the class sample. Pie charts
and Pareto charts do not make sense for quantitative data because they use names or labels,
rather than numbers.
As mentioned above, calculations involving frequencies or percentages of a name or label make
sense for categorical data because they do not use numbers. It is not possible to calculate a 5number summary for categorical data since it consists of names or labels. On the other hand,
calculations involving frequencies of counts or 5 number summaries make sense for quantitative
data. Calculating a percentage of a label or name doesnt make sense for quantitative data
because its data consists of numbers.

!7

Construct a 99% confidence interval estimate for the proportion of


yellow candies.



Requirements are that 1) the sample is a simple random sample; 2) the conditions for the
binomial distribution are met meaning there are a fixed number of trials, the trials are
independent, there are two categories of outcomes, and the probabilities remain constant for each
trial; and 3) there are at least five successes and five failures: since population proportions p and
q are unknown, their values are estimated using the sample proportion. This allows us to verify
that np is greater than or equal to 5 and nq is greater than or equal to 5. Thus, the normal
distribution is a suitable approximation to the binomial distribution (Triola, 2014, p. 330).

!8

Construct a 95% confidence interval estimate for the population mean


number of candies per bag.



Requirements are that 1) the sample is a simple random sample, and 2) either or both of these
conditions is satisfied: the population is normally distributed or n > 30 (Triola, 2014, p. 344).

!9

Construct a 98% confidence interval estimate for the population


standard deviation of the number of candies per bag.



Requirements are 1) the sample is a simple random sample, and 2) the population must have
normally distributed values (Triola, 2014, p. 364).

!10

Discuss and interpret the results of each of your three interval estimates.
Confidence interval for population proportion of yellow candies: the results indicate that we
can be 99% confident that the proportion of yellow candies in the Skittles population lies
between 0.162 and 0.216.
Confidence interval for population mean number of candies per bag: the results indicate that
we can be 95% confident that the mean number of candies per bag in the Skittles population
lies between 58.1961 and 61.6039.
Confidence interval for population standard deviation of the number of candies per bag: the
results indicate that we can be 98% confident that the standard deviation of the number of
candies per bag in the Skittles population lies between 2.911 and 5.982.
The best point estimate of a population proportion is the sample proportion. While this is
considered to be a good estimate, it is only a single value so it does not tell us how good the
estimate is. The confidence interval (CI) is obtained by calculating a range of values, which helps
us to better estimate the value of a population parameter such as proportion, mean, and standard
deviation. The CI also takes into account confidence level (i.e. 90%, 95%, etc.), which gives us a
more precise range of values (Triola, 2014, p. 325-326).

In a paragraph, explain in general the purpose and meaning of a


hypothesis test.
The course textbook defines a hypothesis used in statistics as a claim or statement about a
property of a population and a hypothesis test, also known as a test of significance, as a
procedure for testing a claim about a property of a population (Triola, 2014, p. 382). The
properties include population parameters such as proportion, mean, and standard deviation. The
steps involved in creating hypothesis tests gives us a better understanding of their meaning and
purpose. The first step involves identifying a null hypothesis and alternative hypothesis. A null
hypothesis is formed in step two by equating the value of the population parameter to the
claimed value. A alternative hypothesis is formed in step three by stating that a value of the
population parameter differs from the null hypothesis; the value of the population parameter is
less than, greater than or unequal to the null hypothesis. In step four, the significance level must
be identified so we can distinguish between results that are likely to occur by chance and those
that are unlikely to occur by chance. Step five involves identifying the test statistic and sampling
distribution that are relevant to the test. In step six, the value of the test statistic is found which
allows us to determine the P-value or critical value(s). In step seven, we decide to either reject
the null hypothesis or fail to reject the null hypothesis by using either the P-value method or
critical value method. Lastly, in step eight, the decision is restated in simple terms (i.e. there is
sufficient evidence to support the claim/warrant rejection of the claim or there is not sufficient
evidence to support the claim/warrant rejection of the claim) that are understood by people who
do not have a knowledge of statistical terms and procedures (Triola, 2014).

!11

Use a 0.05 significance level to test the claim that 20% of all Skittles
candies are red, using the entire class data set as your sample.


Because we fail to reject the null hypothesis, we fail to reject the claim that 20% of all Skittles
candies are red. There is not sufficient evidence to warrant rejection of the claim that 20% of
Skittles candies are red.

!12

Use a 0.01 significance level to test the claim that the mean number of
candies in a bag of Skittles is 55, using the entire class data set as your
sample.


For this hypothesis test, I used my TI-83 Plus to calculate the P-value:


As seen above, the P-value was calculated to be 0.000005 (I rounded to six decimal places for
the purpose of showing that the value is equal to the above calculation). Since the P-value is less
than the significance level of 0.01, we reject the null hypothesis. There is sufficient evidence to
warrant rejection of the claim that the mean number of candies in a bag of Skittles is 55.

!13

In detail, discuss how your samples meet (or fail to meet) the
requirements for performing these hypothesis tests.
The requirements for testing a claim about a population proportion include 1) the sample
observations are taken from a simple random sample, 2) the conditions for a binomial
distribution are satisfied, meaning there are a fixed number of independent trials, each with
success and failure outcomes, and 3) the conditions np is greater than or equal to 5 and nq is
greater than or equal to 5 are satisfied (Triola, 2014, p. 400). The class sample meets the first
requirement in that it is a simple random sample; every 2.17 ounce bag of Skittles has the same
chance of being filled with a variety of colors and the 1378 candies were taken from a random
sample of 23 bags of Skittles. The conditions for a binomial distribution are satisfied because
there are a fixed number of independent trials (in this case, n = total number of Skittles candies
in class sample = 1378) and each trial has two outcome categories which include success or
failure (in this case, successes = 296 red candies and failures = 1082 candies in colors other than
red). The third condition is met because np is greater than or equal to 5 (in this case np = (1378)
(0.20) = 275.6, which is greater than 5) and nq is greater than or equal to 5 (in this case nq =
(1378)(0.80) = 1102.4, which is greater than 5).
The requirements for testing a claim about a population mean include 1) the sample is a simple
random sample and 2) either or both of the two following conditions is satisfied: the population
is normally distributed or n > 30 (Triola, 2014, p. 412). The class sample met these requirements
because it is a simple random sample; each class member purchased a random 2.17 ounce bag of
Skittles. The sample size (n) was less than 30, but the data was normally distributed and only one
of these conditions had to be satisfied to meet the requirements.

Discuss and interpret the results of each of your two hypothesis


tests.
In the first hypothesis test, I used a 0.05 significance level to test the claim that 20% of all
Skittles candies are red. The P-value of 0.1646 was greater than the significance level so I failed
to reject the null hypothesis that p = 0.20. Since this original claim includes an equality, the
conclusion is that there is not sufficient evidence to warrant rejection of the claim that 20% of
Skittles candies are red. The class sample data is consistent with the hypothesis.
In the second hypothesis test, I used a 0.01 significance level to test the claim that the mean
number of candies in a bag of Skittles is 55. The P-value of 0.000005 was less than the
significance level so I rejected the null hypothesis that the mean = 55. Since the original claim
includes an equality, the conclusion is that there is sufficient evidence to warrant rejection of the
claim that the mean number of candies in a bag of Skittles is 55. The class sample data is not
consistent with the hypothesis.

!14

Reflective Writing
The Skittles term project has taught me how to conduct a statistical study. It has shown me that
real life data can be analyzed and proven to be statistically and/or practically significant. The
group aspect of the term project was beneficial in that it reinforced the importance of working
with other people to analyze data, confirm results, and share perspectives to improve overall
understanding. I have applied the mathematics and statistics skills learned in this course to the
term project and I will continue to apply them to other courses and areas of my life.
I am currently completing the online RN to BSN program through the University of Utah
College of Nursing while working as a full time registered nurse. A huge emphasis is placed on
critical thinking and evidence-based practice within the field of nursing and within my RN to
BSN coursework. This course has taught me about different types of data, sampling methods,
and studies that are conducted. According to Triola, statistical thinking involves critical thinking
and the ability to make sense of the results (2014, p. 11). The Skittles term project has guided
me through the process of effectively preparing data, analyzing data, and drawing conclusions
based on the study results. These newly refined skills have proven to be useful in my nursing
coursework and will continue to be invaluable in my current and future nursing career.
Throughout the course of this project, I have had the opportunity to improve upon my problemsolving skills. I have learned how to create and analyze different graphs/charts, generate
confidence intervals, and perform hypothesis tests. As a result, I feel like I now have a better
ability to determine whether study results are reliable and valid. I am grateful for the skills I have
acquired in this course and from this term project.

!15
Reference

Triola, M.F. (2014). Elementary Statistics. 12th edition. Boston, MA: Pearson Education Inc.

S-ar putea să vă placă și