Sunteți pe pagina 1din 10

Fartun Issaq

Tiffany Hilton
Math 1040
December 5, 2018

The purpose of this project was to use different statistical concepts that we learned

through semester and use these concepts together in a different types of application. The students

in the math 1040 class were instructed to buy 2.17oz regular skittle. From there we applied the

aspects of statistics we learned in classes. From there we collected the data, created graphs to

visually illustrate the dataset, we examined the distribution and determining if it is normal or

not, we also worked with confidence intervals, and used hypothesis tests to determine whether or

not it claimed was true or it could be rejected.

In this project I haven’t learned much from statistics class because I think statistics isn’t a

concept I apply in my life right now b b . Throughout the semester it may seem like I was doing

good but when it came to concluding real life situations it was very hard to apply the concept in

math. Especially when it came to the linear regression equation, I had a hard time concluding the

intercept and slope. But it wasn’t only the conclusion for linear regression equation that I had

difficult with, when it came to the hypothesis conclusion it was very hard to conclude type I and

type II error. Solving for equation was very simple and I knew how to apply the formulas which

I learned in my previous math classes. This class however helped apply my concepts to solve

story problems, in my previous math classes it is very hard to apply story problems because I

wasn’t reading them carefully. In this course probability means likelihood something to occur.

In the future but I may not realize it now but I may apply it in my future courses and the

development of my career.

Project Part 2--Descriptive Statistics


Qualitative Data

Total number of skittle :2098

Every Student in the Tiffany Hilton Math 1040 class has purchased a bag of 2.17 oz original skittles and counted
the color in their bag. The graphs above demonstrates qualitative data because they are categorical data. The color of
the candies is also characteristic, it not appropriate for numerical value, which we can conclude that it not
appropriate to discuss the shape of distribution

Quantitative Data
Quantitative data is used when you are calculating weight, height and number people in the class.Quantitative data
in this sample would be number of candies in the bag. The appropriate graphs would be histogram, pareto charts and
stem plots. Below, is a box plot and dot-plot that measure the sample which is the 2.17oz skittles. The boxplot and
dot-plot makes sense for numerical data because they show the distribution from data.

Summary statistics of the skittles

Summary statistics:
Column n Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3

Total 35 59.942857 5.2907563 2.3001644 0.38879875 60 11 54 65 59 61

Mode: 60

Upper fence:64

Lower fence: 56

The IQR= (Q3-Q1) which is (61-59) which equals 2. In order to find The upper fence (Q3+1.5 (IQR)) gives us 64
and the lower fence is (Q1-1.5(IQR) gives us 56. Which mean the outliers 54, 55, and 65.
this data of the distribution is skewed to the left. my bag of Skittles had 60 candies, it was along the mean so my
bag definitely wasn’t an outlier. The minimum number bag had 54 and 55 and lower fence was 56 which makes it an
outlier. And the max was 65 but the upper fence was 64 which that would be considered an outlier. I thought my
bag would be an outlier but I wasn’t surprised when it was along the mean

Project Part 3--Correlation and Regression

1. Can height be used to predict the number of candies that will be in a bag of
Skittles you purchase?”
Height can not used to determine the number of skittles in a bag. Skittle bags at the store are
randomly picked up by anyone. Someone that is shorter than me could have grabbed the same
bag I had. So height can't predict the number of skittles in the bag.

Response variable: number of candies per bag


Explanatory Variable:height of the person
Is there a significant relationship between the two variables?
R (correlation coefficient) = -0.1209
Critical value=0.361
ŷ=--0.0670x+64.3
Because the R= 0.1209 and critical value there is no significant relationship. The
absolute value is 0.1209, which is less than the critical value so there is no linear
relationship

My hypothesis was correct the number of skittles in the bag cannot determine the
height.
ŷ=--0.0670(63.5)+64.3=60

If some whom is 63.5 inches tall the expected amount of candies in his purchased bag should be
around 60 skittles. Since there is no relationship between neither height nor amount of skittles in
the bag it in not appropriate to use the regression equation to make predictions

1.46% of the variation in the number of skittles that is explained by the regression line
relationship with the height
R-sq = 0.0146
It inappropriate If a relationship existed and we panted to predict the number of candies inside a
bag purchased by Yao Ming who measures at 90 inches that would not be appropriate since 90
inches is outside our scope.
TOTAL HEIGHT
56 69

59 64

60 63

60 74

61 67

62 60

62 70
Correlation Coefficient (r) = -0.2191
Regression equation: ŷ= -0.0960x+66.4
Critical Value: 0.754
there is no linear relationship between X and Y since Absolute value of correlation coefficient
value is smaller than the critical value.

Part 4: Probability

Red Orange Yellow Green Purple Total


416 457 406 412 407 2097

Problem 1
a). P(green skittle)=412/2098=0.196
b). P(Not green skittle)=1-(412/2098)=0.804
c). P(red or yellow)= P(red)+P(yellow)-P(red and yellow)
=(416/2098)+(406/2098)
=(0.198)+(0.194)=.392
d).P(orange given that it’s a secondary color)= 457/1276=0.358 Secondary
colors(green, orange and purple)= 412+457+407=1276

Problem 2
Skittles from purchase bag.
Red Orange Yellow Green Purple Total
10 8 16 14 12 60

a). P( both skittles are purple with replacement)=(12/60)^2=0.04


b). P(both skittles are purple without replacement)=(12/60)*(11/59)=0.0379
c).P( first one purple and second not purple)=(12/60)*(1-12/60)=1
d).P(at least one skittle is purple)=(1-48/60)^2= .36
Problem 3:
a). P constant—>406/2098 = .1935
Independent trials—> Randomly selected
N fixed—> n=10
Two out comes —>yellow or not yellow
b). 2^nd> vars> Binompdf (10,0.2,4)=.081
c).2^nd > Vars > Binomcdf(10,0.2,2)=.6973
d). Mean=10(0.2)=1.935

(1-.1935)=.8065 =sqrt.
10(.1935)*1-.1935)=1.5606

Project Part 5--Sampling Distributions and Confidence Intervals

Assume p = the proportion of yellow candies for all Skittles = 0.2. Describe the sampling
distribution for the proportion of yellow candies for samples of 85 candies, including center,
spread, and shape
Mean=0.2
Std = sqrt(0.2)(1-0.2)/(85)=.043
shape:(85)(0.2)(1-0.2)=13.6>10
Approximately normal
n=85 is less than 5% of all the candies

Explain in general the purpose and meaning of a confidence interval-


An interval of numbers based of a point estimate that gives a range of likely values for an
unknown parameter. It is what measures the probability that a population parameter falls
between two sets values.

99% confidence interval


1-prop Z-Int
x= 406
n=2098
C-level: .99
(.1713,.2157)
With 99% confidence level the proportion of all skittles being yellow is between .171 and .216

My bag
Total yellow: 16
Total number of skittles in bag purchased: 60
16/60=0.27
The bag I purchased is not likely in the population proportions since 0.27 is higher than .171
and .216

Sampling distribution
μ = 60
σ = 2.5
the mean number of candies per bag for samples of 32 bags
2.5/sqrt(32)=.442
Shape: approximately normal since 32>30

what is the probability that a sample of 32 bags will have a mean of less than 59
candies per bag
Less than 59
Normalcdf :(1E99,59,60,2.5/sqrt(32)) = .0118
construct a 95% confidence
Stat > Test> T-Inteval(8)
T-interval
x=59.94
S=2.300
n: 35
C-Level: .95
(59.15,60.73)
Interpretation:
- The number of candies in my bag is 60. This value is included in the range for
the confidence interval of 95% since it is between 59.15 and 60.73. My bag is
likely in the population mean.
- We are 95% confident that the true mean number of candies per bag for ALL
bags of skittles is between 59.13 and 60.73
Project Part 6-Hypothesis Test

Meaning of a hypothesis test.


A hypothesis test is a procedure for testing a claim about a property of a population.
It starts with a null hypothesis and an alternative hypothesis. Then you choose the
sampling distribution that is relevant and either find a p-value or identify the critical
values and make a conclusion about the claim.

Test the claim that 20% of all Skittles candies are red.
Total red: 416
Total skittle: 2098
416/2098=.1983
a. hypotheses with correct notation
Null hypotheses HO: P=.20
Test Hypotheses H1:p ≠ .20

B. Conditions for performing the hypothesis test


1. The observations of this sample is convenience sample
2. The conditions for doing a binomial distribution are satisfied.
3. npo(1-po)>= 10: -2098(0.20)(1-0.20)=335.68>=10
4. n<= 0.05: 2908 is less than 5% of all skittles candies.
C. the test statistic and supporting work
1-PropZTest:
Po: 0.20
x: 416
n: 2098
Prop: ≠Po
P-value =0.844
Z= -.196

D. the appropriate decision about the null hypothesis and an appropriate conclusion Significance
level: 0.05
● p-value of .844 is greater than 0.05, we do not reject the null hypothesis. There is
insufficient evidence to conclude that the proportion of red skittles is not 20%.
P-value= 1-0.1431= 0.8569 Right tailed

p-value interpretation
If the proportion of red skittles is .2, there is a probability of .84 that we would obtain a
sample proportion of 416/2098=.1983 or more extreme.
E. Using values for the class data that you computed in Part 2 of the project and a 0.01
significance level, test the claim that the mean number of candies in a bag of Skittles is
more than 58.5.
1.Hypothesis test for the mean of candies in a skittle bag
Ho: mean=58.5
H1: mean >= 58.5
1.Simple Random Sample - not met, this is a convenience sample
2. n ≤ .05N - Yes, n=35 is less than 5% of all the bags of skittles in the world
3. n ≥ 30: 35≥ 30- so this is met =

Test Statistics
T-Tests:

Mu= 58.5

Xbar= 59.94

Sx=2.300

n= 35

t=3.7

P value = .0004

p-value is less than alpha, so reject Ho. There is sufficient evidence to conclude that the mean
number of candies per bag is above 58.5.

the Type I and Type II errors for this test


● Type I: Reject Ho when it's true = Conclude that the mean is greater than 58.5 when it
actually equals 58.5
● Type II: Fail to reject Ho when it's false = Fail to conclude that the mean is greater than
58.5 when it actually is greater than 58.5