Documente Academic
Documente Profesional
Documente Cultură
2. Descriptive Statistics
Variable classes: >class(gender) Displaying categorical data >table(gender) >table(gender, hand) >prop.table(table(gender, hand), 1) > barplot(table(gender),col="lightgreen",main= "Barplot of Gender") > barplot(table(gender,hand),beside=TRUE, main= "Barplot of Gender")
>barplot(table(gender,hand),beside=FALSE, main= "Barplot of Gender")
Question 1: What was the proportion of lefties in our population we are examining?
Question 2: Now, suppose I have the hypothesis that the true proportion of lefties in our population was 0.10. Write down this hypothesis formally: H0 : Ha : Question 3: We are now going to take a single random sample of size 400. Record the proportion of lefties of this sample:
> samp = sample(hand,400,rep=F) > prop.table(table(samp))
Question 4: Use your random sample size (n = 400), your sample proportion of lefties to construct a 95% confidence interval. Hint: use z* = 1.96 for the critical value.
Question 5: Based on this confidence interval what is your conclusion to my claim that the true proportion of lefties for the whole population? Question 6: Again using your sample results, calculate the z statistic. Find the P-value. What conclusions do you reach pertaining to the hypothesis. Did you get the same conclusion as when you used the confidence interval to run this test?
Question 1: What was the average height for the students who filled out the survey? = Question 2: Now, suppose I have the hypothesis that the true average height is the answer got in Question 1. Write down this hypothesis formally:
2
H0 : Ha : Question 3: We are now going to take a single random sample of size 60. Record the mean and standard deviation of this sample:
> samp = sample(height,60,rep=F) > mean(samp) > sd(samp)
Question 4: a) Use your random sample size (n = 60), your sample mean and sample standard deviation to construct a 95% confidence interval. Hint: use > qt(0.975; df = n - 1) to find the critical value.
> qt(0.975, df = 59)
b) Use your random sample size (n = 60), your sample mean and sample standard deviation to construct a 90% confidence interval. c) compare between the two confidence intervals. Why you think they are different? Question 5: Based on this confidence interval what is your conclusion to the claim that the true average was your answer for Q1? Question 6: Again using your sample results, calculate the t statistic. Find alpha and Find a P value. What conclusions do you reach pertaining to the hypothesis. Did you get the same conclusion as when you used the confidence interval to run this test?
Compare between the two distributions above Now let's perform the goodness of fit test:
> chisq.test(swine, p=expected/619)
or:
> null.probs <- expected/619 > chisq.test(swine, p=null.probs)
b) Test for Independence: Here we are testing whether or not the dominant hand is independent of gender. To do so, we perform a Chi-square test: Question 9:
> lab2.chi=chisq.test(hand,gender) > names(lab2.chi)
To see the expected values: You can calculate it using the formula or:
> lab2.chi$expected
To see the residual values: You can calculate it using the formula or:
> lab2.chi$residuals
The residuals calculated are the Pearson residuals i.e. (observed - expected) / sqrt(expected). You can examine these and easily pick out which are the most important associations (and the direction). You do not actually need to type the full command to see the components of the chisquared test. After the $ sign you can type a short version and as long as it is unique it will be interpreted e.g.
> lab2.chi$obs > lab2.chi$exp > lab2.chi$res Question 10: Do you think that hand and gender variables are independent? Is your conclusion consistent with what you guessed using stacked bar-chart in part 2:
> barplot(table(gender,hand),beside=F, main= "Barplot of Gender")
Test for Independence using a summary table: Political Affiliation and Music Preference example from your powerpoint slides: Democrat Pop Classic Rock Other 70 34 21 Republican 52 57 16
> data<-matrix(c(70,52,34,57,21,16), ncol=2,byrow=T) > data > barplot(data) > chisq.test(data) Code for graphing a Chi-Squared Distribution: (Degrees of freedom =5)
> > > > > > > > > > > > x=seq(0,20,length=200) y=dchisq(x,df=5) y=dchisq(x,5) plot(x,y,type="l",lwd=2,col="red") x=seq(0,20,length=200) y=dchisq(x,5) plot(x,y,type="l", lwd=2, col="blue") x=seq(0,1.54,length=200) y=dchisq(x,5) polygon(c(0,x,1.54),c(0,y,0),col="gray") pchisq(1.54,df=5) qchisq(0.09159182,df=5)
Good Luck