Sunteți pe pagina 1din 491

Statistics II For Dummies

Visit www.dummies.com/cheatsheet/statistics2 to
view this book's cheat sheet.

Table of Contents

Introduction

About This Book


Conventions Used in This Book
What Youre Not to Read
Foolish Assumptions
How This Book Is Organized

Part I: Tackling Data Analysis and Model-


Building Basics
Part II: Using Different Types of Regression to
Make Predictions
Part III: Analyzing Variance with ANOVA
Part IV: Building Strong Connections with Chi-
Square Tests
Part V: Nonparametric Statistics: Rebels
without a Distribution
Part VI: The Part of Tens
Icons Used in This Book
Where to Go from Here

Part I: Tackling Data Analysis and Model-Building


Basics

Chapter 1: Beyond Number Crunching: The Art and


Science of Data Analysis

Data Analysis: Looking before You Crunch

Nothing (not even a straight line) lasts


forever
Data snooping isnt cool
No (data) fishing allowed

Getting the Big Picture: An Overview of Stats


II

Population parameter
Sample statistic
Confidence interval
Hypothesis test
Analysis of variance (ANOVA)
Multiple comparisons
Interaction effects
Correlation
Linear regression
Chi-square tests
Nonparametrics

Chapter 2: Finding the Right Analysis for the Job

Categorical versus Quantitative Variables


Statistics for Categorical Variables

Estimating a proportion
Comparing proportions
Looking for relationships between
categorical variables
Building models to make predictions

Statistics for Quantitative Variables

Making estimates
Making comparisons
Exploring relationships
Predicting y using x

Avoiding Bias
Measuring Precision with Margin of Error
Knowing Your Limitations

Chapter 3: Reviewing Confidence Intervals and


Hypothesis Tests
Estimating Parameters by Using Confidence
Intervals

Getting the basics: The general form of a


confidence interval
Finding the confidence interval for a
population mean
What changes the margin of error?
Interpreting a confidence interval

Whats the Hype about Hypothesis Tests?

What Ho and Ha really represent


Gathering your evidence into a test
statistic
Determining strength of evidence with a
p-value
False alarms and missed opportunities:
Type I and II errors
The power of a hypothesis test

Part II: Using Different Types of Regression to Make


Predictions

Chapter 4: Getting in Line with Simple Linear


Regression

Exploring Relationships with Scatterplots and


Correlations

Using scatterplots to explore relationships


Collating the information by using the
correlation coefficient

Building a Simple Linear Regression Model

Finding the best-fitting line to model your


data
The y-intercept of the regression line
The slope of the regression line
Making point estimates by using the
regression line

No Conclusion Left Behind: Tests and


Confidence Intervals for Regression

Scrutinizing the slope


Inspecting the y-intercept
Building confidence intervals for the
average response
Making the band with prediction intervals

Checking the Models Fit (The Data, Not the


Clothes!)

Defining the conditions


Finding and exploring the residuals
Using r2 to measure model fit
Scoping for outliers

Knowing the Limitations of Your Regression


Analysis

Avoiding slipping into cause-and-effect


mode
Extrapolation: The ultimate no-no
Sometimes you need more than one
variable

Chapter 5: Multiple Regression with Two X


Variables

Getting to Know the Multiple Regression


Model

Discovering the uses of multiple


regression
Looking at the general form of the
multiple regression model
Stepping through the analysis

Looking at xs and ys
Collecting the Data
Pinpointing Possible Relationships
Making scatterplots
Correlations: Examining the bond

Checking for Multicolinearity


Finding the Best-Fitting Model for Two x
Variables

Getting the multiple regression


coefficients
Interpreting the coefficients
Testing the coefficients

Predicting y by Using the x Variables


Checking the Fit of the Multiple Regression
Model

Noting the conditions


Plotting a plan to check the conditions
Checking the three conditions

Chapter 6: How Can I Miss You If You Wont


Leave? Regression Model Selection

Getting a Kick out of Estimating Punt Distance

Brainstorming variables and collecting


data
Examining scatterplots and correlations
Just Like Buying Shoes: The Model Looks
Nice, But Does It Fit?

Assessing the fit of multiple regression


models
Model selection procedures

Chapter 7: Getting Ahead of the Learning Curve


with Nonlinear Regression

Anticipating Nonlinear Regression


Starting Out with Scatterplots
Handling Curves in the Road with Polynomials

Bringing back polynomials


Searching for the best polynomial model
Using a second-degree polynomial to pass
the quiz
Assessing the fit of a polynomial model
Making predictions

Going Up? Going Down? Go Exponential!

Recollecting exponential models


Searching for the best exponential model
Spreading secrets at an exponential rate

Chapter 8: Yes, No, Maybe So: Making Predictions


by Using Logistic Regression

Understanding a Logistic Regression Model

How is logistic regression different from


other regressions?
Using an S-curve to estimate probabilities
Interpreting the coefficients of the logistic
regression model
The logistic regression model in action

Carrying Out a Logistic Regression Analysis

Running the analysis in Minitab


Finding the coefficients and making the
model
Estimating p
Checking the fit of the model
Fitting the movie model

Part III: Analyzing Variance with ANOVA

Chapter 9: Testing Lots of Means? Come On Over


to ANOVA!

Comparing Two Means with a t-Test


Evaluating More Means with ANOVA
Spitting seeds: A situation just waiting for
ANOVA
Walking through the steps of ANOVA

Checking the Conditions

Verifying independence
Looking for whats normal
Taking note of spread

Setting Up the Hypotheses


Doing the F-Test

Running ANOVA in Minitab


Breaking down the variance into sums of
squares
Locating those mean sums of squares
Figuring the F-statistic
Making conclusions from ANOVA
Whats next?

Checking the Fit of the ANOVA Model

Chapter 10: Sorting Out the Means with Multiple


Comparisons

Following Up after ANOVA


Comparing cellphone minutes: An
example
Setting the stage for multiple comparison
procedures

Pinpointing Differing Means with Fisher and


Tukey

Fishing for differences with Fishers LSD


Using Fishers new and improved LSD
Separating the turkeys with Tukeys test

Examining the Output to Determine the


Analysis
So Many Other Procedures, So Little Time!

Controlling for baloney with the


Bonferroni adjustment
Comparing combinations by using
Scheffes method
Finding out whodunit with Dunnetts test
Staying cool with Student Newman-Keuls
Duncans multiple range test
Going nonparametric with the Kruskal-
Wallis test

Chapter 11: Finding Your Way through Two-Way


ANOVA

Setting Up the Two-Way ANOVA Model

Determining the treatments


Stepping through the sums of squares

Understanding Interaction Effects

What is interaction, anyway?


Interacting with interaction plots

Testing the Terms in Two-Way ANOVA


Running the Two-Way ANOVA Table

Interpreting the results: Numbers and


graphs

Are Whites Whiter in Hot Water? Two-Way


ANOVA Investigates

Chapter 12: Regression and ANOVA: Surprise


Relatives!

Seeing Regression through the Eyes of


Variation

Spotting variability and finding an x-


planation
Getting results with regression
Assessing the fit of the regression model

Regression and ANOVA: A Meeting of the


Models

Comparing sums of squares


Dividing up the degrees of freedom
Bringing regression to the ANOVA table
Relating the F- and t-statistics: The final
frontier

Part IV: Building Strong Connections with Chi-Square


Tests

Chapter 13: Forming Associations with Two-Way


Tables

Breaking Down a Two-Way Table

Organizing data into a two-way table


Filling in the cell counts
Making marginal totals

Breaking Down the Probabilities

Marginal probabilities
Joint probabilities
Conditional probabilities

Trying To Be Independent

Checking for independence between two


categories
Checking for independence between two
variables

Demystifying Simpsons Paradox

Experiencing Simpsons Paradox


Figuring out why Simpsons Paradox
occurs
Keeping one eye open for Simpsons
Paradox

Chapter 14: Being Independent Enough for the Chi-


Square Test

The Chi-square Test for Independence

Collecting and organizing the data


Determining the hypotheses
Figuring expected cell counts
Checking the conditions for the test
Calculating the Chi-square test statistic
Finding your results on the Chi-square
table
Drawing your conclusions
Putting the Chi-square to the test

Comparing Two Tests for Comparing Two


Proportions

Getting reacquainted with the Z-test for


two population proportions
Equating Chi-square tests and Z-tests for a
two-by-two table

Chapter 15: Using Chi-Square Tests for Goodness-


of-Fit (Your Data, Not Your Jeans)

Finding the Goodness-of-Fit Statistic

Whats observed versus whats expected


Calculating the goodness-of-fit statistic

Interpreting the Goodness-of-Fit Statistic Using


a Chi-Square

Checking the conditions before you start


The steps of the Chi-square goodness-of-
fit test
Part V: Nonparametric Statistics: Rebels without a
Distribution

Chapter 16: Going Nonparametric

Arguing for Nonparametric Statistics

No need to fret if conditions arent met


The medians in the spotlight for a change
So, whats the catch?

Mastering the Basics of Nonparametric


Statistics

Sign
Rank
Signed rank
Rank sum

Chapter 17: All Signs Point to the Sign Test and


Signed Rank Test

Reading the Signs: The Sign Test

Testing the median


Estimating the median
Testing matched pairs
Going a Step Further with the Signed Rank
Test

A limitation of the sign test


Stepping through the signed rank test
Losing weight with signed ranks

Chapter 18: Pulling Rank with the Rank Sum Test

Conducting the Rank Sum Test

Checking the conditions


Stepping through the test
Stepping up the sample size

Performing a Rank Sum Test: Which Real


Estate Agent Sells Homes Faster?

Checking the conditions for this test


Testing the hypotheses

Chapter 19: Do the Kruskal-Wallis and Rank the


Sums with the Wilcoxon

Doing the Kruskal-Wallis Test to Compare


More than Two Populations

Checking the conditions


Setting up the test
Conducting the test step by step

Pinpointing the Differences: The Wilcoxon


Rank Sum Test

Pairing off with pairwise comparisons


Carrying out comparison tests to see
whos different
Examining the medians to see how theyre
different

Chapter 20: Pointing Out Correlations with


Spearmans Rank

Pickin On Pearson and His Precious


Conditions
Scoring with Spearmans Rank Correlation

Figuring Spearmans rank correlation


Watching Spearman at work: Relating
aptitude to performance

Part VI: The Part of Tens

Chapter 21: Ten Common Errors in Statistical


Conclusions
Claiming These Statistics Prove . . .
Its Not Technically Statistically Significant,
But . . .
Concluding That x Causes y
Assuming the Data Was Normal
Only Reporting Important Results
Assuming a Bigger Sample Is Always Better
Its Not Technically Random, But . . .
Assuming That 1,000 Responses Is 1,000
Responses
Of Course the Results Apply to the General
Population
Deciding Just to Leave It Out

Chapter 22: Ten Ways to Get Ahead by Knowing


Statistics

Asking the Right Questions


Being Skeptical
Collecting and Analyzing Data Correctly
Calling for Help
Retracing Someone Elses Steps
Putting the Pieces Together
Checking Your Answers
Explaining the Output
Making Convincing Recommendations
Establishing Yourself as the Statistics Go-To
Guy or Gal

Chapter 23: Ten Cool Jobs That Use Statistics

Pollster
Ornithologist (Bird Watcher)
Sportscaster or Sportswriter
Journalist
Crime Fighter
Medical Professional
Marketing Executive
Lawyer
Stock Broker

Appendix: Reference Tables

Cheat Sheet
End User License Agreement
Dedication

About the Author

Authors Acknowledgments
About This Book
Conventions Used in This Book

monofont

What Youre Not to Read

Foolish Assumptions
How This Book Is Organized

Part I: Tackling Data Analysis and Model-


Building Basics
Part II: Using Different Types of Regression
to Make Predictions

Part III: Analyzing Variance with ANOVA

Part IV: Building Strong Connections with


Chi-Square Tests
Part V: Nonparametric Statistics: Rebels
without a Distribution

Part VI: The Part of Tens

Icons Used in This Book


Where to Go from Here
Data Analysis: Looking before You Crunch
Nothing (not even a straight line) lasts
forever
Data snooping isnt cool
No (data) fishing allowed
Getting the Big Picture: An Overview of Stats II

Population parameter
Sample statistic
Confidence interval

Hypothesis test
Analysis of variance (ANOVA)
Multiple comparisons
Interaction effects

Correlation
Linear regression
Chi-square tests
Nonparametrics
Categorical versus Quantitative Variables
Statistics for Categorical Variables

Estimating a proportion
Comparing proportions
Looking for relationships between
categorical variables
Building models to make predictions
Statistics for Quantitative Variables

Making estimates
Making comparisons
Exploring relationships
Predicting y using x
Avoiding Bias
Measuring Precision with Margin of Error
Knowing Your Limitations
Estimating Parameters by Using Confidence
Intervals

Getting the basics: The general form of a


confidence interval
Finding the confidence interval for a
population mean


What changes the margin of error?

Population standard deviation

Sample size
Confidence level
Large confidence, narrow intervals just the right
size
Interpreting a confidence interval


Whats the Hype about Hypothesis Tests?

What Ho and Ha really represent


Gathering your evidence into a test statistic

Determining strength of evidence with a p-


value



False alarms and missed opportunities:
Type I and II errors

Making false alarms with Type I errors

Missing an opportunity with a Type II error


The power of a hypothesis test


Throwing a power curve

Controlling the sample size



Exploring Relationships with Scatterplots and
Correlations
Using scatterplots to explore relationships
Collating the information by using the
correlation coefficient
Building a Simple Linear Regression Model
Finding the best-fitting line to model your
data


The y-intercept of the regression line
The slope of the regression line

Making point estimates by using the


regression line
No Conclusion Left Behind: Tests and Confidence
Intervals for Regression
Scrutinizing the slope

A confidence interval for slope


A hypothesis test for slope

Inspecting the y-intercept



Building confidence intervals for the
average response


Making the band with prediction intervals

Predicting textbook weight using student weight


Comparing prediction and confidence intervals
Checking the Models Fit (The Data, Not the
Clothes!)

Defining the conditions

Normal ys for every x


Same spread for every x

Finding and exploring the residuals


Finding the residuals


Standardizing the residuals

Making residual plots


Checking normality
Checking the spread of the ys for each x
Using r to measure model fit
2
Scoping for outliers
Knowing the Limitations of Your Regression
Analysis
Avoiding slipping into cause-and-effect
mode
Extrapolation: The ultimate no-no
Sometimes you need more than one variable
Getting to Know the Multiple Regression Model
Discovering the uses of multiple regression

Looking at the general form of the multiple


regression model
Stepping through the analysis
Looking at xs and ys
Collecting the Data
Pinpointing Possible Relationships
Making scatterplots
Correlations: Examining the bond
Finding and interpreting correlations

Testing correlations for significance



Checking for Multicolinearity


Finding the Best-Fitting Model for Two x Variables
Getting the multiple regression coefficients
Interpreting the coefficients
Testing the coefficients

Predicting y by Using the x Variables
Checking the Fit of the Multiple Regression Model

Noting the conditions


Plotting a plan to check the conditions
Checking the three conditions

Meeting the first condition: Normal distribution


with mean zero
Satisfying the second condition: Variance

Checking the third condition


Getting a Kick out of Estimating Punt Distance
Brainstorming variables and collecting data
Examining scatterplots and correlations

Seeing relationships through scatterplots


Looking for connections by using correlations

Just Like Buying Shoes: The Model Looks Nice, But
Does It Fit?

Assessing the fit of multiple regression


models
Model selection procedures
Going with the forward selection procedure

Opting for the backward selection procedure


Using the best subsets procedure
The secret to a punters success: An example
Anticipating Nonlinear Regression
Starting Out with Scatterplots
Handling Curves in the Road with Polynomials
Bringing back polynomials



Searching for the best polynomial model
Using a second-degree polynomial to pass
the quiz
Assessing the fit of a polynomial model
Examining R2 and R2 adjusted
Checking the residuals

Making predictions
Going Up? Going Down? Go Exponential!
Recollecting exponential models



Searching for the best exponential model


Spreading secrets at an exponential rate
Step one: Check the scatterplot

Step two: Let Minitab do its thing to log(y)

Step three: Go exponential


Step four: Make predictions

Step five: Assess the fit of your exponential model


Understanding a Logistic Regression Model
How is logistic regression different from
other regressions?
Using an S-curve to estimate probabilities


Interpreting the coefficients of the logistic
regression model

The logistic regression model in action


Carrying Out a Logistic Regression Analysis

Running the analysis in Minitab


Finding the coefficients and making
the model
Estimating p
Checking the fit of the model
Fitting the movie model
Step one: p-value for Chi-squared

Step two: p-value for the x variable

Step three: Concordant pairs


Comparing Two Means with a t-Test


Evaluating More Means with ANOVA

Spitting seeds: A situation just waiting for


ANOVA
Walking through the steps of ANOVA

Checking the Conditions


Verifying independence

Looking for whats normal


Taking note of spread
Setting Up the Hypotheses
Doing the F-Test
Running ANOVA in Minitab
Breaking down the variance into sums of
squares
Locating those mean sums of squares
Figuring the F-statistic
Making conclusions from ANOVA

Using the p-value approach


Using critical values



Whats next?

Checking the Fit of the ANOVA Model


Following Up after ANOVA

Comparing cellphone minutes: An example


Setting the stage for multiple comparison
procedures
Pinpointing Differing Means with Fisher and Tukey

Fishing for differences with Fishers LSD


The original LSD procedure

Using Fishers new and improved LSD



Separating the turkeys with Tukeys test



Examining the Output to Determine the Analysis
So Many Other Procedures, So Little Time!
Controlling for baloney with the Bonferroni
adjustment


Comparing combinations by using Scheffes
method

Finding out whodunit with Dunnetts test


Staying cool with Student Newman-Keuls

Duncans multiple range test


Going nonparametric with the Kruskal-
Wallis test
Setting Up the Two-Way ANOVA Model

Determining the treatments


Stepping through the sums of squares
Understanding Interaction Effects

What is interaction, anyway?


Interacting with interaction plots

Factors A and B are significant


Factor A is significant but not Factor B
Factor B is significant but not Factor A
Neither factor is significant

Interaction term AB is significant


Testing the Terms in Two-Way ANOVA
Running the Two-Way ANOVA Table

Interpreting the results: Numbers and


graphs
Assessing the fit
Multiple comparisons

Are Whites Whiter in Hot Water? Two-Way


ANOVA Investigates
Seeing Regression through the Eyes of Variation
Spotting variability and finding an x-
planation
Getting results with regression
Assessing the fit of the regression model

Using a scatterplot and correlation


Using R2
Regression and ANOVA: A Meeting of the Models

Comparing sums of squares


Partitioning variability by using SSTO, SSE, and
SST for ANOVA

Finding sums of squares for regression


Dividing up the degrees of freedom

Degrees of freedom in ANOVA


Degrees of freedom in regression
Bringing regression to the ANOVA table
Relating the F- and t-statistics: The final
frontier
Breaking Down a Two-Way Table

Organizing data into a two-way table


Filling in the cell counts
Making marginal totals
Breaking Down the Probabilities
Marginal probabilities
Joint probabilities
Conditional probabilities
Figuring conditional probabilities
Notation for conditional probabilities
Comparing two groups with conditional
probabilities
Using graphs to display conditional probabilities
Trying To Be Independent

Checking for independence between two


categories
Checking for independence between two
variables
Demystifying Simpsons Paradox
Experiencing Simpsons Paradox

Simpsons Paradox in action: Video games and the


gender gap
Factoring in difficulty level
Comparing success rates with conditional
probabilities
Figuring out why Simpsons Paradox occurs
Keeping one eye open for Simpsons
Paradox
The Chi-square Test for Independence

Collecting and organizing the data


Determining the hypotheses
Figuring expected cell counts
Checking the conditions for the test

Calculating the Chi-square test statistic

Working out the formula


Calculating the test statistic
Picking through the output
Finding your results on the Chi-square table
Determining degrees of freedom
Discovering how Chi-square distributions behave
Using the Chi-square table


Drawing your conclusions

Approximating p-value from the table


Extracting the p-value from computer output

Putting the Chi-square to the test


Comparing Two Tests for Comparing Two
Proportions

Getting reacquainted with the Z-test for two


population proportions

Equating Chi-square tests and Z-tests for a


two-by-two table


Finding the Goodness-of-Fit Statistic
Whats observed versus whats expected
Calculating the goodness-of-fit statistic
Interpreting the Goodness-of-Fit Statistic Using a
Chi-Square
Checking the conditions before you start
The steps of the Chi-square goodness-of-fit
test


Arguing for Nonparametric Statistics
No need to fret if conditions arent met
The medians in the spotlight for a change
So, whats the catch?
Mastering the Basics of Nonparametric Statistics

Sign
Testing the median
Doing a matched-pairs experiment
Rank
Signed rank
Rank sum
Reading the Signs: The Sign Test

Testing the median



Estimating the median

Testing matched pairs


Going a Step Further with the Signed Rank Test

A limitation of the sign test


Stepping through the signed rank test

Losing weight with signed ranks

Conducting the Rank Sum Test

Checking the conditions


Stepping through the test

Stepping up the sample size



Performing a Rank Sum Test: Which Real Estate
Agent Sells Homes Faster?
Checking the conditions for this test
Testing the hypotheses



Combining and ranking

Finding the test statistic


Determining whether you can reject Ho


Doing the Kruskal-Wallis Test to Compare More
than Two Populations
Checking the conditions
Setting up the test
Conducting the test step by step


Pinpointing the Differences: The Wilcoxon Rank
Sum Test

Pairing off with pairwise comparisons


Carrying out comparison tests to see whos
different

Examining the medians to see how theyre
different
Pickin On Pearson and His Precious Conditions
Scoring with Spearmans Rank Correlation
Figuring Spearmans rank correlation
Watching Spearman at work: Relating
aptitude to performance
Claiming These Statistics Prove . . .
Its Not Technically Statistically Significant, But . . .
Concluding That x Causes y
Assuming the Data Was Normal
Only Reporting Important Results


Assuming a Bigger Sample Is Always Better

Its Not Technically Random, But . . .
Assuming That 1,000 Responses Is 1,000 Responses
Of Course the Results Apply to the General
Population
Deciding Just to Leave It Out
Asking the Right Questions
Being Skeptical
Collecting and Analyzing Data Correctly
Calling for Help
Retracing Someone Elses Steps
Putting the Pieces Together

Checking Your Answers


Explaining the Output
Making Convincing Recommendations
Establishing Yourself as the Statistics Go-To Guy or
Gal
Pollster
www.pollster.com

Ornithologist (Bird Watcher)


Sportscaster or Sportswriter
Journalist
Crime Fighter
Medical Professional
Marketing Executive
Lawyer
Stock Broker
t-Table
Binomial Table
Chi-Square Table
Rank Sum Table


F-Table

To access the cheat sheet specifically for this book,
go to www.dummies.com/cheatsheet/statistics2.

Find out "HOW" at Dummies.com


WILEY END USER LICENSE
AGREEMENT

S-ar putea să vă placă și