Sunteți pe pagina 1din 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

Regression
Panel Data Model (pdf)
T-test & ANOVA
Categorical Data (pdf)
Event Count Data (pdf)
Statistics (K300)
T-tests & ANOVA (pdf)
Factor Analysis (SEM)
HLM (Multilevel Model)
Software
SAS (vendor)
Stata (vendor)
M-Plus (vendor)
R (R-project)
Resources
Data Achives
Journals

KB Korea Hangeul Help Sitemap Calc


Link

Home (masil.org)
Korean (Hangeul)
Knowledge Base | KB2
SSCI Journal List
K-E Dic
This page illustrates how to compare group means using T-test, various ANOVA (analysis of
variance) including the repeated measure ANOVA, ANCOVA (analysis of covariance), and MANOVA
(multivariate analysis of variance).

Intro (PDF) | Data Structure | ANOVA | T-test | One-way ANOVA | Two-way


ANOVA
Factorial | Latin Square | Split-Plot | Repeated | ANCOVA | MANOVA | References
INTRODUCTION
The t-test and ANOVA examine whether group means differ from one another. The t-test
compares two groups, while ANOVA can do more than two groups.
The t-test ANOVA have three assumptions: independence assumption (the elements of one sample
are not related to those of the other sample), normality assumption (samples are randomly drawn
from the normally distributed populstions with unknown population means; otherwise the means
are no longer best measures of central tendency, thus test will not be valid), and equal variance
assumption (the population variances of the two groups are equal)
ANCOVA (analysis of covariance) includes covariates, interval independent variables, in the righthand side to control their impacts. MANOVA (multivariate analysis of variance) has more than
one left-hand side variable.
Analysis
T-test
One-way
Two-way
ANCOVA
MANOVA

LHS (interval)
Single
Single
Single
Single
Multiple

RHS (categorical)
Single (binary)
Single
Two (multiple)
Multiple
Multiple

Notes

Covariates

The following diagram summarizes the t-tes and one-way ANOVA.


http://www.masil.org/method/anova.html

Page 1 sur 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

SAS has the UNIVARIATE, MEANS, and TTEST procedures for t-test, while SAS ANOVA,
GLM, and MIXED procedures conduct ANOVA.
The ANOVA procedure is able to handle balanced data only, but the GLM and MIXED procedures
can deal with both balanced and unbalanced data. The t-test and one-way ANOVA do not matter
whether data are balanced or not.
STATA has the .ttest, and the .ttesti commands for t-test, and the .anova, and .manova commands
conduct ANOVA. Note STATA .glm command is not used for ANOVA.

DATA STRUCTURE
It is useful to read multiple observations in a data line. Note that @@ is a line holder in SAS.
LIBNAME js 'c:\data\sas';
DATA js.data1;
INPUT group block $ response @@;
DATALINES;
1 A 34.5 1 B 54.5 1 B 25.8 3 C 54.8
2 B 54.8 3 A 15.8 2 C 14.5 2 A 15.1
...
RUN;
/* Data read ******************
1 1 A 34.5
2 1 B 54.5
3 1 B 25.8
...
*******************************/
The DO statement allows to read more complicated data. You may list the particular numbers in the DO
http://www.masil.org/method/anova.html

Page 2 sur 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

statement rather than set a range of values (e.g., DO treatment=1 TO 2;). The @ may not be omitted.
This tip is very useful especially when you type in data for the randomized complete block design (RCB)
and the Latin square design (LSD).
DATA js.data2;
DO block=1 TO 3;
DO treatment=1,5;
INPUT response @;
OUTPUT;
END;
END;
DATALINES;
4.91 4.63 4.76 5.04 5.38 6.21
5.60 5.08 4.91 4.63 4.76 5.04
...
RUN;
/* Data read *********************
1 1 1 4.91
2 1 5 4.63
3 2 1 4.76
4 2 5 5.04
5 3 1 5.38
...
**********************************/
If data are arranged in the long format, you need to rearranged into the wide format.
DATA js.wide1;
SET js.long;
IF period=1;
RENAME response=response1;
PROC SORT DATA=js.wide1;
BY id;
RUN;
...
DATA js.wide;
MERGE js.long1 js.long2 ...;
BY id;
RUN;
STATA has the .pkshape command to transform a data set in the latin square form into the corresponing
data set for analysis.
. list, noobs
+---------------+
|id row c1 c2 c3|
|---------------|
|100 1 74 97 54 |
|101 2 54 84 25 |
|102 3 15 57 64 |
+---------------+
http://www.masil.org/method/anova.html

Page 3 sur 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

. pkshape id r c1-c3, order(abc cab bca) outcome(y) sequence(row) treat(treat) period(col)

T-TEST
One Sample T-Test
The MU0 option specifies a value of the null hypothesis. The ALPHA option specifies the significance
level. The T option in the MEANS procedure runs the t-test.
PROC UNIVARIATE MU0=0 ALPHA=.01;
VAR response;
RUN;
. ttest response=0, level(99)
PROC UNIVARIATE MU0=10 VARDEF=DF NORMAL ALPHA=.05;
VAR response;
RUN;
. ttest response=10
PROC MEANS T PROBT;
VAR response;
RUN;
. ttest response=0
PROC MEANS MEAN STD STDERR T VARDEF=DF PROBT CLM ALPHA=.01;
VAR response;
RUN;
Paired T-Test
PROC TTEST;
PAIRED pre*post;
RUN;
. ttest pre=post,level(95)
Note that STATA .ttest command does not have the "unpaired" option. SAS PAIRED statement is able
to compare multiple pairs.
PROC TTEST;
PAIRED (a b)*(c d);
RUN;
Two Independent Samples T-Test
The TTEST procedure reports two T statistics: one under the equal variance assumptio and the other for
unequal variance. Users have to check the equal variance test (F test) first. If not rejected, read the T
statistic and its p-value of pooled analysis. If rejected, read the T statistic and its p-value of Satterthwaite
or Cochran/Cox approximation.
http://www.masil.org/method/anova.html

Page 4 sur 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

PROC TTEST COCHRAN;


CLASS male;
VAR response;
RUN;
. ttest response, by(male)
. ttest response, by(male) unequal
STATA is able to conduct the t-test for two independnet samples even When data are arranged in two
variables without a group varialbe. The unpaired option indicates that the two variables are independent,
and the welch option asks STATA produces Welch approximation of degree of freedom. Note STATA
does not give us Cochran/Cox approximation.
. ttest response1=response2, unpaired level(99)
. ttest response1=response2, unpaired unequal welch
T-Test on Aggregate Data
The FREQ statement in the TTEST procedure can handle aggregate data
PROC TTEST H0=5 ALPHA=.01;
CLASS smoke;
VAR lung;
FREQ count;
RUN;
STATA .ttesti command enables you to conduct t-test using aggregated descriptive statistics. The
numbers listed are the number of observation, mean, and standard deviation of first sample and of
second sample.
. ttesti 30 4.5 0.54 // One sample T-test
. ttesti 30 4.5 0.54 30 5.0 1.44 // Two sample T-test

ONE-WAY ANOVA
This experimental design is often called completely randomized design (CRD). SAS has the ANOVA,
GLM (Generalized Linear Model), MIXED Procedures for one-way ANOVA. Their usages are identical.
PROC ANOVA;
CLASS treatment;
MODEL response=treatment;
RUN;
STATA has the .anova and .oneway command for one-way ANOVA.
. anova response treatment
. oneway response treatment, tabulate
You may add the MEANS statement in both ANOVA and GLM procedures to compute means of groups
and perform multiple comparison tests such as DUNCAN, TUKEY, DUNNETT, and BON.
PROC GLM;
CLASS treatment;
http://www.masil.org/method/anova.html

Page 5 sur 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

MODEL response=treatment;
MEANS treatment /T DUNCAN;
RUN;

TWO-WAY ANOVA
Randomized Complete Block (RCB): Treatments are assigned at random within blocks of adjacent
subjects, each treatment once per block. The number of blocks is the number of replications. Any
treatment can be adjacent to any other treatment, but not to the same treatment within the block.
Again, the ANOVA, GLM, and MIXED conduct the two-way ANOVA with the identical usage.
PROC GLM;
CLASS treat1 treat2;
MODEL response=treat1 treat2;
RUN;
In the case of the randomized complete block design, you may have one observation in each cell. So,
including an interaction term is meaningless, producing awkward results. But it is noteworthy that the
sum of squares due to error (SSE) is equivalent to the sum of squares of interaction (SSI).
You may compare group means using the MEANS or the LSMEANS (least squares means) statement.
The LSMEANS statement is not available in the ANOVA procedure.
PROC ANOVA;
CLASS treatment block;
MODEL response=treatment block;
MEANS treatment block /TUKEY;
RUN;
PROC GLM;
CLASS treatment block;
MODEL response=treatment block;
LSMEANS treatment block /ADJUST=TUKEY;
RUN;
If there is subsamples, you need to use nested scheme as follows.
PROC GLM;
CLASS treatment sub;
MODEL response=treatment treatment(sub);
RUN;
. ttest response treatment / sub | treatment /

FACTORIAL DESIGN
If there are subsamples (more than one observation in each cell) in a two-way ANOVA, you may
consider the interaction effects. This is the two-way factorial design on CRD.
Treat1

Block1
54, 67, 87

http://www.masil.org/method/anova.html

block2
57, 67

block3
31, 54, 87, 95
Page 6 sur 10

T-test and Analysis of Variance (ANOVA)

Treat2
Treat3

35, 67
98, 45, 12, 57, 87

10/02/10 11:52

54, 87, 15, 75, 55


31, 14, 54

68, 17, 16, 68


24, 87

The interaction is expressed by asterisk (*). The | indicates all possible combinations. Thus, the following
procedures return the same result.
PROC ANOVA;
CLASS treatment block;
MODEL response=treatment | block;
RUN;
PROC GLM;
CLASS treatment block;
MODEL response=treatment block treatment*block;
RUN;
You may compare group means using the MEANS or the LSMEANS (least squares means) statement.
The LSMEANS statement is not available in the ANOVA procedure.
PROC ANOVA;
CLASS treatment block;
MODEL response=treatment | block;
MEANS treatment block treatment*block/TUKEY;
RUN;
PROC GLM;
CLASS treatment block;
MODEL response=treatment | block;
LSMEANS treatment | block /ADJUST=TUKEY;
RUN;
Two-Way Factorial Design on RCB
PROC GLM;
CLASS treat1 treat2 block;
MODEL response=treat1 treat2 block treat1*treat2;
RUN;
. anova response treatment block treatment*block
Three-Way Factorial Design on RCB
PROC GLM;
CLASS treat1 treat2 treat3 block;
MODEL response=treat1 treat2 block treat1*treat2 treat1*treat3 treat2*treat3 treat1*treat2*treat3;
RUN;

SPLIT-PLOT DESIGN
Split-Plot Design on CRD
PROC GLM;
CLASS treat repeat sub;
http://www.masil.org/method/anova.html

Page 7 sur 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

MODEL response=treat sub treat(repeat) treat*sub;


RUN;
Split-Plot Design on RCB
PROC GLM;
CLASS treat block sub;
MODEL response=treat block sub treat*block treat*sub;
RUN;
LATIN SQUARE DESIGN
The latin square design (LSD) has the equal number of rows, columns and treatments. Treatments are
assigned at random within rows and columns, with each treatment once per row and once per column.
Each cell of the squared table has only one observation. This LSD is useful to control variation in two
row and column.
PROC GLM;
CLASS row column treatment;
MODEL response=row column treatment;
RUN;
.anova response row column treat
The degree of freedom of main effects (block, group, and treatment) is r, the number of row or column.
The degree of freedom of SSE is (r-1)(r-2). Finally, the degree of freedom of SST is N-1 = r*r-1.

REPEATED MEASURE ANOVA


The REPEATED statement in SAS and the repeated() option are used to indicate repeated measure
analysis.
PROC GLM;
CLASS treat block;
MODEL resp1 resp2 resp3=treat block;
REPEATED response;
RUN;
PROC GLM;
CLASS treat block;
MODEL output1 - output5 = treat block;
REPEATED id 5 (0 1 2 3 4) / POLYNOMIAL SUMMARY PRINTE;
RUN;
. anova response treat time, repeated(time)

RANDOM EFFECT MODELS


The followings are examples of random effects models using MIXED and GLM.
PROC MIXED;
http://www.masil.org/method/anova.html

Page 8 sur 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

CLASS treat block;


MODEL response = treatk /SOLUTION;
RANDOM block /SOLUTION;
RUN;
PROC MIXED COVTEST METHOD=TYPE3;
CLASS subject type; /* type is a characteristic of subject */
MODEL response = type /SOLUTION;
RANDOM subject(type) /SOLUTION;
LSMEANS type /DIFF;
RUN;
PROC GLM COVTEST;
CLASS subject type; /* type is a characteristic of subject */
MODEL response = type subject(type);
RANDOM subject(type) /TEST;
RUN;
PROC MIXED COVTEST;
CLASS town block plant treat ;
MODEL response = treat /SOLUTION;
RANDOM area plant area*plant block(area) /SOLUTION;
RUN;

ANCOVA
ANCOVA controls variation in an experiment by measuring an independent factor on each experimental
subject.
PROC GLM;
CLASS treat;
MODEL response=covariate treat /SOLUTION;
LSMEANS treat /STDERR;
RUN;
. anova response treat covariate, continuous(covariate)
MANOVA
The MANOVA statement indicates that this model is the multivariate analysis of variance.
PROC GLM;
CLASS treat1 treat2;
MODEL response1-response3= treat1-treat5/NOUNI;
MANOVA H=treat;
RUN;
. manova response1-response3 = treat1-treat5

REFERENCES
http://www.masil.org/method/anova.html

Page 9 sur 10

T-test and Analysis of Variance (ANOVA)

10/02/10 11:52

Littell, Ramon C., Walter W. Stroup, and Rudolf J. Freund. 2002. SAS for Linear Models, 4th ed.
Cary, NC: SAS Institute.
Littell, Ramon C., George A. Milliken, Walter W. Stroup, and Russell D. Wolfinge. 2006. SAS
System for Mixed Models. 2nd ed. Cary, NC: SAS Institute.
Stata Press. 2003. Stata Base Reference Manual Release 8. College Station, TX: Stata Press.
http://www.tfrec.wsu.edu/ANOVA/
Indiana University | Korea University | PA Department | IGS Korea | ASPA | KAPA | KAPS
http://www.masil.org/method/anova.html. Last modified on 02/07/2010
Copyright 1999-2010, Jeeshim and KUCC625

http://www.masil.org/method/anova.html

Page 10 sur 10

S-ar putea să vă placă și