Documente Academic
Documente Profesional
Documente Cultură
Chapter 225
Repeated
Measures ANOVA
Introduction
This chapter describes how to obtain false discovery rate or experiment-wise error rate
(Bonferroni) adjusted P-values (Probability Levels) for a repeated measures (within-subject)
experiment using the GESS: Repeated Measures ANOVA procedure. The general linear models
approach for repeated measures is used in this procedure. Up to three between factor variables
and three within factor variables, as well as interactions, may be specified in the Repeated
Measures ANOVA procedure. Geisser-Greenhouse, Box, and Huynh-Feldt corrections on the
within-subject F tests are available in this procedure. A detailed discussion of Repeated Measures
Analysis of Variance is found in the NCSS: Repeated Measure Analysis of Variance chapter.
Before running this procedure, output (.ges) files containing a single expression value for each
gene on each array must be obtained using the appropriate pre-processing procedure in GESS.
Declared Declared
Not Different Different Total
A true difference in
expression does not exist U V m0
There exists a true
difference in expression T S m – m0
Total m–R R m
In the table, the m is the total number of hypotheses tested (or total number of genes) and is
assumed to be known in advance. Of the m null hypotheses tested, m0 is the number of tests for
which there is no difference in expression, R is the number of tests for which a difference is
declared, and U, V, T, and S are defined by the combination of the declaration of the test and
whether or not a difference exists, in truth. The random variables U, V, T, and S are unobservable.
(i.e., maximizes S), but it will also include a very large number of genes which are falsely
declared to have a true difference in expression (i.e., does not appropriately minimize V).
Controlling the PCER should be viewed as overly weak control of Type I error.
To obtain P-values (Probability Levels) that control the PCER, no adjustment is made to the P-
value. To determine significance, the P-value is simply compared to the designated alpha.
p% j = min(mp j ,1) .
That is, each P-value (Probability Level) is multiplied by the number of tests, and if the result is
greater than one, it is set to the maximum possible P-value of one.
m
p% ri = min {min( pr ,1)} ,
k =i ,..., m k k
where pr1 ≤ pr2 ≤ L ≤ prm are the observed ordered unadjusted P-values. The procedure is
defined in Benjamini and Hochberg (1995). The corresponding adjusted P-value definition given
here is found in Dudoit, Shaffer, and Boldrick (2003).
Analysis Steps
Following are the recommended steps for running an Repeated Measures ANOVA on microarray
data.
Step 1 – Pre-Processing
Run the appropriate pre-processing procedure (e.g., GenePix Pre-processing or Affymetrix Pre-
processing) to prepare data (.ges) files for statistical analysis. The .ges files are created when a
variable name is entered in the Output File Names Variable box on the variables tab of the pre-
processing procedure window.
Procedure Options
This section describes the options available in this procedure.
Variables Tab
These options specify the variables that will be used in the analysis.
Factor Specification
These variables are used to identify all the factors to be used in the model.
Between Variables (1-3)
These three variables specify between-subject factors. A between-subject factor specifies groups
into which the subjects are divided. For example, gender, age group, and treatment group are all
between factors.
The values of these variables indicate which group the subject belongs in. Values may be text or
numeric.
225-6 Repeated Measures ANOVA
Subject Variable
This variable indicates the variable containing the subject identification number or phrase.
Note that this variable is treated as a 'nested' factor.
Within Variables (1-3)
These three variable define within-subject factors. A within-subject factor is one whose levels
represent different points in time or space. It is the 'repeated measurement'.
Examples of within-subject factors are time, pre-post, and body organ.
Type
This option specifies the type of each factor. The options are
• Fixed
A fixed factor includes all possible values across the range of interest. Usually, hypotheses
are tested about fixed factors. For example, gender, dose-level, and treatment-group are
examples of fixed factors.
• Random
A random factor includes a sample from the population of possible values. Examples of
random factors are hospitals, cities, and randomly-selected sites.
Model Specification
These options determine the model that will be analyzed.
Which Model Terms
A design in which all main effect and interaction terms are included is called a saturated model.
Occasionally, it is useful to omit various interaction terms from the model-usually because some
data values are missing. This option lets you specify which interactions to keep.
The options included here are:
For complicated designs, it is usually easier to check the option 'Write Only', and run the
procedure. A model containing the listed factors will be generated and placed in this box. You
can then edit it as you desire.
The model is entered using letters (in alphabetical order) separated by the plus sign. For example,
a three-factor factorial in which only two-way interactions are needed would be entered as
follows:
A+B+AB+C+AC+BC
A simple repeated-measures design would look like this:
A+B(A)+C+AC+BC(A)
Write Model in ‘Custom Model’ Field. Do Not Process Data.
When this option is checked, no data analysis is performed when the procedure is run. Instead, a
copy of the full model is stored in the Custom Model box. You can then edit the model as desired.
Correction Option
Geisser-Greenhouse Correction
In a repeated measures ANOVA, the regular F-Tests of the within factors may not meet all of the
necessary assumptions. Geisser-Greenhouse proposed an adjustment to make the probability
levels more accurate. Box made a popular refinement. Huynh-Feldt made a further refinement
that made the probability level even more accurate.
Select here the type of adjustment you want to use.
RECOMMENDATION:
We recommend the Huynh-Feldt adjustment.
• None
No correction is done.
• Bonferroni
The Bonferroni correction preserves the experiment-wise error rate.
• Recommendation
If you will be doing follow-up testing, False Discovery Rate Control should be used. If not,
the Bonferroni correction should be used.
225-8 Repeated Measures ANOVA
Reports Tab
The options on this panel control which reports and plots are generated.
Select Reports
The following options are used to determine the reports that will be displayed.
Expected Mean Square Report
Check this box to obtain the expected mean square for each model term.
Test Detail Sorted by Prob Level
Check this box to obtain a list of the most significant F tests, sorted by the probability level.
Associated names or IDs, unadjusted probability levels, standard deviations of means, standard
errors, degrees or freedom, and test statistics are also shown.
A separate report is produced for each term in the model.
Prob Level Cutoff
Specify the cutoff for the multiple test corrected probability levels. When the Test Detail Sorted
by Prob Level box is checked, all adjusted probability levels below this value will be reported.
Test Detail Sorted by Gene Within Subset
Check this box to obtain a list of all genes that are in subset lists. Associated probability levels,
standard deviations of means, standard errors, degrees of freedom, and test statistics are also
shown.
A separate list is produced for each subset, sorted alphabetically. A separate report is produced
for each term in the model.
Report Options
These options determine the format of the reports.
Precision
Specifies whether unformatted numbers are displayed as single (7-digit) or double (13-digit)
precision numbers.
• Single
Unformatted numbers are displayed with 7-digits. This is the default setting. All reports have
been formatted for single precision.
• Double
Unformatted numbers are displayed with 13-digits. This option is most often used when the
extremely accurate results are needed for further calculation. Double precision numbers will
require more space than allotted, potentially resulting in unaligned output. This option is
provided for those instances when accuracy is more important than format alignment.
COMMENTS:
This option does not affect formatted numbers such as probability levels.
This option only influences the format of the numbers as they are output. All calculations are
performed in double precision regardless of selection.
Repeated Measures ANOVA 225-9
Prob Decimals
Specify the number of decimal places to be used for displaying probability levels on the reports.
The number chosen here does not affect the internal precision of the data.
sqrt(MS) Decimals
Specify the number of decimal places to be used for displaying square root transformed mean
squares on the reports. The number chosen here does not affect the internal precision of the data.
F Value Decimals
Specify the number of decimal places to be used for displaying F-statistics on the reports. The
number chosen here does not affect the internal precision of the data.
Select Histograms
The following options are used to determine which histograms will be displayed.
Histogram of Prob Level
Check this box to obtain a histogram of the unadjusted (raw) probability levels.
Histogram of Corrected Prob Level
Check this box to obtain a histogram of all corrected probability levels.
Histogram of Log10(Prob Level)
Check this box to obtain a histogram of the Log(base 10) transformed, unadjusted (raw)
probability levels. When the mean square denominator is zero, the Log10(Prob Level) is put in
the bin at -5.
Histogram of Log10(Corrected Prob Level)
Check this box to obtain a histogram of all Log(base 10) transformed, corrected probability
levels. Occasionally, a mean square denominator of zero occurs, producing an undefined Prob
Level. When the mean square denominator is zero, the Log10(Corrected Prob Level) is put in the
bin at -5.
Histogram of Z(Prob Level)
Check this box to obtain a histogram of Z-transformed unadjusted (raw) probability levels. The
Z-transformation converts the probability level into the corresponding standard normal
distribution value using the probability integral transform. Values less than -9 are binned at -9.
Values greater than 9 are binned at 9.
Histogram of Z(Corrected Prob Level)
Check this box to obtain a histogram of Z-transformed corrected probability levels. The Z-
transformation converts the probability level into the corresponding standard normal distribution
value using the probability integral transform. Values less than -9 are binned at -9. Values greater
than 9 are binned at 9.
Histogram of SQRT(Mean Square Numerator)
Check this box to obtain a histogram of all square root transformed numerator mean squares for
each factor.
Histogram of SQRT(Mean Square Denominator)
Check this box to obtain a histogram of all square root transformed denominator mean squares for
each factor.
225-10 Repeated Measures ANOVA
Histogram of F Value
Check this box to obtain a histogram of all F values for each factor. When the mean square
denominator is zero, the value 100 is used in the histogram.
Computational Option
Genes Per Batch
To optimize the use of computer memory, the genes are processed in groups or batches. This
parameter specifies the number of genes processed per batch.
The basic rule is that the number of genes per batch times the number of arrays should be less
than 500,000.
If you choose 'Automatic', the program will select a reasonable value.
Histograms Tab
The options on this panel control the appearance of the histograms.
Major Ticks
Specify the number of large tickmarks and optional grid lines along this axis. A set of minor
tickmarks will be generated between each pair of major tickmarks. A reference number is
displayed adjacent to each major tickmark.
Minor Ticks
Select the number of small tickmarks to be displayed between each pair of major (large)
tickmarks along this axis.
Show Grid Lines
Check this option to display grid lines at the major tickmarks along this axis.
NOTE: Since the grid lines are drawn out from the tickmarks, they appear perpendicular to the
axis. Thus, checking the Y Grid Lines will actually cause horizontal grid lines to appear.
Histogram Settings
These options are used to specify the appearance of the histograms.
Style File
Designate a histogram style file. This file sets all histogram options that are not set directly on
this panel. Unless you choose otherwise, the HistoBox style file is used. Histogram style files are
created in the Histograms procedure.
Number of Bars
Specify the number of bars (bins) to be displayed. Select '0 - Automatic' to direct the program to
select an appropriate number based on the number of values.
Interior Color
Specify the histogram interior color.
Background Color
Specify the histogram background color.
Bar Fill Color
Specify the color of the inside of the bars.
Bar Border Color
Specify the color of the lines around the bars.
Histogram Title
Title
Enter text here for the histogram title.
REPLACEMENT CODES:
The following code is replaced by the appropriate name when the plot is generated.
{X} is replaced by the statistic that is reported in the histogram.
Storage Tab
The options on this panel control the storage of pre-processed data values on the spreadsheet for
further analysis.
Subsets 1 - 9 Tabs
The options on this panel control the names and lists of subsets.
Subset 1 – 9
Name
The name of the gene subset is entered here.
Separate reports may be generated to show all genes of a subset (see Reports tab). This may be
useful for examining probability levels of specific genes you are interested in that do not make
the cutoff.
Genes in this Subset
Enter a list of genes that are to be in this subset. The genes may be entered directly, or the *
character may be used to specify all genes with a particular beginning. The gene names or IDs
entered in this list must be in the column specified in Gene Name From box on the Variables tab.
EXAMPLES:
Blank
spike1
225-14 Repeated Measures ANOVA
spike3
spike5
spike* (all names beginning with spike)
AA44719
NM_00582
NM_04762
NM_27564
cntrl* (all names beginning with cntrl)
file(C:\Microarray\genelist.txt) (all names in the genelist.txt file)
var(OutputGenes) (all names in the spreadsheet variable with the variable name OutputGenes)
These Genes are
Specify here whether the genes of this subset are to be included or excluded from the list of genes
that are analyzed. Probability levels will not be calculated for the genes of this subset when
'Excluded' is entered here.
Template Tab
The options on this panel allow various sets of options to be loaded (File menu: Load Template)
or stored (File menu: Save Template). A template file contains all the settings for this procedure.
Template Id’s
A list of the Template Id’s of the corresponding files. This id value is loaded in the box at the
bottom of the panel.
The result is 27 samples. Each sample is processed, exposed to a single microarray, resulting in a
single expression value for each gene for each rat of each treatment group at each time period.
The goal is to determine for each gene whether there is evidence that the expression is different
across treatment, time, and/or if there is a treatment by time interaction.
In the pre-processing procedure, 27 files are created. The format of the spreadsheet is shown
below.
RM1_RM dataset
Rat Time Treatment OutputFile
1 0 C %p%\data\gess\rm\rm\RM1_RM_1.ges
1 12 C %p%\data\gess\rm\rm\RM1_RM_2.ges
1 24 C %p%\data\gess\rm\rm\RM1_RM_3.ges
2 0 C %p%\data\gess\rm\rm\RM1_RM_4.ges
2 12 C %p%\data\gess\rm\rm\RM1_RM_5.ges
2 24 C %p%\data\gess\rm\rm\RM1_RM_6.ges
3 0 C %p%\data\gess\rm\rm\RM1_RM_7.ges
3 12 C %p%\data\gess\rm\rm\RM1_RM_8.ges
3 24 C %p%\data\gess\rm\rm\RM1_RM_9.ges
4 0 Trt1 %p%\data\gess\rm\rm\RM1_RM_10.ges
4 12 Trt1 %p%\data\gess\rm\rm\RM1_RM_11.ges
4 24 Trt1 %p%\data\gess\rm\rm\RM1_RM_12.ges
5 0 Trt1 %p%\data\gess\rm\rm\RM1_RM_13.ges
5 12 Trt1 %p%\data\gess\rm\rm\RM1_RM_14.ges
5 24 Trt1 %p%\data\gess\rm\rm\RM1_RM_15.ges
225-16 Repeated Measures ANOVA
6 0 Trt1 %p%\data\gess\rm\rm\RM1_RM_16.ges
6 12 Trt1 %p%\data\gess\rm\rm\RM1_RM_17.ges
6 24 Trt1 %p%\data\gess\rm\rm\RM1_RM_18.ges
7 0 Trt2 %p%\data\gess\rm\rm\RM1_RM_19.ges
7 12 Trt2 %p%\data\gess\rm\rm\RM1_RM_20.ges
7 24 Trt2 %p%\data\gess\rm\rm\RM1_RM_21.ges
8 0 Trt2 %p%\data\gess\rm\rm\RM1_RM_22.ges
8 12 Trt2 %p%\data\gess\rm\rm\RM1_RM_23.ges
8 24 Trt2 %p%\data\gess\rm\rm\RM1_RM_24.ges
9 0 Trt2 %p%\data\gess\rm\rm\RM1_RM_25.ges
9 12 Trt2 %p%\data\gess\rm\rm\RM1_RM_26.ges
9 24 Trt2 %p%\data\gess\rm\rm\RM1_RM_27.ges
This report displays the expected mean squares for each term in the model.
Source Term
The source of variation or term in the model.
DF
The degrees of freedom, which is the number of observations used by this term.
Term Fixed?
Indicates whether the term is fixed or random.
Denominator Term
Indicates the term used as the denominator in the F-ratio.
Expected Mean Square
This expression represents the expected value of the corresponding mean square if the design
were completely balanced. S represents the expected value of the mean square error (sigma). The
uppercase letters represent either the adjusted sum of squared treatment means if the factor is
fixed, or the variance component if the factor is random. The lowercase letter represents the
number of levels for that factor, and s represents the number of replications of the experimental
layout.
These EMS expressions are provided to determine the appropriate error term for each factor. The
correct error term for a factor is that term whose EMS is identical except for the factor being
tested.
In this example, the appropriate error term for treatment is B(A).
225-18 Repeated Measures ANOVA
This report displays the genes for which the Bonferroni adjusted Prob Level is less than 0.05.
Gene Name
This is the name or ID of the genes for which the Bonferroni adjusted Prob Level is less than
0.05.
Subset Name
This is the name of the specified subset to which this gene belongs. If the gene is a not a member
of a subset list the default subset name is Other.
Bonferroni Adjusted Multiple Tests Prob Level
This is the Prob Level for the specified hypothesis test following a Bonferroni correction.
Single Test Prob Level
This is the Prob Level of the individual test, before multiple test correction is done.
F Value
This is the value of the F Statistic used to conduct the hypothesis test of interest.
DF1/DF2
DF1 is the number of degrees of freedom for the numerator. DF2 is the number of degrees of
freedom for the denominator.
SQRT MS Numerator
It is square root of the numerator of the F Statistic. It gives an idea of the variation among means.
SQRT MS Denominator
This is square root of the denominator of the F Statistic.
Repeated Measures ANOVA 225-19
This report displays the genes for which the Bonferroni adjusted Prob Level is less than 0.05.
This report displays the gene for which the Bonferroni adjusted Prob Level is less than 0.05.
15.0 262.5
Count
Count
10.0 175.0
5.0 87.5
0.0 0.0
0.0 0.3 0.5 0.8 1.0 0.0 0.3 0.5 0.8 1.0
Histogram of Log10(Prob Level) for Term = A Histogram of Log10(Corrected Prob Level) for Term = A
120.0 350.0
90.0 262.5
Count
Count
60.0 175.0
30.0 87.5
0.0 0.0
-6.0 -4.5 -3.0 -1.5 0.0 -3.0 -2.3 -1.5 -0.8 0.0
Histogram of Z(Prob Level) for Term = A Histogram of Z(Corrected Prob Level) for Term = A
50.0 350.0
37.5 262.5
Count
Count
25.0 175.0
12.5 87.5
0.0 0.0
-6.0 -3.5 -1.0 1.5 4.0 -4.0 -0.5 3.0 6.5 10.0
These six plots are used to examine the distribution of the P-Values (Prob Levels) of all genes in
the experiment, before and after the multiple testing correction. The Log (Base 10) and Z
(Normal) transformations aid in examining the distribution of the P-Values (Prob Levels) that are
extremely close to zero.
Histogram of SQRT(MS Numerator) for Term = A Histogram of SQRT(MS Denominator) for Term = A
80.0 40.0
60.0 30.0
Count
Count
40.0 20.0
20.0 10.0
0.0 0.0
0.0 0.9 1.8 2.6 3.5 0.0 0.4 0.7 1.1 1.4
The distributions of the sqrt(mean square numerator) and sqrt(mean square denominator) a feel
for the components of the calculated F Values. Often these plots will be omitted.
Repeated Measures ANOVA 225-21
Count 262.5
175.0
87.5
0.0
0.0 37.5 75.0 112.5 150.0
The distribution of the F Statistics can show the position of extreme F Values. Often this plot will
be omitted.
Step 1 – Pre-Processing
The 24 arrays used in the example have already been pre-processed using one of the pre-
processing procedures. The spreadsheet containing the pathways for these files is the RM2_Split
dataset. To open the RM2_Split dataset, use the following steps.
Random numbers may be entered into a vacant column to verify that the setup is correct. The title
for the column may be named Random. The spreadsheet should now look like the following.
RM2_Split_a dataset
Subject Gender Treatment OutputFile Random
1 M Trt1 %p%\data\gess\rm\split\RM2_Split_1.ges 6
1 M Trt2 %p%\data\gess\rm\split\RM2_Split_2.ges 5
2 M Trt1 %p%\data\gess\rm\split\RM2_Split_3.ges 8
2 M Trt2 %p%\data\gess\rm\split\RM2_Split_4.ges 9
3 M Trt1 %p%\data\gess\rm\split\RM2_Split_5.ges 7
3 M Trt2 %p%\data\gess\rm\split\RM2_Split_6.ges 6
4 M Trt1 %p%\data\gess\rm\split\RM2_Split_7.ges 4
4 M Trt2 %p%\data\gess\rm\split\RM2_Split_8.ges 6
5 M Trt1 %p%\data\gess\rm\split\RM2_Split_9.ges 5
5 M Trt2 %p%\data\gess\rm\split\RM2_Split_10.ges 3
6 M Trt1 %p%\data\gess\rm\split\RM2_Split_11.ges 8
6 M Trt2 %p%\data\gess\rm\split\RM2_Split_12.ges 9
7 F Trt1 %p%\data\gess\rm\split\RM2_Split_13.ges 9
7 F Trt2 %p%\data\gess\rm\split\RM2_Split_14.ges 4
8 F Trt1 %p%\data\gess\rm\split\RM2_Split_15.ges 5
8 F Trt2 %p%\data\gess\rm\split\RM2_Split_16.ges 4
Repeated Measures ANOVA 225-23
9 F Trt1 %p%\data\gess\rm\split\RM2_Split_17.ges 6
9 F Trt2 %p%\data\gess\rm\split\RM2_Split_18.ges 8
10 F Trt1 %p%\data\gess\rm\split\RM2_Split_19.ges 5
10 F Trt2 %p%\data\gess\rm\split\RM2_Split_20.ges 6
11 F Trt1 %p%\data\gess\rm\split\RM2_Split_21.ges 7
11 F Trt2 %p%\data\gess\rm\split\RM2_Split_22.ges 3
12 F Trt1 %p%\data\gess\rm\split\RM2_Split_23.ges 2
12 F Trt2 %p%\data\gess\rm\split\RM2_Split_24.ges 5
To analyze the random column using the NCSS: Repeated Measures Analysis of Variance
procedure, take the following steps.
The Denominator Term and Expected Mean Squares are correct. The three F-Tests are those
desired in the gene expression analysis. The appropriateness of the setup has been verified.
This report displays the expected mean squares for each term in the model. The columns are
described in Example 1.
This report displays the genes for which the False Discovery Rate Adjusted Prob Level is less
than 0.05 for Gender. The columns are described in Example 1.
225-26 Repeated Measures ANOVA
This report displays the genes for which the False Discovery Rate Adjusted Prob Level is less
than 0.05 for Treatment. The columns are described in Example 1.
This report displays the genes for which the False Discovery Rate Adjusted Prob Level is less
than 0.05 for the interaction. The columns are described in Example 1.
Storage Data
The pre-processed data for the 2 most significant genes are output into the spreadsheet.
RM2_Split dataset after data storage
Factor1 Factor2 Treatment OutputFile Random X31962_at X94766_at
1 M Trt1 ...1.ges 6 4.347998793 3.60229846
1 M Trt2 ...2.ges 5 4.871123991 2.547631461
2 M Trt1 ...3.ges 8 4.111716766 5.001783623
2 M Trt2 ...4.ges 9 5.771551162 2.84931956
3 M Trt1 ...5.ges 7 4.206340193 4.490210641
3 M Trt2 ...6.ges 6 5.639026008 2.438519502
4 M Trt1 ...7.ges 4 3.75557322 4.313683288
4 M Trt2 ...8.ges 6 5.018803914 4.540379736
5 M Trt1 ...9.ges 5 4.788529126 4.302717763
5 M Trt2 ...10.ges 3 6.887367824 2.716635127
6 M Trt1 ...11.ges 8 3.738071745 4.61868161
6 M Trt2 ...12.ges 9 5.237574131 2.834071854
7 F Trt1 ...13.ges 9 4.395934876 4.179028364
7 F Trt2 ...14.ges 4 5.157022636 2.884850643
8 F Trt1 ...15.ges 5 4.212181686 4.963690929
Repeated Measures ANOVA 225-27
An X is added at the beginning of the variable names to avoid a variable name beginning with a
number. This data can be analyzed further using the NCSS: Repeated Measures Analysis of
Variance procedure. However, when hypothesis tests are run using the NCSS: Repeated Measures
Analysis of Variance procedure, adjustments for multiplicity of tests across genes are no longer
made.
Note: Mauchly's statistic actually tests the more restrictive assumption that the pooled covariance matrix
has compound symmetry.
Repeated Measures ANOVA 225-29
Plots Section
6.13 6.13
X31962_at
4.38 4.38
3.50 3.50
F M Trt1 Trt2
Gender Treatment
X31962_at
5.25 5.25
4.38 4.38
3.50 3.50
F M Trt1 Trt2
Gender Treatment
6.13
X31962_at
5.25
4.38
3.50
Trt1 Trt2
Treatment
225-30 Repeated Measures ANOVA
The full analysis of variance and means tables, tests of assumptions, and graphics can be used to
further study the results of each gene that is found to be statistically significant. Notice, however,
that no correction is made for multiple testing across genes in NCSS. Details of the NCSS:
Repeated Measures Analysis of Variance procedure are in Chapter 214 of the NCSS manuals.