Repeated Measures ANOVA PDF

225-1
Chapter 225
Repeated
Measures ANOVA
Introduction
This chapter describes how to obtain false discovery rate or experiment-wise error rate
(Bonferroni) adjusted P-values (Probability Levels) for a repeated measures (within-subject)
experiment using the GESS: Repeated Measures ANOVA procedure. The general linear models
approach for repeated measures is used in this procedure. Up to three between factor variables
and three within factor variables, as well as interactions, may be specified in the Repeated
Measures ANOVA procedure. Geisser-Greenhouse, Box, and Huynh-Feldt corrections on the
within-subject F tests are available in this procedure. A detailed discussion of Repeated Measures
Analysis of Variance is found in the NCSS: Repeated Measure Analysis of Variance chapter.
Before running this procedure, output (.ges) files containing a single expression value for each
gene on each array must be obtained using the appropriate pre-processing procedure in GESS.
Repeated Measures ANOVA

The Repeated Measures ANOVA procedure produces an F-Test for each gene for each term of
the model used. Geisser-Greenhouse-, Box, or Huynh-Feldt corrections for the unadjusted P-
values (Probability Levels) may be made previous to multiple testing correction. A discussion of
repeated measures, disadvantages of within-subjects designs, assumptions, and P-Value
corrections are found in Chapter 214, Repeated Measures Analysis of Variance, of the NCSS
Manual.
Multiple Testing Adjustment

When a repeated measures analysis of variance is run, the result is a P-value (Probability Level)
for each fixed factor that reflects the evidence of difference in expression for at least one level of
the factor. When hundreds or thousands of genes are investigated at the same time, many ‘small’
P-values will occur by chance, due to the natural variability of the process. It is therefore requisite
to make an appropriate adjustment to the P-value (Probability Level), such that the likelihood of a
false conclusion is controlled.
225-2 Repeated Measures ANOVA
Benjamini and Hochberg’s (1995) False Discovery Rate Table

The following table (adapted to the subject of microarray data) is found in Benjamini and
Hochberg’s (1995) false discovery rate article. In the table, m is the total number of tests, m0 is
the number of tests for which there is no difference in expression, R is the number of tests for
which a difference is declared, and U, V, T, and S are defined by the combination of the
declaration of the test and whether or not a difference exists, in truth.
Declared Declared
Not Different Different Total
A true difference in
expression does not exist U V m0
There exists a true
difference in expression T S m – m0
Total m–R R m
In the table, the m is the total number of hypotheses tested (or total number of genes) and is
assumed to be known in advance. Of the m null hypotheses tested, m0 is the number of tests for
which there is no difference in expression, R is the number of tests for which a difference is
declared, and U, V, T, and S are defined by the combination of the declaration of the test and
whether or not a difference exists, in truth. The random variables U, V, T, and S are unobservable.
Need for Multiple Testing Adjustment

Following the calculation of a raw P-value for each test, P-value adjustments need be made to
account in some way for multiplicity of tests. It is desirable that these adjustments minimize the
number of genes for which factors are falsely declared different (V) while maximizing the number
of genes that are correctly declared different (S). To address this issue the researcher must know
the comparative value of finding a gene to the price of a false positive. If a false positive is very
expensive, a method that focuses on minimizing V should be employed. If the value of finding a
gene is much higher than the cost of additional false positives, a method that focuses on
maximizing S should be used.
Error Rates – P-Value Adjustment Techniques

Below is a brief description of three common error rates that are used for control of false positive
declarations. The commonly used P-value adjustment technique for controlling each error rate is
also described.
Per-Comparison Error Rate (PCER) – No Multiple Testing Adjustment

The per-comparison error rate (PCER) is defined as
PCER = E (V ) / m ,
where E(V) is the expected number of genes that are falsely declared different, and m is the total
number of tests. Preserving the PCER is tantamount to ignoring multiple testing altogether. If a
method is used which controls a PCER of 0.05 for 1,000 tests, approximately 50 out of 1,000
tests will falsely be declared significant. Using a method that controls the PCER will produce a
list of genes that includes most of the genes for which there exists a true difference in expression
Repeated Measures ANOVA 225-3
(i.e., maximizes S), but it will also include a very large number of genes which are falsely
declared to have a true difference in expression (i.e., does not appropriately minimize V).
Controlling the PCER should be viewed as overly weak control of Type I error.
To obtain P-values (Probability Levels) that control the PCER, no adjustment is made to the P-
value. To determine significance, the P-value is simply compared to the designated alpha.
Family-Wise Error Rate (FWER) – Bonferroni Adjustment

The family-wise error rate (FWER) is defined as
FWER = Pr(V > 0) ,
where V is the number of genes that are falsely declared different. Controlling FWER is
controlling the probability that a single null hypothesis is falsely rejected. If a method is used
which controls a FWER of 0.05 for 1,000 tests, the probability that any of the 1,000 tests
(collectively) is falsely rejected is 0.05. Using a method that controls the FWER will produce a
list of genes that includes a small (depending also on sample size) number of the genes for which
there exists a true difference in expression (i.e., limits S, unless the sample size is very large).
However, the list of genes will include very few or no genes that are falsely declared to have a
true difference in expression (i.e., stringently minimizes V). Controlling the FWER should be
considered very strong control of Type I error.
Assuming the tests are independent, the well-known Bonferroni P-value adjustment produces
adjusted P-values (Probability Levels) for which the FWER is controlled. The Bonferroni
adjustment is applied to all m unadjusted P-values ( p j ) as
p% j = min(mp j ,1) .
That is, each P-value (Probability Level) is multiplied by the number of tests, and if the result is
greater than one, it is set to the maximum possible P-value of one.
False Discovery Rate (FDR) – Benjamini and Hochberg Adjustment

The false discovery rate (FDR) (Benjamini and Hochberg, 1995) is defined as
V V
FDR = E ( 1{R >0} ) = E ( | R > 0) Pr( R > 0) ,
R R
where R is the number of genes that are declared significantly different, and V is the number of
genes that are falsely declared different. Controlling FDR is controlling the expected proportion
of falsely declared differences (false discoveries) to declared differences (true and false
discoveries, together). If a method is used which controls a FDR of 0.05 for 1,000 tests, and 40
genes are declared different, it is expected that 40*0.05 = 2 of the 40 declarations are false
declarations (false discoveries). Using a method that controls the FDR will produce a list of genes
that includes an intermediate (depending also on sample size) number of genes for which there
exists a true difference in expression (i.e., moderate to large S). However, the list of genes will
include a small number of genes that are falsely declared to have a true difference in expression
(i.e., moderately minimizes V). Controlling the FDR should be considered intermediate control of
Type I error.
Assuming the tests are independent, the Benjamini and Hochberg P-value adjustment produces
adjusted P-values (Probability Levels) for which the FDR is controlled. These adjusted P-values
are found as
m
p% ri = min {min( pr ,1)} ,
k =i ,..., m k k
where pr1 ≤ pr2 ≤ L ≤ prm are the observed ordered unadjusted P-values. The procedure is
defined in Benjamini and Hochberg (1995). The corresponding adjusted P-value definition given
here is found in Dudoit, Shaffer, and Boldrick (2003).
Multiple Testing Adjustment Comparison

The following table gives a summary of the multiple testing adjustment procedures and error rate
control. The power to detect differences also depends heavily on sample size.
Adjustment Error Rate Control of Power to

Technique Controlled Type I Error Detect Differences
None PCER Minimal High
Bonferroni FWER Strict Low
Benjamini and FDR Moderate Moderate/High
Hochberg
Type I Error: Rejection of a null hypothesis that is true.
Analysis Steps
Following are the recommended steps for running an Repeated Measures ANOVA on microarray
data.
Step 1 – Pre-Processing
Run the appropriate pre-processing procedure (e.g., GenePix Pre-processing or Affymetrix Pre-
processing) to prepare data (.ges) files for statistical analysis. The .ges files are created when a
variable name is entered in the Output File Names Variable box on the variables tab of the pre-
processing procedure window.
Step 2 – Spreadsheet Setup

Because the analysis for hundreds or thousands of genes may be time-consuming, it is
recommended that an initial run be made on fictitious data to assure the spreadsheet is setup
properly. Perhaps the most important part of this initial run is careful specification of the model.
Correct specification may be verified by confirming that the expected sums of squares are shown
in the output for the run on fictitious data. The importance of this step increases as the complexity
of the statistical analysis increases. This step is also useful for getting ideas for follow-up
statistical analyses of specific genes.
Step 3 – Run the Analysis

Carefully specify the model and the Prob Level Cutoff. If follow-up experiments are to be run,
the False Discovery Rate Control adjustment is recommended. If there will be no follow-up
experiments, the Bonferroni adjustment is recommended. The pre-processed data for the most
significant genes should be stored in the spreadsheet for detailed follow-up analysis.
Examine the output to determine if the number of hypothesis tests conducted is as expected, and
to see if the appropriate number of replicates was used. It may also help to look at the Prob Level
histogram to understand the distribution of statistics across the entire experiment.
Step 4 – Follow-Up Analysis

Run individual follow-up statistical analyses on the genes for which pre-processed data was
stored using the Repeated Measures Analysis of Variance procedure in NCSS. These individual
analyses are useful for examining test assumptions and specific trends in greater detail. Note,
however, that statistical tests are not adjusted for multiple testing across genes in the NCSS
procedures.
Procedure Options
This section describes the options available in this procedure.
Variables Tab
These options specify the variables that will be used in the analysis.
GES Files Specifications

These variables are used to identify the .ges files for the Repeated Measures ANOVA.
Response GES Files Variable
Specify the variable containing the column of input files on the spreadsheet. These input files will
usually be those files that were output as a result of a pre-processing procedure. The files of this
column contain the intensity summaries that will be the responses in the model.
Factor Specification
These variables are used to identify all the factors to be used in the model.
Between Variables (1-3)
These three variables specify between-subject factors. A between-subject factor specifies groups
into which the subjects are divided. For example, gender, age group, and treatment group are all
between factors.
The values of these variables indicate which group the subject belongs in. Values may be text or
numeric.
Subject Variable
This variable indicates the variable containing the subject identification number or phrase.
Note that this variable is treated as a 'nested' factor.
Within Variables (1-3)
These three variable define within-subject factors. A within-subject factor is one whose levels
represent different points in time or space. It is the 'repeated measurement'.
Examples of within-subject factors are time, pre-post, and body organ.
Type
This option specifies the type of each factor. The options are
• Fixed
A fixed factor includes all possible values across the range of interest. Usually, hypotheses
are tested about fixed factors. For example, gender, dose-level, and treatment-group are
examples of fixed factors.
• Random
A random factor includes a sample from the population of possible values. Examples of
random factors are hospitals, cities, and randomly-selected sites.
Model Specification
These options determine the model that will be analyzed.
Which Model Terms
A design in which all main effect and interaction terms are included is called a saturated model.
Occasionally, it is useful to omit various interaction terms from the model-usually because some
data values are missing. This option lets you specify which interactions to keep.
The options included here are:
• Full Model. Use all terms.

The complete, saturated model is analyzed. All reports will be generated when this option is
selected.
• Full model except subject interactions combined with error.

Some authors recommend pooling the interactions involving the subject factor into one error
term to achieve more error degrees of freedom and thus more power in the F-tests. This
option lets you do this. Note that the Geisser-Greenhouse corrections are not made in this
case.
• Use the Custom Model given below.

This option indicates that you want the Custom Model (given in the next box) to be used.
Custom Model
When 'Custom Model' is selected in the Which Model Terms above, the actual analysis of
variance model is entered here.
For complicated designs, it is usually easier to check the option 'Write Only', and run the
procedure. A model containing the listed factors will be generated and placed in this box. You
can then edit it as you desire.
The model is entered using letters (in alphabetical order) separated by the plus sign. For example,
a three-factor factorial in which only two-way interactions are needed would be entered as
follows:
A+B+AB+C+AC+BC
A simple repeated-measures design would look like this:
A+B(A)+C+AC+BC(A)
Write Model in ‘Custom Model’ Field. Do Not Process Data.
When this option is checked, no data analysis is performed when the procedure is run. Instead, a
copy of the full model is stored in the Custom Model box. You can then edit the model as desired.
Correction Option
Geisser-Greenhouse Correction
In a repeated measures ANOVA, the regular F-Tests of the within factors may not meet all of the
necessary assumptions. Geisser-Greenhouse proposed an adjustment to make the probability
levels more accurate. Box made a popular refinement. Huynh-Feldt made a further refinement
that made the probability level even more accurate.
Select here the type of adjustment you want to use.
RECOMMENDATION:
We recommend the Huynh-Feldt adjustment.
Adjustment for Multiple Testing

Multiple Test Correction
When several tests are performed on the same set of data, the probability levels of the individual
tests should be corrected. This option lets you specify the type of multiple test correction.
• None
No correction is done.
• Bonferroni
The Bonferroni correction preserves the experiment-wise error rate.
• False Discovery Rate Control

False Discovery Rate Control controls the proportion of falsely declared significant
differences.
• Recommendation
If you will be doing follow-up testing, False Discovery Rate Control should be used. If not,
the Bonferroni correction should be used.
Reports Tab
The options on this panel control which reports and plots are generated.
Select Reports
The following options are used to determine the reports that will be displayed.
Expected Mean Square Report
Check this box to obtain the expected mean square for each model term.
Test Detail Sorted by Prob Level
Check this box to obtain a list of the most significant F tests, sorted by the probability level.
Associated names or IDs, unadjusted probability levels, standard deviations of means, standard
errors, degrees or freedom, and test statistics are also shown.
A separate report is produced for each term in the model.
Prob Level Cutoff
Specify the cutoff for the multiple test corrected probability levels. When the Test Detail Sorted
by Prob Level box is checked, all adjusted probability levels below this value will be reported.
Test Detail Sorted by Gene Within Subset
Check this box to obtain a list of all genes that are in subset lists. Associated probability levels,
standard deviations of means, standard errors, degrees of freedom, and test statistics are also
shown.
A separate list is produced for each subset, sorted alphabetically. A separate report is produced
for each term in the model.
Report Options
These options determine the format of the reports.
Precision
Specifies whether unformatted numbers are displayed as single (7-digit) or double (13-digit)
precision numbers.
• Single
Unformatted numbers are displayed with 7-digits. This is the default setting. All reports have
been formatted for single precision.
• Double
Unformatted numbers are displayed with 13-digits. This option is most often used when the
extremely accurate results are needed for further calculation. Double precision numbers will
require more space than allotted, potentially resulting in unaligned output. This option is
provided for those instances when accuracy is more important than format alignment.
COMMENTS:
This option does not affect formatted numbers such as probability levels.
This option only influences the format of the numbers as they are output. All calculations are
performed in double precision regardless of selection.
Prob Decimals
Specify the number of decimal places to be used for displaying probability levels on the reports.
The number chosen here does not affect the internal precision of the data.
sqrt(MS) Decimals
Specify the number of decimal places to be used for displaying square root transformed mean
squares on the reports. The number chosen here does not affect the internal precision of the data.
F Value Decimals
Specify the number of decimal places to be used for displaying F-statistics on the reports. The
number chosen here does not affect the internal precision of the data.
Select Histograms
The following options are used to determine which histograms will be displayed.
Histogram of Prob Level
Check this box to obtain a histogram of the unadjusted (raw) probability levels.
Histogram of Corrected Prob Level
Check this box to obtain a histogram of all corrected probability levels.
Histogram of Log10(Prob Level)
Check this box to obtain a histogram of the Log(base 10) transformed, unadjusted (raw)
probability levels. When the mean square denominator is zero, the Log10(Prob Level) is put in
the bin at -5.
Histogram of Log10(Corrected Prob Level)
Check this box to obtain a histogram of all Log(base 10) transformed, corrected probability
levels. Occasionally, a mean square denominator of zero occurs, producing an undefined Prob
Level. When the mean square denominator is zero, the Log10(Corrected Prob Level) is put in the
bin at -5.
Histogram of Z(Prob Level)
Check this box to obtain a histogram of Z-transformed unadjusted (raw) probability levels. The
Z-transformation converts the probability level into the corresponding standard normal
distribution value using the probability integral transform. Values less than -9 are binned at -9.
Values greater than 9 are binned at 9.
Histogram of Z(Corrected Prob Level)
Check this box to obtain a histogram of Z-transformed corrected probability levels. The Z-
transformation converts the probability level into the corresponding standard normal distribution
value using the probability integral transform. Values less than -9 are binned at -9. Values greater
than 9 are binned at 9.
Histogram of SQRT(Mean Square Numerator)
Check this box to obtain a histogram of all square root transformed numerator mean squares for
each factor.
Histogram of SQRT(Mean Square Denominator)
Check this box to obtain a histogram of all square root transformed denominator mean squares for
each factor.
Histogram of F Value
Check this box to obtain a histogram of all F values for each factor. When the mean square
denominator is zero, the value 100 is used in the histogram.
Computational Option
Genes Per Batch
To optimize the use of computer memory, the genes are processed in groups or batches. This
parameter specifies the number of genes processed per batch.
The basic rule is that the number of genes per batch times the number of arrays should be less
than 500,000.
If you choose 'Automatic', the program will select a reasonable value.
Histograms Tab
The options on this panel control the appearance of the histograms.
Vertical and Horizontal Axes

These options are used to format the histogram axes.
Label
Enter text here for the designated label.
REPLACEMENT CODES:
The following code is replaced by the appropriate name when the plot is generated.
{X} is replaced by the statistic that is reported in the histogram.
Minimum
Specify the value to be displayed as the minimum on this axis. Data values less than this amount
will be ignored.
If this value is left blank, the minimum will be determined from the data.
Maximum
Specify the value to be displayed as the maximum on this axis. Data values greater than this
amount will be ignored.
If this value is left blank, the maximum will be determined from the data.
Tick Label Settings…
This option specifies the characteristics of the reference numbers. It displays a window that edits
the font size and color of the reference numbers that appear next to the text along the axis of the
plot. It also allows you to set the number of digits in the reference numbers as well as their
vertical/horizontal orientation.
Note that in some cases, the format specified here is overridden by the variable's format as
specified on the database in the Variable Info Sheet.
Major Ticks
Specify the number of large tickmarks and optional grid lines along this axis. A set of minor
tickmarks will be generated between each pair of major tickmarks. A reference number is
displayed adjacent to each major tickmark.
Minor Ticks
Select the number of small tickmarks to be displayed between each pair of major (large)
tickmarks along this axis.
Show Grid Lines
Check this option to display grid lines at the major tickmarks along this axis.
NOTE: Since the grid lines are drawn out from the tickmarks, they appear perpendicular to the
axis. Thus, checking the Y Grid Lines will actually cause horizontal grid lines to appear.
Histogram Settings
These options are used to specify the appearance of the histograms.
Style File
Designate a histogram style file. This file sets all histogram options that are not set directly on
this panel. Unless you choose otherwise, the HistoBox style file is used. Histogram style files are
created in the Histograms procedure.
Number of Bars
Specify the number of bars (bins) to be displayed. Select '0 - Automatic' to direct the program to
select an appropriate number based on the number of values.
Interior Color
Specify the histogram interior color.
Background Color
Specify the histogram background color.
Bar Fill Color
Specify the color of the inside of the bars.
Bar Border Color
Specify the color of the lines around the bars.
Horizontal Axis Minimums and

Maximums
Horizontal Axis Maximum
Horizontal Axis Minimum

Histogram Title
Title
Enter text here for the histogram title.
REPLACEMENT CODES:
The following code is replaced by the appropriate name when the plot is generated.
{X} is replaced by the statistic that is reported in the histogram.
Storage Tab
The options on this panel control the storage of pre-processed data values on the spreadsheet for
further analysis.
Model Term Used for Storage

Term
Specify the term for which the pre-processed gene data are stored on the database. The pre-
processed values for all rows are stored, beginning with the gene with the most significant
(smallest probability level) for this term.
Spreadsheet Storage of the NAMES

of Significant Genes
These options determine whether the names of significant genes will be stored and where.
Store the names of the most significant genes on the spreadsheet
Check this box to store a list of names of the most significant genes into the variable (column)
specified under Store Gene Names in Variable.
Store Gene Names in Variable
If the box immediately below is checked, the names of the most significant genes will be stored in
the column associated with this variable.
Any data that is already in this variable will be overwritten.
Spreadsheet Storage of the

EXPRESSION VALUES of Significant
Genes
These options determine whether the expression values of significant genes will be stored and
where.
Store the data values of the most significant genes on the spreadsheet
Check this box to store the pre-processed data values of all genes for which the corrected
probability level is below the cutoff value.
This allows the user to utilize other procedures to obtain follow-up analyses and graphics for the
significant genes.
Store Expression Values Beginning with Variable
The values of the most significant gene will be stored in this variable. The values for each
additional significant gene are stored in the variables immediately to the right of this variable.
Leave this value blank if you want the data storage to begin in the first blank column on the right-
hand side of the data.
WARNING: Use caution when selecting this variable, since existing data is automatically
replaced when the storage variables are created.
Maximum Storage Variables Used
Specify the maximum number of variables (columns) for which you want the gene intensity data
stored on the spreadsheet. This choice may be particularly important when the number of
significant genes is large.
Note that NCSS spreadsheets are limited to 255 variables, so if you want to store more values,
you will have to add more sheets.
Subsets 1 - 9 Tabs
The options on this panel control the names and lists of subsets.
Subset 1 – 9
Name
The name of the gene subset is entered here.
Separate reports may be generated to show all genes of a subset (see Reports tab). This may be
useful for examining probability levels of specific genes you are interested in that do not make
the cutoff.
Genes in this Subset
Enter a list of genes that are to be in this subset. The genes may be entered directly, or the *
character may be used to specify all genes with a particular beginning. The gene names or IDs
entered in this list must be in the column specified in Gene Name From box on the Variables tab.
EXAMPLES:
Blank
spike1
spike3
spike5
spike* (all names beginning with spike)
AA44719
NM_00582
NM_04762
NM_27564
cntrl* (all names beginning with cntrl)
file(C:\Microarray\genelist.txt) (all names in the genelist.txt file)
var(OutputGenes) (all names in the spreadsheet variable with the variable name OutputGenes)
These Genes are
Specify here whether the genes of this subset are to be included or excluded from the list of genes
that are analyzed. Probability levels will not be calculated for the genes of this subset when
'Excluded' is entered here.
Non-Subset (Ungrouped) Genes

Name of Ungrouped Set
Enter the subset name to be used for all genes that are not included in any of the nine subsets.
Ungrouped Genes are
Specify here whether the genes not listed in any other subset are to be included or excluded from
the list of genes that are analyzed. Probability levels will not be calculated for these genes when
'Excluded' is entered here.
Excluding the genes of the ungrouped subset may be useful when analyzing only a small subset
of the genes of the array is desired.
Template Tab
The options on this panel allow various sets of options to be loaded (File menu: Load Template)
or stored (File menu: Save Template). A template file contains all the settings for this procedure.
Specify the Template File Name

File Name
Designate the name of the template file either to be loaded or stored.
Select a Template to Load or Save

Template Files
A list of previously stored template files for this procedure.
Template Id’s
A list of the Template Id’s of the corresponding files. This id value is loaded in the box at the
bottom of the panel.
Example 1 – Repeated Measures ANOVA

The effect on gene expression of a control and 2 treatments are to be monitored over time. The
gene expression measurement times of interest are 0 hours, 12 hours, and 24 hours. Nine rats are
randomly assigned to the three treatment groups such that 3 rats are in each group. Blood samples
are taken from each rat immediately after treatment, 12 hours after treatment, and 24 hours after
treatment.
Time
Rat 0 Hours 12 Hours 24 Hours
1 Sample 1,1 Sample 1,2 Sample 1,3
Control 2 Sample 2,1 Sample 2,2 Sample 2,3
Treatment 1 5 Sample 5,1 Sample 5,2 Sample 5,3
Treatment 2 8 Sample 8,1 Sample 8,2 Sample 8,3
The result is 27 samples. Each sample is processed, exposed to a single microarray, resulting in a
single expression value for each gene for each rat of each treatment group at each time period.
The goal is to determine for each gene whether there is evidence that the expression is different
across treatment, time, and/or if there is a treatment by time interaction.
In the pre-processing procedure, 27 files are created. The format of the spreadsheet is shown
below.
RM1_RM dataset
Rat Time Treatment OutputFile
1 0 C %p%\data\gess\rm\rm\RM1_RM_1.ges
4 0 Trt1 %p%\data\gess\rm\rm\RM1_RM_10.ges
The spreadsheet data used are recorded in the RM1_RM dataset.

You may follow along here by making the appropriate entries or load the completed template
Example 1 from the Template tab of the GESS Repeated Measures ANOVA window.
1 Open the RM1_RM dataset.

• From the File menu of the NCSS Data window, select Open.
• Select the Data subdirectory of your NCSS directory.
• Open the GESS folder.
• Click on the file RM1_RM.S0.
• Click Open.
2 Open the GESS Repeated Measures ANOVA window.

• On the menus, select GESS, then Analysis of Variance Routines, then Repeated
Measures ANOVA. The GESS Repeated Measures ANOVA procedure will be
displayed.
• On the menus, select File, then New Template. This will fill the procedure with the
default template. Alternatively, load the Example 1 Template, which generates the
specifications described below.
3 Specify the variables and hypothesis test details.

• On the GESS Repeated Measures ANOVA window, select the Variables tab.
• Set the Response GES Files Variable to OutputFile.
• Set the first variable box beneath Between Variables to Treatment.
• Set the Type for Treatment to Fixed.
• Set the Subject Variable to Rat.
• Set the first variable box beneath Within Variables to Time.
• Set the Type for Time to Fixed.
• Set Which Model Terms to Full model. Use all terms.
• Set Geisser-Greenhouse Correction to Huynh-Feldt.
• Set the Multiple Test Correction to Bonferroni.
4 Specify the reports.

• Select the Reports tab.
• Check the box next to Expected Mean Square Report.
• Check the box next to Test Detail Sorted by Prob Level.
• Set the Prob Level Cutoff to 0.05.
• Check all other boxes except Test Detail Sorted by Gene Within Subset.
5 Run the procedure.

• From the Run menu, select Run Procedure. Alternatively, just click the Run button (the
left-most button on the button bar at the top).
Expected Mean Squares Section

Source Term Denominator Expected
Term DF Fixed? Term Mean Square
A: Treatment 2 Yes B(A) S+csB+bcsA
B(A): Rat 6 No S(ABC) S+csB
C: Time 2 Yes BC(A) S+sBC+absC
AC 4 Yes BC(A) S+sBC+bsAC
BC(A) 12 No S(ABC) S+sBC
S(ABC) 0 No S
Note: Expected Mean Squares are for the balanced cell-frequency case.
This report displays the expected mean squares for each term in the model.
Source Term
The source of variation or term in the model.
DF
The degrees of freedom, which is the number of observations used by this term.
Term Fixed?
Indicates whether the term is fixed or random.
Denominator Term
Indicates the term used as the denominator in the F-ratio.
Expected Mean Square
This expression represents the expected value of the corresponding mean square if the design
were completely balanced. S represents the expected value of the mean square error (sigma). The
uppercase letters represent either the adjusted sum of squared treatment means if the factor is
fixed, or the variance component if the factor is random. The lowercase letter represents the
number of levels for that factor, and s represents the number of replications of the experimental
layout.
These EMS expressions are provided to determine the appropriate error term for each factor. The
correct error term for a factor is that term whose EMS is identical except for the factor being
tested.
In this example, the appropriate error term for treatment is B(A).
F-Test Detail for A: Treatment Sorted in Probability Level Order

F-Test Detail for A: Treatment Sorted in Probability Level Order
Bonferroni
Adjusted
Multiple Single SQRT MS SQRT MS
Gene Subset Tests Test DF1/ Num- Denom-
Name Name Prob Level Prob Level F Value DF2 erator inator
93822_at Other 0.0026532 0.0000077 148.986 2/6 3.1713 0.2598
AFFX-Ss_Angioten_3_s_at
Other 0.0080476 0.0000233 101.996 2/6 1.1045 0.1094
37029_at Other 0.0402469 0.0001167 58.397 2/6 3.3276 0.4355
Total number of hypothesis tests conducted = 345

Geisser-Greenhouse Correction: Huynh-Feldt
This report displays the genes for which the Bonferroni adjusted Prob Level is less than 0.05.
Gene Name
This is the name or ID of the genes for which the Bonferroni adjusted Prob Level is less than
0.05.
Subset Name
This is the name of the specified subset to which this gene belongs. If the gene is a not a member
of a subset list the default subset name is Other.
Bonferroni Adjusted Multiple Tests Prob Level
This is the Prob Level for the specified hypothesis test following a Bonferroni correction.
Single Test Prob Level
This is the Prob Level of the individual test, before multiple test correction is done.
F Value
This is the value of the F Statistic used to conduct the hypothesis test of interest.
DF1/DF2
DF1 is the number of degrees of freedom for the numerator. DF2 is the number of degrees of
freedom for the denominator.
SQRT MS Numerator
It is square root of the numerator of the F Statistic. It gives an idea of the variation among means.
SQRT MS Denominator
This is square root of the denominator of the F Statistic.
F-Test Detail for C: Time Sorted in Probability Level Order

F-Test Detail for C: Time Sorted in Probability Level Order
Bonferroni
Adjusted
37189_at Other 0.0000369 0.0000001 81.071 2/12 2.9567 0.3284
37725_at Other 0.0030582 0.0000089 35.707 2/12 1.8401 0.3079
100084_at Other 0.0035796 0.0000104 34.627 2/12 3.1345 0.5327
37001_at Other 0.0343659 0.0000996 21.868 2/12 2.5077 0.5363

This report displays the genes for which the Bonferroni adjusted Prob Level is less than 0.05.
F-Test Detail for AC Sorted in Probability Level Order

Bonferroni
Adjusted
101482_at Other 0.0000030 0.0000000 88.070 4/12 3.4083 0.3632

This report displays the gene for which the Bonferroni adjusted Prob Level is less than 0.05.
Histograms and Plots Section (for Treatment)

Histogram of Prob Level for Term = A Histogram of Corrected Prob Level for Term = A
20.0 350.0
15.0 262.5
Count
Count
10.0 175.0
5.0 87.5
0.0 0.0
0.0 0.3 0.5 0.8 1.0 0.0 0.3 0.5 0.8 1.0
Prob Level for Term = A Corrected Prob Level for Term = A

Histogram of Log10(Prob Level) for Term = A Histogram of Log10(Corrected Prob Level) for Term = A
120.0 350.0
90.0 262.5
Count
Count
60.0 175.0
30.0 87.5
0.0 0.0
-6.0 -4.5 -3.0 -1.5 0.0 -3.0 -2.3 -1.5 -0.8 0.0
Log10(Prob Level) for Term = A Log10(Corrected Prob Level) for Term = A
Histogram of Z(Prob Level) for Term = A Histogram of Z(Corrected Prob Level) for Term = A
50.0 350.0
37.5 262.5
Count
Count
25.0 175.0
12.5 87.5
0.0 0.0
-6.0 -3.5 -1.0 1.5 4.0 -4.0 -0.5 3.0 6.5 10.0
Z(Prob Level) for Term = A Z(Corrected Prob Level) for Term = A
These six plots are used to examine the distribution of the P-Values (Prob Levels) of all genes in
the experiment, before and after the multiple testing correction. The Log (Base 10) and Z
(Normal) transformations aid in examining the distribution of the P-Values (Prob Levels) that are
extremely close to zero.
Histogram of SQRT(MS Numerator) for Term = A Histogram of SQRT(MS Denominator) for Term = A
80.0 40.0
60.0 30.0
Count
Count
40.0 20.0
20.0 10.0
0.0 0.0
0.0 0.9 1.8 2.6 3.5 0.0 0.4 0.7 1.1 1.4
SQRT(MS Numerator) for Term = A SQRT(MS Denominator) for Term = A
The distributions of the sqrt(mean square numerator) and sqrt(mean square denominator) a feel
for the components of the calculated F Values. Often these plots will be omitted.
Histogram of F Value for Term = A

350.0
Count 262.5
175.0
87.5
0.0
0.0 37.5 75.0 112.5 150.0
F Value for Term = A
The distribution of the F Statistics can show the position of extreme F Values. Often this plot will
be omitted.
Example 2 – Split Plot Design – Analysis Steps

In a study, two factors are expected to influence gene expression in humans: gender and a
treatment factor. Blood samples are taken from 6 males and 6 females. Each sample is divided
into two parts. One part receives Treatment 1, while the other part receives Treatment 2. A single
cDNA sample is obtained from each part following treatment, resulting in a total of 24 samples.
Each sample is exposed to a single microarray. The goal is to determine for each gene whether
there is evidence that the expression is different between males and females, across treatments,
and/or if there are interactive effects of gender and treatment on gene expression.
Step 1 – Pre-Processing
The 24 arrays used in the example have already been pre-processed using one of the pre-
processing procedures. The spreadsheet containing the pathways for these files is the RM2_Split
dataset. To open the RM2_Split dataset, use the following steps.
1 Open the RM2_Split dataset.

• Click on the file RM2_Split.S0.
• Click Open.
Step 2 – Spreadsheet Setup

The RM2_Split dataset should appear as
RM2_Split dataset
Subject Gender Treatment OutputFile
1 M Trt1 %p%\data\gess\rm\split\RM2_Split_1.ges
7 F Trt1 %p%\data\gess\rm\split\RM2_Split_13.ges
Random numbers may be entered into a vacant column to verify that the setup is correct. The title
for the column may be named Random. The spreadsheet should now look like the following.
RM2_Split_a dataset
Subject Gender Treatment OutputFile Random
1 M Trt1 %p%\data\gess\rm\split\RM2_Split_1.ges 6
7 F Trt1 %p%\data\gess\rm\split\RM2_Split_13.ges 9
Alternatively, open the RM2_Split_a dataset.
1 Open the RM2_Split_a dataset.

• Click on the file RM2_Split_a.S0.
• Click Open.
To analyze the random column using the NCSS: Repeated Measures Analysis of Variance
procedure, take the following steps.
2 Open the NCSS: Repeated Measures Analysis of Variance window.

• On the menus, select Analysis, then Analysis of Variance (ANOVA), then Repeated
Measures Analysis of Variance. The NCSS: Repeated Measures Analysis of Variance
procedure will be displayed.
default template.
3 Specify the variables.

• Select the Variables tab.
• Set Response Variable(s) to Random.
• Set the Between Factor 1 to Gender.
• Set the Subject Variable to Subject.
• Set the Within Factor 1 to Treatment.
4 Specify the model.

• Select the Model tab.
• Under Which Model Terms, select Full Model. Use all terms.

Repeated Measures ANOVA Output

The Expected Mean Squares Section and Analysis of Variance Table should appear as follows.

A: Gender 1 Yes B(A) S+csB+bcsA
B(A): Subject 10 No S(ABC) S+csB
C: Treatment 1 Yes BC(A) S+sBC+absC
S(ABC) 0 No S
Analysis of Variance Table

Source Sum of Mean Prob Power
Term DF Squares Square F-Ratio Level (Alpha=0.05)
A: Gender 1 6 6 1.17 0.305024 0.165173
B(A): Subject 10 51.33333 5.133333
C: Treatment 1 0.6666667 0.6666667 0.20 0.661086 0.069474
AC 1 0.6666667 0.6666667 0.20 0.661086 0.069474
BC(A) 10 32.66667 3.266667
S 0
Total (Adjusted) 23 91.33334
Total 24
* Term significant at alpha = 0.05
The Denominator Term and Expected Mean Squares are correct. The three F-Tests are those
desired in the gene expression analysis. The appropriateness of the setup has been verified.
Step 3 – Run the Analysis

The following steps should be taken to run the analysis. You may follow along here by making
the appropriate entries or load the completed template Example 2 from the Template tab of the
GESS Repeated Measures ANOVA window.
1 Open the GESS Repeated Measures ANOVA window.

• On the menus, select GESS, then Analysis of Variance Routines, then Repeated
Measures ANOVA. The GESS Repeated Measures ANOVA procedure will be
displayed.
default template. Alternatively, load the Example 2 Template, which generates the
specifications described below.
2 Specify the variables and hypothesis test details.

• On the GESS Repeated Measures ANOVA window, select the Variables tab.
• Set the Response GES Files Variable to OutputFile.
• Set the first variable box beneath Between Variables to Gender.
• Set the Type for Treatment to Fixed.
• Set the first variable box beneath Within Variables to Treatment.
• Set the Type for Time to Fixed.
• Set Which Model Terms to Full model. Use all terms.
• Set Geisser-Greenhouse Correction to None (Regular F).
• Set the Multiple Test Correction to False Discovery Rate Control.
3 Specify the storage options.

• Select the Storage tab.
• Set Term used for Determining Storage to C (corresponding to Treatment).
• Check the box next to Store the data values of the most significant genes on the
spreadsheet.
• Set Store Expression Values Beginning with Variable to C6.
• Set Maximum Storage Variables used to 2.
4 Specify the reports.

• Select the Reports tab.
• Check the box next to Expected Mean Square Report.
• Check the box next to Test Detail Sorted by Prob Level.
• Set the Prob Level Cutoff to 0.05.
• Uncheck the all other boxes.


S(ABC) 0 No S
This report displays the expected mean squares for each term in the model. The columns are
described in Example 1.
F-Test Detail for A: Gender Sorted in Probability Level Order

F-Test Detail for A: Gender Sorted in Probability Level Order
FDR
Adjusted
37046_at Other 0.0002119 0.0000006 122.859 1/10 3.4262 0.3091
37189_at Other 0.0004410 0.0000026 90.109 1/10 3.5115 0.3699
37029_at Other 0.0020054 0.0000174 58.482 1/10 3.0374 0.3972

Geisser-Greenhouse Correction: None (Regular F)
This report displays the genes for which the False Discovery Rate Adjusted Prob Level is less
than 0.05 for Gender. The columns are described in Example 1.
F-Test Detail for C: Treatment Sorted in Probability Level Order

F-Test Detail for C: Treatment Sorted in Probability Level Order
FDR
Adjusted
31962_at Other 0.0004585 0.0000013 103.977 1/10 3.2343 0.3172
94766_at Other 0.0090688 0.0000526 45.108 1/10 3.7749 0.5620
101482_at Other 0.0228121 0.0002036 32.273 1/10 2.6180 0.4608
93822_at Other 0.0228121 0.0002645 30.170 1/10 2.8094 0.5115
40515_at Other 0.0255393 0.0003701 27.626 1/10 2.5226 0.4799

than 0.05 for Treatment. The columns are described in Example 1.

FDR
Adjusted
38437_at Other 0.0000191 0.0000001 204.432 1/10 5.3333 0.3730
41237_at Other 0.0001335 0.0000008 116.895 1/10 4.3791 0.4050

than 0.05 for the interaction. The columns are described in Example 1.
Storage Data
The pre-processed data for the 2 most significant genes are output into the spreadsheet.
RM2_Split dataset after data storage
Factor1 Factor2 Treatment OutputFile Random X31962_at X94766_at
1 M Trt1 ...1.ges 6 4.347998793 3.60229846
1 M Trt2 ...2.ges 5 4.871123991 2.547631461
2 M Trt1 ...3.ges 8 4.111716766 5.001783623
2 M Trt2 ...4.ges 9 5.771551162 2.84931956
3 M Trt1 ...5.ges 7 4.206340193 4.490210641
3 M Trt2 ...6.ges 6 5.639026008 2.438519502
4 M Trt1 ...7.ges 4 3.75557322 4.313683288
4 M Trt2 ...8.ges 6 5.018803914 4.540379736
5 M Trt1 ...9.ges 5 4.788529126 4.302717763
5 M Trt2 ...10.ges 3 6.887367824 2.716635127
6 M Trt1 ...11.ges 8 3.738071745 4.61868161
6 M Trt2 ...12.ges 9 5.237574131 2.834071854
7 F Trt1 ...13.ges 9 4.395934876 4.179028364
7 F Trt2 ...14.ges 4 5.157022636 2.884850643
8 F Trt1 ...15.ges 5 4.212181686 4.963690929
8 F Trt2 ...16.ges 4 5.528088451 2.665368144

9 F Trt1 ...17.ges 6 3.917985941 4.061706345
9 F Trt2 ...18.ges 8 5.014277601 2.794416622
10 F Trt1 ...19.ges 5 3.815579149 3.463388104
10 F Trt2 ...20.ges 6 4.94345766 2.770238937
11 F Trt1 ...21.ges 7 4.384939058 5.074048944
11 F Trt2 ...22.ges 3 5.586884161 2.732024731
12 F Trt1 ...23.ges 2 4.085517525 5.117822837
12 F Trt2 ...24.ges 5 5.949754288 2.922561482
An X is added at the beginning of the variable names to avoid a variable name beginning with a
number. This data can be analyzed further using the NCSS: Repeated Measures Analysis of
Variance procedure. However, when hypothesis tests are run using the NCSS: Repeated Measures
Analysis of Variance procedure, adjustments for multiplicity of tests across genes are no longer
made.
Step 4 – Follow-Up Analysis

Twenty-four pre-processed values should have been saved for 2 genes, X31962_at and
X94766_at.
More specific analyses of X31962_at, for example, may be obtained using the NCSS: Repeated
Measures Analysis of Variance procedure.
1 Open the NCSS: Repeated Measures Analysis of Variance window.

• On the menus, select Analysis, then Analysis of Variance (ANOVA), then Repeated
Measures Analysis of Variance. The NCSS: Repeated Measures Analysis of Variance
procedure will be displayed.
default template.
2 Specify the variables.

• Select the Variables tab.
• Set Response Variable(s) to X31962_at.
• Set the Between Factor 1 to Gender.
• Set the Within Factor 1 to Treatment.
3 Specify the model.

• Select the Model tab.
• Under Which Model Terms, select Full Model. Use all terms.

NCSS Repeated Measures Analysis of Variance Output

S(ABC) 0 No S
Analysis of Variance Table

Source Sum of Mean Prob Power
Term DF Squares Square F-Ratio Level (Alpha=0.05)
A: Gender 1 7.958636E-02 7.958636E-02 0.23 0.643893 0.071698
B(A): Subject 10 3.503788 0.3503788
C: Treatment 1 10.46043 10.46043 103.98 0.000001* 1.000000
AC 1 5.132553E-02 5.132553E-02 0.51 0.491398 0.099319
BC(A) 10 1.006031 0.1006031
S 0
Total (Adjusted) 23 15.10116
Total 24
* Term significant at alpha = 0.05
Probability Levels for F-Tests with Geisser-Greenhouse Adjustments

Lower Geisser Huynh
Bound Greenhouse Feldt
Regular Epsilon Epsilon Epsilon
Source Prob Prob Prob Prob
Term DF F-Ratio Level Level Level Level
A: Gender 1 0.23 0.643893
B(A): Subject 10
C: Treatment 1 103.98 0.000001* 0.000001* 0.000001* 0.000001*
AC 1 0.51 0.491398 0.491398 0.491398 0.491398
BC(A) 10
S 0
Power Values for F-Tests with Geisser-Greenhouse Adjustments Section

Lower Geisser Huynh
Bound Greenhouse Feldt
Regular Epsilon Epsilon Epsilon
Source Power Power Power Power
Term DF F-Ratio (Alpha=0.05) (Alpha=0.05) (Alpha=0.05) (Alpha=0.05)
A: Gender 1 0.23 0.071698
B(A): Subject 10
C: Treatment 1 103.98 1.000000 1.000000 1.000000 1.000000
AC 1 0.51 0.099319 0.099319 0.099319 0.099319
BC(A) 10
S 0
Box's M Test for Equality of Between-Group Covariance Matrices Section

Covariance
Source F Prob Chi2 Prob Matrices
Term Box's M DF1 DF2 Value Level Value Level Equal?
BC(A) 2.27 3.0 18000.0 0.59 0.620328 1.77 0.620413 Okay
Covariance Matrix Circularity Section

Lower Geisser Huynh Mauchly Covariance
Source Bound Greenhouse Feldt Test Chi2 Prob Matrix
Term Epsilon Epsilon Epsilon Statistic Value DF Level Circularity?
BC(A) 1.000000 1.000000 1.000000 1.000000 0.0 0.0 1.000000 Okay
Note: Mauchly's statistic actually tests the more restrictive assumption that the pooled covariance matrix
has compound symmetry.
Means and Standard Error Section

Standard
Term Count Mean Error
All 24 4.806888
A: Gender
F 12 4.749302 0.1708749
M 12 4.864473 0.1708749
C: Treatment
Trt1 12 4.146698 9.156196E-02
Trt2 12 5.467078 9.156196E-02
AC: Gender,Treatment
F,Trt1 6 4.135356 0.1294882
F,Trt2 6 5.363247 0.1294882
M,Trt1 6 4.158038 0.1294882
M,Trt2 6 5.570908 0.1294882
Plots Section
Means of X31962_at Means of X31962_at

7.00 7.00
6.13 6.13
X31962_at
5.25 X31962_at 5.25
4.38 4.38
3.50 3.50
F M Trt1 Trt2
Gender Treatment
Means of X31962_at Means of X31962_at

7.00 7.00
Treatment Gender
Trt1 F
Trt2 M
6.13 6.13
X31962_at
X31962_at
5.25 5.25
4.38 4.38
3.50 3.50
F M Trt1 Trt2
Gender Treatment
X31962_at vs Treatment by Subject

7.00
6.13
X31962_at
5.25
4.38
3.50
Trt1 Trt2
Treatment
The full analysis of variance and means tables, tests of assumptions, and graphics can be used to
further study the results of each gene that is found to be statistically significant. Notice, however,
that no correction is made for multiple testing across genes in NCSS. Details of the NCSS:
Repeated Measures Analysis of Variance procedure are in Chapter 214 of the NCSS manuals.

Repeated Measures ANOVA PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Repeated Measures ANOVA PDF

Încărcat de

Drepturi de autor:

Formate disponibile

225-1

Repeated Measures ANOVA

Multiple Testing Adjustment

Benjamini and Hochberg’s (1995) False Discovery Rate Table

Need for Multiple Testing Adjustment

Error Rates – P-Value Adjustment Techniques

Per-Comparison Error Rate (PCER) – No Multiple Testing Adjustment

Family-Wise Error Rate (FWER) – Bonferroni Adjustment

False Discovery Rate (FDR) – Benjamini and Hochberg Adjustment

Multiple Testing Adjustment Comparison

Adjustment Error Rate Control of Power to

Type I Error: Rejection of a null hypothesis that is true.

Step 2 – Spreadsheet Setup

Step 3 – Run the Analysis

Step 4 – Follow-Up Analysis

GES Files Specifications

• Full Model. Use all terms.

• Full model except subject interactions combined with error.

• Use the Custom Model given below.

Adjustment for Multiple Testing

• False Discovery Rate Control

Vertical and Horizontal Axes

Horizontal Axis Minimums and

Horizontal Axis Minimum

Model Term Used for Storage

Spreadsheet Storage of the NAMES

Spreadsheet Storage of the

Non-Subset (Ungrouped) Genes

Specify the Template File Name

Select a Template to Load or Save

Example 1 – Repeated Measures ANOVA

The spreadsheet data used are recorded in the RM1_RM dataset.

1 Open the RM1_RM dataset.

2 Open the GESS Repeated Measures ANOVA window.

3 Specify the variables and hypothesis test details.

4 Specify the reports.

5 Run the procedure.

Expected Mean Squares Section

F-Test Detail for A: Treatment Sorted in Probability Level Order

Total number of hypothesis tests conducted = 345

F-Test Detail for C: Time Sorted in Probability Level Order

Total number of hypothesis tests conducted = 345

F-Test Detail for AC Sorted in Probability Level Order

Total number of hypothesis tests conducted = 345

Histograms and Plots Section (for Treatment)

Prob Level for Term = A Corrected Prob Level for Term = A

Log10(Prob Level) for Term = A Log10(Corrected Prob Level) for Term = A

Z(Prob Level) for Term = A Z(Corrected Prob Level) for Term = A

SQRT(MS Numerator) for Term = A SQRT(MS Denominator) for Term = A

Histogram of F Value for Term = A

F Value for Term = A

Example 2 – Split Plot Design – Analysis Steps

1 Open the RM2_Split dataset.

Step 2 – Spreadsheet Setup

Alternatively, open the RM2_Split_a dataset.

1 Open the RM2_Split_a dataset.

2 Open the NCSS: Repeated Measures Analysis of Variance window.

3 Specify the variables.

4 Specify the model.

5 Run the procedure.

Repeated Measures ANOVA Output

Expected Mean Squares Section

Analysis of Variance Table