Sunteți pe pagina 1din 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/10611966

Shelf Life Determination Based On Equivalence Assessment

Article  in  Journal of Biopharmaceutical Statistics · September 2003


DOI: 10.1081/BIP-120022765 · Source: PubMed

CITATIONS READS
9 410

4 authors, including:

Yi Tsong Tsae-Yun Daphne Lin


U.S. Food and Drug Administration U.S. Food and Drug Administration
136 PUBLICATIONS   2,582 CITATIONS    8 PUBLICATIONS   144 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Development of Statistical Methods for Analytical Similarity Assessment View project

Family Health Project View project

All content following this page was uploaded by Yi Tsong on 25 November 2016.

The user has requested enhancement of the downloaded file.


MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

JOURNAL OF BIOPHARMACEUTICAL STATISTICS


Vol. 13, No. 3, pp. 431–449, 2003

Shelf Life Determination Based on Equivalence Assessment#

Yi Tsong,1,* Wen-Jen Chen,1 Tsae-Yun Daphne Lin,1


and Chi Wan Chen2

1
Office of Biostatistics, Office of Pharmacoepidemiology and Statistical Sciences, and
2
Office of New Drug Chemistry, Office of Pharmaceutical Science, Center for Drug
Evaluation and Research, FDA, Rockville, Maryland, USA

ABSTRACT

In a regular analysis of covariance (ANCOVA) approach to stability analysis, the


decision for pooling data from different batches plays a key role in the determination of
the shelf life of the drug product. Conventionally, the decision to pool data for the
estimate of slope and intercept of common or individual regression lines is made by “no
evidence to reject the null hypothesis of no difference.” With typically limited
observations, a significance level of much higher than 0.05 was recommended for the
pooling tests in order to avoid inflation of type-I error rate of the shelf life testing. This
logic of the pooling test decision making discouraged the use of replicates to improve
power of testing and precision of estimation. The concept of pooling by equivalence
test was originally proposed by Ruberg and Hsu in their 1990 article “Multiple
comparison procedures for pooling batches in stability studies” Such a concept has
evolved to pooling batches based on the shelf life equivalence test by Yoshioka et al. in
their 1996 article “Power of analysis of variance for assessing batch-variation of
stability data of pharmaceuticals.” In this article, an approximation test of shelf life
equivalence and a test of chemical value equivalence for the data pooling decision are
proposed as an alternative to the conventional ANCOVA approach.

#
The views expressed in this paper are the authors’ professional opinions. They do not represent the
official positions of the U.S. Food and Drug Administration.
*Correspondence: Yi Tsong, HFD-705, Quantitative Methods and Research Staff, Office
of Biostatistics, Center for Drug Evaluation and Research, FDA, 5600 Fishers Lane, Rockville,
MD 20875, USA; E-mail: tsong@cder.fda.gov.

431

DOI: 10.1081/BIP-120022765 1054-3406 (Print); 1520-5711 (Online)


Copyright q 2003 by Marcel Dekker, Inc. www.dekker.com
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

432 Tsong et al.

Key Words: Stability pooling test; Linear regression; Shelf life testing; Equivalence test.

I. INTRODUCTION

In general, a single shelf life is used for all batches of the drug product of a given
strength and packaging size. Often the shelf life is determined by the shortest shelf life of
all the batches examined. However, when the manufacturing procedure produces quality
batches with very small batch-to-batch differences such that difference between batches
can not be shown statistical significant, a common shelf life based on pooled data can be
used. For example with the analysis of covariance (ANCOVA) approach described in the
Food and Drug Administration (FDA) Draft Guidelines (1987) and the International
Conference on Harmonization (ICH) Guidance (2001a,b; 2003), batches may be pooled to
estimate the slope and/or intercept of a common regression line when it is supported by the
data. The decision of pooling is made by testing the following hypotheses (Chow and Liu,
1995; Lin et al., 1993),
H0 : bj ¼ b for all j; versus Ha : bj – b for some j ¼ 1 to J ð1Þ
and
H0 : aj ¼ a for all j; versus Ha : aj – a for some j ¼ 1 to J ð2Þ
Where aj, bj are the intercept and slope of the regression line of the chemical measurement
of batch j of the product respectively. The slopes and intercepts can be pooled if the null
hypotheses (1) and (2) are not rejected, respectively. In order to protect against a large rate
of false pooling, a large significance level of 0.25 is used conventionally (Asano, 1960;
Bancroft, 1944; Bancroft, 1964; Chow and Liu, 1995; Draft ICH Consensus Guideline,
2001; FDA, 1987; Guidance for Industry: ICH, 2001; Guidance for Industry: ICH, 2003;
Johnson et al., 1977; Larson and Bancroft, 1963; Lin et al., 1993) since the number of
batches is often small at the premarketing stage. This approach is often criticized because
not rejecting H0 provides no evidence to support pooling, rather than “failure to show
difference.” On the other hand, using a large significance level of 0.25 leads only to the
inflation on the type-I error rate of testing hypotheses (1) and (2) without properly
assessing power. Two recent simulation studies support the need to use a large significance
level in pooling tests in order to protect the type-I error rate from falsely determining a
longer shelf life (Chen and Tsong, 2003; Chen et al., 1995).
As early as 1990 and 1991, Ruberg and Hsu (1990) and Ruberg and Stegemen (1991)
proposed to test for equality of slopes or intercept as an alternative for batch pooling test.
Their approach is to test the following hypotheses:
H0 : jbj 2 bj0 j $ d for some j – j0 ; versus Ha : jbj 2 bj0 j

, d for all j; j0 ð3Þ


and
H0 : jaj 2 aj0 j $ d for some j; j0 ; versus Ha : jaj 2 aj0 j , d for all j; j0 ð4Þ
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 433

However, no equality limit was proposed for this approach. A simulation study showed
comparable results of slope equality test with the ANCOVA approach cited in the FDA
Guidelines with an appropriately chosen equivalent limit D. However, there is a problem
in having a fixed predetermined equality limit. Lin and Tsong (1991) showed through a
simulation study that for the same equality limit of slope, there is different impact on the
shelf life equality depending on the slope value.
Yoshioka et al. (1996a,b) revisited the equivalence approach and proposed to pool the
batches if the difference in shelf life between any two batches are within an equivalence
limit that is a prespecified percentage of the longest sample shelf life. Yoshioka et al.’s
(1996a,b) range-based test is proposed to test the following hypotheses:

H0 : jTj 2 Tj0 j $ g Maxj Tj for some j – j0 ;

vs.

Ha : jTj 2 Tj0 j , g Maxj Tj for all j; j0 ð5Þ

where Tj and Tj are the true shelf life of batch j and j0 respectively, 0 # g # 1 is a
constant. Yoshioka et al. (1996a) compared the range-based equivalence test using g ¼
0:15 with the ANCOVA approach through Monte Carlo simulation and found that the
proposed method is more powerful in rejecting pooling when the batches have different
shelf lives. However, there is no proper statistical procedure proposed for implementation.
In this manuscript, we proposed two pooling tests based on equivalence assessment.
One was inspired by Yoshioka et al. (1996a,b; 1997) but with an equivalence assessment
setup. The second approach is inspired by the hypothesis testing statement of Chen and
Tsong (2003) and Tsong et al. (2003) such that shelf life determination is made by
comparing the chemical measurement against its acceptance criteria, SL and SU at a
prespecific proposed date T0 with the following hypotheses,

H0 : Yj ðT0 Þ # SL or Yj ðT0 Þ $ SU for some j

versus

Ha : SL , Yj ðT0 Þ , SU for all j ð6Þ

where Yj ðT0 Þ is the true chemical measurement of batch j at T0. For a one-sided
acceptance criterion, use either SL or SU in hypothesis (6).
For simplicity but without loss of generality, a fixed set of ðSL ; SU Þ ¼ ð95%; 105%Þ
of label claim is used in all examples.

II. BATCH POOLING BASED ON SHELF LIFE EQUIVALENCE TEST

In determining the shelf life of a drug product, the 1987 FDA Guidelines and the ICH
Guidance require that the stability data of at least three batches be used to account for
batch-to-batch variability. It is statistically dealt with through a classical analysis of
covariance model (Chow and Liu, 1995; Draft ICH Consensus Guideline, 2001; FDA,
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

434 Tsong et al.

1987; Guidance for Industry: ICH, 2001; 2003; Lin et al., 1993)

Yjk ¼ aj þ bj tjk þ 1jk ð7Þ

where
Yjk ¼ the assay value as a percentage of the label claim of the j-th batch at the k-th
time point,
a ¼ the intercept at manufacture date of the j-th batch,
bj ¼ the regression rate of the j-th batch,
tjk ¼ time of the k-th time interval of the j-th batch,
1jk ¼ random error corresponding to the k-th time interval of the j-th batch, it is
interface identifier (iid) distributed as Nð0; s2 Þ:

For simplicity, we assume that the assays were performed at the same time point of all
batches.
As recommended in the 1987 FDA Guidelines, the shelf life of the multiple batches of
the product is estimated based on the final ANCOVA model resulting from the pooling
test. The hypotheses of the intercept and slope pooling tests are given below.

Slope test : H01 : bj ¼ b for all j vs: Ha1 : bj – b for some j ð8Þ

and

Intercept test : H02 : aj ¼ a for all j vs: Ha2 : aj – a for some j ð9Þ

If both null hypotheses are not rejected, then the batches are considered to be poolable, and
the data of the batches will be combined to estimate that shelf life, which is the common
shelf life of all batches. However, if H02 is rejected and H01 is not, then a common slope
estimated by using the pooled data can be used to estimate the shelf life but the intercept is
individually estimated. Otherwise, the shortest shelf life among all is the shelf life of all
batches of the same drug product. Since the sample size is not determined through power
calculation, certain protection against falsely pooling batches with different slopes or
intercepts is needed. The FDA dealt with it by raising the type I error rate from 0.05 to 0.25
based on recommendations in the literature (Asano, 1960; Bancroft, 1944; 1964; Johnson
et al., 1977; Larson and Bancroft, 1963). The individual regression lines using separate
intercept and slope and the 95% confidence bands can be estimated using the ANCOVA
model.
For example, consider the data of the three batches in Table 1, the mean estimates and
the 95% confidence intervals (CI) of the shelf life for each of the three batches based on the
ANCOVA model are shown in Fig. 1. The point estimate and the 95% confidence interval
of the shelf life for the three batches are:

Batch #1 95% CI ¼ ð26; 63Þ with mean ¼ 36;

Batch #2 95% CI ¼ ð26; 78Þ with mean ¼ 37;

Batch #3 95% CI ¼ ð19; 34Þ with mean ¼ 25 ð10Þ


MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 435

Table 1. Stability data of three batches.

Time in months

Batch 0 3 6 9 12 15 18

1 100 101 101 100 99 98 98


2 99 100 100 99 99 98 97
3 101 100 99 99 98 97 97

Using pooling tests based on the ANCOVA approach recommended in ICH Guidance, the
equal intercept null hypothesis is rejected but the equal slope null hypothesis is not. The
three 95% confidence bands of the individual regression line with a common slope
intersect the lower acceptance criterion (i.e., 95% of label claim) at 29, 25, and 25 months,
respectively (Fig. 2). Hence, a target shelf life of 24 months for the product is supported
with the ANCOVA approach.
Rather than considering the batch difference either as negligible or as of a fixed
amount (Chen and Tsong, 2003; Chen et al., 1995; Johnson et al., 1977; Ruberg and
Stegemen, 1991), the random effect model was proposed by taking the batch-to-batch
variation into the model. This approach is most suitable for a drug product with a large
number of batches at the manufacturing stage rather than at the time of premarketing
approval.
The practice of using a large significance level such as 0.25 in the ANCOVA pooling
test often raised criticism because of its lack of assurance in protecting the power of

Figure 1. Estimate of shelf life of three batches.


MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

436 Tsong et al.

Figure 2. Regression line and shelf life estimate of three batches.

the tests by increasing chances to falsely reject the null hypotheses (Asano, 1960;
Bancroft, 1944; 1964; Larson and Bancroft, 1963).

A. The Shelf Life Estimates and the 95% Confidence Limits


of the Shelf Life

Consider a single batch, the assay value is modeled by

Yk ¼ a þ btk þ 1k ð11Þ

where
a ¼ the intercept at manufacture data,
b ¼ the regression rate,
tk ¼ time of the k-th time interval,
1k ¼ random error corresponding to the k-th time point,

and that 1k’s are assumed to be independent and identically distributed across all time
points with a normal distribution of mean zero and variance s2. The 95% confidence band
of the regression line is determined by the set of solutions of Y at tk of the following
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 437

equation
½Y 2 ða þ btk Þ2 ¼ ðt0:975;n22 Þ2 s2 ð1=n þ ðtk 2 tÞ2 =Stt  ð12Þ
where b ¼ Syt =Stt and a ¼ y 2 bt are the estimates of b and a respectively, t0:975;n22 is the
97.5th percentile of t- distribution with degrees of freedom n-2, Syt ¼ Sk ðyk 2 y Þðtk 2 tÞ;
Stt ¼ Sk ðtk 2 tÞ2 ; and Syy ¼ Sk ðyk 2 y Þ2 :
Let YL and YU denote the upper and lower 95% confidence bounds over time,
respectively. The intersection of the confidence band of the regression line and either the
upper or lower acceptance criterion is used to estimate the mean shelf life. Furthermore,
the intersections (TL, TU) of the lower and upper 95% confidence bound with the
acceptance criterion are the 95% confidence limits of the mean shelf life. Note that when
two-sided acceptance limits are applied, the shorter of the two 95% confidence limits is the
lower limit of the mean shelf life and is the regulatory defined shelf life of the batch. When
one-sided acceptance criterion is used, a 95% one-sided confidence interval is used in-
place of the two-sided 95% confidence intervals. The equation used to determine the 95%
confidence limits of the mean shelf life is given in Appendix A.
The sampling distribution of T is unknown. The complexity of the problem was well
documented in the literature of statistical calibration problem. For the shelf life regression, it is
understandable that the 95% confidence limits TU and TL are asymmetric to ‘T. For example,
when b , 0; the distance between ‘T and TU is longer than the distance between ‘T and TL.

Example 1. For illustration purpose, only the stability data of batch #1 in Table 1 will be
considered in this example. The 95% confidence interval of the mean shelf life is skewed
to the right and is not a symmetric normal interval (see Fig. 3). Note that in Fig. 2, a 95%
confidence band for the mean regression line is used to determine the confidence interval
of the shelf life. It is in contrast to the single 95% confidence band in Fig. 1 for shelf life
determination using the ANCOVA approach. The sampling distribution of ‘T is unknown.
In general, a log, log(log) or even more complicated transformation may be needed in
order to be approximated by a normal interval. For example, a log transformation may be
suitable if ‘T2 < TL ·TU : As shown in Fig. 3, the estimate of mean shelf life is 36 months.
The 95% lower and upper limits are 26 months and 63 months respectively. Because it is
skewed to the right, a log transformation is needed to make the confidence interval
approximately symmetric. The transformed confidence interval has 3.258, 3.584, and
4.413 as the lower limit, mean, and upper limit, respectively.

B. Shelf Life Equivalence Limits

Assume that a natural log transformation is needed to make the confidence interval
approximately symmetric. Let us consider two batches with expected shelf lives T1 and T2
respectively. A 1.15 ratio for equivalence limit originally proposed by Yoshioka et al.
(1996b) would imply that two shelf lives are equivalent or of no practical difference if the
ratio of the shelf lives is between 0.8696 and 1.15 of the maximum shelf life of the batches
in the study. It is used in the following example in the way that for a proposed shelf life T0,
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

438 Tsong et al.

Figure 3. Estimate and 95% confidence limits of shelf life.

any two batches in the study are equivalent if the difference between the true shelf lives is
between 0.8696T0 and 1.15T0. It appears to be a reasonable choice as shown in Table 2. By
using the equivalence limits of 0.8696 and 1.15, the shelf life of a batch that has equivalent
shelf life to a given batch with a true shelf life of 12 months, is between 10.44 months and
13.8 months. With the ln values of the lower and upper equivalence limit being 2 0.1398
[i.e., ln(0.8696)] and 0.1398 [i.e., ln(1.15)] respectively, the maximum difference between
the batch with equivalent shelf life to the given batch is no more than 1.8 months. The
equivalence limit in shelf life increases with the true shelf life of a given batch. For
example, if the true shelf life of a given batch is 36 months, a batch with equivalent shelf
life will be no more than 41.4 months and no less than 31.4 months in true shelf life. It is no
more than 5.4 months shorter or longer in shelf life than the given batch of 36 months of
true shelf life. It is clear that for a batch with a 12 month shelf life, the equivalence limit of
the second batch is less than two months. The difference is shorter than three months,
which is the shortest length between two time points measured. For a batch with a shelf life

Table 2. The expected shelf life of the equivalent batch.

Expected shelf life of a given batch

Equivalence limit 12 months 24 months 36 months 48 months

1.15 13.8 27.6 41.4 55.2


0.8696 10.44 20.9 31.3 41.8
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 439

of 24 or 36 months, the equivalence limits are all less than six months, which is shorter
than the length between two of the last two time points as suggested for some cases by the
FDA Guidelines (1987) and ICH QIA Guidance (2001). The equivalence limit is greater
than six months only if the target shelf life is four years or longer at the time the length
between two measurements is one year. The use of these equivalence limits in the
regulatory setting may require further investigation.
When using this definition of shelf life equivalence limit, two shelf lives are
equivalent if the following null hypothesis H0 is rejected.
H0 : Tj =Tj0 # 0:8696 or Tj =Tj0 $ 1:15;
vs.

Ha : 0:8696 , Tj =Tj0 , 1:15 ð13Þ


Or, equivalently, reject H0 if

H0 : lnðTj Þ 2 lnðTj0 Þ # 20:1398 or lnðTj Þ 2 lnðTj0 Þ $ 0:1398;


vs.
Ha : 20:1398 , lnðTj Þ 2 lnðTj0 Þ , 0:1398:

C. Shelf Life Equivalence Assessment Between Two Batches

Assume that the 95% confidence interval ðlnðTjU Þ; lnðTjL ÞÞ; j ¼ 1 to J is symmetric to
lnð‘ Tj Þ; the standard error of lnð‘ Tj Þ is ‘ sj < ½lnðTjU Þ 2 lnðTjL Þ=ð2·1:96Þ: The difference
between two batches, Inð‘ Tj Þ 2 lnð‘ Tj0 Þ; is estimated by d ¼ lnð‘ Tj Þ 2 lnð‘ Tj0 Þ; and the
p
standard error of lnð‘ Tj Þ 2 lnð‘ Tj0 Þ is ð‘ s2j þ‘ s2j0 Þ:
Hypotheses (13) can be tested with either two one-sided tests or a confidence interval
decision rule. With the two one-sided tests approach, we consider the following two sets of
one-sided hypotheses

H01 : lnðTj Þ 2 lnðTj0 Þ # 20:1398 vs: Ha1 : lnðTj Þ 2 lnðTj0 Þ . 20:1398


and

H02 : lnðTj Þ 2 lnðTj0 Þ $ 0:1398; vs: Ha2 : lnðTj Þ 2 lnðTj0 Þ . 0:1398:


Schuirmann (1987) showed that when each one sided hypothesis tested at 5% significance
level, they form a two one-sided test with a 5% overall type-I error rate. Equivalently, we
can compare the 90% confidence interval of lnðTj Þ 2 lnðTj0 Þ with (2 0.1398, 0.1398) for
equivalence testing. The two batches are equivalent if both H01 and H02 are rejected at the
5% significance level each or if the 90% confidence interval of lnðTj Þ – lnðTj0 Þ is contained
within (2 0.1398, 0.1398).
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

440 Tsong et al.

The shelf life of multiple batches of the same strength, package type configuration can
be determined by comparing the individual batch shelf life with the target shelf life. If the
shelf life of each batch determined by the ANCOVA model (7) is greater than the target
shelf life, then the target shelf life should be accepted as the shelf life of all batches.
Otherwise a statistical shelf-life equivalence assessment may be performed between every
two batches in order to determine whether a pooled shelf life should be used. The
procedure is illustrated with the following examples.

Example 1 (continued).

Batch #1 ln transformed 95% CI ¼ ð3:258; 4:14Þ with mean 3:584; ‘


s1

< 0:221

Batch #2 ln transformed 95% CI ¼ ð3:258; 4:357Þ with mean 3:584; ‘


s2

< 0:275

Batch #3 ln transformed 95% CI ¼ ð2:944; 3:526Þ with mean 3:219; ‘


s3

< 0:145

Batch #1 and #2 : Difference ¼ 0:027; ‘


s12 < 0:353; 90% CI

¼ ð20:533; 0:608Þ:

Batch #1 and #3 : Difference ¼ 20:365; ‘


s13 < 0:265; 90% CI

¼ ð20:800; 0:071Þ:

Batch #2 and #3 : Difference ¼ 20:392; ‘


s23 < 0:311; 90% CI

¼ ð20:903; 0:119Þ:

None of the 90% CI lays between—0.1398 [i.e., ln(0.8796)] and 0.1398 [i.e., ln(1.15)].
It indicates that the shelf lives of three batches are not equivalent. The shortest 95% lower
confidence limit of mean shelf life of the three batches is 19 months. Hence, a shelf life
of 18 months would be derived for this product based on the proposed shelf life
equivalence test.
As pointed out earlier, using a large significance level for slope and intercept pooling
tests may lead to the control of the type I error of testing against the target shelf life.
However, it is achieved by inflating type I error rate of the pooling test and penalizes
stability studies with replicate observations. When the sample size increases in each batch
with either more observed time points or with replicates at each point, standard error of
estimate is reduced and the power of rejecting H0 also increases. It leads to the fact that
the rejection of pooling is likely when sample size increases. On the other hand, with
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 441

the reduced standard error, it increases the power to establish shelf life equivalence. It is
illustrated with the following Example 2.

Example 2. The stability data of Table 3 represents one study with three batches and
three replicate measurements at each time points. When using the ANCOVA approach as
recommended in regulatory guidance, the results indicate that the null hypothesis of equal
slope is rejected when the p-value of F
test ¼ 0:1687: The three regression lines are

Batch #1 Yt ¼ 100:540 2 0:1697t


Batch #2 Yt ¼ 100:780 2 0:1818t
Batch #3 Yt ¼ 100:421 2 0:1812t

The lower 95% confidence bands of the regression line intercept the acceptance limit (i.e.,
95% of label claim) at 31, 30, and 27 months respectively. Based on these results, a
24-month shelf life will be used for the drug product.
On the other hand, when using the shelf life equivalence test as proposed above, the
estimates of the shelf life of the three batches are

Batch #1; mean ¼ 32 months; 95% CI ¼ ð31; 34Þ;

Batch #2; mean ¼ 31 months; 95% CI ¼ ð30; 33Þ;


Batch #3; mean ¼ 30 months; 95% CI ¼ ð28; 31Þ;
The log transformed means, CIs and standard errors are

Batch #1; mean ¼ 3:466; 95% CI ¼ ð3:434; 3:526Þ; ‘


s ¼ 0:0231;
Batch #2; mean ¼ 3:434; 95% CI ¼ ð3:401; 3:497Þ; ‘
s ¼ 0:0238;
Batch #3; mean ¼ 3:401; 95% CI ¼ ð3:332; 3:434Þ; ‘
s ¼ 0:0254:

Table 3. Stability data with replicates.

Time

Batch Replicate 0 3 6 9 12 18 24

1 1 100.6 100.3 99.5 99.1 98.7 97.2 96.6


2 100.7 99.9 99.4 98.9 98.7 97.2 96.5
3 100.3 100.1 99.4 99.1 98.5 97.3 96.7
2 1 101 100.3 99.6 99.2 98.7 97.3 96.7
2 101 100.3 99.7 99.3 99 97.4 96.5
3 100.4 100.1 99.5 99.2 98.4 97.1 96.4
3 1 100.5 99.8 99.4 98.7 98.3 97.1 96
2 100.6 99.7 99.2 98.8 98.3 97.4 96.2
3 100.3 99.8 99.5 99.1 98 97 96
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

442 Tsong et al.

Table 4. ANCOVA results of example 2.

Sum of source DF Squares Mean square F value Pr . F

Model 5 123.338 24.668 748.39 ,.0001


Error 57 1.879 0.033
Corrected total 62 125.217

Source DF Type I SS Mean square F value Pr . F

Time 1 121.859 121.859 3697.08 ,.0001


Batch 2 1.357 0.679 20.59 ,.0001
Time*Batch 2 0.121 0.061 1.84 0.1687

The shelf life differences in log transformed value are then,

Batch #2 2 Batch #1; difference ¼ 20:3175; 90% CI ¼ ð20:0863; 0:0228Þ;

Batch #3 2 Batch #1; difference ¼ 20:0645; 90% CI ¼ ð20:1111; 20:0080Þ;

Batch #3 2 Batch #2; difference ¼ 20:0328; 90% CI ¼ ð20:0901; 0:02456Þ:

Since the confidence limits are between 2 0.139 and 0.139, the three batches are
equivalent in shelf life. The estimate of the log transformed common shelf life is 3.434,
with 95% CI ¼ 3:3863; 3:4810Þ: After anti-log transformation, the shelf life estimate is
31 months with 95% CI ¼ ð30; 32Þ: Hence, based on the shelf life equivalence test, the
common shelf life is 30 months.
Note that when using the ANCOVA model, the shelf life estimates of three batches
are not independent, and the derivation of the sampling distributions or the joint sampling
distribution is complex. The shelf life equivalence test stated in this section is an
approximation test. However, given the (iid) normal assumption of the error term of the
ANCOVA model in hypotheses (7), distribution of the Y values at any time point is
normal. Hence an equivalence testing for pooling may be derived based on the Y values at
the target shelf life as given in the following section.

III. POOLING BATCHES BASED ON EQUIVALENCE TEST


OF ASSAY VALUE

As pointed out by Chen and Tsong (2003) and Tsong et al. (2003) that the regulatory
requirement of a shelf life T0 is based on the evidence that the chemical characteristic of
the batch is within the acceptance criterion or criteria at T0. Hence, the hypotheses of
interest are

H0 : Yj ðT0 Þ # S for some i versus Ha : Yj ðT0 Þ . S for all j ¼ 1 to J ð14Þ


MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 443

For one-sided acceptance criterion.


H0 : Yj ðT0 Þ # SL or Yj ðT0 Þ $ SU for some j
vs.
Ha : SL , Yj ðT0 Þ , SU for all j ¼ 1 to J ð15Þ
for 2-sided acceptance criteria, where SL and SU are the lower and upper acceptance
criterion respectively. The proposed shelf life T0 is established if H0 is rejected.
Let Yj ðT0 Þ be the mean chemical characteristic of batch j at T0 and
X
YP ðT0 Þ ¼ ð1=JÞ j¼1 to J Yj ðT0 Þ

the mean value of all batches.


We propose to test H0 of hypothesis (14) or (15) using YP ðT0 Þ if jYj ðT0 Þ 2 YP ðT0 Þj ,
dT0 ; where dT0 is the prespecified equivalence limit. Which lead to the testing of the
hypotheses
H0 : jYj ðT0 Þ 2 YP ðT0 Þj $ dT0 for some j ¼ 1 to J
vs.
Ha : jYj ðT0 Þ 2 YP ðT0 Þj , dT0 for all j ¼ 1 to J ð16Þ
Let the chemical characteristic at t of the j-th batch be represented by the individual
regression with
Yj ðtÞ ¼ aj þ bj tk þ 1jk ; here; j ¼ 1. . . to J and t from 0; t1 to tK :

Let J be the total number of batches and K be the number of observational time points of
each batch. The regression models can be represented in matrix form,
YI ¼ XI BI þ 1jk ð17Þ
where YI is an observed data vector with JK elements formed by data from J batches and
K þ 1 time points, BI ¼ ða1 ; a2 ; . . .; aJ ; b1 ; b2 ; . . .; bJ Þ0 is a parameter vector with 2J
parameters, XI is the Jkx2J design matrix of an individual regression model with element
0, 1, or tk. Then, YI follows normal distribution with mean XI BI and covariance matrix s2I
[i.e., NðXI BI ; s2 IÞ:
Let ‘BI be the least equal estimate of BI.

Then; ‘
BI ¼ ðX0I XI Þ21 X0I YI :

For a given batch, ð‘ aj ;‘ bj Þ0 , Nððaj ; bj Þ0 ; SI Þ such that SI is the matrix that consists of the
ðj; jÞ 2 th; ðj; j þ JÞ 2 th; ðj þ J; jÞ 2 th; and ðj þ J; j þ JÞ 2 th entries of the matrix

ðX0I XI Þ21 s2

In turn,


Yj ðT0 Þ ¼ Xj ðT0 Þ‘ BI
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

444 Tsong et al.

where ‘ Yj ðT0 Þ is the estimate of Yj at T0, Xj ðT0 Þ is the 1 £ JK row vector such that the j-th
entry is 1, the j þ J-th entry is T0, and 0 for the rest entries using individual regression line.
By general linear model theorem, the following results are developed:


Yj ðT0 Þ , NðXj ðT0 Þ‘ BI ; Xj ðT0 ÞðX0I XI Þ21 X0j ðT0 Þs2 Þ;

and
X

YP ðT0 Þ ¼ ð1=JÞ j¼1 to J Xj ðT0 Þ‘ BI
X X
, Nðð1=JÞ X ðT Þ‘ B1 ; ½ð1=JÞ
j¼1 to J j 0 j ¼1 to J
Xj ðT0 ÞðX0I XI Þ21
X
½ð1=JÞ j¼1 to J X0j ðT0 Þs2 Þ
X
ð‘ Yj ðT0 Þ 2‘ YP ðT0 ÞÞ ¼ ð1=JÞ½JXj ðT0 Þ 2 j¼1 to J
Xj ðT0 Þ ðX0I XI Þ21 X0I Y

and

s2 ¼ ½Y0 ðI – XI ðX0I XI Þ21 X0I Y=ðN 2 2JÞ;
0 21 0
where N 2 2J ¼ rankðI
XP I ðXI XI Þ XI Þ:
Since ð1=JÞ½JXj ðT0 Þ 2 j¼1 to J Xj ðT0 ÞðX0I XI Þ21 X0I ½I
XI ðX0I XI Þ21 X0I  ¼ 0;
it follows that

T* ¼{ð‘ Yj ðT0 Þ 2‘ YP ðT0 ÞÞ 2 ½Xj ðT0 Þ


X p
2 ð1=JÞ j¼1 to J
Xj ðT0 ÞBI }= {½ð1=JÞðJXj ðT0 Þ
X
2 j¼1 to J
Xj ðT0 ÞÞðXI X0I Þ21 ½ð1=JÞðJXj ðT0 Þ
X
2 j¼1 to J
Xj ðT0 ÞÞ0 ‘ s2 } , tf ;

where f ¼ rankðI – XI ðX0I XI Þ21 X0I Þ:

Then, the 90% confidence interval of YP ðT0 Þ 2 YJ ðT0 Þ is


p
ðð‘ YP ðT0 Þ 2‘ Yj ðT0 ÞÞ þ tf ð0:025Þ {½ð1=JÞðJXj ðT0 Þ
X
2 j¼1 to J
Xj ðT0 ÞÞðX0I XI Þ21 ½ð1=JÞðJXj ðT0 Þ
X
2 j¼1 to J
Xj ðT0 ÞÞ0 ‘ s2 };
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 445

p
ðð‘ YP ðT0 Þ 2‘ Yj ðT0 ÞÞ 2 tf ð0:025Þ {½ð1=JÞðJXj ðT0 Þ
X
2 j¼1 to J
Xj ðT0 ÞÞðX0I XI Þ21 ½ð1=JÞðJXj ðT0 Þ
X
2 j¼1 to J
Xj ðT0 ÞÞ0 ‘ s2 };

Hence hypotheses
P (16) can be tested by comparing the confidence interval of ½ai þ
bi T0 2 ð1=JÞ j¼1 to J ðai þ bi T0 Þ with ð2dT ; dT Þ: The null hypothesis
P of hypothesis (16)
is rejected if the 90% confidence interval of ½ai þ bi T0 2 ð1=JÞ j¼1 to J ðai þ bi T0 Þ is
contained within ð2dT ; dT Þ:
For illustration purpose, dT of 1.5% is used in Examples 1 and 2. It implies that for all
batches in a study with a common assay value at the proposed shelf life T0, the difference
in assay value of any two batches can be no more than 3% of the label claim. The use of
these equivalence limits in the regulatory setting may require further investigations.

Example 1 (continued).
The estimate of Y(24) and the 95% CI of the three batches in Table 1 can be estimated
using the ANCOVA model. They are

Batch #1 ‘ YI1 ð24Þ ¼ 97:071; 95% CI ¼ ð95:941; 98:202Þ


Batch #2 ‘ YI2 ð24Þ ¼ 96:893; 95% CI ¼ ð95:762; 98:024Þ
Batch #3 ‘ YI3 ð24Þ ¼ 95:321; 95% CI ¼ ð94:191; 96:452Þ:

YP ð24Þ ¼ 96:428

Then,

YI1 ð24Þ , Nð97:07; 0:40Þ

YI2 ð24Þ , Nð96:89; 0:40Þ

YI3 ð24Þ , Nð95:32; 0:40Þ
and

YP ð24Þ , Nðð96:43; 0:13Þ:

Finally, the 95% CI of YP(24) is (95.65, 97.21) and the three 90% CI of
YI1 ð24Þ 2 YP ð24Þ;YI2 ð24Þ 2 YP ð24Þ; and YI3 ð24Þ 2 YP ð24Þ are respectively reported
below:
ð20:26; 1:55Þ; ð20:44; 1:37Þ; and ð22:01; 20:20Þ:
The 90% confidence interval of difference YI1 ð24Þ 2 YP ð24Þ and Y13 ð24Þ 2 YP ð24Þ are
not bounded within ð2dT ; dT Þ [i.e., ð21:5; 1:5Þ; hence the assay value of the three batches
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

446 Tsong et al.

at 24 months can’t be pooled for shelf life testing and the shortest of the three batches,
18 months is used as the shelf life of the three batches.

Example 2 (continued). The estimate of Y(30) and the 95% CI of the three batches in
Table 2 are
Batch #1 ‘ YI1 ð30Þ 95:450; 95% CI ¼ ð95:236; 95:665Þ

Batch #2 ‘ YI2 ð30Þ ¼ 95:323; 95% CI ¼ ð95:110; 95:540Þ

Batch #3 ‘ YI3 ð30Þ ¼ 94:985; 95% CI ¼ ð94:770; 95:200Þ:


YP ð30Þ ¼ 95:253:


YI1 ð30Þ , Nð0 95:45; 0:012Þ


YI2 ð30Þ , Nð 95:32; 0:012Þ


YI3 ð30Þ , Nð94:98; 0:012Þ
and

YP ð30Þ , Nð95:25; 0:004Þ:

Finally, the 95% CI of YP(30) is ð95:13; 95:38Þ; and the three 90% CI of YI1 ð30Þ 2 YP ð30Þ;
YI2 ð30Þ 2 YP ð30Þ; YI3 ð30Þ 2 YP ð30Þ are respectively reported below:

ð0:051; 0:34Þ; ð20:08; 0:22Þ; and ð20:42; 20:12Þ:

They clearly indicate that the assay value of the three batches at the 30-th month can be pooled.
The lower 95% confidence limit of the assay value averaged over the three batches is
95:25 2 1:645ð0:004Þ ¼ 95:243:
Hence a 30-month shelf life maybe given based on the proposed assay value equivalence at the
thirtieth month with equivalence limit ¼ 1.5%.
Note that the equivalence limit applied in these examples is selected only for the
purpose of demonstrating the examples. One justification of the choice is that it is selected
to assure that the difference between any two batches equivalent to the mean is no more
than 3% for the examples discussed here with the acceptance criteria restricted to no more
than 5% from the label claim.

IV. DISCUSSION AND CONCLUSION

The issues of significance level used in the pooling test of slope and intercept in the
conventional ANCOVA approach of stability study has been documented in the literature.
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 447

The concept of equivalence testing for slope and/or intercept pooling was introduced early
and the difficulty was recognized. Articles by Yoshioka et al. (1996a,b; 1997) reintroduced
the concept of pooling by equivalence in stability study. The recent statistical development
in equivalence testing makes the concept of pooling batches based on equivalence much
more acceptable.
When considering pooling batches based on equivalence of shelf life, it encounters
the difficulty of deriving the sampling distribution of the shelf life estimate. In the example
of shelf life equivalence, a natural log transformation is applied to make the confidence
interval symmetric. However, the choice of transformation in this case can be data
dependent. Furthermore, the estimates of shelf lives are not independent and the derivation
of the joint distribution of two shelf lives is complicated. The proposed approximation
approach does not take the covariance of the two shelf lives into consideration. On the
other hand, the sampling distribution of the estimate of the difference of the assay values
of two batches is estimable under the general ANCOVA model assumptions and the
equivalence test is more statistically sound.
The justifications of the selected equivalence limits in the two approaches are
discussed briefly. However, the choice is yet debatable and requires further investigation.
When applying the ANCOVA model to multiple factor stability design, we
recommend testing the interaction terms of the ANCOVA model using the procedure
proposed by Tsong et al. (2003) After eliminating as many of the interaction terms as
possible, one can start performing equivalence testing across all product combinations
based on the already interaction-reduced model. If they fail the equivalence test, one may
test equivalence of the products within a given level of a factor to see if a common
chemical characteristic value can be used in supporting the proposed shelf life, T0. The
generalization is simple but tedious.

APPENDIX A. THE CONFIDENCE INTERVAL OF MEAN SHELF LIFE (SEBER,


G.A.F., 1977, LINEAR REGRESSION ANALYSIS, JOHN WILEY AND SONS,
NEW YORK, NY)

The estimate of mean shelf life can be obtained by reversing Eq. (A.1), such that

T ¼ ðy* 2 aÞ=b ðA:1Þ

where y* is the acceptance criterion, and ‘T is a biased estimate. An unbiased estimate


may be obtained using inverse regression model (Krutchoff, 1967). However, many
statisticians also provided theoretical analysis to support the usage of ‘T. The confidence
limits of ‘T can also be obtained by solving t of the following equation,

½y* 2 y 2 bðt – t Þ2 ¼ F1;n22 ðaÞS2 {ð1=nÞ þ ðt – t Þ2 =½Sj ðtj 2 t Þ2 }

where S2 ¼ ð1=ðn 2 2ÞÞ{Sj ðyj 2 y Þ2 2 b2 ðtj 2 t Þ2 }; F1;n22 ðaÞ is the a-th percentile of F
distribution with degrees of freedom 1 and n-2.
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

448 Tsong et al.

Another alternative approach would be to first generate yj ¼ a0 þ b0 tj from the


sampling distribution of a and b. Then obtain the 5th and 95th percentile of the shelf lives of
the simulated regression lines using Eq. (A.1)

V. ACKNOWLEDGMENTS

The authors want to thank the referee for the careful review of the manuscript and the
important comments. This manuscript was prepared with the support of Regulatory
Science Research Grant RSR02-015 of the Center for Drug Evaluation and Research,
FDA. The authors thank the members of FDA CDER Office of Biostatistics Stability
Working Group for their support and discussion on the development of the proposed
approaches.

REFERENCES

Asano, C. (1960). Tests due to pooling data through preliminary test on biological direct
assay. Bull. Math. Stat. 9:25 –39.
Bancroft, T. A. (1944). On biases in estimation due to the use of preliminary tests of
significance. Annal Math. Stat. 15:190 –204.
Bancroft, T. A. (1964). Analysis and inference for incompletely specified models
involving the use of preliminary tests of significance. Biometrics 20(3):427– 442.
Chen, W. J., Tsong, Y. (2003). Significance level for stability polling test: a simulation
study. J. Biopharm. Stat. 13(3):355 –374.
Chen, J. J., Hwang, J.-S., Tsong, Y. (1995). Estimation of the shelf-life of drugs with
mixed effects models. J. Biopharm. Stat. 5(1):131 – 140.
Chow, S. C., Liu, J. P. (1995). Statistical Design and Analysis in Pharmaceutical Sciences.
New York, NY USA: Marcel Dekker.
Draft ICH Consensus Guideline. (2001a). Q1E Stability Data Evaluation. Food and
Drug Administration, Center for Drug Evaluation and Research and Center for
Biologics Evaluation and Research, www.fda.gov/cder/guidance/4983dft.pdf.
FDA. (1987). Guidelines for Submitting Documentation for the Stability of Human Drugs
and Biologics. Rockville MD: USA Food and Drug Administration, Center for Drugs
and Biologics.
Guidance for Industry: ICH, (2001b). Q1A(R) Stability Testing of New Drug Substances
and Products. Food and Drug Administration, Center for Drug Evaluation and
Research and Center for Biologics Evaluation and Research, www.fda.gov/cder/
guidance/4282fnl.pdf.
Guidance for Industry: ICH, (2003). Q1D Bracketing and Matrixing Designs for Stability
Testing of New Drug Substances and Products. Food and Drug Administration,
Center for Drug Evaluation and Research and Center for Biologics Evaluation and
Research, www.fda.gov/cder/guidance/4985fnl.pdf.
Johnson, J. P., Bancroft, T. A., Han, C. P. (1977). A pooling methodology for regressions
in prediction. Biometrics 33:57 – 67.
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016
©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Pooling Batch by Equivalence Test 449

Krutchoff, R. G. (1967). Classical and inverse regression methods of calibration.


Technometrics 9:425 – 439.
Larson, H. J., Bancroft, T. A. (1963). Sequential model building for prediction in
regression analysis I. Annal Math. Stat. 34:231 –242.
Lin, T.-Y.D., Tsong, Y. (1991). Determination of significance level for pooling data in
stability studies. Proceedings of Biopharm. Section of Joint Statist. Meeting, Am. Stat.
Assoc. 195 –201.
Lin, K. K., Lin, T. D., Kelly, R. E. (1993). Stability of drugs. In: Buncher, C. R., Tsay,
J. Y., eds. Statistics in the Pharmaceutical Industry. 2nd ed. New York, NY, USA:
Marcel Dekker, pp. 419 –444.
Ruberg, S. J., Hsu, J. C. (1990). Multiple comparison procedures for pooling batches in
stability studies. Proc. Biopharm. Sec. Joint Stat. Meeting, Am. Stat. Assoc. 204 –209.
Ruberg, S. J., Stegemen, J. W. (1991). Pooling data from stability studies: testing the
equality of batch degradation slopes. Biometrics 47:1059– 1069.
Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the
power approach for assessing the equivalence of average bioavailability. J. Pharm.
Biopharm. 15:657– 680.
Tsong, Y., Chen, W. J., Chen, C. W. (2003). Statistical analysis of multiple factor design
of stability study. J. Biopharm. Stat. 13(3):376 –393.
Yoshioka, S., Aso, Y., Kojima, S. (1996a). Statistical evaluation of shelf-life of
pharmaceutical products estimated by matrixing. Drug Stab. 1:147 – 151.
Yoshioka, S., Aso, Y., Kojima, S., Po, A. L. W. (1996b). Power of analysis of variance for
assessing batch-variation of stability data of pharmaceuticals. Chem. Pharm. Bull.
44(10):1948 –1950.
Yoshioka, S., Aso, Y., Kojima, S. (1997). Assessment of shelf-life equivalence of
pharmaceutical products. Chem. Pharm. Bull. 45(9):1482– 1484.

View publication stats

S-ar putea să vă placă și