Sunteți pe pagina 1din 52

Method validation

With Confidence
Performance specifications Experimental protocols Statistical interpretation EXCEL Files

Dietmar Stckl
Dietmar@stt-consulting.com

STT Consulting Dietmar Stckl, PhD Abraham Hansstraat 11 B-9667 Horebeke, Belgium e-mail: dietmar@stt-consulting.com Tel + FAX: +32/5549 8671

Copyright: STT Consulting 2007

Method validation

Content

Content
Introduction Materials Validation protocols Imprecision Limit of detection (LoD) Working range Linearity model 1 Linearity model 2, accuracy protocol (= accuracy of calibration curve) Recovery model 1 (paired sample protocol: spike and control) Recovery model 2 (accuracy protocol: sample with target value) Interference Method comparison Annex Summary of protocols, statistics & graphics System stability, Ruggedness and multifactor protocols Glossary of terms

Method validation

Introduction

Introduction
WHAT is validation? Validation is the confirmation, through the provision of objective evidence, that requirements for a specific intended use or application have been fulfilled (ISO 9000). We see, from this definition, that we have to specify the intended use of a method, define performance requirements, provide data from validation experiments (objective evidence), and interprete the validation data (confirmation that requirements have been fulfilled). WHICH type of performance requirements (specifications) exist? Performance requirements can be statistical, analytical, or applicationdriven/regulatory. Statistical and analytical specifications are most useful for method evaluation. Application-driven/regulatory specifications are used for validation. Some examples are given in the table below.

Performance requirements (specifications) Statistical t-test: P 0.05 F-test: P 0.05 Analytical Bias Calibration tolerance CV stable CV Application-driven# Bias 3% CV 3%

#Cholesterol (National Cholesterol Education Program) WHICH performance characteristics exist? We have seen that we have to specify performance requirements for a validation. These requirements refer to the following performance charateristics of an analytical method: Imprecision Limit of detection Working range Linearity Recovery Interference/Specificity Total error (method comparison) [Robustness/Ruggedness]: will not be addressed in this book.

Method validation

Introduction

Introduction
WHICH experiments do we have to perform? The experiments we have to perform depend on the performance characteristic we want to validate. For the estimation of method imprecision, for example, we need to perform repeated measurements with a stable sample. However, there is no agreement over the various application fields of analytical methods about the design of such experiments. In this book, we will mainly refer to the experimental protocols from the Clinical and Laboratory Standards Institute (CLSI). The table below gives an overview about typical experiments to be performed during a method validation study. Performance chracteristic Imprecision Samples Measurements IQC-samples; no target n = 20 (repetition over several days) Blank; Low sample n = 20 (repetition over several days) 5 related samples/-calibrators (mix); no target n = 4 (repetition within day) See: Imprecision/Linearity Samples: Interferent spike & control (no target) n = 4 (repetition within day) Samples: Known analyte spike & control or certified reference materials (CRM) n = 4 - 5 (repetition over several days) 40 samples (target by reference method) n = 1 or 2 (measurement in one or several days)

LoD/LoQ

Linearity

Working range Interference

Recovery (Accuracy/Trueness)

Total error (method comparison

IQC: Internal Quality Control; LoD: limit of detection; LoQ: limit of quantitation These experiments will be described in detail in the following chapters of the book.

Method validation

Introduction

Introduction
HOW do we make decisions? When we have created data, we have to decide whether they fulfill the requirements that have been selected for the application of the method "for a specific intended use". Currently, it is common practice to make decisions without considering confidence intervals or statistical significance testing. Modern interpretation of analytical data, however, requires the use of confidence intervals/statistical significance testing.These two approaches are compared in the table below for the case of a recovery experiment. Decision making approaches Old Experimental recovery: 90% Modern Experimental recovery: 90% Confidence interval: 11% (with n = 4 and CV = 7%) Limit: 85 115% Decision: fail (90 11 = 79%, exceeds 85%) Action: increase n or reduce CV

Limit: 85 115% Decision: passed

In the old approach, we compare one naked number with the specification. This approach misses the information on the number of measurements that have been performed and the imprecision of the method. If we would repeat the validation, we easily could obtain a recovery estimate of 80%, for example. Therefore, decision-making should be statistics-based. This is by applying a formal statistical test or by interpreting the confidence interval of an experimental estimate. Statistics-based decision Importance of the test-value (= requirement, specification) When we make statistics-based decisions, the selection of the test value will depend on the type of requirement we apply (statistical, analytical, validation). Statistical - Statistical test versus Null-hypothesis (F-test, t-test, 95% confidenceintervals, ): Bias = 0; Slope = 1; Intercept = 0; etc. Analytical - Statistical test versus estimate of stable performance (F-test, t-test, 95% confidence-intervals, etc.): Bias calibration tolerance; etc. Validation case (application-driven; specific intended use) - Statistical test versus validation limit (F-test, t-test, 95% confidence-intervals, etc.): CVexp CVmax; Biasexp Biasmax; etc. Nevertheless, in all three situations, we apply the same type of statistical tests.

Method validation

Introduction

Introduction
Interpretation of 95%-confidence limits Confidence limits and quality specifications The figure below shows a graphical interpretation of 95%-confidence limits versus a predefined quality specification: "10". Note When comparing an estimate with a specification, usually, the confidence limits are constructed 1-sided. Specification 10 1. Limit 2. Typical performance

1. Interpretation of the cases A D when the specification is a limit A: "In", the specification is satisfied with 95% probability. B: Not "In" with 95% probability - More data may help C: Not "In" with 95% probability, but also not out with 95% probability. D: "Out" 2. Interpretation when the number characterizes a stable process If the "number" is the typical performance of a stable process, situation C can still be accepted. C: Look at lower limit: Not "Out" with 95% probability. This situation is applied in the EP 5 protocol to investigate whether the user CV is different from the typical manufacturer CV.

Method validation

Introduction

Introduction
SUMMARY For a successful validation, we need performance specifications, experimental protocols, and statistical interpretation of the data. The whole exercise, however should be carefully planned, including the samples needed, the foreseen internal quality control, and the documentation of the results. A validation plan should consider (at least), the following elements.

Validation plan Define the application, purpose and scope of the method Define performance characteristics and acceptance criteria Develop a validation protocol or operating procedure for the validation Qualify materials, e.g. standards, reagents, and samples Perform validation experiments Document validation experiments and results in the validation report Interprete the validation data and make statistics-based decisions

Method validation

Introduction

Introduction
In the book, the following validation example will be used. Measurand Amount-of-substance concentration of glucose in serum S-glucose: mmol/L (adult reference interval: 3.9 5.8 mmol/L). Specific intended use For in vitro diagnostic purposes. Performance specifications Performance characteristic Imprecision Specification Within-run: 1.5%# Total: 3%#

LoD
Working range Linearity

0.1 mmol/L
0.1-42 mmol/L 0.1-42 mmol/L Limit: 5% Limit: 5% Limit: 10% Limit, Bias: 3%; Total error: 10%

Recovery Interference Total error Method comparison

#Note: typical values for stable process; not meant as limit!


Data simulation Most data are simulated with an assumed method CV of 1-2% (within-run) and 3% (total).

Method validation

Materials

Materials
Instrument XYZ Standard, Lot# Reagent, Lot# Imprecision (CLSI EP5) and IQC during experiments Low IQC material : 3.9 mmol/L High-normal IQC material : 5.9 mmol/L High IQC material : 8.5 mmol/L LoD, dilutions, "adaptation of control" (CLSI EP17) Isotonic saline solution (= Blank) :0 mmol/L Linearity, experiment 1 (CLSI EP6) Low sample 1 High sample 1

: 3.0 mmol/L : 7.0 mmol/L

Linearity, experiment 2 ("manufacturer protocol": accuracy) Spiked Blank : 45.0 mmol/L Recovery and Interference (CLSI EP7) Low sample 2 Normal sample High sample 2 Glucose solution in isotonic NaCl Bilirubin solution in isotonic NaCl Low sample 2 spiked with bilirubin Recovery (Accuracy) Standard 1 Standard 2 Standard 3 Method comparison (CLSI EP9) 40 native samples

: 3.5 mmol/L : 4.8 mmol/L : 6.5 mmol/L : 30.0 mmol/L : 600 mg/dL : 60 mg/dL

: 4.5 mmol/L : 5.0 mmol/L : 5.5 mmol/L

: various

Method validation

10

Imprecision

Imprecision
Graphics Dot plot Histogram Statistics Descriptive Statistics: Dispersion Gaussian "("Normal) distribution Outliers Sampling statistics & Confidence intervals of SDs Significance tests for SD & variance (Chi2, F-test) ANOVA model II

Method validation

11

Imprecision

Imprecision
The CLSI protocol (EP-5) 2 Different samples (e.g., low and high) 1 or 2 runs/per day Duplicates 20 Days IQC! with 1 or 2 samples Specific calculations for a single run Within-run standard deviation (swr): swr = SQRT[SD2dupl/(2 20)] Ddupl = Difference of within-run duplicates Standard deviation of the daily means (smeans = "B" in EP-5): smeans = SQRT[SD2means/(20-1)] Dmeans = Difference [daily mean - overall mean of 20 days] Between-day standard deviation (sbd): sdd = SQRT[s2means s2wr/2] CAVE: set sdd = 0 when s2means < s2wr/2 (negative SQRT!) Total standard deviation (sT): sT = SQRT[s2means + s2wr/2] CAVE: set sT = swr when s2means < s2wr/2 Calculation of degrees of freedom: (EP5) s2wr = number of duplicates measured: 20 s2T = complex: precalculated in EXCEL-template Comparing a SD-estimate with a claim Test overlap of 1-sided confidence limit (CL) of SDs with claim, or 1-sample F-test ("Chi2-test"), 1-sided (EXCEL-template) Statistics for imprecision can also be treated with Model II ANOVA! Importance of imprecision Limit of detection Working range Number of analytical replicates Troubleshooting

Method validation

12

Imprecision

Imprecision EXCEL file


Day
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Replicate 1
5.95 5.64 5.92 5.85 5.98 5.77 5.91 5.94 6.16 5.83 5.79 6.04 6.18 6.03 6.02 6.14 5.95 6.07 5.78 6.31

Replicate 2
5.82 5.81 5.98 5.85 5.92 5.53 5.92 5.91 6.14 5.79 5.80 6.06 6.21 6.17 6.03 6.16 5.90 6.17 5.84 6.40

Graphics The distribution of the mean values does not indicate an outlier. The distribution of the differences indicates that day 6 may be an outlier (-0.24). According to the CLSI protocol it is not (4 SD outlier criterium). According to the Grubbs-test, it is. Calculations The Worksheet uses the CLSI EP5 calculations and EXCEL ANOVA (Tools>Data Analysis). In case ANOVA is used, the formulae for Swr, Sdd, and S T must be calculated with EXCEL (see examle in the Worksheet). Note: Due to the nature of calculation of Sdd (SQRT of a difference), Sdd is set to zero when MS-Between groups is <= MS-Within groups. We calculate: Swr = 0.063 mmol/L; CVwr = 1.1% Sdd = 0.170 mmol/L ST = 0.181 mmol/L; CVT = 3.1%

Method validation

13

Imprecision

Imprecision EXCEL file


Interpretation The calculated values for imprecision are: CVwr (exp) = 1.1% CVT (exp) = 3.1% The specifications are: CVwr (stable) = 1.5% CVT (stable) = 3.0% We compare them by use of the Chi2-statistics. We test whether the lower, 1-sided 95% confidence limits of the experimental estimates are equal or smaller than the preset specifications. Both values pass this statistical test, even though the experimental total CV T (3.1%) is higher than the limit (= 3%). The reason is that the lower confidence limit (=2.51%) is <3%. Calculations Chi2exp = (SD2exp df)/SD2claim (df = degrees of freedom, here = 20)

Lower CL of SD = SD SQRT[(df)/Chi20.05,df]
Conclusion The validation data demonstrate that the method passes the pre-set specifications for within and total imprecision.

DETAILED STATISTICAL BACKGROUND Statistics Descriptive Statistics: Dispersion Gaussian "("Normal) distribution Outliers Sampling statistics & Confidence intervals of SDs Significance tests for SD & variance (Chi2, F-test) ANOVA model II

Method validation

14

Limit of detection

Limit of detection (LoD)


Concepts LoD can be calculated from the standard deviation of a blank signal-to-noise ratio of a chromatogram of a low sample calibration line by means of regression Graphics Dot plot Scatter plot Statistics From blank Outlier Mean Confidence interval of centiles SDtotal (experiments on different days) Consideration of -errors and -errors: Power concept LoD considering of -errors and -errors Model 1: LoD = Mean + 1.65 s0 (s = at zero) 5% false positives when the analyte is not present (-error) 50% false negatives (-error) when the analyte "is present at 1.65 s0". Model 2: LoD = Mean + 2 1.65 s = Mean + 3.3 s Mean and s are from the zero-standard 3.3 s often simplified to 3 s Result: 5% false positive (-error) and 5% false negative (-error)
1.65 s

Model applied in this book and in the EXCEL file Simplified Model 2: LoD = Mean + 3 s

Method validation

15

Limit of detection

Limit of detection (LoD) Other concepts


Chromatographic (S/N = 3) Outlier Mean SDtotal (experiments on different days) Chromatographic LoD (S/N = 3) compared with LoD from blank (mean noise + 3.3 SD)
20 15
LoD = S/N = 3

20 15

LoD = Mean noise + 3.3 SD

Signal 6 SD

10 5 0

Response

Response

Noise 2 SD

10 5 0

Time

Time

From calibration Calculation of LoD from calibration data with regression

Yb = "Signal of blank" via regression = intercept a Sb = "Standard deviation of blank" = Sy/x b = slope Transform "Signal LoD" to concentration "Signal" LoD = a + 3 Sy/x Calculate CLoD via regression equation y = a + b x CLoD = (a + 3 Sy/x a)/b = [3 Sy/x]/b When the calibration curve passes through zero, the mean-term is omitted (e.g., in case of an automatic blank).

Method validation

16

Limit of detection

Limit of detection (LoD)


Samples Usually, the LoD is derived from test variation at zero analyte. This requires suitable "blank" samples. For exogenous compounds, such as drugs, this is easy to realize. For endogenous compounds, suitable blank samples are more difficult to realize. Note that "stripped" samples or blank solutions often give an overoptimistic LoD because of their "clean" matrix.

Ideally, the LoD of a method should be assessed with several native samples containing concentrations near the detection limit, as determined by a reference method.
Alternatively, the LoD is derived from measurements of calibrators.

Protocols Blank ("Common"): Applied in this book and the EXCEL file 20 measurements of the zero-standard/blank - 20 days, for example combined with EP5 Chromatographic 20 measurements of a sample that gives a Signal/Noise ratio of 3. - 20 days, for example combined with EP5 Calibration From calibration curves at several different days (for example 5). CLSI Protocol EP 17 Determination of Limits of Quantitation.

Method validation

17

Limit of detection

Limit of detection (LoD) EXCEL file


Day
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

mmol/L
0.01 -0.01 0.02 0.04 0.02 -0.03 -0.01 0.00 -0.01 0.01 0.02 -0.03 0.03 -0.03 0.02 0.01 0.01 -0.04 0.01 0.00

0.05 0.04 0.03 0.02 0.01 0.00 -0.01 -0.02 -0.03 -0.04 -0.05 Blank

Graphic The graphic gives no indication of an outlier. Calculations (3 s model) Mean: 0.0020 mmol/L SD: 0.0219 mmol/L Confidence interval 3SD-centile (1-sided, 95%): 0.02 mmol/L Calculation: t(0.1,19) SQRT[SD2/20 + (32 SD2/2 20)] LOD: 0.068 mmol/L; #UCL: 0.088 mmol/L LOD (blanked): 0.066 mmol/L; #UCL: 0.086 mmol/L #UCL: upper confidence limit Interpretation We compare the UCL of the LoD (0.088 or 0.086 mmol/L) with the specification of 0.1 mmol/L. Conclusion The validation data demonstrate that the method passes the pre-set specification for the LoD.

Method validation

18

Working range

Working range 2 Models


Fixed value of the precision profile (Figure), or
45 40 35 30 25 20 15 10 5 0 0

CV (%)

Limit of detection Working range

10

15

20

25

Analyte (arbitrary units)

Linear part of the calibration function In this book and in the EXCEL file, the working range is defined by the linearity of the calibration curve. Protocol The protocol is presented in the chapter linearity/manufacturer protocol. In fact, this is a protocol that assess accuracy with a number of related (mixed) samples. Statistics & Graphics The statistics and graphics are presented in the chapters linearity and accuracy/recovery.

Method validation

19

Linearity

Linearity
Graphics Scatter plot Residual plot (preferred) For "accuracy model": Difference plot (preferred) Statistics Model 1 Based on linear regression and ANOVA: F-test for variance around line/within sample sets (lack-of-fit: old EP 6 model) Comparison of linear model with 2nd or 3rd order models (new EP 6 model) Interpretation: Use CBstat Statistics>Method evaluation>Linearity Model 2 ("Common", Accuracy) Often used by manufacturers for defining the Working Range ("Accuracy-based" = true x-values: e.g., weighed-in) Investigate the deviation from the line of equality with confidence limits, or t-test Interpretation Use EXCEL template Note In some fields, the correlation coefficient is used to assess linearity.

Method validation

20

Linearity

Linearity model 1
CLSI EP-6 protocol 5 interrelated samples Mixing protocol 1 low 2 low (3) + high (1) 3 low (2) + high (2) 4 low (1) + high (3) 5 high Alternative mixing 1 low 2 low medium: mix medium and low (1:1) 3 medium: low and high (1:1) 4 high medium: mix medium and high (1:1) 5 high Measurement design Measure all samples 4 times (random), within-run or "closely related runs": SDwr.

Method validation

21

Linearity

Linearity model 1 EXCEL file (worksheet Linearity)


Samples Low sample: 3 mmol/L High sample: 7 mmol/L EP 6 mix protocol Concentrations (C) of samples 2 - 4 (V = volume) C = (C1*V1 + C5*V5)/(V1 + V5)

Sample# 1 2 3 4 5 Sample
3.0 4.0 5.0 6.0 7.0

Concentration (mmol/L) 3 4 5 6 7 y1
2.99 3.93 4.97 5.74 6.78

y2
2.94 4.02 5.02 5.90 6.69

y3
3.01 4.01 4.95 5.97 6.82

y4
3.06 4.03 4.92 5.93 6.65

Graphic The graphic may indicate outliers in the levels 4 and 6 mmol/L. The Grubbs test, however, does not confirm the presence of an outlier. The residuals plot indicates non-linearity.

Method validation

22

Linearity

Linearity model 1 EXCEL file (worksheet Linearity)


Calculations The data are investigated for linearity with specialized software (here: CBstat). The models used are the "lack-of-fit" method and the evaluation by a second order polynomial fit (new CLSI EP 6 model). "Lack-of-fit" F-test for linearity: F = 2.5125 P: 0.0980 No significant deviation from linearity. Second order polynomial fit t-test of last coefficient against zero: SE of last coef.: 0.0085 t value: -2.8816 P:0.0104 x-level %-difference 3 -1.6 4 0.6 5 1.0 6 0.4 7 -0.7 Significant deviation from linearity, but non of the levels deviates by more than 5% (chosen limit). Interpretation The statistical results show that the second order polynomial fit method is more sensitive than the lack-of-fit method. The latter shows that the data-set is nonlinear. However, the 5% limit is not exceeded. Conclusion The validation data demonstrate that the method passes the pre-set specification for linearity.

Method validation

23

Linearity

Linearity model 2 EXCEL file (worksheet Lin-Manuf)


Accuracy protocol ("Working Range protocol") This model is called "Working Range protocol" because it is often applied by manufacturers to establish the working range. Samples 11 (for example) interrelated samples, prepared by mixing of a blank sample and a blank sample spiked with a known amount of analyte. 1: Blank (blank) 2: 9 blank + 1 high (spiked) sample 3: 8 blank + 2 high (spiked) sample 4: 7 blank + 3 high (spiked) sample 5: 6 blank + 4 high (spiked) sample 6: 5 blank + 5 high (spiked) sample 7: 4 blank + 6 high (spiked) sample 8: 3 blank + 7 high (spiked) sample 9: 2 blank + 8 high (spiked) sample 10: 1 blank + 9 high (spiked) sample 11: High (spiked, known concentration) sample Measurement design Measure all samples 4 times (random), within-run: SDwr. Sample
0.0 4.5 9.0 13.5 18.0 22.5 27.0 31.5 36.0 40.5 45

y1
0.03 4.47 9.06 13.77 18.45 22.62 26.70 30.75 35.67 39.42 42.42

y2
0.00 4.47 8.97 14.22 17.94 22.59 27.24 32.25 34.47 38.13 41.79

y3
0.00 4.59 8.85 13.41 18.09 22.35 27.30 31.59 35.07 38.34 41.10

y4
-0.03 4.59 8.91 13.71 17.85 22.47 26.76 31.59 34.02 38.31 42.09

Method validation

24

Linearity

Linearity model 2 EXCEL file (worksheet Lin-Manuf)


Graphic The graphic shows an (expected) increase of the scatter of the data around their mean values (constant measurement CV). Otherwise, there seems to be no irregularity.

Calculations The 1-sided 95% confidence interval of the mean is calculated as follows: CI = t (0.1,3) x SD/SQRT(4). Interpretation The interpretation of the data is done by use of the difference plot. The plot indicates that the CLs overlap with the 5% specification from a concentration >31.5 mmol/L. More replicates could demonstrate that the concentration of 36 mmol/L is within the specified linearity limit of 5%.

Conclusions The validation data do not support a working range up to 45 mmol/L. The range should be reduced to 31.5 mmol/L

Method validation

25

Recovery

Recovery
Graphics Ratio plot (%) Difference plot (%) Statistics Descriptive statistics: Location (mean, median & mode) t-distribution Central limit theorem Confidence intervals t-tests ANOVA-model I Power and sample size

Method validation

26

Recovery

Recovery experiments
Protocols Model 1 ("Paired-sample"; see also CLSI EP 7) Samples "Paired-sample" experiment: 2 portions of native samples; spike one with known analyte amount (= Test) and the other with the same volume saline solution (= Control). 3 5 samples at relevant concentrations Test: Add x-mL analyte standard (preferably in blank-solution) to y-mL sample; the volume added should be less than 5-10% (requires concentrated analyte standard) - Added concentration: e.g.; -1 of a "normal" sample Control: Add same volume blank-solution to same volume sample Measurement design Measure Control & Test alternating (n = 2 4) - Note: may need repetition with other lots of calibrators/reagents

Calculations Concentration added = Concentration of standard x/(x + y) Concentration recovered = Test - Control
Recovery (%) = 100 (Recovered conc./Added conc.) 95%CL

Model 2 (Accuracy: "trueness" based; "Common" protocol) Samples Experimental design: "Recovery of target values" Reference materials with target values - Certified reference materials - IQC materials - Standards Measurement design Measure samples 5 times at different days - Note: may need repetition with other lots of calibrators/reagents Calculations Recovery (%) = 100 (Measured value/Target value) 95% CL

Method validation

27

Recovery

Recovery Model 1 (paired sample), EXCEL file


Samples/Materials Low sample : 3.5 mmol/L Normal sample : 4.8 mmol/L High sample 2 : 6.5 mmol/L Glucose solution in isotonic NaCl : 30 mmol/L (add 10% volume) Isotonic NaCl-solution Test: Add 0,1 mL (= x) Analyte-standard to 0,9 mL (= y) sample. Control: Add same volume NaCl-solution to same volume sample. Calculations (see EXCEL worksheet) Tests C = (Csample Vsample+Cstandard Vstandard)/(Vsample+Vstandard) Controls C = (Csample Vsample+Csaline Vsaline)/(Vsample+Vsaline) Added concentration = Concentration of standard x ml standard/(x ml standard + y ml sample) Recovered concentration = Test Control Recovery (%) = 100 (Recovered conc./Added conc.) CL Results Control
3.15 4.32 5.85

y1
3.11 4.35 5.82

y2
3.14 4.39 5.79

y3
3.13 4.26 5.90

y4
3.16 4.22 5.77

Test
6.15 7.32 8.85

y1
6.14 7.27 8.82

y2
6.20 7.30 8.72

y3
6.25 7.18 8.88

y4
6.12 7.42 8.98

Method validation

28

Recovery

Recovery Model 1 (paired sample), EXCEL file


Graphics

The graphic shows the distribution of the results around their mean values and the individual recoveries. It shows no irregularities. Calculations The 1-sided 95% confidence interval of the mean difference between Test and Control is calculated withz-value as follows: CI = z x SDpr/SQRT(4), with z = 1.65 (1-sided 95%). The interpretation of the results is done with the confidence limits calculated with the z-value and the predicted SD (SDpr) from the EP 5 imprecision data (CLSI EP 7 approach). Note that the imprecision of the %-recoveries depend on the Test and Control level and on the magnitude of the spike (see EXCEL-file).

CAVE: if one uses t, the propagated SD from the actual data has to be calculated (SD from Test and Control: different, because of different levels!). The degrees of freedom must be calculated with the Satterthwaite formula (different concentrations!). The respective test is a t-test. CAVE: the SD of %-recovery will be high when little is spiked!!!
Interpretation The interpretation of the data is done by use of the % ratio plot. The plot shows that none of the CLs overlap with the 5% specification.

Conclusions The validation demonstrates that the method passes the preset 5% limit for recovery.

Method validation

29

Recovery

Recovery Model 2 (accuracy/trueness), EXCEL file


Samples Low IQC material High-normal IQC material High IQC material Standard 1 Standard 2 Standard 3 : 3.9 mmol/L : 5.9 mmol/L : 8.5 mmol/L : 4.5 mmol/L : 5.0 mmol/L : 5.5 mmol/L

Measurement Measure samples 5 times at different days. Note: may need repetition with other lots of calibrators/reagents. Calculations (see EXCEL worksheet) Recovery (%) = 100 (Measured value/Target value) CL Results
Sample 3.9 5.9 8.5 4.5 5.0 5.5

y1
3.93 5.83 7.92 4.63 4.92 5.59

y2
3.90 5.70 8.64 4.48 4.97 5.60

y3
3.88 5.79 8.31 4.40 5.29 5.68

y4
3.92 5.63 8.79 4.50 4.95 5.93

y5
3.91 5.84 8.66 4.60 5.14 5.28

Graphics

The graphic shows the distribution of the results around their mean values. It shows no irregularities.

Method validation

30

Recovery

Recovery Model 2 (accuracy/trueness), EXCEL file


Calculations The 1-sided 95% confidence interval of the mean is calculated as follows: CI = t (0.1,4) x SD/SQRT(5). Interpretation The interpretation of the data is done by use of the % ratio plot. The plot shows that only the CL of Standard 3 overlaps with the 5% specification. This standard should be repeated.

Conclusions The validation demonstrates that the method passes the preset 5% limit for recovery (given that the repetition of Standard 3 is within the specification).

Method validation

31

Interference

Interference testing (CLSI EP7)


Graphics See "Recovery: Paired sample Statistics See Recovery: Paired sample Protocols (CLSI EP 7, 2 approaches) Approach 1: "Paired difference method" Applies similar experimental design and calculations as the paired-sample recovery experiment (3 5 samples). Instead of analyte standard, an interferent standard has to be prepared. Test: Add x-mL interferent-solution (preferably in blank-solution) to y-mL sample; the volume added should be less than 5-10% Control: Add the same volume blank-solution to the same volume sample Measure: Control & Test alternating (n = 2 4) Interference (%) = 100 (Test - Control)/Contro 95% CL Approach 2: "Dose-response method" (used in EXCEL file) 3 5 samples, for each Low pool (low or no interferent added; if no, add blank!) High pool (interferent at maximum concentration) - Note: always add the same volumes blank/interferent solutions Create 5 levels by "alternative mix-protocol linearity"! Measure: All levels "up", then down, or random (n = 2 4) Interference (%) = 100 (Test - Control)/Control CL Note CLSI EP7 applies regression analysis for this protocol!

Method validation

32

Interference

Interference EXCEL file


Samples/Materials Low sample 2 : 3.5 mmol/L Interferent solution in NaCl : 600 mg/dL Isotonic saline solution -Make "Low pool" (add 0,1 ml saline to 0,9 ml sample) -Make "High pool" (add 0,1 ml interferent solution to 0,9 ml sample) (Note: always add the same volumes saline/interferent solutions) -Create 5 levels by "alternative mixing protocol" Measurement Measure, within-run: All levels "up", then down, or random (n = 4) Interference (%) = 100 (Test - Control)/Control CL Results BILI
0 15 30 45 60

y1
3.17 3.24 3.15 3.33 3.55

y2
3.12 3.18 3.12 3.36 3.68

y3
3.13 3.03 3.20 3.34 3.40

y4
3.15 3.22 3.13 3.40 3.53

Graphics

The graphic shows the distribution of the results around their mean values. It shows no irregularities.

Method validation

33

Interference

Interference EXCEL file


Calculations The 1-sided 95% confidence interval of the mean difference between Test and Control (0 BILI) is calculated with the z-value as follows: CI = z x SDpr/SQRT(4), with z = 1.65 (1-sided 95%). The interpretation of the results is done with the confidence limits calculated with the z-value and the within-run imprecision as calculated from the EP 5 protocol (CLSI EP 7 approach). Note that the imprecision of the interference results (SDpr) is SQRT(2) times the measurement imprecision because the interference results are the difference between 2 measurements (Test and Control). Interpretation The interpretation of the data is done by use of the % difference plot. The plot shows that only the CL of the sample with 60 mg/dL bilirubin overlaps with the 10% specification. The test is valid up to a bilirubin concentration of 45 mg/dL.

Conclusions The validation data show that the test is valid up to a bilirubin concentration of 45 mg/dL.

Method validation

34

Method comparison

Method comparison
Graphics Scatter plot Difference plot Residual plot Krouwer plot Bland and Altman plot

Statistics Correlation Regression Bland and Altman approach General (F-test, t-test, confidence-intervals)
General remarks Method comparison supposes: Appropriate performance of test- and comparison method - Internal Quality Control (verify actual imprecision with expected by use of Ftest; verify calibration with targetted control samples by t-test of confidence intervals) Appropriate presentation of the paired observations (xi,yi) Appropriate interpretation Interpretation of method comparison makes integrated use of: Graphical and statistical techniques Analytical quality specifications Method comparison Sample size Usually, general recommendations are given for sample size (EP 9: n 40, e.g.). However, to assure given type I and II errors, i.e. sufficient power in a method comparison study, a minimum sample size is needed depending on: Slope or intercept deviation to be detected Measurement range Constant or proportional analytical error assumption Magnitude of SD or CV for the methods Tables are available: See Linnet K. Clin Chem 1999; 45: 882-894.

Method validation

35

Method comparison

Method comparison protocols


The CLSI EP-9 protocol Experimental design: At least 40 samples Spread analysis over 5 days, randomize concentrations Measure duplicates in 1 run, 1st series "upwards", second series "downwards" Apply adequate internal quality control!

Data presentation and calculations: Outlier tests: Diffdupl > 4 Mean Diffdupl (if yes, perform the same with % data) Scatter plots, singlicates and mean of duplicates Bias plots, singlicates and mean of duplicates Inspect for linearity, dispersion, and range (r 0.975) Apply linear regression (ordinary or Deming)
Interpretation: Dependent on the criteria of the laboratory Dependent on whether a reference method was used or a "comparative" method Note: Make a distinction between pure statistical, analytical, and clinical interpretation! The Valtech protocol Experiments At least 50 samples (better: 80 - 100). Carry the analyses out in singlicates, spread over 10 measurement series, and take the samples random. Adequate internal quality control! Vassault A, Grafmeyer D, Naudin Cl, Dumont G, Bailly M, Henny J, Gerhardt MF, Georges P. Socit Franaise de Biologie Clinique. Protocole de validation de techniques. Ann Biol Clin 1986;44:686-719 (english version: 720-45). See also: Vassault A, Grafmeyer D, de Graeve J, Cohen R, Beaudonnet A, Bienvenu J. Socit Franaise de Biologie Clinique. Analyses de biologie mdicale: spcifications et normes dacceptabilit lusage de la validation de techniques. Ann Biol Clin 1999;57:685-95.

Method validation

36

Method comparison

Method comparison protocols


The UG protocol If possible, use a true reference method for comparison Experiments Start from a reliable calibration basis and verify it with IQC samples from the manufacturer = Stable basis. Adapt the number and the sort of samples to the problem (e.g. 50). Duplicates in 1 series, random sampling (note: for the reference method, adapt the number of measurements to the problem). Intensive IQC Dewitte K, Stckl D, Van de Velde M, Thienpont LM. Evaluation of intrinsic and routine quality of serum total magnesium measurement. Clin Chim Acta 2000;292:55-68. The stable basis Was the method performed adequately: Inspection of the internal quality control (IQC) data. Evaluation of precision and traceability to manufacturer The stable basis Statistics F-test t-test Confidence-intervals

Method validation

37

Method comparison

Method comparison EXCEL file


Results Ref.
3.79 3.84 3.86 3.88 3.92 3.99 4.08 4.11 4.13 4.13 4.23 4.27 4.38 4.39 4.42 4.58 4.70 4.70 4.85 4.85

Yours
3.80 3.88 3.65 3.86 3.93 4.09 4.16 4.11 4.05 4.07 4.38 4.21 4.28 4.28 4.31 4.63 4.65 4.48 5.01 4.62

Ref.
4.89 4.91 4.91 4.95 5.01 5.02 5.03 5.16 5.17 5.17 5.18 5.25 5.39 5.44 5.49 5.53 5.55 5.58 5.58 5.65

Yours
4.64 4.62 4.90 4.88 4.86 4.89 5.17 4.90 5.12 5.01 5.26 5.28 5.37 5.49 5.43 5.34 4.99 5.45 5.53 5.27

Ref.
5.65 5.73 5.79 5.83 5.84 5.86 5.92 5.93 5.94 5.97 5.97 6.06 6.11 6.12 6.30 6.49 6.50 6.59 6.61 6.66

Yours
5.55 5.58 6.08 5.65 6.05 5.76 5.76 5.57 6.10 5.80 5.88 6.11 6.08 5.90 6.03 6.48 6.77 6.58 6.22 6.28

Ref.
6.66 6.71 6.78 6.87 6.94 7.10 7.12 7.13 7.14 7.15 7.15 7.36 7.43 7.47 7.51 7.56 7.90 8.02 8.07 8.19

Yours
6.87 6.80 6.90 7.11 7.17 7.07 7.00 7.02 6.90 7.23 7.38 7.19 7.11 7.11 7.15 7.39 7.81 7.83 7.82 7.72

Method validation

38

Method comparison

Method comparison EXCEL file


Calculations Bland & Altman approach The calculations comprise the mean difference and the 1.96 CV of the individual differences and their respective CLs. CI (mean) = t (0.1,79) x SDdiff/SQRT(80), CI (1.96s centile) = t(0.1,79) SQRT[SD2/80 + (1.962 SD2/2 80)] = 1.71 CI (mean) See also Worksheet Meth-Comp3 for calculations. Graphics and interpretation The graphic (% differences) reveals no outliers. The CLs of the mean and the 1.96 centile of the differences ("limits of agreement") do not overlap with the respective specifications of 3% (SE or Bias limit) and 10% (TE limit).

Conclusions The validation data show that the test passes the preset limits for systematic (3%) and total error (10%).

Method validation

39

Method comparison

Method comparison EXCEL file


Calculations Regression See Worksheet Meth-Comp4 for the detailed calculation of the ordinary linear regression and correlation estimates. Calculations CI (line) = t Sy/x SQRT[1/n + (Xc Xmean)2/S(Xi Xmean)2] (df t = n 2) Xc: concentration for which the bias shall be investigated. CI (points) = t Sy/x SQRT[1 + 1/n + (Xc Xmean)2/S(Xi Xmean)2] (df t = n 2) Xc: concentration for which the total error shall be investigated. Graphics The results are presented in a scatter plot and a residuals plot.

Interpretation The confidence limits of bias and total error at the minimum and maximum values of x (respectively y) are compared with the specifications. They are smaller than the specifications at both concentrations (see Worksheet Meth-Comp4). Conclusions The validation data show that the test passes the preset limits for systematic (3%) and total error (10%).

Method validation

40

Notes

Notes

Method validation

41

Annex

Content
Summary of protocols, statistics & graphics System stability, Ruggedness and Multifactor protocols Glossary of terms

Method validation

42

Annex

Protocols & statistics


Experimental protocols Protocols Imprecision : EP 5 Limit of detection : EP 17 or "Common" Working range : see linearity or or define by imprecision Linearity : EP 6 Linearity by recovery : "Common" (Accuracy) Recovery, reference material : "Common" (Accuracy) Recovery, added analyte : see interference/specificity Interference/Specificity : EP 7 Total error : EP 9, UG* (Method comparison) EP* = CLSI Evaluation protocols; UG = University Ghent Others EP 10 Preliminary evaluation EP 12 Qualitative tests EP 14 Matrix effects EP 15 User demonstration precison & accuracy EP 21 Total error

Statistics (>Statistics Book) Analytical problem General Imprecision Limit of detection Working range Linearity Recovery Interference/Specificity Total error (method comparison) Trouble-shooting #Alternative: confidence intervals

Associated statistics Basic statistics Outlier tests (e.g., Grubbs) F-test; CHI2-test (#), ANOVA Probability & Power see linearity or define by imprecision Regression, ANOVA t-test (#) t-test (#) Regression & correlation Bland & Altman plot Power (sample size calculations)

Method validation

43

Annex

Graphics
Univariate data Dot plot Bivariate data Scatter plot

Histogram
8 7 6 5 4 3 2 1 0

Difference plot

Frequency

Box plot

55 65 75 85 95 105 115 125 135


Value-Bin

Ratio plot (%) (Recovery)

Residuals plot

Method validation

44

Annex

Overview of experiments, statistics, and graphics


Performance chracteristic Samples Measurements# Relevant SD$ IQC-samples; no target n = 20 Within & total Blank; Low sample n = 20 Total 5 related samples/-calibrators (mix); no target n = 4 Within See: Imprecision/Linearity Samples: Interferent spike & control (no target) n = 4 Within Samples: Known analyte spike & control or CRM n = 5 Total 40 samples (RMP-target) n = 1 or 2 Total or within (UG protocol) Graphics Statistical test vs specification Dot plot ANOVA & 1-sample F-test or CL of SD Dot-plot 1-sample F-test or CL of SD CLSI Doc.

Imprecision

EP 5 EP 15

LoD/LoQ

EP 17

Linearity

Scatter-/residual plot Lack-of-fit or polynomial regression

EP 6

Working range Interference

--Difference-/ratio plot CL of mean difference (or ttest)

--EP 7

Trueness (Accuracy)

Difference-/ratio plot CL of mean difference or CL of mean (or t-tests)

EP 7 EP 15

Total error

Scatter-/bias plot Correlation, Regression/Bland&Altman

EP 9 EP 21 UG

#Numbers do not always correspond to the respective CLSI document. $Abbreviations: SD: standard deviation; IQC: Internal Quality Control; CLSI: Clinical and Laboratory Standards Institute; CRM: Certified Reference Material; RMP: Reference Measurement Procedure); UG: University Ghent; CL: Confidence limit.

Method validation

45

Annex

Overview of experiments, statistics, and graphics


Sensitivity of statistical parameters to different types of errors
(From: Westgard JO, Hunt MR. Clin Chem 1973;19:49-57)
Type of error Statistic Slope Least-squares Intercept Sy/x Paired t-test Correlation Bias SDdiff r Random No No Yes No Yes Yes Constant No Yes No Yes No No Proportional Yes No No Yes Yes No

Method validation

46

Annex

System stability
Trueness is also related to system [in]stability Drift Shift System instability is tackled by internal quality control.

Carryover
Carryover is related to the quality of the instrument and the test procedure (e.g., washing). See CLSI protocol EP-10.

Ruggedness
Ruggedness = ability to reproduce the method in different laboratories or in different circumstances Related to the method principle and the test conditions Related to the instrument a method is performed with Assessment of ruggedness Between-laboratory performance data obtained through EQA Ease of operation within the laboratory Efforts needed for internal quality control Productivity of a method (down time, calibration and service intervals, etc.)

Method validation

47

Annex

Multifactor protocols
Classically, single effects are investigated in one experimental design (e.g., imprecision, linearity, carryover). Multi-factor evaluation designs investigate several effects with one experimental design Advantage: less time consuming! Example- EP 10: Allows evaluation of Imprecision Linearity Bias Carryover Drift Applies multiple linear regression analysis Needs special software for interpretation The EP-10 protocol The design 3 interrelated samples: low, mid, high Prescribed measurement sequence: Mid, mid, high, low, mid, mid, low, low, high, high, mid. 5 days, always 1 run

Method validation

48

Annex

Glossary
Metrology [1] field of knowledge concerned with measurement Measurand [1] quantity intended to be measured Quantity [1] property of a phenomenon, body, or substance, to which a number can be assigned with respect to a reference Measurement [1] process of experimentally obtaining one or more quantity values that can reasonably be attributed to a quantity Notes: Quantities are length, mass, amount-of-substance, time, temperature, etc. The value of a quantity is expressed by both a number and an unit The full specification of the quantities measured in the medical laboratory comprises three elements: System (e.g., blood plasma) Component (also called analyte) (e.g., glucose) Kind-of-quantity (e.g., amount-of-substance concentration) The full report of a glucose measurement would read: the amount-of-substance concentration of glucose in blood plasma was 5.2 mmol/L Measurement unit [1] scalar quantity, defined and adopted by convention, with which any other quantity of the same kind can be compared to express the ratio of the two quantities as a number Value of a quantity [1] number and reference together expressing magnitude of a quantity EXAMPLE: Length of a given rod: 5.34 m Measurement standard [1] realization of the definition of a given quantity, with stated quantity value and measurement uncertainty, used as a reference EXAMPLE: 1 kg mass standard.

Method validation

49

Annex

Glossary
Error [1] difference of measured quantity value and reference quantity value Systematic error [1] component of measurement error that in replicate measurements remains constant or varies in a predictable manner Bias [1] systematic measurement error or its estimate, with respect to a reference quantity value Random error [1] component of measurement error that in replicate measurements varies in an unpredictable manner Trueness [1] closeness of agreement between the average of an infinite number of replicate measured quantity values and a reference quantity value Accuracy [1] closeness of agreement between a measured quantity value and a true quantity value of the measurand Precision [1] closeness of agreement between indications obtained by replicate measurements on the same or similar objects under specified conditions Repeatability condition [1] condition of measurement in a set of conditions that includes the same measurement procedure, same operators, same measuring system, same operating conditions and same location, and replicate measurements on the same or similar objects over a short period of time Reproducibility condition [1] condition of measurement in a set of conditions that includes different locations, operators, measuring systems, and replicate measurements on the same or similar objects Uncertainty [1] parameter characterizing the dispersion of the quantity values being attributed to a measurand, based on the information used [Metrological] Traceability [1] property of a measurement result whereby the result can be related to a stated reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty

Method validation

50

Annex

Glossary
Commutability [of a reference material] [1] property of a reference material, demonstrated by the closeness of agreement between the relation among the measurement results for a stated quantity in this material, obtained according to two given measurement procedures, and the relation obtained among the measurement results for other specified materials Matrix effect [2] Influence of a property of the sample, other than the measurand, on the measurement of the measurand according to a specified measurement procedure and thereby on its measured value [2] Influence quantity [1] quantity that, in a direct measurement, does not affect the quantity that is actually measured, but affects the relation between the indication and the measurement result Note: Specificity & Interference are not yet unequivocally defined by ISO. Selectivity [1] capability of a measuring system, using a specified measurement procedure, to provide measurement results, for one or more measurands, that do not depend on each other nor on any other quantity in the system undergoing measurement (= specificity in chemistry) Interference [in analysis] A systematic error in the measure of a signal caused by the presence of concomitants in a sample (http://goldbook.iupac.org) specific [in analysis] A term which expresses qualitatively the extent to which other substances interfere with the determination of a substance according to a given procedure. Specific is considered to be the ultimate of selective, meaning that no interferences are supposed to occur (http://goldbook.iupac.org). Calibration [1] operation that, under specified conditions, in a first step establishes a relation between the quantity values with measurement uncertainties provided by measurement standards and corresponding indications with associated measurement uncertainties and, in a second step, uses this information to establish a relation for obtaining a measurement result from an indication Sensitivity [1] quotient of the change in the indication and the corresponding change in the value of the quantity being measured

Method validation

51

Annex

Glossary
Linear range Concentration range over which the intensity of the signal obtained is directly proportional to the concentration of the species producing the signal (http://goldbook.iupac.org). Linearity (generic) Ability of an analytical procedure to produce test results which are proportional to the concentration (amount) of an analyte, either directly or by means of a well-defined mathematical transformation.

Working interval [1] set of values of the quantities of the same kind that can be measured by a given measuring instrument or measuring system with specified instrumental uncertainty, under defined conditions
Limit of detection (in analysis) The limit of detection, expressed as the concentration, cL, or the quantity, qL, is derived from the smallest measure, xL, that can be detected with reasonable certainty for a given analytical procedure. The value of xL is given by the equation xL = xbi + k sbi, where xbi is the mean of the blank measures, sbi is the standard deviation of the blank measures, and k is a numerical factor chosen according to the confidence level desired (http://goldbook.iupac.org). Limit of detection [1] measured quantity value, obtained by a given measurement procedure, for which the probability of falsely claiming the absence of a component in a material is , given a probability of falsely claiming its presence Ruggedness (generic) Ability to reproduce the method in different laboratories or in different circumstances. Ruggedness (USP) Degree of reproducibility of the results obtained under a variety of conditions, expressed as %RSD. These conditions include different laboratories, analysts, instruments, reagents, days, etc. Robustness (ICH Q2A 1995) The robustness of an analytical procedure is a measure of its capacity to remain unaffected by small, but deliberate variations in method parameters and provides an indication of its reliability during normal usage. [1] BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, OIML. Vocabulaire International des Termes Fondamentaux et Gnraux de Mtrologie. 3rd ed. Geneva: ISO, 2007. [2] EN/ISO 17511:2003. In vitro diagnostic medical devices Measurement of quantities in biological samples Metrological traceability of values assigned to calibrators and control materials. [3] See also: www.clsi.org>Harmonized Terminology Database

Method validation

52

S-ar putea să vă placă și