Good Signal Detection Practices Evidence From Imi Protect

Drug Saf (2016) 39:469–490
DOI 10.1007/s40264-016-0405-1
SPECIAL ARTICLE
Good Signal Detection Practices: Evidence from IMI PROTECT

Antoni F. Z. Wisniewski1 • Andrew Bate2 • Cedric Bousquet3,4 • Andreas Brueckner5 •
Gianmario Candore6 • Kristina Juhlin7 • Miguel A. Macia-Martinez8 •

Katrin Manlik9 • Naashika Quarcoo10 • Suzie Seabroke11 • Jim Slattery6 •
Harry Southworth12 • Bharat Thakrar13 • Phil Tregunno11 • Lionel Van Holle14 •
Michael Kayser15 • G. Niklas Norén7
Published online: 7 March 2016

Ó The Author(s) 2016. This article is published with open access at Springerlink.com
Abstract Over a period of 5 years, the Innovative recommendations point to pragmatic steps that those
Medicines Initiative PROTECT (Pharmacoepidemiological working in the pharmacovigilance community can take to
Research on Outcomes of Therapeutics by a European improve signal detection practices, whether in a national or
ConsorTium) project has addressed key research questions international agency or in a pharmaceutical company set-
relevant to the science of safety signal detection. The ting. PROTECT has also pointed to areas of potentially
results of studies conducted into quantitative signal detec- fruitful future research and some areas where further effort
tion in spontaneous reporting, clinical trial and electronic is likely to yield less.
health records databases are summarised and 39 recom-
mendations have been formulated, many based on com-
parative analyses across a range of databases (e.g. 1 Introduction
regulatory, pharmaceutical company). The
The opportunities for effective signal detection in large
& Antoni F. Z. Wisniewski databases have improved substantially since the early days
antoni.wisniewski@astrazeneca.com of pharmacovigilance. In those early days, much effort
necessarily focussed on manual clinical review of incom-
1
AstraZeneca, Macclesfield, UK ing reports—often in the form of handwritten or typed
2
Pfizer, Walton-on-the-Hill, Surrey, UK reports of suspected adverse drug reactions (ADRs) by
3
INSERM, UMR_S1142, LIMICS, Paris, France pharmacovigilance experts. Adverse event reports, or sets
4
Department of Public Health and Medical Informatics, CHU
of reports, in some way triggering the suspicion of a clin-
University Hospital of Saint Etienne, Saint-Étienne, France ical reviewer would be further investigated by the reviewer
5
Novartis, Basel, Switzerland
and in some instances go on to become signals with some
6
of these signals leading to further actions. While sophisti-
European Medicines Agency, London, UK
cated approaches to statistical signal detection had been
7
Uppsala Monitoring Centre, Uppsala, Sweden proposed and even tested in a limited methodological or
8
Agencia Española de Medicamentos y Productos Sanitarios, theoretical context [1–5], routine prospective screening
Madrid, Spain using tools and automated systems was a mere pipe dream.
9
Bayer Pharma AG, Berlin, Germany Several decades later, systematic screening of adverse
10
GlaxoSmithKline, London, UK event reports is not just a reality; it is today the de facto
11 standard in large datasets [6]. However, as adverse event
Medicines and Healthcare Products Regulatory Agency,
London, UK reports are exchanged electronically around the world, in
12 the tens of thousands on a daily basis, it is well accepted
Data Clarity Consulting, Stockport, UK
13
that our capabilities for signal detection are far from per-
F. Hoffmann-La Roche, Basel, Switzerland
fect and should be improved. In addition, we are seeing a
14
GlaxoSmithKline, Wavre, Belgium significant shift of focus, beyond adverse event reports and
15
Bayer Pharma AG, Wuppertal, Germany prescription event monitoring systems, on to the use of
470 A. F. Z. Wisniewski et al.
longitudinal observational databases and clinical trials for analyses across a range of databases (e.g. regulatory,
signal detection. Although the former have been used pharmaceutical company) and in several data sources
extensively in the past for epidemiological purposes, their (spontaneous reporting, clinical trials and electronic patient
potential use for signal detection is novel and new records). A further 25 recommendations for future research
approaches for identifying safety signals in clinical study are also offered. It is acknowledged that signal detection
data beyond basic comparisons of adverse event frequen- relies on quantitative and qualitative elements and it should
cies have been slow to gain widespread use. be noted that the focus of Work Package 3 was primarily
CIOMS VIII [7] was convened in part to address the concerning the former. The majority of recommendations
widespread and growing interest in quantitative analysis of are based on the outcomes of the signal detection research
spontaneous reports and particularly the desire for guide- conducted under the auspices of IMI PROTECT Work
lines around the appropriate use of such quantitative Package 3. However, in attempting to provide holistic and
approaches. There was also a need to contextualise the role unambiguous messages of how the work can impact good
and use of such approaches, including disproportionality, signal detection practices, some reference to recommen-
within a holistic signal management perspective. CIOMS dations beyond the strict scope of PROTECT is necessary.
VIII was able to achieve these ambitions but also high-
lighted several areas of signal detection in spontaneous
report and observational data and in the use of terminolo- 2 Précis of the Research and Recommendations
gies that needed more research. PROTECT was set up to
address these, and a number of other important topics, in The studies undertaken though IMI PROTECT, which
signal detection. provide the evidence base for each set of recommendations
The following recommendations1 relate to research that follow, are summarised here. For a full description of
conducted into quantitative signal detection in spontaneous the studies, refer to the cited original publication or tech-
reporting, clinical trial and electronic health records data- nical report.
bases conducted under the auspices of the Innovative
Medicines Initiative (IMI) PROTECT Work Package 3 2.1 Timeliness of Quantitative Signal Detection
(one of seven work packages) between September 2009 Using MedDRAÒ Terms and Groupings
and February 2015. The PROTECT consortium (Pharma-
coepidemiological Research on Outcomes of Therapeutics Different terms in standard medical terminologies can be
by a European ConsorTium; http://www.imi-protect.eu) used to describe the same suspected ADR. Many organi-
was a public–private partnership co-ordinated by the sations rely on disproportionality analysis for first-pass
European Medicines Agency (EMA) and received support screening of large collections of individual case safety
from the Innovative Medicines Initiative Joint Undertaking reports, in which the observed rate of a drug and adverse
(http://www.imi.europa.eu) under Grant Agreement No. reaction reported together is compared with an expected
115004. Full details of IMI PROTECT, including specific value based on their relative frequencies reported individ-
information about Work Package 3 and a detailed technical ually in the spontaneous reporting database. Confidence
report supporting the recommendations can be found at intervals or statistical significance tests are used to provide
http://www.imi-protect.eu/about.shtml. some protection against spurious associations and a certain
In attempting to advance the science of safety quanti- number of reports are generally required before an asso-
tative signal detection and with the overall goal of ciation can be detected. By grouping together related
increasing the efficiency of quantitative signal detection medical terms for the purpose of analysis, the observed
practices, the scope of IMI PROTECT Work Package 3 has count will increase, but so too may the corresponding
been ambitious, with 12 separate work streams covering a expected value. It is not known whether lumping or split-
wide range of research questions and undertaken between ting is preferable for timely quantitative signal detection. A
September 2009 and February 2015. To convert the previous study reported lower sensitivity but higher posi-
insights gained from these efforts into meaningful and tive predictive value for MedDRAÒ groupings than for
executable outputs, the following recommendations have preferred terms (PTs) using cumulative data, but did not
been developed: in total, 39 separate recommendations evaluate the timeliness of statistical signalling [8].
have been formulated, many based on comparative The study of Hill et al. [9] sought to determine to what
extent the use of standard MedDRAÒ2 groupings could
1
The non-binding recommendations presented in this report repre- expedite the detection of disproportionate reporting pat-
sent the views of the authors and do not represent the views or
terns for historical safety signals, relative to analysis by
policies of the authors’ respective affiliations (unless by coincidence),
even if employees of those organisations at the time of preparing this
2
paper. MedDRA is Medical Dictionary for Regulatory Activities.
Recommendations for Signal Detection from IMI PROTECT 471
individual MedDRAÒ PTs, separately, as is common Recommendation Rationale

practice. Analyses were performed in the World Health
Organization (WHO) Global Individual Case Safety Future research should evaluate Neither PTs nor HLTs are
Reports Database, VigiBaseÒ as of 5 February, 2010. The tighter custom-made groupings universally ideal for
of MedDRAÒ PTs for signal quantitative signal detection.
scope of the study was restricted to 13 medical concepts detection Gains in time by aggregating
identified as having medium to high probability of being PTs were observed in the
drug related [10]. The ADRs consisted of 43 historical PROTECT study when the
labelling changes by the EMA related to one of the 13 terms were very similar, in a
clinical sense [9]
medical concepts, derived from a previously published
Future research should evaluate Parallel analyses at different
reference set [11]. Each labelling change had an associated simultaneous analysis at terminological levels could
index date indicating when the EMA first became aware of different levels of the improve timeliness but have
the potential signal and initiated their investigation, which terminology resource implications [9]
was used for reference. For each medical concept, related Future research should explore a The PROTECT study was
high-level terms (HLTs), narrow standardised MedDRAÒ broader range of ADRs restricted to a selection of 13
ADR categories [9]
queries (SMQs) and PTs were manually identified. The
latter were selected among the PT associated with either
the HLT or the SMQ for each concept. For each medical
concept, separate analysis of each PT was conducted and
also a joint analysis of all related PTs together as a custom
group. Disproportionality analysis was based on the 2.2 Use of Novel Term Groupings Generated
information component (IC), adjusted for country of origin by Knowledge Engineering Techniques
and year of submission to VigiBase (simultaneously),
through a Mantel–Haenszel-type stratification, as described New methods based on knowledge engineering techniques
previously [12]. Lower 95 % two-sided credibility interval have been used to support the development of new
limits of the IC (IC025) were computed retrospectively for groupings of terms or new terminologies, respectively.
each quarter of a year from 1995 to 2010. For each level of For example, the French Common Classification of
the hierarchy, the quarter in which IC025 first exceeded zero Clinical Procedures (CCAM) was built using artificial
was determined. For the analysis using individual PTs, this intelligence tools from the European GALEN (General-
was the first quarter that an IC025 value related to any ized Architecture for Languages, Encyclopaedias and
single PT in the group exceeded zero. Nomenclatures) project [13]. Version 11 of the Interna-
The study found no overall benefit in conducting signal tional Classification of Diseases (ICD), which includes
detection using MedDRAÒ HLTs or SMQs compared with reference to multiple factors such as body systems,
using PTs. Some relatively minor gain in time to signalling symptoms and causal agents, was developed using the
was seen when closely related (in a clinical sense) ADR experience of international experts in medical informatics
terms where grouped together and this should be explored [14]. Thus, knowledge engineering techniques can be
further in future studies. used to derive novel groupings of adverse event terms
based on semantic definitions of each term [15] and these
Recommendation Rationale groupings may provide an alternative to the standard
groupings available in an adverse event terminology, such
For overall timeliness in The PROTECT study found no as MedDRAÒ HLTs or SMQs. In particular, knowledge
quantitative signal detection, advantage in conducting signal
engineering may allow for a more flexible approach to
analysis can be performed at detection at levels of
the MedDRAÒ PT level MedDRAÒ above the PT level defining groups, based on the relevant dimensions for a
and indeed observed a net loss specific topic of interest.
in timeliness of quantitative Two PROTECT studies [16, 17] employed a bespoke
signal detection from replacing
ontology (OntoADR [17, 18]) created using formal
an analysis at the PT level with
one at a higher level of the definitions of MedDRAÒ PTs. The formal definitions
hierarchy [9] were either inherited from mapped SNOMED clinical
Future research should evaluate The false-positive burden was out terms or defined in semi-automatic or manual processes
the false-positive burden for of scope for the PROTECT [18]. The semantic definition for the MedDRAÒ PT,
signal detection at each level of study [9]
‘Upper gastrointestinal haemorrhage’, is illustrated as an
the terminology
example:
The concordance between groupings derived with Recommendation Rationale

knowledge engineering techniques and standardised event
groupings, or manually derived groupings were studied and Consideration should be given Given that it has been shown to be
found fair concordance between the terms included in the to piloting the use of possible to generate relevant
knowledge engineering in novel groupings in MedDRAÒ, it
standard and proposed groupings [16, 17, 19], as well as developing groupings in other is reasonable to expect that it
high concordance between the corresponding measures of ontologies for application to would also be possible in other
disproportionality with randomly selected drugs in the other vocabularies and their ontologies, e.g. ICD10,
FAERS3 database [16, 17]. possible linkage SNOMED
Recommendation Rationale
2.3 Development of a Structured Database of SPC
Knowledge engineering The PROTECT studies show it is §4.8 as Reference Dataset
techniques may be considered possible to propose relevant
as an adjunct to the creation of novel groupings when no
The information available from the Summary of Product
custom groupings and SMQs predefined grouping is
designed for the selection and available in MedDRAÒ for a Characteristics (SPC) is increasingly used by computer
extraction of case reports in given safety topic (e.g. applications. For a computer program to make use of this
pharmacovigilance databases anaphylactic shock or upper information, it must be coded according to a well-defined
gastric hemorrhages) [16, 17]
and exhaustive dictionary. Probably the most commonly
Additional research would be Given the current state of used dictionary for ADRs is MedDRAÒ. It is recommended
necessary to validate if novel research, the clinical accuracy
groupings generated by of groupings generated by [20] that ADRs in the SPC be listed using exact (usually
knowledge engineering knowledge engineering is such preferred) terms from MedDRAÒ but this does not always
techniques can help in the that manual clinical review is occur in practice. Sometimes this is because the ADR
design of appropriate groupings still required and this still needs could be described using multiple MedDRAÒ PTs and
of MedDRAÒ PTs for use in to be validated against existing
signal detection or evaluation signal detection methods would thus be difficult to communicate efficiently to clin-
[16, 17, 19] ical staff in this format. Of course, no such problem arises
with a computer that can handle multiple terms efficiently.
Thus, it is worth investing the quite considerable effort
needed to convert the ADR information in the SPC for use
in computer applications. PROTECT took on the task of
3
The Food and Drug Administration Adverse Event Reporting creating a structured database of the ADR information in
System (FAERS) is a database that contains information on adverse section 4.8 of the SPC for all European centrally authorised
event and medication error reports submitted to the US Food and
Drug Administration.
products using MedDRAÒ as the coding dictionary.
The coding of the SPC was a three-stage process. First, Recommendation Rationale
strings of text corresponding to discrete concepts were
extracted manually and an exact match to the MedDRAÒ The structured database of ADRs The PROTECT database has
hierarchical dictionary at some level was sought using sim- for centrally authorised been used to provide a
products may be used as a reference in evaluating signal
ple procedures in SAS. If this first matching failed, the EMA reference to enhance detection methods and also to
sent the list of unmatched codes to the Uppsala Monitoring pharmacovigilance for these identify known ADRs emerging
Centre (UMC) where a fuzzy matching algorithm was run to products. The database is from routine signal detection
identify potential alternative matches to the strings [21]. The available here: http://www. activity, hence reducing
imi-protect.eu/adverseDrug unnecessary investigation.
final step was to subject remaining codes to expert evaluation Reactions.shtml [25] Other potential uses have been
at the EMA and Bayer Pharmaceuticals. identified but not yet tested
An obvious application of machine-readable ADR Structured databases of ADRs The current database does not
information arises in pharmacovigilance signal detection and their synonyms mapped to address the majority of
systems. Such systems use ADR reports that are them- MedDRAÒ should be set up to products authorised under
cover other products mutual recognition or national
selves coded in MedDRAÒ and hence it is easy to use the processes. Further work is
ADR dataset to determine immediately if a new signal of required if similar benefits are
disproportionate reporting (SDR) [22, 23] corresponds to a to be realised in
known ADR. This eliminates the need for manual inspec- pharmacovigilance systems
covering these products
tion of the SPC where the focus of the monitoring is the
A standard minimum structure To maintain the utility of
detection of new risks. This use has already been tested should be established for all databases and allow
successfully at the EMA and the UMC and implemented SPC ADR databases. The combinations across databases,
into their corresponding signal detection process. Another PROTECT database provides a a standardised core structure
potential application would be to alert doctors to a possible useful template for this will be essential although the
structure appropriate structure will
known ADR using the information on medications and depend on the intended
clinical events contained in electronic health records. A functions of the database. Thus,
further use is in research on ADRs, for example, it could be a coordinated approach with
used to identify products with similar ADR profiles or to wide consultation of intended
users would be needed. Co-
help construct reference datasets for evaluation of quanti- ordination of such an effort
tative signal detection algorithms [21, 24]. If similar could be undertaken by a large
databases are constructed in other regulatory systems, they regulatory agency or a cross-
could also be used to investigate the consistency of regu- industry organisation. For a
description of the database
latory decisions across these systems. structure see: http://www.imi-
A difficulty with mapping textual SPCs to MedDRAÒ is protect.eu/documents/
that a given medical concept may map to several PTs. Databasestructure.pdf [26]
When HLTs from MedDRAÒ are used in the SPC this is To facilitate signal detection, This is essential to facilitate the
handled easily but non-MedDRAÒ terms require consid- exact MedDRAÒ terms should construction of machine-
be used to identify ADRs in readable data sources that have
erable thought. No systematic process exists to build up SPC section 4.8 where feasible. a number of potential uses
groups of MedDRAÒ PTs corresponding to broader med- When an ADR involves very including the facilitation of
ical concepts. It is currently an incremental process that large numbers of terms and signal detection. See Eudralex
improves over time as non-standard terms are discovered requires an ad hoc name, Vol. 2: http://ec.europa.eu/
mapping from this name to the health/documents/eudralex/vol-
and the mappings refined through repeated scrutiny, and relevant MedDRAÒ terms 2/index_en.htm [27]
this area needs additional work. It would also be desirable should be maintained
to extend the database to products authorised through
national procedures.
the respective database owners. Signals of disproportionate
reporting were identified at monthly intervals and classified
Computerised text processing to The approximate matching as true positives if they corresponded to an entry in the
help in mapping non-standard system was used to find reference set. To measure the (algorithm’s) performance,
descriptions of ADRs to appropriate MedDRAÒ terms
MedDRAÒ codes should be when non-standard terminology these results were summarised as sensitivity and precision
considered both for efficiency was used in the SPC. This was (specifically, the positive predictive value) for each algo-
and consistency of coding usually successful and also rithm in each database. Time to signalling was also
practice much more efficient than investigated, as early detection is an important contributory
human intervention alone [28]
factor to effective pharmacovigilance.
In setting up a database of ADRs, Around half the ADRs listed in
a programme of maintenance SPCs are added as a result of Different algorithms gave very different levels of signal
should be established to reflect post-authorisation activities and detection performance across all spontaneous report data-
changes to the SPC from hence the database will require bases tested. However, increases in sensitivity were gen-
emerging safety issues or continuous attention to keep it erally associated with a decrease in precision and no
MedDRAÒ version changes up to date
method clearly dominated all others. The performance is
Consideration should be given to Lists of product ADRs are
establishing the value and currently maintained by
strongly dependent on the thresholds and other rules based
feasibility of having direct links regulators and by MAHs. These on the disproportionality statistics that define a statistical
between databases of SPC data may conflict either in detail or signal. However, the different disproportionality statistics
and other product information in coding conventions. Even did not themselves influence the achievable performance:
sources to prevent the need for when they agree, it is not
duplicate data sources, or avoid efficient to maintain
the choice of signal detection algorithm was much more
repetition in the types of data independent sources of important than the choice of disproportionality statistic.
collected in the different identical data Absolute performance of the same algorithm might be very
sources different between one spontaneous report database and
another but the relative performance of two algorithms was
2.4 Comparison of Disproportionality Analysis generally similar in different databases. Over the lifetime
Methods Within and Across Spontaneous of a product, there is a reduction in precision of any
Report Databases quantitative signal detection algorithm.
The changes in sensitivity and precision obtainable by
Most pharmacovigilance departments maintain a system to replacing one quantitative signal detection algorithm with
identify ADRs through analysis of spontaneous reports. The another are predictable. However, the absolute performance
majority of statistical methods used employ a dispropor- of a method is specific to the spontaneous report database and
tionality statistic calculated for each drug-event combination is best assessed directly on that database. The limits of per-
in the dataset and a signal detection algorithm that consists of formance of the current disproportionality statistics are
a set of conditions that the disproportionality statistic and, similar and new methods, involving substantially different
possibly, other statistics calculated for the drug-event com- approaches, may be required to gain appreciable improve-
bination must satisfy for a SDR to be identified. The nature of ments using spontaneous reporting data.
the spontaneous report databases in terms of size and drug
diversity varies between operators and it is unclear whether Recommendation Rationale
any signal detection algorithm can be expected to provide
Choice of a disproportionality Several disproportionality
good performance in a wide range of environments. A statistic for signal detection statistics are currently used in
number of different disproportionality statistics are in use should be primarily based on data mining spontaneous report
[12, 29–31], but they are conceptually very similar [32–34]. ease of implementation, databases. All these can achieve
The study of Candore et al. [35] compared the perfor- interpretation and optimisation similar overall performance by
of resources choice of appropriate signal
mance of a number of commonly used signal detection detection algorithm. Thus,
algorithms used across a range of spontaneous report choice should be based on
databases at national and international pharmacovigilance criteria other than signal
organisations and individual pharmaceutical companies. A detection performance. Factors
that might be considered
set of 220 products was chosen and a reference set of include the computing
ADRs was compiled based on SPC and company core data requirements to run the system,
sheets. Among four companies, one national agency and the ease of maintaining and
two international spontaneous report databases, 15 quan- adapting the system and
whether the operation of the
titative signal detection algorithms based on five dispro- system can be easily
portionality statistics were tested using a subset of products communicated to non-
that fell within the pharmacovigilance responsibilities of statisticians [35]
Recommendation Rationale Recommendation Rationale
Consideration should be given to In contrast to the choice of Consideration should be given to It is possible that some ADRs
the choice of signal detection disproportionality statistic, the carrying out comparisons of may be more easily found in
algorithm used with choice of signal detection quantitative signal detection some databases. This was not
disproportionality statistics algorithm to define a SDR can methods across spontaneous investigated in PROTECT
because these can have provide very different levels of report databases matching at the
important effects on quantitative signal detection drug-event combination level
quantitative signal detection performance in terms of rather than averaging over all
performance sensitivity, precision and time drug-event combinations
to signal. Hence, these criteria It would be useful to conduct Combination products and single
must be carefully selected on research to establish substances are often treated as
the basis of empirical evidence empirically the best method for unrelated in signal detection
[35] quantitative signal detection in systems; a question remains
For moderate to large In the PROTECT study, signal combination products whether combining data from
spontaneous report databases, detection algorithms with good these products will provide
the relative performance of a signaling properties (in terms of more or less accurate detection
quantitative signal detection sensitivity and positive of signals
algorithm in one database can predictive value) compared to Consideration should be given to Our research has shown a
be predicted from research in other signal detection establishing a framework for predictable trade-off between
other databases algorithms in one spontaneous selecting the best quantitative sensitivity and precision as far
report database also had signal detection algorithm to as purely quantitative signal
relatively good signaling suit the organisational goals and detection algorithms are
properties in other spontaneous resource available within a concerned. However, the means
report databases. The databases pharmacovigilance group of striking the correct balance
were both regulatory and between sensitivity and the
company based and ranged in concomitant burden of false
size from about 500,000 to positives for a given
5,000,000 reports. Hence, organisation requires careful
relative performance in consideration
moderately large databases can
be reliably inferred from
evaluations in other settings
[35] 2.5 Use of Subgrouping and Stratification
Absolute performance of the Although the relative in Disproportionality Analysis
selected quantitative signal performance of signal detection
detection algorithm must be algorithms is similar in Spontaneous report databases cover a range of products
validated in the target different spontaneous report
spontaneous report database databases, the absolute
aimed at diverse medical conditions and used across a
performance characteristics broad range of patient populations. This diversity is
may vary substantially. Hence, important as, for example, vaccines are given to healthy
it is advisable to test the chosen subjects, often children who are likely to have fewer
disproportionality statistic with
a range of signal detection
underlying medical conditions and consequently different
algorithms within the target reported background adverse events than the main popu-
database [35] lation of patients that use other medicines. Many quanti-
Consideration should be given to There appears to be a reduction tative signal detection algorithms disregard this diversity
the effect of reduced positive in precision with time and and give equal weight to information from all products and
predictive value with time on hence it may be more
the market productive to put additional
all patients when computing the expected number of
effort into the evaluation of reports for a particular drug-event pair, which may result in
signals from newer products. signals either being masked or false associations being
This finding has been validated flagged as potential signals. Stratification and subgroup
excluding ADRs identified
prior to authorisation from the
analyses are generally used in epidemiology to reduce
reference database but further confounding and highlight effect modification. Both of
work is ongoing to characterise these approaches may also have advantages in quantitative
this effect [35] signal detection.
Other published studies have suggested some benefits of Recommendation Rationale

stratified and subgroup analyses but often the analyses
included only a few key covariates or study products, and Subgroup analyses may be In spontaneous report databases
were conducted in single databases [36–44]. It is not clear beneficial in routine first-pass with over 0.5 million reports
signal detection and should be with broad diversity of
how generalisable these results are to other spontaneous considered. Stratified/adjusted products, subgroup analyses
report datasets of different sizes and characteristics. analyses are unlikely to provide tended to perform better than
Additionally, to our knowledge a head-to-head comparison added value stratified/adjusted analyses in
of stratified and subgroup analyses against a reference all spontaneous report
databases. Stratified/adjusted
standard has not been conducted. analyses were not found to
The study of Seabroke et al. [45] investigated the impact increase either sensitivity or
of stratification and subgrouping in signal detection algo- precision beyond random
rithms in spontaneous report databases of different sizes and variation [45]
characteristics using a range of key covariates (age, gender, Subgroup analyses can be Subgroup analyses within the
considered beneficial in large larger international datasets
calendar time period, country of origin, vaccines/non-vac- international spontaneous consistently showed benefits in
cines, event seriousness, reporter type and report source). report databases with over 2 both precision and sensitivity
Signal detection performance was measured against a ref- million reports. Smaller over crude analyses for two
erence standard. Disproportionality analyses were con- datasets especially those with disproportionality methods/
reports from only one country thresholds with differing
ducted using either stratified or subgroup approaches and may need to consider a likely performance characteristics.
compared with an unstratified crude analysis. Stratified and tradeoff between increased For the smaller spontaneous
subgroup analyses calculated disproportionality statistics precision with some loss of report databases, a gain in
within each individual stratum separately. For the stratified sensitivity if subgroup analysis precision tended to result in
was to replace a crude or some loss of sensitivity
analyses, these were combined into a single value using a adjusted analysis particularly for the stricter
Mantel–Haenszel approach, whereas for subgroup analyses, disproportionality method/
a positive signal was counted if any of the individual strata threshold and for the regulatory
met the signal criteria. The results were presented as sensi- dataset with reports from only
one country [45]
tivity and precision (positive predictive value) for each
Subgrouping by seriousness of Subgrouping by seriousness of
approach calculated using a reference set of ADRs compiled ADR or routinely excluding the ADR defined using the IME
from the SPC and company core data sheets as a proxy for legal cases is unlikely to lista had little effect on
true positives. Additional analyses included investigating the provide benefits in signal sensitivity or precision in any
benefit of combined subgroup/stratified variables and also detection in terms of increased spontaneous report database.
sensitivity or precision An analysis excluding cases
investigating the impact of a permutation analysis that used submitted by lawyers also had
randomly split strata of equal size to a real variable of interest little effect in all spontaneous
and compared the results with those for the real variable. report databases apart from the
Whilst the spontaneous report databases employed in largest international database,
which showed an increase in
this study included large international, national and sensitivity and precision when
industry datasets, the results may not be generalisable to all legal cases are excluded [45]
spontaneous report databases, particularly those with a Subgrouping by gender, reporter Subgrouping by gender, reporter
small volume of reports. The results from this study type and 5-yearly time points type and 5-yearly time points
showed that subgroup analyses consistently performed may provide modest showed a modest improvement
improvement in precision in all, in precision for all spontaneous
better than stratified analyses in all databases. Subgroup and sensitivity in some, report databases and improved
analyses were also shown to provide clear benefits over spontaneous report databases sensitivity for larger and
crude analyses for some databases whilst stratified analyses international databases.
were not found to increase either sensitivity or precision Implementation of these
subgroup analyses into routine
beyond that associated with analytical artefacts of the signal detection may provide
stratified analysis. some benefit [45]
Recommendation Rationale Recommendation Rationale
Subgrouping by age, country or Subgrouping by age, country of Future research should evaluate Results for subgroup analyses that
continent of origin, or a origin, continent of origin and a the use of subgroup analysis in used an overall threshold of
combination of these variables, combination of these variables parallel with crude and/or n applied to the whole drug event
may confer improved precision showed the highest adjusted analysis combination showed large
in all and enhanced sensitivity improvement of precision in all increases in sensitivity but with
in some spontaneous report spontaneous report databases loss of precision. Further
databases and sensitivity in the larger validation would be needed
databases. Implementation of within each organisation to
these subgroup analyses may be ascertain whether this approach
beneficial in optimising is sustainable in respect of
quantitative signal detection resources available to evaluate
[45] an increased number of false
Subgrouping by vaccines/non- Subgrouping by vaccines/non- positives [39]
vaccines should not be vaccines resulted in a decrease a
The EMA Important Medical Event Terms (IME) list (https://
implemented without careful in both precision and sensitivity eudravigilance.ema.europa.eu/human/textforIME.asp)
consideration of the desired in all spontaneous report
effect databases. This was almost
exclusively driven by the 2.6 Influence of Masking on Disproportionality
vaccines subgroup. These Analysis
effects were owing to the
suppression of listed vaccine
ADRs as a result of comparing Disproportionality analysis uses the spontaneous report
vaccines to each other. This database itself as the basis for computing an expected
may be desirable for certain number of reports on a particular drug and adverse event.
reactions e.g. injection-site
reactions but undesirable for
This is based on the assumption that true causal relation-
other more serious reactions ships between drugs and events do not influence the overall
e.g. Guillian–Barre syndrome reporting rates and that the degree of under- (or over-)
[45] reporting for a given event is approximately the same for
Where subgrouping by variables Including missing data in the all drugs. In practice, this assumption may be invalid for
with considerable missing data subgroup analyses for age and
(e.g. age, gender) is undertaken, gender increased sensitivity in
some drugs and events in a spontaneous reporting database.
consideration should be given all spontaneous report For example, attention to a real or perceived safety issue in
to including a stratum for databases but tended to also the medical community or in public media may increase
unknown rather than excluding decrease precision. In the reporting rate for that drug-event pair to such an extent
these cases spontaneous report databases
with higher levels of missing
that the overall reporting of that event is affected and
data (20? %) the increase in potentially masks alerts from other drugs [46–51].
sensitivity was greater than the The two studies that made up this PROTECT work
decrease in precision [45] package [49, 50] looked at the impact of masking on dis-
Subgrouping with a threshold Results for subgroup analyses proportionality analysis. An algorithm was used to generate
based on number of reports that used an overall threshold of
may benefit from basing the n applied to the whole drug
a masking ratio whereby the masking effect of one drug on
threshold on the entire drug- event combination showed another can be estimated for a specific event and suggests a
event combination rather than large increases in sensitivity but simplified version valid under certain conditions. A simu-
within each individual stratum with loss of precision. Further lation study was performed that focussed specifically on
validation would be needed
within each organisation to
comparing differences between the simplified and exact
ascertain whether this approach masking ratio [49]. The impact of different approaches to
is sustainable in respect of treating the reports was also explored, including both the
resources available to evaluate suspected masking drug and the suspected masked drug as
an increased number of false
positives [45]
concomitant medicines. In a follow-on study, the extent
and impact of masking was studied in two spontaneous
reporting data sets: EudraVigilance and Pfizer’s sponta- continued

neous report database [50]. The latter evaluated the impact Recommendation Rationale
of removing from the analysis for each ADR the drug with
the highest masking ratio. It was found that the drugs Future research should compare This was outside of the scope of
inducing the highest masking effects tended to be those that disproportionality-based the PROTECT studies and
approaches for unmasking to there appears to be no
are known to cause the ADR in question. Under the con- other statistical approaches published research on this topic
ditions of the study (assuming that each ADR is masked by (e.g. logistic regression models)
exactly one drug), it was rare that the unmasking analysis that could also be used to
affected whether a drug-event pair was considered to be account for masking effects
disproportionally reported or not. However, it is important The use of simple unmasking Results indicate that the
algorithms as a means of performance of the simplified
to note that the drug-event pairs affected in this way pri- reducing computation methods is comparable to that
marily involved rarely reported ADRs. complexity and improving of more complex methods
transparency should be while the computational
explored in a future study complexity is reduced and
transparency improved, but
further research is needed to
Quantification of the masking Results indicate that many drugs fully explore this on datasets
effect of drugs on adverse and adverse reactions are not with different properties
reactions or adverse reactions affected by masking. Avoid [49, 50]
on drugs could be used as a complicating the analysis of data
diagnostic tool of the extent of by adding an unmasking 2.7 Drug–Drug Interaction Detection
masking at two levels: procedure when masking is not
For determining, based on the an issue. Formulas for assessing
the effect of masking can be
Adverse drug–drug interactions (DDIs) harm large num-
general characteristics of a
spontaneous report database, if found in papers by Maignen et al. bers of patients every year. Not all DDIs are known when
the application of an unmasking [49, 50]. As the characteristics of new medicines are made available to the general public,
algorithm would be worthwhile the spontaneous report database but individual case reports may enable post-marketing
change over time, it is still
At the product-event pair level,
interesting to monitor the extent
detection. Earlier studies have indicated that statistical
if a specific concern is raised measures for DDIs that use additive baseline models per-
of masking periodically. During
about a potential masking effect
signal evaluation, some form better than those that use multiplicative baseline
driven by another product or
evaluation of the masking effect models [52, 53] but no broad evaluation has been reported
another group of products
could be performed at the level of
the product-event pair [49, 50]
in the literature. A study was conducted to compare the
If the masking effect of drugs on Reducing the effect of masking
sensitivity and specificity of different statistical measures
adverse reactions or adverse can increase the sensitivity of for ADR detection against established and emerging
reactions on drugs is quantitative signal detection adverse DDIs, respectively.
substantial, applying an and, in principle, result in Analyses were performed in VigiBase where four sta-
unmasking algorithm should be earlier identification of new
considered drug-event associations [49, 50]
tistical measures for DDI detection were evaluated: one
based on regression with a multiplicative baseline model
If false negatives are a major If unmasking and standard
concern, unmasking of drugs disproportionality analyses are [53], one based on regression with an additive baseline
and/or adverse reactions can be used in parallel, sensitivity will model [53], one shrinkage disproportionality statistic with
used in parallel with standard be equal to or higher than that a multiplicative baseline model [52] and one shrinkage
disproportionality analysis to of standard disproportionality
disproportionality statistic with an additive baseline model
improve sensitivity and analysis alone, but parallel
timeliness but this benefit must analyses of the data also [52]. The reference set for known interactions consisted of
be balanced against the cost in increase the false-positive rate, 74 established DDIs and 29 pairs of drugs with an ADR for
increased evaluation of false from spurious associations which there was no empirical support for a DDI. The ref-
positives [49, 50]
erence set for emerging DDIs included 324 adverse drug
Future research should explore In the absence of public health
interactions added to Stockley’s Interaction Alerts between
the effectiveness of unmasking evidence from prospective
in terms of true/false positives studies on the benefits of 2007 and 2009, and 324 9 20 combinations with two
revealed by an algorithm removing the masking (or drugs that were not listed together as known to interact in
situations in which unmasking the same reference, in 2009. The majority of the ADRs
could be beneficial), the use of
were investigated at the Meddra PT level.
a particular algorithm should be
directed by the rate of true The study was limited to statistical interaction measures,
signals/false positives revealed whereas recent research has suggested that predictive
by the removal of the models accounting for multiple aspects of strength of evi-
unmasking effect [49, 50]
dence may perform even better [54]. The algebra of the
statistical interaction measures is such that additive models however, where there is a significant impact there is
impart superior sensitivity regarding the detection of DDIs potential for the cases to mislead clinical assessment.
compared with multiplicative measures, at any given value Many organisations rely on rule-based duplicate detec-
for the threshold. However, the additive models used in this tion methods, which rely on exact matching of a number of
study also demonstrated better specificity compared with elements within individual case safety reports and dis-
the multiplicative models. Thus, statistical interaction playing the putative duplicate cases for review. Rule-based
measures with additive baseline models outperformed approaches can vary significantly in their complexity and
those with multiplicative baseline models for both estab- based on the fields used, and may consider patient, reporter,
lished and emerging adverse DDIs. However, given the drug or reaction details, or a combination of them. The
small number of cases attributed to the interacting drugs study compared methods [57] used in the UK, Denmark
together, both models gave low sensitivity (\20 %) for and Spain. The UMC had previously published a proba-
emerging adverse DDIs, at conventional signalling criteria. bilistic record matching algorithm that indicates the like-
lihood of cases being duplicates (vigiMatch [58]) as an
alternative to rule-based approaches. The PROTECT study
attempted to quantify the benefits (or disbenefits) of the
Statistical interaction measures Statistical interaction measures different approaches used by PROTECT consortium
with additive baseline models with additive baseline models members.
should be preferred over those provided better sensitivity and A first phase of the study aimed to evaluate probabilistic
with multiplicative baseline equal or better specificity for
models for detecting signals of both established and emerging
record matching for duplicate detection compared with
DDIs in spontaneous report DDIs [55] rule-based approaches. Studies considered positive pre-
databases dictive value and numbers of false positives of different
Future research should explore This was out of scope for the approaches, and in addition attempted to characterise the
how statistical interaction PROTECT study, but recent main causes of duplication. Initial research was undertaken
measures with additive baseline research has found that
models can best be predictive models accounting
using the WHO Global Individual Case Safety Reports
incorporated in broader for multiple aspects of strength Database, VigiBase, for reports submitted between 2000
predictive models of adverse of evidence perform better than and 2010. Suspected duplicates for the UK, Denmark, and
drug interactions, and in routine statistical measure of Spain were reviewed and classified by the respective
signal detection interaction alone [54]
national centre. This included evaluation to determine
whether confirmed duplicates had already been identified
by in-house rule-based screening. A second phase of the
study directly compared results from the MHRA’s rule
2.8 Duplicate Detection based approach to the UMC’s probabilistic record match-
ing approach using data received in the MHRA’s Sentinel
Individual case reports of suspected harm from medicines database during 2013.
are fundamental for signal detection in post-marketing Probabilistic record matching performed positively
surveillance. Their effective analysis requires reliable data when compared with rule-based approaches. Specifically,
and one challenge is report duplication. These occur when probabilistic record matching demonstrated a high predic-
multiple unlinked records are recorded describing the same tive value above that of rule-based methods and is expected
suspected ADR in a particular patient. Report duplication is to improve efficiency and accuracy of duplicate manage-
known to occur for a diverse range of reasons, including ment. The study showed very few false positives suggest-
reporting of an ADR from multiple primary sources, ing it may be possible to increase sensitivity while ensuring
requirements for marketing authorisation holder reporting false-positive rates are kept at a reasonable level. The study
of literature cases (and subsequent retransmission), and [57] highlighted that case management system changes or
technical or administrative issues [56]. De-duplication of upgrades can occasionally result in large numbers of
data is also widely understood to be a time-consuming duplicates, either in the source system, or those external to
process; however, little research had been undertaken prior the organisation. It was also noted that proliferation of
to PROTECT on the efficiency of different de-duplication duplicated cases within databases occurred as a result of
methodologies. Duplicate ICSRs are known to distort sta- rapid submission and re-transmission of cases to multiple
tistical screening [56] both increasing the numbers of false- stakeholders made possible by electronic systems. This
positive and false-negative signals. The net effect of emphasises the need for swift and robust duplicate detec-
duplicates (and de-duplication techniques) is unknown, and tion procedures at each organisation. Data privacy
may not be consistent across all products in the database; requirements were noted as a barrier to the most effective
duplicate detection and it was considered that approaches Recommendation Rationale

to deidentifying data (for example, scrambling dates and
patient initials) in a way that permits duplicate detection Further evaluation should be This was beyond the scope of the
should be pursued to allow for effective duplicate detection done to understand the impact PROTECT study. If this
of automatic exclusion of approach proved successful
in databases that pool reports from different sources. potential duplicates from manual duplicate detection
Although beyond the scope of the PROTECT study, quantitative signal detection activities could be eliminated
further evaluation should be undertaken to understand the algorithms resulting in time/resource
feasibility and impact of automatic exclusion of potential savings [57]
duplicates from quantitative signal detection algorithms. Approaches to deidentifiying data This will reduce the negative
(for example, scrambling dates impact of data privacy laws, for
Such an approach may help ensure independence of and patient initials) in a way duplicate detection in
reports, which is a fundamental assumption underlying the that permits duplicate detection international databases [57]
computation of confidence intervals for all disproportion- should be pursued to allow for
ality statistics; however, it equally may remove clusters of effective duplicate detection in
databases that pool reports from
reports from the same reporter that may be important for different sources
patient safety. A better understanding of the reasons for
related cases that are not considered duplicates, and their
scientific implications for signal detection will help deter-
mine if this is a viable approach.
2.9 Relationship Between Disproportionality

Measures (i.e. PRR) and Risk Estimates
Probabilistic record matching Probabilistic record matching
should be considered as an demonstrated a high predictive Spontaneous reports are only submitted when a patient
alternative to rule-based value above that of rule-based
methods for duplicate detection methods in our study, and is
exposed to a medicine experiences a suspected ADR and
in pharmacovigilance expected to improve efficiency the reporter decides to report the case. It is difficult to
and accuracy of duplicate determine the extent to which the numbers of these
management [57] reports may help to reflect a measure of association of
Care should be taken to avoid Our study showed that such adverse outcomes in the drug-exposed population. Thus,
case duplication during system changes on occasion resulted in
changes/upgrades, considering very large numbers of
the quantitative data in spontaneous reporting systems,
both internal aspects and case duplicates [57] while being useful in detecting new signals of drug-
transmission to external event associations, are not easily interpretable in terms
organisations of clinical impact. Nevertheless, quantitative signal
Rapid electronic re-transmission There are a large number of detection systems consider threshold levels for dispro-
of spontaneous adverse drug duplicates in spontaneous
portionality measures and therefore higher values of
reaction reports between reporting databases, which are
databases can increase the shown to affect quantitative disproportionality statistics are one of the factors that
number of duplicates to the signal detection scores. Rapid influence the decision to investigate particular drug-
extent that disproportionality transmission of cases by event combinations. Hence, it makes sense to ask whe-
statistics are are significantly electronic systems exacerbates
ther there is a direct relationship between the magnitude
affected, emphasising the need this issue, meaning that
for swift and robust duplicate accurate (and ideally, non- of disproportionality statistics and the magnitude of the
detection and management burdensome) duplicate association between a product and an adverse effect
processes in databases that detection processes are required from pharmacoepidemiological studies.
employ electronic data to mitigate this unwanted
A study was conducted to determine the proportional
exchange impact on disproportionality
statistics [57] reporting ratios (PRRs) for a set of known ADRs and
Further work should be Our study showed very few false compare them with estimates of association from formal
undertaken to explore lowering positives, so it should be epidemiological studies [59]. A set of 15 confirmed
the threshold for the tested possible to increase sensitivity ADRs were selected from the initially identified dataset
probabilistic record matching while ensuring false positive
of pharmacovigilance driven European Union regulatory
method and methods in general rates are kept at a reasonable
to evaluate the balance of false level [57] actions and for which relative risk estimates from formal
positives and negatives studies were available and were considered to provide
well-established evidence supporting the respective reg- Recommendation Rationale

ulatory actions. Prior to any calculation of PRR, the
studies were collated and a best estimate of the risk ratio It may be possible to use the PRR The PRR observed before general
selected on the basis of pre-specified rules. When at the early phase of the awareness of an ADR shows a
analysis of a new safety signal good correlation with the
available, this estimate was that determined during the as an indicator of the likely strength of the association in
European Union regulatory assessment. At the same strength of the association, terms of relative risk or odds
time, an estimate was made of the date at which each should the signal be confirmed ratios later established by
ADR was first publicly recognised. controlled studies. However,
the PRR is not a direct
Only after the risk ratio was decided upon was the estimator of the risk ratio and
PRR calculated by reconstruction of the spontaneous should be considered only in
reporting data system at the predetermined date of first the absence of any more
recognition in the medical community of the ADR. The reliable evidence. The caveat
‘should the signal turn out to be
primary analysis used the EudraVigilance spontaneous confirmed’ must be observed.
report database and an additional analysis was carried The study analysis does not
out in FEDRA, the Spanish national spontaneous report compare the distribution of
database. Case definitions for the ADR of interest were PRR values for ‘true’ and
‘false’ signals of
developed for PRR calculations for each drug-event pair, disproportionate reporting and
aiming to reproduce the case definition as used in the no inference can be made about
epidemiological studies providing relative risks. An ini- whether the initial magnitude of
tial dataset of 78 drug-event pairs was obtained follow- PRR gives information about
the nature of the association
ing the initial inclusion criteria. Following the exclusion (causal or otherwise) [59]
criteria, 15 drug-event pairs were finally selected. The Following the initial detection of This study shows that, at least in
pairs include 13 different ADRs and 14 different drug- a signal of a specific drug-event this selected set of study cases,
s/classes. Four of the ADRs represented class effects. association, PRR values based the underlying relative risk
Four topics were related to outcomes of safety referrals on clinical definitions of the seemed to influence both the
adverse event may serve to direction and magnitude of the
concluding in a CHMP Opinion and 11 were recom- provide an estimate of the PRR calculated with a similar
mendations from the Pharmacovigilance Working Party likely size of clinical effect and case definition of the adverse
to CHMP or to National Competent Authorities. Eleven be included among the criteria event. Because the study
of 15 drugs were non-centrally authorised medicines and for initial prioritisation of its sample comprises drug-event
assessment associations confirmed
four were centrally authorised in the EU. following assessments of
An orthogonal regression model showed a significant diverse data sources and signal
association between relative risks and PRRs. This sug- detection systems, the results
gests that, in some cases, there is a relationship between may be applicable to PRRs
calculated following both
the PRR calculated immediately prior to first awareness quantitative and traditional
of the safety topic and estimates of relative risk taken signal detection approaches
from published epidemiological studies, where the signal [59, 60]
turned out to be confirmed. It is noted that no validation Consideration should be given to PRR values generated in different
of the model has been performed and that despite the repeating these analyses in ADR datasets are unlikely to be
other ADR datasets to see the same. Other IMI PROTECT
good correlation shown between RRs and PRRs in this whether they can be replicated research has focussed on the
exercise, it is emphasised that PRR cannot replace RR. and, if they can be, to establish performance of
Thus, calculation of PRRs from spontaneous reporting the relevant scale factor disproportionally statistics and
databases should not replace nor delay the performance of different signalling
algorithms in different ADR
of formal epidemiological studies but could however be datasets. However, to date no
an indicator of the likely clinical importance of the attention has been paid to
adverse reaction, should the signal be confirmed subse- describe and explain
quently [60]. differences in the calculated
PRR values in different datasets
The comparison between exploratory and confirmatory
analysis was based on 13 published studies of possible ADRs
Consideration should be given to The findings from an IMI in THIN [71]. The selected studies listed a total of 56 drug-
further exploring whether PRRs PROTECT study on sub- event pairs that were included in the analysis. For each pair,
adjusted by subgroup variables grouping and stratification [45]
improves the correlation with suggest that subgrouping may the analysis results of the study (positive or negative) were
measures of association from be a useful strategy to try to compared with those that would result from analysis with
studies improve the correlation standard design choices for a calibrated self-controlled
between the PRR and the cohort analysis in THIN. Observed differences were closely
estimates of risk from studies
scrutinised to identify areas of possible improvement for the
Consideration should be given to The medical concepts used in the
exploring whether PRRs studies to derive the estimates standard implementation of the design, for exploratory
calculated for single of relative risk often described analysis. In our comparison to published epidemiological
MedDRAÒ PTs as is in broader medical concepts than studies, a common discrepancy was that the epidemiological
EudraVigilance monitoring the MedDRAÒ PT level used in study performed analysis for all drugs in a class together and/
behave in the same way as the EudraVigilance for the PRR
clinically defined case screening analysis (see also or for a number of related medical events together, which
definitions in terms of Recommendations in relation to improves power. However, our detailed review often found
correlation with measures of Timeliness of Quantitative substantial and important differences among different drugs
association from studies Signal Detection using in the same class or among different medical events in the
MedDRAÒ Terms and
Groupings) same category. Clearly, more research is needed to minimise
terminological obstacles for exploratory analysis of longi-
2.10 Signal Detection in Longitudinal Observational tudinal data, as well as across the different data streams [70].
Data None of the false positives were considered to represent a
chance association: all were considered to be the result of
Post-marketing surveillance aims to identify and charac- systematic variability.
terise risks of medicines. At present, signal detection is For prospective identification of potential signals in
predominantly based on individual case reports, but the use electronic health records [72], a questionnaire for struc-
of electronic health records and insurance claims to detect tured assessment was designed and iteratively refined
ADRs is an area of active research [61–64]. While some through pilot testing. It covered aspects such as the nature
work has focussed on traditional sequential approaches and of the temporal pattern, the presence of co-medications
looking to use them for signal detection, new methods have associated with the medical event, the likelihood of con-
also been adapted or proposed [65–68]. Other publications founding by underlying disease, and other alternative
have highlighted the challenges and limitations of using explanations for observed temporal associations. In real-
longitudinal observational data for signal detection [69, 70]. world use, its purpose would be to provide analysis and
The objectives of the three studies that made up this decision support for potential signals identified through
PROTECT work package were (1) to better characterise the prospective and open-ended screening of longitudinal
opportunities and challenges for prospective signal detection electronic health data. For the purpose of the study, drug-
in longitudinal electronic health data, including the devel- event pairs temporally associated according to a calibrated
opment and evaluation of a process for the structured self-controlled cohort analysis in THIN were randomly
assessment of potential safety signals from electronic health selected for review. Six assessors trained in pharmacovig-
records; (2) a comparison between exploratory and confir- ilance and/or epidemiology participated in the main study
matory analyses of longitudinal electronic health data and; and each evaluated up to 20 temporally associated medical
(3) a performance evaluation of quantitative signal detection events per drug, for seven randomly selected drugs [72].
in longitudinal electronic health data and individual case Our analysis highlighted a number of potential safety sig-
reports, respectively, for emerging safety signals. All anal- nals in electronic health records that merit further review.
yses were performed in The Health Improvement Network These range from life threatening to those that are less
(THIN4) database of longitudinal electronic health records serious, but important for patients and for adherence.
from general practices in the UK. However, three out of four temporal associations identified
in the initial screen could be dismissed from further eval-
uation after the initial review. In other words: without a
4
THIN is an electronic medical record data resource including over review, the majority of the highlighted associations would
12 million individual patients from the UK, with over 3.8 million have been false positives. A minority of the dismissed
currently active patients. The electronic medical records are collected
associations were considered to be owing to random vari-
from general practices in primary care (http://www.ncbi.nlm.nih.gov/
pubmed/22828580). ability; most were the result of biases and other systematic
. effects [72].
The performance evaluation of quantitative signal Recommendation Rationale

detection in longitudinal electronic health data and sponta-
neous reports was based on a reference set of 264 historical Safety signal detection in Clinical review of statistical
safety signals derived from a previously published study longitudinal observational data signals is fundamental in
should include clinical, evaluating signals arising from
[11], and 5280 negative controls. The literature is very sparse pharmacological and spontaneous report databases.
on such comparisons. One example is from Trifirò et al. [73], epidemiological review of In our study of structured
who as part of the EU-ADR (European Commission spon- identified temporal associations assessment for prospective
sored project ‘Exploring and Understanding Adverse Drug identification on safety signals
in electronic health records,
Reactions by integrative mining of clinical records and three out of four temporal
biomedical knowledge’), looked at the number of drug-event associations identified in the
combinations that were highlighted as disproportional in initial screen could be
each of the data sources. Analyses were performed in THIN dismissed from further
evaluation after initial review.
and VigiBase, each backdated to the end of 2004, to match Without review, the majority of
approximately the time of the historical signals. The analysis the highlighted associations
in VigiBase was repeated with a restriction to reports origi- would have been false positives
nating only from the UK. The retrospective evaluation [72]
against historical safety signals for European centrally To the extent possible, temporal In our prospective identification
associations detected in study, in-depth review of the
authorised products showed that none of them could be longitudinal observational data chronograph temporal patterns
detected in THIN with the method we used, prior to the initial should be further explored with proved a valuable component
signal at the EMA. In many cases, this was because of the statistical graphical methods of the expert review. Univariate
drug not being available on the UK market at the time, or the measures of temporal
association may over-simplify
drug or medical event not being reliably captured in primary or obscure the underlying
care. In contrast, some of the positive controls could be patterns in such rich, complex
detected in VigiBase, even when we restricted the analysis to and often long records [72]
individual case reports originating from the UK. Safety signal detection in Our retrospective evaluation
longitudinal observational data against historical safety signals
should account for limitations for European centrally
Recommendation Rationale of the underlying data and take authorised products showed
measures to ensure appropriate that none of them could be
Longitudinal observational data Individual case reports of
interpretation. In selecting the detected in THIN with the
should be further explored as a suspected harm from medicines
data set for analysis, one should method we used, prior to the
complement to signal detection have a proven value for safety
account for both its size and initial signal at the EMA. In
using individual case reports signal detection. However, they
scope (which drugs and many cases (to be further
but cannot currently replace are not optimal for detecting
diagnoses it captures) and for specified once we have the
individual case reports for this increased rates of multifactorial
the fact that effective review of data), this was because of the
purpose adverse drug reactions with high
identified temporal associations drug not being available on the
background incidence.
requires expert knowledge of UK market at the time, or the
Longitudinal observational data
the underlying data, which is drug or medical event not being
provide the basis for
particularly relevant for large reliably captured in primary
epidemiological evaluation of
heterogeneous data sets care
such associations and should in
principle enable their initial Future research should explore In our comparison to published
identification. However, we lack the relative merits of epidemiological studies, a
evidence to suggest that signal performing safety signal common discrepancy was that
detection in longitudinal detection in longitudinal they performed analysis for all
observational data can match observational data for groups of drugs in a class together and/or
the performance of signal medicinal products and medical for a number of related medical
detection in individual case events, instead of or in parallel events together, which
reports for all drugs and medical with that of individual products improves power. However, our
events. In our evaluation of and events detailed review often found
historical safety signals from the substantial and important
EMA, none of the positive differences among different
controls could be detected in the drugs in the same class or
THIN database at an early stage, among different medical events
whereas this was possible in in the same category.
VigiBase for some of the
signals, even when we
considered only the subset of
the UK individual case reports
[72]
2.11 Signal Detection in Clinical Study Data provided valuable information relating to the decision to
proceed with further development of the compound.
In the pre-approval phase of a drug, systematically col- One important characteristic of signal detection in
lected adverse event data from randomised clinical trials clinical trials as well as in other databases is a focus on the
are the principal data source and therefore the cornerstone evaluation of a large number of endpoints. Analyses of
of safety analysis. Specifically, clinical trial data allow the adverse event data typically generates multiple risk mea-
estimation of exposure, which is one of the limitations of surement estimates associated with events across several
spontaneous reporting datasets [7]. In addition, randomi- body systems. Selection of outcomes for further evaluation
sation itself is a powerful feature of randomised controlled by conventional hypothesis testing of between-group dif-
trials that addresses the issue of confounding, which ferences for each endpoint can be problematic. Ignoring the
otherwise complicates attempts to identify imbalances in fact that the data determine the hypotheses that are tested,
the incidence of adverse events between the treated and by using a process that involves a large number of com-
untreated subjects in data from non-randomised sources. parisons across multiple analysis time points multiplicity
Because of this, clinical trial data form a natural source for becomes an issue. Acknowledging P-values as a useful
signal detection, especially in the early phase of a drug’s flagging device, ICH E9 recommends statistical adjust-
lifecycle. ments for multiplicity when applying hypothesis tests to a
Two different types of analyses performed in company large number of safety variables in clinical trial data.
clinical trial databases are presented: in the first, the use of However, probably owing to the concern of missing true
extreme value modelling for the prediction of potential safety signals (i.e. false-negative signals), multiplicity
drug toxicity was evaluated; in the second, different adjustment has also been described as counterproductive
approaches for dealing with multiplicity issues for adverse and again probably because of these and similar reserva-
event-based signal detection were compared. tions, adjustments are often avoided, despite the fact that
It is often the case that potential drug toxicity is sug- ignoring multiplicity may easily result in an excessive rate
gested by the occurrence of extreme values of clinical of false-positive signals. The EudraVigilance Expert
variables, rather than changes in the location of the dis- Working Group notes that thresholds commonly used to
tribution of these variables. For example, large values of detect signals in spontaneous data are a trade-off between
serum creatinine and blood urea nitrogen are used in the two conflicting goals: ‘‘either generating too many false
diagnosis of acute renal failure, large values of the QT positive signals if the threshold is too low or missing true
interval are suggestive of cardiotoxicity and large values of signals if this threshold is too high’’. Given this trade-off, it
alanine aminotransferase suggest hepatotoxicity. Many is important to identify and calibrate methods to strike a
other examples of abnormally large (or small) values of reasonable balance between these two parameters.
laboratory variables being indicative of safety issues can be The objective of this study [75] was to investigate dif-
found in the Common Terminology Criteria guidelines,5 ferent approaches to address multiplicity for the use of
and, in general, any laboratory values that fall above the signal detection methods to select ADR candidates in
upper limit of normal or beneath the lower limit of normal clinical trial data. The aim was to identify the best per-
may be indicative of a potential safety issue. Often, forming method that maximises the proportion of correct
extreme values are poorly predicted by an analysis that signals (i.e. the positive predictive value) as compared with
focuses on the central features of the distribution. There- the CDS as the gold standard. In addition, the use of dif-
fore, analyses of means or medians tend to be uninforma- ferent MedDRAÒ levels for signal detection reflects the
tive. Extreme value modelling might therefore provide a importance of considering the MedDRAÒ hierarchy for
novel approach to the prediction of drug toxicity early on signal generation (see above, ‘Timeliness of Quantitative
during drug development. Signal Detection using MedDRAÒ Terms and Groupings’).
A retrospective analysis of phase II data using extreme In addition to basing the analyses on the PT as the smallest
value modelling was conducted for ximelagatran [74], a unit of analysis, the use of available company-specific
compound that was denied marketing approval in the USA grouping of PTs developed specifically for the purpose of
and was withdrawn from those markets in which it had labelling so-called medical labelling groupings was
been approved because of concerns over potential liver evaluated.
toxicity. The analysis showed that the phase II data were As expected, the ability for ADR detection was highly
predictive of the phase III results and, had the methods influenced by ADR frequency. In general, all model types
been available at the time, such analyses would have that took multiplicity into consideration proved appropriate
for the detection of signals. The Bayesian hierarchical
5
Common Terminology Criteria for Adverse Events, US National model that can make use of the hierarchy, thereby bor-
Cancer Institute (http://ctep.cancer.gov/). rowing strength, performed best among the quantitative
signal detection methods, especially with regard to time to Recommendation Rationale

signal. In our analysis, the use of medical labelling
groupings did slightly improve the performance of quan- Properly trained, Independent When an IDMC exists, there are
titative signal detection algorithms. Data Monitoring Committees sometimes reasons for
or Safety Review Boards are additional monitoring. It
In summary, taking multiplicity into consideration likely to benefit from extreme follows that applying proven
resulted in lower rates of false-positive signals. The hier- value modelling of unblinded methodology to emerging data
archical Bayes and double-FDR, which also account for the data will provide the best chance of
hierarchical structure of the underlying medical terminol- identifying and characterising
the safety issue as soon as
ogy dictionary, demonstrated the best performance in the possible [74]
investigated setting. Considering the good positive pre- When extreme value modelling It is reasonable to expect that
dictive value while providing similar sensitivity, these does not find evidence of a some toxic effects of drugs will
methods can provide an alternative to the often-used safety signal in studies of short not manifest themselves until
unadjusted analysis for identifying or flagging potential duration, extrapolation beyond several weeks or months of
observed durations of exposure exposure have occurred. If
imbalances between treatment arms and as such could be is discouraged extreme values are not
used in the detection of ADR candidates. However, observed at relevant doses in
selection of the most appropriate methods must consider short trials, proceed with
the size of the clinical trial database and computational caution, acknowledging that
they could occur after longer
requirements. durations of exposure
Multiplicity adjustment provides The ability for ADR detection is
a useful tool to improve the highly influenced by ADR
Recommendation Rationale positive predictive value in frequency in the source dataset.
signal detection in clinical trial Thus, database size and event
If prior knowledge suggests data Extreme-value modelling has data. The use of multiplicity reporting frequency must be
from a particular organ system been demonstrated, in various adjustment needs to be taken into consideration when
should be monitored, consider examples, to provide useful evaluated against the size of the the use of multiplicity
extreme value modelling on predictions of drug toxicities available clinical trial database adjustments for ADR candidate
data arising from each trial for from early-phase data. If it is selection is considered [75]
the compound of interest. For possible to pre-specify the The use of Bayesian Hierarchical Bayesian Hierarchical Models
example, if preclinical data modelling and prediction Models can improve the provided the best performance
suggested a potential liver exercise, the results have efficiency of signal detection with regard to positive
issue, prepare to model ALT; if greater credibility than if they through borrowing of strength predictive value, specificity,
another compound in the class are data driven, and resources from other relevant events in sensitivity and negative
showed kidney signals, prepare can be allocated up front to the clinical trial dataset. This predictive value, mainly owing
to model creatinine ensure the work is done to must be weighed against the to their ability to ‘‘borrow
appropriate deadlines [74, 76] more complex computational strength’’ across similar terms
Some analyses will be data Not all potential safety issues are requirements of Bayesian [75]
driven, suggested by observed known in advance, so some modelling
extremes in the data. These analyses are necessarily data The use of more specific The use of narrow-term
could also be subjected to driven. It is inappropriate to MedDRAÒ groupings can groupings for analysis provided
extreme value modelling and consider such analysis further improve signal slightly better results for signal
the statistical evidence thus illegitimate or to yield detection in clinical trial data detection compared with the
acquired interpreted in context unreliable results provided they analysis based on MedDRAÒ
are interpreted in context. PTs alone [75]
Statistical inference is only one
part of the larger process of
scientific inference [74]
Extreme value modelling can Experience suggests that phase I
3 Discussion
commence as early as phase I; data may be sufficient for
however, in most cases, phase extreme value modelling to These recommendations are presented to highlight the
II data need to be available for identify toxicity, but that outcomes of the research conducted under the auspices of
reliable inferences to be made sometimes the sample sizes are
IMI PROTECT Work Package 3 and in such a way that
too small. Modelling and
prediction have the most value pharmacovigilance professionals, particularly those with an
to add when the volume of interest in research and methods development can readily
available data is low, so such adopt appropriate potential improvements in their quanti-
exercises should be
tative signal detection practices. They should not be con-
commenced as soon as possible
[74] sidered a comprehensive treatise on the subject of
quantitative signal detection but should be considered by
readers within the context of the entire body of signal already been tested and implemented into signal detection
detection research and guidance documents that exist on processes at the EMA and the UMC.
good signal detection practice [7, 77] outside of IMI In a broad study across spontaneous report databases,
PROTECT. For example, PROTECT’s focus on quantita- PROTECT showed that the choice of signal detection
tive signal detection methods means that recommendations algorithm (e.g. threshold on the number of reports, the
relating to the relative merits of quantitative vs. qualitative threshold on the disproportionality statistic, and/or statis-
signal detection methods cannot be made and pharma- tical significance) was much more important than the
covigilance organisations are likely to have to continue choice of disproportionality statistic itself [35]. Perfor-
with both approaches to maintain an optimal signal mance of any single algorithm might be very different
detection capability. Although a broad cross-section of between one spontaneous report database and another but
databases [78] and data sources were employed in the work the relative performance of two algorithms was generally
package, the generalisability of recommendations to other similar across different databases. In a related study
databases and data sources should be considered carefully across almost the same range of databases, the use of
before implementing them, particularly in databases of stratification and subgroup analysis in disproportionality
adverse event reports smaller than those used in this analysis was explored [45]. An earlier study by Caster
project. et al. [37] showed that the use of subgroup analyses and
An important strength of PROTECT’s signal detection stratification both out-performed crude disproportionality
research was the execution of standardised analysis pro- analysis although the relative contributions of each
tocols across multiple spontaneous reporting datasets [35, approach were not determined. In the PROTECT study, it
45]. Several of these studies compared the databases of was shown that subgroup analyses tended to be more
pharmaceutical companies, national regulatory authorities, beneficial over stratified analyses across all datasets
and international organisations, such as the EMAgency and studied. Subgroup analyses also provided clear benefits
the WHO. over crude analyses in some datasets whereas stratified
PROTECT found no overall benefit in conducting signal analyses did not increase sensitivity or precision beyond
detection using MedDRAÒ HLTs or SMQs compared with random variation. This unexpected finding that has not
using individual PTs [9]. Some relatively minor gain in been reported elsewhere has important implications
time to signalling was seen when closely related (in a because a number of organisations routinely use stratifi-
clinical sense) ADR terms were grouped together, an area cation in their analyses, but very few routinely employ
for potential future research. This is compatible with an subgrouping. Other previous studies [36, 38] have
earlier study of the US FDA Adverse Event Reporting observed modest improvements for stratified analyses
System, which found that analysis at the level of HLTs or consistent with the results from the stratified analyses in
SMQs decreased sensitivity but increased specificity. Its the PROTECT study and therefore these findings from the
reference dataset were drug-event combinations with some other studies may also be artefacts from the stratification
degree of support in the literature that were not on the process rather than a true effect.
original drug label, but it did not consider timeliness [8]. PROTECT explored the impact of masking on statistical
PROTECT showed that knowledge engineering techniques signal detection in spontaneous report databases [49, 50].
can be used to derive novel groupings of adverse event Under the conditions of the study, it was rare for masking
terms dynamically based on the relevant semantic dimen- to affect whether a drug-event pair was considered to be
sions for a specific topic of interest, although no net disproportionally reported or not; however, the drug-event
improvement in signal detection performance was seen in pairs that were affected in this way primarily involved
these studies [16, 17]. It is interesting to speculate whether rarely reported ADRs. Furthermore, the study only con-
it may be possible to go beyond this restricted example and sidered removal of single masking drugs from the calcu-
generate alternatives to the standard groupings available in lation and in some cases, multiple masking drugs may be
existing adverse event terminologies, such as MedDRAÒ present [51].
HLTs or SMQs. For the detection of adverse DDIs, PROTECT showed
One of the tangible outcomes of PROTECT work that statistical interaction measures with additive baseline
package 3 is the structured database of MedDRAÒ coded models outperform those with multiplicative baseline
ADR information taken from section 4.8 of the SPC for all models [55], typically available in standard statistical
products authorised in Europe through the centrally software. This finding was true for both established and
authorised procedure [25]. This database is publically emerging DDIs. Notably, for emerging adverse DDIs, the
available on the EMA website and is maintained. It reduces statistical interaction measures with multiplicative baseline
the need for manual inspection of SPCs when the focus of models that might be the easiest to implement performed
safety monitoring is detection of new risks. Its use has worse than would expected by chance.
Many organisations rely on rule-based methods for the highlighted statistical associations is crucial to attain an
detection of duplicate individual case safety reports; how- acceptable false-positive rate [72]. Conversely, a retro-
ever, PROTECT showed that probabilistic record matching spective evaluation did not detect any of about 500 his-
performed better than rule-based screening [57] and should torical safety signals in THIN, prior to the initial signal at
be considered as a viable alternative. Specifically, proba- the EMA. In many cases, this was because of the drug not
bilistic record matching demonstrated a high predictive being reliably captured in primary care data, and on a few
value above that of rule-based screening, and is expected to occasions to the drugs not having yet been marketed in the
improve efficiency and accuracy of duplicate management. UK. In contrast, some of the ADRs could be detected in
This has important resource and quality implications for VigiBase, even when the analysis was restricted to spon-
organisations where large volumes of case reports are taneous reports from the UK. This shows that compre-
exchanged on a routine basis. hensive surveillance for early safety signals requires broad
A comparison between estimates of association from population coverage as well as effective ascertainment of a
formal epidemiological studies and proportional reporting wide spectrum of newly marketed drugs and adverse
ratios in spontaneous reporting data for a set of known events. Concurrent research has found that even networks
ADRs found a correlation, at a point in time before the of longitudinal observational databases can be under-
ADR was first publicly recognised [59]. This study sug- powered for rare adverse reactions, whereas common
gests that it may be possible to use the proportional adverse reactions should be possible to detect for com-
reporting ratio at the early phase of the analysis of a new monly used drugs [85]. There is an increasing number of
safety signal as an indicator of the likely strength of the observational databases available throughout the world for
association, should the signal be confirmed. Acknowledged potential pharmacoepidemiology and signal detection work
limitations exist in the current evidence base for this with each having widely varying characteristics, data
association and have been discussed in an earlier publica- structure and data quality concerns. It would be necessary
tion [60]. to repeat the PROTECT analyses in these databases to
At present, signal detection is predominantly based on determine to what extent these findings are generalisable.
spontaneous reports, but the use of longitudinal electronic Before approval of a drug, information on adverse
health data in pharmacovigilance is an area of active events from clinical trials constitutes the primary basis for
research. Most studies to date have focussed on the sta- safety analysis and signal detection. PROTECT explored
tistical evaluation of well-established ADRs [79, 80] but two statistical approaches to enhancing signal detection in
not sought to define processes for effective identification of clinical trials. One study explored the utility of extreme
emerging safety signals in longitudinal health data. Simi- value modelling in early clinical studies as the basis for
larly, comparisons between individual case reports and predicting drug toxicity in the subsequent phases of clinical
longitudinal health data for signal detection have focussed development and evaluation [74]. A retrospective analysis
on established and not emerging ADRs. Broad studies such showed that extreme value analysis of phase II data would
as those performed by observational medical outcomes have highlighted the risk for liver toxicity for a compound
partnership and EU-ADR have explored the merits of dif- eventually withdrawn from the market on account of this
ferent epidemiological designs when applied automatically risk. A second study evaluating different approaches to
across broad ranges of drugs and outcomes [80–83] but adjust for multiplicity found that Bayesian Hierarchical
such studies have been primarily retrospective in nature, Models can improve signal detection performance through
and there is also a lack of studies to determine the relative borrowing strength from related adverse events in the
merits of exploratory screening vs. customised confirma- clinical trial dataset [75].
tory analyses of longitudinal health data. Some research
has proposed signalling showing outputs requiring high-
lighting in both spontaneous reports and observational data 4 Conclusions
[84] but without attempting to assess the relative value of
the two data sources. Over a period of 5 years, IMI PROTECT has addressed
PROTECT performed research on statistical signal key research questions relevant to the science of safety
detection in the THIN database of longitudinal electronic signal detection. The resultant recommendations point to
health records from general practices in the UK. A process pragmatic steps that those working in the pharmacovigi-
for structured clinical and epidemiological assessment of lance community can take to improve signal detection
temporally associated prescriptions and events in electronic practices, whether in a national or international agency or a
health records was developed and evaluated. It showed that pharmaceutical company setting. PROTECT has also
important potential safety signals can be identified in these pointed to areas of potentially fruitful future research and
data, whereas clinical and epidemiological review of some areas where further effort is likely to yield less.
Acknowledgments The views expressed in this paper are those of Open Access This article is distributed under the terms of the
the authors only and not of their respective institution or company. Creative Commons Attribution-NonCommercial 4.0 International
The following persons contributed to research within the various work License (http://creativecommons.org/licenses/by-nc/4.0/), which per-
packages that form the basis for these recommendations. All con- mits any noncommercial use, distribution, and reproduction in any
tributors were invited to review the final draft manuscript of this medium, provided you give appropriate credit to the original
article prior to submission. author(s) and the source, provide a link to the Creative Commons
Christine Ahlers3; David Ansell4; Ramin Arani2; Alex Asi- license, and indicate if changes were made.
imwe10 and formerly 2; Andrew Bate15; Dorthe Bech Fink15; Tomas
Bergvall18; Fatima Bhayat17 and formerly 2; Cedric Bousquet9; Gunnar
Brobert3; Andreas Brueckner14 and formerly 3; Gianmario Candore6;
Benedicte Cappelli6; Ola Caster18; Susanna Cederholm18; Gunnar
References
Declerck9; Susan Duke8; Marie Dupuch9; Ralph Edwards18; Johan
Ellenius18; Cristina Fernandez1; Natalia Grabar9; Vincent Guy 1. Patwary KM. Report on statistical aspects of the pilot research
Bauchau8; Manfred Hauben15; Elke Heer-Klopottek3; Richard Hill18; project for international drug monitoring. Geneva: World Health
Johan Hopstadius18; Marie-Christine Jaulent9; Kristina Juhlin18; Organisation; 1969.
Ghazaleh Karimi18; Michael Kayser3; Mona Vestergaard Laursen15; 2. Mandel SP, Levine A, Beleno GE. Signalling increases in
Edurne Lazaro1; Magnus Lerch3; Hanna Lindroos18; Miguel Macia- reporting in international monitoring of adverse reactions to
Martinez1; Francois Maignen6; Anngret Mallick3; Katrin Manlik3; therapeutic drugs. Methods Inf Med. 1976;15:1–10.
G.Niklas Norén18; Nils Opitz3; Klas Östlund18; Jeffrey Painter8; 3. Inman WH. Monitoring of adverse reactions to drugs in the
Antoine Pariente9; Vlasta Pinkston8; Jutta Pospisil3; Annette Prelle3; United Kingdom. Proc R Soc Med. 1970;63:1302–4.
Naashika Quarcoo8; Gilly Roberts8; Marietta Rottenkolber11; Anette 4. Napke E, Bishop J. The Canadian drug adverse reaction reporting
Sahlin18; Lovisa Sandberg18; Ruth Savage18; Suzie Seabroke13; Jim program. Can Med Assoc J. 1966;95:1307.
Slattery6; Daniel Soeria-Atmadja18; Maria Montserrat Soriano 5. Finney DJ. Statisical logic in the monitoring of reactions to
Gabarro3; Harry Southworth2; Julien Souvignet9; Kristina Star18; therapeutic drugs. Methods Inf Med. 1971;10:237–45.
Johanna Strandell18; Torbjörn Sund18; Bharat Thakrar16; Mary 6. Hauben M, Noren GN. Editorial: a decade of data mining and still
Thompson4; Bruno Tran17; Phil Tregunno13; Béatrice Trombert9; counting. Drug Saf. 2010;33:527–34.
Lionel Van Holle8; Harald Vangerow10; Taner Vardar12; Sarah 7. CIOMS Working Group VIII. Practical aspects of signal detec-
Watson18; Arnold Willemsen7; Antoni Wisniewski2; Jenny Wong13. tion in pharmacovigilance. 2010.
1. AEMPS, Agencia Española de Medicamentos y Productos 8. Pearson RK, Hauben M, Goldsmith DI, Gould AL, Madigan D,
Sanitarios; 2. AZ, AstraZeneca; 3. Bayer, Bayer HealthCare AG; 4. O’Hara DJ, Reisinger SJ, Hochberg AM. Influence of the Med-
Cegedim, Cegedim (prev. EPIC); 5. DHMA, Danish Health and DRA hierarchy on pharmacovigilance data mining results. Int J
Medicines Authority; 6. EMA, European Medicines Agency; 7. Med Inform. 2009;78:e97–103.
Genzyme, Genzyme Europe B.V.; 8. GSK, GlaxoSmithKline; 9. 9. Hill R, Hopstadius J, Lerch M, Noren GN. An attempt to expedite
INSERM, Institut National de la Santé et de la Recherche Médicale; signal detection by grouping related adverse reaction terms. Drug
10. Lilly, Eli-Lilly and Company; 11. LUM, Ludwig-Maximilians- Saf. 2012;35:1194–5.
Universität-München; 12. Merck, Merck KGaA; 13. MHRA, 10. Trifiro G, Pariente A, Coloma PM, Kors JA, Polimeni G, Mire-
Medicines and Healthcare Products Regulatory Agency; 14. Novartis, mont-Salame G, Catania MA, Salvo F, David A, Moore N,
Novartis AG; 15. Pfizer, Pfizer Inc.; 16. Roche, F. Hoffmann-La Caputi AP, Sturkenboom M, Molokhia M, Hippisley-Cox J,
Roche Ltd; 17. Takeda; 18. UMC, Uppsala Monitoring Centre Acedo CD, van der Lei J, Fourrier-Reglat A, EU-ADR group.
Data mining on electronic health record databases for signal
Compliance with Ethical Standards detection in pharmacovigilance: which events to monitor? Phar-
macoepidemiol Drug Saf. 2009;18:1176–84.
Sources of Funding The PROTECT project has received support 11. Alvarez Y, Hidalgo A, Maignen F, Slattery J. Validation of sta-
from the Innovative Medicine Initiative Joint Undertaking (http:// tistical signal detection procedures in eudra vigilance post-au-
www.imi.europa.eu) under Grant Agreement No. 115004, resources thorization data: a retrospective evaluation of the potential for
of which are composed of financial contribution from the European earlier signalling. Drug Saf. 2010;33:475–87.
Union’s Seventh Framework Programme (FP7/2007–2013) and 12. Noren GN, Hopstadius J, Bate A. Shrinkage observed-to-ex-
European Federation of Pharmaceutical Industries and Associations pected ratios for robust and transparent large-scale pattern dis-
companies’ in kind contribution. covery. Stat Methods Med Res. 2013;22:57–69.
13. Trombert-Paviot B, Rodrigues JM, Rogers JE, Baud R, Van Der
Conflicts of Interest The following authors have declared no Haring E, Rassinoux AM, Abrial V, Clavel L, Idir H. GALEN: a
potential conflicts of interest: Cedric Bousquet, Gianmario Candore, third generation terminology tool to support a multipurpose
Kristina Juhlin, Miguel Macia-Martinez, Suzie Seabroke, Jim Slat- national coding system for surgical procedures. Int J Med Inform.
tery, Phil Tregunno, G. Niklas Norén. Antoni Wisniewski is an 2000;58–59:71–85.
employee and shareholder of AstraZeneca; Andrew Bate is an 14. Tudorache T, Falconer S, Nyulas C, Storey MA, Ustun TB,
employee and shareholder of Pfizer Inc.; Andreas Brueckner is an Musen MA. Supporting the Collaborative Authoring of ICD-11
employee of Novartis and formerly of Bayer Pharma AG; Katrin with WebProtege. In: AMIA Annual Symposium Proceedings;
Manlik is an employee of Bayer Pharma AG; Naashika Quarcoo and 2010. p. 802–806.
Lionel Van Holle are both employees and shareholders of 15. Bousquet C, Lagier G, Lillo-Le Louet A, Le Beller C, Venot A,
GlaxoSimthKline; Bharat Thakrar is an employee and shareholder of Jaulent M. Appraisal of the MedDRA conceptual structure for
Roche; Michael Kayser is an employee, a shareholder and is listed as describing and grouping adverse drug reactions. Drug Saf.
an inventor in patents of Bayer Pharma AG; Harry Southworth was an 2005;28:19–34.
employee of AstraZeneca for some of the time working on PROTECT 16. Souvignet J, Declerck G, Jaulent M, Bousquet C. Evaluation of
and has since worked as a consultant to the pharmaceutical sector. automated term groupings for detecting upper gastrointestinal
bleeding signals for drugs. Drug Saf. 2012;35:1195–6.
17. Souvignet J, Declerck G, Trombert B, Rodrigues JM, Jaulent MC, detection methods within and across spontaneous reporting
Bousquet C. Evaluation of automated term groupings for databases. Drug Saf. 2015;38:577–87.
detecting anaphylactic shock signals for drugs. In: AMIA Annual 36. Hopstadius J, Noren GN, Bate A, Edwards IR. Impact of strati-
Symposium Proceedings; 2012. p. 882–890. fication on adverse drug reaction surveillance. Drug Saf.
18. Bousquet C, Sadou E, Souvignet J, Jaulent MC, Declerck G. 2008;31:1035–48.
Formalizing MedDRA to support semantic reasoning on adverse 37. Caster O, Juhlin K, Watson S, Norén GN. Improved statistical
drug reaction terms. J Biomed Inform. 2014;49:282–91. signal detection in pharmacovigilance by combining multiple
19. Declerck G, Bousquet C, Jaulent MC. Automatic generation of strength-of-evidence aspects in vigiRank: retrospective evalua-
MedDRA terms groupings using an ontology. Stud Health tion against emerging safety signals. Drug Saf. 2014;37:617–28.
Technol Inform. 2012;180:73–7. 38. Van Holle L, Bauchau V. Optimization of a quantitative signal
20. European Union. Notice to applicants A guideline on summary of detection algorithm for spontaneous reports of adverse events
product characteristics (SmPC) revision 2. 2009. http://ec.europa.eu/ post immunization. Pharmacoepidemiol Drug Saf. 2013;22:
health/files/eudralex/vol-2/c/smpc_guideline_rev2_en.pdf. Acces- 477–87.
sed 15 Aug 2015. 39. Hopstadius J, Norén GN. Robust discovery of local patterns:
21. Bergvall T, Dahlberg G, Cappelli B, Norén G. Fuzzy text subsets and stratification in adverse drug reaction surveillance. In:
matching to identify known adverse drug reactions. Pharma- Proceedings of the 2nd ACM SIGHIT international health
coepidemol Drug Saf. 2011;20(S1):S143. doi:10.1002/pds.2206. informatics symposium; 2012. p. 265–274.
22. Hauben M, Reich L, Chung S. Postmarketing surveillance of 40. Woo EJ, Ball R, Burwen DR, Braun MM. Effects of stratification
potentially fatal reactions to oncology drugs: potential utility of on data mining in the US Vaccine Adverse Event Reporting
two signal-detection algorithms. Eur J Clin Pharmacol. System (VAERS). Drug Saf. 2008;31:667–74.
2004;60:747–50. 41. Zeinoun Z, Seifert H, Verstraeten T. Quantitative signal detection
23. European Medicines Agency. Guideline on the use of statistical for vaccines: effects of stratification, background and masking on
signal detection methods in the Eudravigilance data analysis GlaxoSmithKline’s spontaneous reports database. Hum Vaccines.
system. http://www.ema.europa.eu/docs/en_GB/document_library/ 2009;5:599–607.
Regulatory_and_procedural_guideline/2009/11/WC500011437.pdf. 42. Almenoff JS, LaCroix KK, Yuen NA, Fram D, DuMouchel W.
Accessed 31 Jan 2016. Comparative performance of two quantitative safety signalling
24. Boyce RD, Ryan PB, Noren GN, Schuemie MJ, Reich C, Duke J, methods: implications for use in a pharmacovigilance depart-
Tatonetti NP, Trifiro G, Harpaz R, Overhage JM, Hartzema AG, ment. Drug Saf. 2006;29:875–87.
Khayter M, Voss EA, Lambert CG, Huser V, Dumontier M. 43. Grundmark B, Holmberg L, Garmo H, Zethelius B. Reducing the
Bridging islands of information to establish an integrated noise in signal detection of adverse drug reactions by standard-
knowledge base of drugs and health outcomes of interest. Drug izing the background: a pilot study on analyses of proportional
Saf. 2014;37:557–67. reporting ratios-by-therapeutic area. Eur J Clin Pharmacol.
25. IMI PROTECT ADR Database. http://www.imi-protect.eu/ 2014;70:627–35.
methodsRep.shtml. Accessed 17 Mar 2014. 44. De Bie S, Verhamme KMC, Straus SMJM, Stricker BHC,
26. EMA. Database structure. EMA. 2012. http://www.imi-protect. Sturkenboom MCJM. Vaccine-based subgroup analysis in Vigi-
eu/documents/Databasestructure.pdf?_sm_au_=i2VfNqLvSJjQs base: effect on sensitivity in paediatric signal detection. Drug Saf.
qsH. Accessed 18 Jan 2015. 2012;35:335–46.
27. EMA EudraLex. Volume 2: pharmaceutical legislation notice to 45. Seabroke S, Candore G, Juhlin K, Quarcoo N, Wisniewski A,
applicants and regulatory guidelines medicinal products for Arani R, Painter J, Tregunno P, Norén GN, Slattery J. Perfor-
human use. European Commission. http://ec.europa.eu/health/ mance of stratified and subgrouped disproportionality analyses in
documents/eudralex/vol-2/index_en.htm. Accessed 18 Jan 2015. spontaneous databases. Drug Saf. 2016. doi:10.1007/s40264-015-
28. Slattery J, Cappelli B, Bergvall T, Opitz N, Kurz X. A structured 0388-3.
database of adverse drug reaction based on information from the 46. Pariente A, Didailler M, Avillach P, Miremont-Salame G, Four-
summary of product characteristics. Drug Saf. 2013;35:1192–3. rier-Reglat A, Haramburu F, Moore N. A potential competition
29. Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, bias in the detection of safety signals from spontaneous reporting
De Freitas RM. A Bayesian neural network method for adverse databases. Pharmacoepidemiol Drug Saf. 2010;19:1166–71.
drug reaction signal generation. Eur J Clin Pharmacol. 47. Pariente A, Avillach P, Salvo F, Thiessard F, Miremont-Salam G,
1998;54:315–21. Fourrier-Reglat A, Haramburu F, Bgaud B, Moore N. Effect of
30. DuMouchel W. Bayesian data mining in large frequency tables, competition bias in safety signal generation: analysis of a
with an application to the FDA spontaneous reporting system. research database of spontaneous reports in France. Drug Saf.
Am Stat. 1999;53:177–90. 2012;35:855–64.
31. Evans SJW, Waller PC, Davis S. Use of proportional reporting 48. Juhlin K, Ye X, Star K, Noren GN. Outlier removal to uncover
ratios (PRRs) for signal generation from spontaneous adverse drug patterns in adverse drug reaction surveillance: a simple
reaction reports. Pharmacoepidemiol Drug Saf. 2001;10:483–6. unmasking strategy. Pharmacoepidemiol Drug Saf. 2013;22:
32. Bate A, Evans SJW. Quantitative signal detection using sponta- 1119–29.
neous ADR reporting. Pharmacoepidemiol Drug Saf. 49. Maignen F, Hauben M, Hung E, Holle LV, Dogne J-. A
2009;18:427–36. conceptual approach to the masking effect of measures of
33. Van Puijenbroek EP, Bate A, Leufkens HGM, Lindquist M, Orre disproportionality. Pharmacoepidemiol Drug Saf. 2014;23:
R, Egberts ACG. A comparison of measures of disproportionality 208–17.
for signal detection is spontaneous reporting systems for adverse 50. Maignen F, Hauben M, Hung E, Van Holle L, Dogne J. Assessing
drug reactions. Pharmacoepidemiol Drug Saf. 2002;11:3–10. the extent and impact of the masking effect of disproportionality
34. Gipson G. A shrinkage-based comparative assessment of analyses on two spontaneous reporting systems databases. Phar-
observed-to-expected disproportionality measures. Pharmacoepi- macoepidemiol Drug Saf. 2014;23:195–207.
demiol Drug Saf. 2012;21:589–96. 51. Wang H, Hochberg AM, Pearson RK, Hauben M. An experi-
35. Candore G, Juhlin K, Manlik K, Thakrar B, Quarcoo N, Seabroke mental investigation of masking in the US FDA adverse event
S, Wisniewski A, Slattery J. Comparison of statistical signal reporting system database. Drug Saf. 2010;33:1117–33.
52. Noren GN, Sundberg R, Bate A, Edwards IR. A statistical 72. Cederholm S, Hill G, Asiimwe A, Bate A, Bhayat F, Persson
methodology for drug-drug interaction surveillance. Stat Med. Brobert G, Bergvall T, Ansell D, Star K, Norén GN. Structured
2008;27:3057–70. assessment for prospective identification of safety signals in
53. Thakrar BT, Grundschober SB, Doessegger L. Detecting signals electronic medical records: evaluation in the health improvement
of drug-drug interactions in a spontaneous reports database. Br J network. Drug Saf. 2015;38:87–100.
Clin Pharmacol. 2007;64:489–95. 73. Trifiro G, Patadia V, Schuemie MJ, Coloma PM, Gini R, Herings
54. Strandell J, Caster O, Hopstadius J, Edwards IR, Noren GN. The R, Hippisley-Cox J, Mazzaglia G, Giaquinto C, Scotti L, Peder-
development and evaluation of triage algorithms for early dis- sen L, Avillach P, Sturkenboom MC, van der Lei J. EU-ADR
covery of adverse drug interactions. Drug Saf. 2013;36:371–88. healthcare database network vs. spontaneous reporting system
55. Soeria-Atmadja D, Juhlin K, Thakrar B, Norén G. Evaluation of database: preliminary comparison of signal detection. Stud
statistical measures for adverse drug interaction surveillance. Health Technol Inform. 2011;166:25–30.
Pharmacoepidemiol Drug Saf. 2014;23(S1):294–5. 74. Southworth H. Predicting potential liver toxicity from phase 2 data:
56. Hauben M, Reich L, DeMicco J, Kim K. ‘Extreme duplication’ in a case study with ximelagatran. Stat Med. 2014;33:2914–23.
the US FDA adverse events reporting system database. Drug Saf. 75. Ahlers C, Brueckner A, Mallick A, Opitz N, Pinkston V, Tran B,
2007;30:551–4. Scott J, Southworth H, Van Holle L, Wallis N. Statistical signal
57. Tregunno PM, Fink DB, Fernandez-Fernandez C, Lazaro-Bengoa E, detection in Clinical Trial data. http://www.imi-protect.eu/
Noren GN. Performance of probabilistic method to detect duplicate documents/PROTECTsymposium_signaldetectioninClinicaltrials_
individual case safety reports. Drug Saf. 2014;37:249–58. talk.pdf. Accessed 12 Dec 2015.
58. Noren GN, Orre R, Bate A, Edwards IR. Duplicate detection in 76. Southworth H, Heffernan JE. Multivariate extreme value mod-
adverse drug reaction surveillance. Data Min Knowl Discov. elling of laboratory safety data from clinical studies. Pharm Stat.
2007;14:305–28. 2012;11:367–72.
59. Maciá-Martı́nez M, de Abajo FJ, Roberts G, Slattery J, Thakrar 77. European Medicines Agency (EMA) Guideline on good phar-
B, Wisniewski AFZ. An empirical approach to explore the rela- macovigilance practices (GVP) Module IX—signal management.
tionship between measures of disproportionate reporting and 2012.
relative risks from analytical studies. Drug Saf. 2015. doi:10. 78. Wisniewski AFZ, Juhlin K, Laursen M, Macia MM, Manlik K,
1007/s40264-015-0351-3. Pinkston VK, Seabroke S, Slattery J. Characterisation of data-
60. Evans SJW. What Is the plural of a ‘yellow’ anecdote? Drug Saf. bases (DBS) used for signal detection (SD): results of a survey of
2015. doi:10.1007/s40264-015-0368-7. imi protect work package (WP) 3 participants. Pharmacoepi-
61. Norén GN, Hopstadius J, Bate A, Star K, Edwards IR. Temporal demiol Drug Saf. 2012;21:233–4.
pattern discovery in longitudinal electronic patient records. Data 79. Ryan PB, Madigan D, Stang PE, Marc-Overhage J, Racoosin JA,
Min Knowl Discov. 2010;20:361–87. doi:10.1007/s10618-009- Hartzema AG. Empirical assessment of methods for risk identi-
0152-3. fication in healthcare data: results from the experiments of the
62. Choi N, Chang Y, Choi YK, Hahn S, Park B. Signal detection of Observational Medical Outcomes Partnership. Stat Med.
rosuvastatin compared to other statins: data-mining study using 2012;31:4401–15.
national health insurance claims database. Pharmacoepidemiol 80. Coloma PM, Schuemie MJ, Trifiro G, Gini R, Herings R, Hip-
Drug Saf. 2010;19:238–46. pisley-Cox J, Mazzaglia G, Giaquinto C, Corrao G, Pedersen L,
63. Dore DD, Seeger JD, Chan KA. Use of a claims-based active Van Der Lei J, Sturkenboom M. Combining electronic healthcare
drug safety surveillance system to assess the risk of acute pan- databases in Europe to allow for large-scale drug safety moni-
creatitis with exenatide or sitagliptin compared to metformin or toring: the EU-ADR Project. Pharmacoepidemiol Drug Saf.
glyburide. Curr Med Res Opin. 2009;25:1019–27. 1011;20:1–11.
64. Curtis JR, Cheng H, Delzell E, Fram D, Kilgore M, Saag K, Yun 81. Stang PE, Ryan PB, Racoosin JA, Overhage JM, Hartzema AG,
H, Dumouchel W. Adaptation of bayesian data mining algorithms Reich C, Welebob E, Scarnecchia T, Woodcock J. Advancing the
to longitudinal claims data: Coxib safety as an example. Med science for active surveillance: rationale and design for the
Care. 2008;46:969–75. observational medical outcomes partnership. Ann Intern Med.
65. Jin HW, Chen J, He H, Williams GJ, Kelman C, O’Keefe CM. 2010;153:600–6.
Mining unexpected temporal associations: applications in 82. Schuemie MJ, Gini R, Coloma PM, Straatman H, Herings RMC,
detecting adverse drug reactions. IEEE Trans Inf Technol Pedersen L, Innocenti F, Mazzaglia G, Picelli G, Van Der Lei J,
Biomed. 2008;12:488–500. Sturkenboom MCJM. Replication of the OMOP experiment in
66. Noren GN, Hopstadius J, Bate A, Edwards IR. Safety surveil- europe: evaluating methods for risk identification in electronic
lance of longitudinal databases: methodological considerations. health record databases. Drug Saf. 2013;36(S159):S169.
Pharmacoepidemiol Drug Saf. 2011;20:714–7. 83. Schuemie MJ, Coloma PM, Straatman H, Herings RMC, Trifiro
67. Schuemie MJ. Methods for drug safety signal detection in lon- G, Matthews JN, Prieto-Merino D, Molokhia M, Pedersen L, Gini
gitudinal observational databases: LGPS and LEOPARD. Phar- R, Innocenti F, Mazzaglia G, Picelli G, Scotti L, Van Der Lei J,
macoepidemiol Drug Saf. 2011;20:292–9. Sturkenboom MCJM. Using electronic health care records for
68. Walker AM. Signal detection for vaccine side effects that have drug safety signal detection: a comparative evaluation of statis-
not been specified in advance. Pharmacoepidemiol Drug Saf. tical methods. Med Care. 2012;50:890–7.
2010;19:311–7. 84. Harpaz R, Vilar S, DuMouchel W, Salmasian H, Haerian K, Shah
69. Moore TJ, Furberg CD. Electronic health data for postmarket NH, Chase HS, Friedman C. Combing signals from spontaneous
surveillance: a vision not realized. Drug Saf. 2015;38:601–10. reports and electronic health records for detection of adverse drug
70. Bate A, Brown EG, Goldman SA, Hauben M. Terminological reactions. J Am Med Inform Assoc. 2013;20:413–9.
challenges in safety surveillance. Drug Saf. 2012;35:79–84. 85. Coloma PM, Trifirò G, Schuemie MJ, Gini R, Herings R, Hip-
71. Edwards I, Karimi G, Bergvall T, Caster O, Asiimwe A, Star K, pisley-Cox J, Mazzaglia G, Picelli G, Corrao G, Pedersen L, van
Bhayat F, Soriano-Gabarro M, Thakrar B, Thompson M, Bate A, der Lei J, Sturkenboom M. Electronic healthcare databases for
Norén G. Risk identification in healthcare records: comparison to active drug safety surveillance: is there enough leverage? Phar-
epidemiological studies. Pharmacoepidemiol Drug Saf. macoepidemiol Drug Saf. 2012;21:611–21.
2013;22(S1):311–2.

Good Signal Detection Practices Evidence From Imi Protect

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Good Signal Detection Practices Evidence From Imi Protect

Încărcat de

Drepturi de autor:

Formate disponibile

Drug Saf (2016) 39:469–490

Good Signal Detection Practices: Evidence from IMI PROTECT

Gianmario Candore6 • Kristina Juhlin7 • Miguel A. Macia-Martinez8 •

Published online: 7 March 2016

individual MedDRAÒ PTs, separately, as is common Recommendation Rationale

The concordance between groupings derived with Recommendation Rationale

Recommendation Rationale Recommendation Rationale

Other published studies have suggested some benefits of Recommendation Rationale

Recommendation Rationale Recommendation Rationale

reporting data sets: EudraVigilance and Pfizer’s sponta- continued

duplicate detection and it was considered that approaches Recommendation Rationale

2.9 Relationship Between Disproportionality

well-established evidence supporting the respective reg- Recommendation Rationale

The performance evaluation of quantitative signal Recommendation Rationale

signal detection methods, especially with regard to time to Recommendation Rationale

S-ar putea să vă placă și