Sunteți pe pagina 1din 18

Toate pasajele albastre din text sunt linkuri pe care se poate da click (și, Doamne-ferește, citi și în plus)

Odată cu creșterea semnificativă a resurselor financiare guvernamentale alocate în lume pentru cercetare
științifică după cel de-al Doilea Război Mondial, creșterea exponențială a numărului de publicații științifice a
a început să se facă simțită rapid, tendință care s-a accelerat odată cu finalul anilor ’60. Deși maniera în
care rezultatele din multiple studii științifice ar putea fi combinate pentru o înțelegere mai complexă a
unui fenomen studiat fusese discutată încă de la începutul secolului 20, atunci când Karl Pearson publică
pentru prima oară o încercare de rezumare a unui set de studii medicale, nevoia dictată de piață pentru
modelări matematice complexe ale unor rezultate disparate din mai multe studii publicate independent
avea să se facă simțită abia după Al Doilea Război Mondial.

Dacă până în anii ’70, jurnalele de psihiatrie și psihologie clinică publicau destul de puține rezultate încât
acestea să poată fi evaluate individual de către cercetătorii interesați să ia parte la o dezbatere (vezi
citările din Rosenhan și Spitzer, seminarul 2 – deși studiile de psihologie clinică fuseseră deja criticate de
Jacob Cohen ca având o putere statistică deja inadecvată pentru rezultatele pe care le raportau), în 1976
Gene Glass, un statistician american introduce termenul de meta-analiză pentru a descrie modelarea
matematică sistematică a unor rezultate disparate publicate într-o anumită arie de interes. Devenită
rapid foarte utilizată, din cauza puterii epistemologice evidente, meta-analiza a ridicat rapid o serie de
probleme dificile diverșilor cercetători interesați de o asemenea procedură, probleme legate de
eterogenitatea adesea foarte ridicată a rezultatelor raportate (dată de metodologiile și instrumentele
semnificativ diferite folosite de cercetătorii care publicau diverse studii), bias-ul publicării, dat de faptul
că jurnalele științifice tind să fie interesate numai de rezultatele confirmatorii, în timp ce rezultatele
care confirmă ipoteza de nul rămân adesea încuiate în sertarele cercetătorilor, calitatea variabilă a
studiilor care explorează un subiect de interes care introduce adeseori așa-numitul risc “garbage in,
garbage out” (faptul că sumarizarea unor studii deficitare metodologic nu poate conduce decât la un
rezultat fals).

Luând în calcul complexitatea unor modelări matematice care stau la baza unei meta-analize, Pim Cuijpers
și Ioana Cristea fac, în acest capitol din volumul Evidence-Based Practice in Action: Bridging Clinical Science
and Intervention, o scurtă introducere în maniera în care se consumă și se înțeleg datele științifice din
cercetarea din domeniul sănătății mentale în contemporaneitate.
Chapter 5 using a systematic and reproducible methodology.
These reviews also assess the quality, the
Systematic Reviews in Mental Health
characteristics of the included studies, and the
Pim Cuijpers outcomes of the studies in a systematic way.
Meta-analyses are a specific type of systematic
Ioana A. Cristea review. The only difference with other systematic
Every year, hundreds of thousands of reviews is that the findings of the included studies
biomedical articles are published, including more are statistically integrated into estimates of the
than 16,000 randomized controlled trials (RCTs) effects of the interventions being tested, and these
examining the effects of treatments (Cuijpers, are accompanied by an assessment of their
2016), and these numbers increase every year. statistical significance.
With such large numbers of randomized trials, it In this chapter, we focus mostly on traditional
becomes more and more difficult to maintain an meta-analyses of outcome studies examining the
overview of a field of interventions. Both clinical effects of interventions. What we say about meta-
practitioners and researchers who want to stay "on analyses is also true for systematic reviews,
top" of the latest empirical findings find that it is except the parts about the statistical integration of
not possible to do so, despite the best of the results. Meta-analyses, in principle, can
intentions. Consequently, reviews are needed to integrate all outcomes of studies that have a
help clinicians and researchers make good use of standard error; therefore, they are not limited to
the clinical intervention literature in their work. studies about effects of intervention; for example,
Reviews also help to inform clinical practice meta-analysis can be used to integrate the findings
guidelines, which are the focus of Hollon of longitudinal prospective studies that examine
(Chapter 6, this volume). risk factors for a specific disorder. Many of the
There are different types of reviews. principles that we describe are also true for such
Traditional (narrative) reviews are written by meta-analyses, although some issues may differ,
experts in a specific field, and they rely very much such as the formulation of the research question
on the authority of the author, as well as the or the assessment of the validity of the included
author's own perspective on what counts as studies.
relevant and how data should be interpreted. It is
usually not clear how the studies included in the The results of meta-analyses are used by all
review were selected, whether all relevant stakeholders in mental health care. Policymakers
outcomes for all studies are described, and how use meta-analyses to decide about treatments that
the quality of the studies was assessed. It is are included in health care programs and the
difficult, therefore, to verify the conclusions of financial coverage to be provided for such
such reviews. Also, the author's own biases and treatments. Patients use the results of meta-
allegiance can play a significant role in analyses to make decisions about whether they
influencing the conclusions of the review. In want a treatment or not. Researchers use meta-
contrast, systematic reviews have a clear analyses to generate new research questions, to
objective and try to answer a precise research examine methodological limitations of existing
question. Based on that question, the criteria for trials, and to estimate sample sizes for future
which studies should be included are defined, and trials. Clinicians use meta-analyses for the
orderly searches for these studies are conducted development of decision-making tools and
treatment guidelines, and they may use them to
direct them to key individual studies (e.g.. specific Another problem is that meta-
RCTs that may be particularly informative for analyses "combine apples and oranges":
some of their patients). Given the importance of Especially in mental health care, there are
meta-analyses in delivering (and improving) usually considerable differences between
evidence-based practice, it is critical to develop studies. For example, the exact inclusion
the skills to interpret and make use of meta- criteria for participants, the recruitment
analyses. methods, the characteristics of the
participants, the manuals for the treatments,
This chapter is designed to serve as your guide. and the therapists delivering the intervention all
We lead you, step by step, through the process of may vary. Also, rarely are trials examining
conducting a meta-analysis, so that you can get one intervention exact replications of
a detailed look "behind the scenes": We another. Consequently, some critics say that
also highlight core questions to ask and the results of these studies cannot be
dimensions to evaluate along the way, so that you integrated in a single meta-analysis.
can determine how much you can trust a meta-
analysis to guide your clinical practice and A third problem of meta-analyses is researcher
research. allegiance, or "agenda-driven" bias of the
researchers who conduct the meta-analyses.
Advantages and Problems of Meta-Analyses Meta-analyses are often written by researchers
who are strong supporters the interventions they
Integrating the results of multiple trials in examine, and they may be inclined to stress the
a meta-analysis has several advantages positive effects of the interventions they examine.
(Cuijpers, 2016). First of all, because the Fortunately, there are ways to address these
results of many studies are combined into potential problems, which we explore in de-tail in
one effect size, the precision and accuracy this chapter. This is good news
with which an effect can be estimated is much because systematic reviews and meta-analysis
better than each of the included trials. This form the foundation of clinical guidelines,
precision is better because the number of included which are presented Holton (Chapter 6,
participants is much larger, or in technical terms, this volume). Together, these types of
the statistical power is higher. Also, meta- publications are powerful tools for researchers
analyses can address questions that require and practitioners who want to use the existing
large samples and would be hard to evidence base to shape future studies or the care
address with individual trials, including questions that they deliver in their clinical practices.
about moderators of treatment effects.
Formulating Research Questions for Meta-
There are, however, also several problems with Analyses with the PICO Acronym
meta-analyses. One is often referred to as the
"garbage in, garbage out" problem. This Every study starts with a good research
means that if the studies that are included in question. That is also true for meta-analyses.
the meta-analysis are of low quality, the results of Research questions for meta-analyses are
the meta-analysis also will be of low quality typically formulated with the use of the
(although the meta-analysis in itself may be done PICO acronym (although some
very well). So a meta-analysis can never be investigators also recommend attention to
better than the sum of the studies it summarizes. "time" and "setting" and propose using the
PICOTS framework). PICO stands for
Participants, Interventions,
Comparisons, and Outcomes. A research question for preclinical sciences. Pubmed now has 25 million
a meta-analysis could be, for example, "What is the citations and abstracts from more than 5,600
efficacy of cognitive-behavioral therapy (CBT) biomedical journals, and it is free for any user
on sleep diary outcomes, compared with control, (www.ncbtnlm.niltgov/pubmed). PsycInfo is a
for the treatment of adults with chronic bibliographical database from the American
insomnia?" (Trauer, Qian, Doyle, Rajaratnam, & Psychological Association with almost 4 million
Cunning-ton, 2015). All four elements of the bibliographic records from more than 2,500
PICO are in there: P = adults with chronic scientific journals, books, and theses on the
insomnia; I = CBT; C = control groups; O = sleep behavioral and social sciences. Unfortunately, it
diary outcomes. is not freely accessible. The Cochrane Central
Register of Controlled Trials (CENTRAL) is the
Just like randomized trials, a meta-analysts database from the Cochrane Collaboration and
focus on a contrast between an intervention and a contains only randomized trials in the bio-medical
comparator. Effect sizes and outcomes of trials sciences. The Cochrane database identifies trials
and meta-analyses typically describe the by searching other bibliographical databases and
difference between the intervention and the by hand-searching the contents of about 2,400
comparator after treatment. However, some meta- scientific journals. www.ClinicalTrials.gov is
analyses also compare the difference between also a very useful source for identifying clinical
baseline and posttest within one group of trial protocols.
participants receiving an intervention. In both of
these cases, however, there is a comparison— In addition to these core databases, there are
either between two groups or two moments in many others that may be relevant to mental health.
time. Embase (another general biomedical
bibliographical database) includes many journals
Identifying Trials in Bibliographical that are not included in PubMed. There are also
Databases many subject-specific databases, such as
CINAHL (nursing science), BiblioMap (health
Once a good research question has been promotion research), ERIC (education), and
articulated, the next step in conducting a meta- AgeLine (aging issues). Other databases include
analysis is to identify trials in a systematic, citation databases (e.g., ISI Web of Knowledge.
reproducible way. The goal is to include all Scopus. and Google Scholar), national and
studies that are relevant and meet the inclusion regional databases (e.g.. LILACS from Latin
criteria. America, IndMED from India. and several
Chinese databases (Xia, Wright, & Adams, 2008).
Included studies are identified most often
through searches in bibliographical databases. For Relevant studies can also be identified using
meta-analyses in mental health, at least three of other methods, such as checking the references of
these bibliographical databases should be included trials, identifying earlier meta-analyses
searched. PubMed is a website that provides free to see which studies were included, hand-
access to Medline, life science journals, and online searching the contents of major journals of the
books. Medline is the National Library of field, searching conference proceedings, or
Medicine's database of citations and abstracts in contacting key experts in the field to check
the fields of medicine. nursing, dentistry. whether you missed studies. Often, articles that
veterinary medicine, health care systems, and
appear outside of published journals are referred key words to records using a thesaurus, and these
to as the "grey literature," and there are key words are hierarchically structured into a
differences of opinion regarding the pros and cons taxonomy. For example. in PubMed, these key
of including such articles in a meta-analysis. As a words are called MeSH terms (Medical Subject
reader of meta-analyses, what is critical is that the Headings).
meta-analysis clearly describes the search
strategy and decisions about what types of studies Searches in bibliographical databases make
are and are not included. In addition, it is helpful intensive use of Boolean operators (like AND,
if the meta-analysis is registered, so that one can OR, and NOT). Brackets can help with defining
determine that these basic methods were applied such search strings. Suppose, for example, that
consistently throughout the meta-analytic you want to conduct a meta-analysis on
process; one can check for the preregistration of psychological treatment for generalized anxiety
systematic reviews through the database disorder and you need to do a search of individual
PROSPERO (www.crd.york.ac.uk/prospero), RCTs. In that case, you could develop a search
which is an international registry of string combining terms for generalized anxiety
health-related systematic reviews and meta- disorder and treatment, and it might look like this:
analyses. ("generalized anxiety" OR "worry's) AND
(psychotherapy or "cognitive behavior therapy"
Searching in Bibliographical Databases OR "interpersonal psychotherapy"). In this
example, you can also see how truncation may be
Based on PICO research question, inclusion used. Truncation is a searching technique used in
and exclusion criteria for the trials that are databases in which a word ending is replaced by a
the object of the meta-analysis are formulated. symbol, usually the asterisk (*). For example, if
So they typically describe the characteristics of you use "worry*" as a search term in PubMed,
the participants for the trials they will you will find records with not only "worry" but
include, the interventions, the comparator, and also "worrying." Apart from truncation, wildcards
the outcomes. The PICO terms are also used to ("?") can be used to replace the letter of a word.
develop search strings to identify studies in the For example, the term "m?n" will identify records
bibliographical databases. In these searches, a with the term "man," "men," "min," "mun," and so
balance has to be found between sensitivity forth.
and precision. Broad searches generate large
amounts of records, but the chance of missing Search filters are often used in searches in
trials that meet inclusion criteria is small. bibliographical databases (see the website from
Narrow searches result in a smaller number the "InterTASC Information Specialists"
of records, but the chance that trials are Sub-Group Search Filter
missed is greater. The identification of search Resource" (www.york.ac.uk/insticrd/
terms, based on the inclusion and exclusion intertasc) for a useful overview of search filters
criteria, that adequately filter is an extremely for many different types of studies). When
important part of the meta-analytic process. The conducting meta-analyses of randomized trials,
search terms help to ensure the quality and starch filters for trials are often used. For
similarity of included studies. example, PubMed has a useful MeSh term
for randomized trials ("Randomized
When searching in bibliographical Controlled Trial"(Publication Type).
databases, it is important to search for not only
text words in the title and abstract but also key
words. Every bibliographical database has a
system of attaching
As a reader of meta-analytic reviews, what this should again be done by two independent
means is that it is critical to pay attention to the reviewers. This selection process results in a first
search terms used by the authors. The use list of studies to be included in the meta-analysis.
of search terms determines what studies are The first list of studies probably is not the definite
included or excluded. Knowing this helps one to list of studies because often during the extraction
determine how useful a meta-analysis will be of the data from the individual trials, it turns out
in guiding one's clinical practice given the that one of the inclusion criteria is not met after
types of patients and problems with which one all, or, for example, that it is not possible to
works. calculate effect sizes because essential
information is missing.
Selection of Studies
Data Extraction: Study Characteristics
A published meta-analysis should have a
Preferred Reporting Items for When the decisions about the inclusion of
Systematic Reviews and Meta-Analyses studies has been made, the data extraction can
(PRISMA) flowchart, which delineates begin. There are three types of data that have to
the process of selecting studies from be extracted from each included study:
bibliographical databases up until the inclusion characteristics of the studies, risk of bias (or
of the studies in the meta-analysis. The quality assessment), and the data that are needed
PRISMA flowchart requires that the exact for the calculation of effect sizes.
number of records found in bibliographical
databases be reported, as well as the total number The characteristics of the studies are always
of full-text papers that are retrieved, the number summarized in a table in the paper describing the
of trials that meet inclusion criteria, and the included studies. These descriptions typically
reasons why full text papers were excluded. follow the PICO of the meta-analyses and
This is to ensure that an independent researcher illustrate the key characteristics of the
has all the necessary information to reproduce participants, the intervention that is examined,
the search. As a reader of meta-analytic reviews, and the comparators. The outcomes are not
one might have cause for concern if one dots always included in the descriptive table because
not find a PRISMA diagram. It is challenging these are also converted to the effect sizes and are
to assess the search and inclusion-exclusion reported in the results of the paper. There are no
process without this information. Also, it straightforward rules by which characteristics of
might indicate that the authors are not the included studies are extracted. That depends
adhering to other generally accepted on the subject of the meta-analysis, the included
standards in the field. studies, and the exact research question.

Moreover, preferably the records resulting Quality and Risk of Bias


from the searches should be read by two
Assessment of risk of bias is an essential part
independent researchers. During this process,
of any meta-analysis. It is directly related to the
those records possibly meeting inclusion criteria
need to address concerns about the "garbage in,
are selected, and the full texts of these records are
garbage out" problem. A meta-analysis can
retrieved. It is not required in this phase to
never be better than the sum of its parts, that
indicate reasons why records are not selected. It is
is, the set of studies that it summarizes. If the
usually only reported that they were excluded original
based on the title and the abstract. After
retrieval of the full-text papers, these should
be read in order to see if they meet inclusion
criteria. This
studies have high risk of bias, no meta-analysis, trial. If the random assignment to conditions in
regardless of how sophisticated it is, can solve this trials (as reviewed by Kraemer & Periyakoil in
problem. Chapter 4, this volume) has not been done well, there
may be systematic differences between
There is a difference between quality and risk participants in the intervention and, respectively, the
of bias, although the two concepts overlap. comparison group. Selection bias can be caused
Quality indicates how well a study has been by errors in the randomization process. There are
designed and conducted. What is good quality, two "weak spots" in this process. The first "weak
however, is not so easy to define. There are many spot" is the generation of the order in which
rating scales of study quality, but it is often not participants are assigned to conditions. This is
clear which concepts these scales measure; called sequence generation. Using a random
therefore, these scales vary considerably. In fact, numbers table, a computerized random number
a recent analysis (Armijo-Olivo, Fuentes, Ospina, generator, throwing dice, or tossing a coin are all valid
Saltaji, & Hartling, 2013) of tools of evaluating ways of generating random numbers. Assigning
methodological qualities of RCTs revealed participants by date of birth, the date of admission,
inconsistencies among themselves, and between patient record number, or by the judgment of a
the items in these tools and the Cochrane clinician, however, are not valid methods of
Collaboration Risk of Bias (RoB; Higgins et al., randomizing. The second "weak spot" is allocation
2011). concealment. This means that the researchers and
the participants cannot foresee the assignment because
Defining and assessing risk of bias is more this could allow them to influence the process
straightforward than methodological quality. Bias of randomization. The allocation to conditions
is a systematic error in a study, or deviation from should therefore be concealed as much as possible
the true or actual outcomes, in results or from researchers and participants. Some strategies
inferences. Risk of bias can be seen as denoting for doing so include asking an independent person,
"weak spots" of randomized trials, where the who is not involved in the trial, to do the
researchers (usually without intention or even assignment to conditions, or making sequentially
awareness) can influence the outcomes of the numbered, opaque, and sealed envelopes
study. These weak spots do not automatically containing the condition to which the participant is
imply that there is bias; hence, it is more correct assigned. In mental health research, the
to talk about "risk of bias" instead of bias. Many method of randomization was not described in
meta-analyses use the Cochrane RoB assessment most studies on mental health problems at all until
tool (described in the Cochrane Handbook for about 10 years ago, and it was usually only reported
Systematic Reviews of Interventions [Higgins & that participants were randomized (Chen et al., 2014;
Green, 2011), which gives an excellent overview Cuijpers, van Straten, Bohlmeijer, Hollon, &
of the different types of risk of bias Andersson, 2010).
[http://handbook.cochrane.otg]). There are
different kinds of bias; here, we discuss five Another "weak spot" of randomized trials is
important areas of risk of bias: selection bias, detection bias, which refers to systematic
detection bias, attrition bias, reporting bias, and differences between groups in how outcomes are
allegiance bias. assessed. Detection bias can be prevented by
blinding (or masking) of participants, the
Selection bias refers to systematic differences
between the groups that were randomized in the
personnel involved in the study, and outcome & O'Neill, 2009). There is, however, no
assessors. In trials testing the effects of drugs, it is consensus about whether any of these methods are
possible to blind patients who participate. Patients better or whether there are differences between
receive a pill that may contain the medication that results. Earlier trials on psychological
is tested or a placebo pill. This placebo pill is interventions rarely used these methods. Another
exactly the same as the medication, but type of bias in randomized trials is reporting bias.
without the active substance. In It often happens that more than one primary
psychological interventions, blinding of outcome is used to measure the effects of an
participants is usually not possible because intervention. If one of these outcomes shows
participants and clinicians delivering the better effects of the intervention than another
intervention typically know whether they outcome, researchers are sometimes inclined
are assigned to the intervention or to a control to report only the outcome showing the
condition. It is therefore very possible that most favorable effects for the intervention.
effects of interventions are (partly) caused by the But that is wrong because it results in an
participant's expectations about the overestimation of the effects when the study
intervention. Unfortunately, there is no solution is included in a meta-analysis. In recent
for this problem. It is nonetheless possible in years, many trials are registered in trial
trials on psychological interventions to blind registries or the design of the study is
assessors of outcome. If assessors are not published in protocol papers before the study
blinded, they are inclined to assume that the has started (sec. e.g., clinicaltrials.gov). In
participants who did receive the intervention these protocols, it can be verified whether
are better off than the ones who did not and the planned outcome measures were
therefore overrate the outcomes of the indeed used in the analyses.
intervention (Higgins & Green, 2011). In
many psychological interventions, outcomes These four types of bias can be assessed with
of a trial are assessed with self-report measures, the Cochrane RoB Assessment Tool, which has
and not through interviews with (blinded) clear criteria for each of these types of risk of bias,
assessors. Self-assessment of outcome is also which are scored as low risk of bias (when the
not blinded and may therefore result in bias, paper clearly describes that the type bias was
too. handled well), high risk of bias (when the paper
describes a procedure that indicates that the risk
Attrition bias refers to the bias that is caused of bias is present), or unclear risk of bias (when
by the participants who drop out of the trial. Trials the paper does not give enough in-formation to
should include intent-to-treat analyses, which say whether there was risk of bias or not). Again,
means that all participants who were randomized it is again important that these assessments of
are included in the analyses of the outcome, the risk of bias are done by two
regardless of whether they dropped out of the independent researchers in conducting the me-
trial. Missing data from participants who ta-analysis.
dropped out should be imputed. There are good
methods for imputation of missing data A final type of bias that is not included in the
available, such as using the last observation RoB Assessment Tool but is very important
that is available (the last observation carried for psychological interventions is
forward), multiple imputation techniques, or researcher allegiance. This can be defined as
mixed models for repeated measurements a researcher's "belief in the superiority of a
(Crameri, von Wyl, Koemcda, Schulthess, & treatment [and] ... the superior validity of the
Tschuschke, 2015; Siddiqui. Hung. theory of change that
is associated with the treatment" (Leykin & calculated with the same formula, but then a
DcRubeis, 2009, p. 55). There is considerable somewhat different method to calculate the
evidence that researcher allegiance is associated pooled standard deviation. For small samples,
with better outcomes for the preferred treatment Hedges' g gives a more accurate estimate of the
(Dragioti, Dimoliatis, Fountoulakis, & effect size. In order to calculate an effect size, the
Evangclou, 2015; Muncie'', Brutsch, Leonhart, mean, standard deviation, and the number of
Gager, & Barth, 2013). This may be because participants for each group is needed. If the mean
treatments are implemented with better fidelity by and standard deviation are not given in one of the
investigators who have allegiance to those original studies to be included in a meta-analysis,
treatments; however, the impact of researcher other statistics, such as the r-value or the p-value
allegiance is not well understood or measured in can be used to calculate the effect size.
the literature. One's confidence in the findings
reported by a meta-analytic review should be One of the big advantages of effect sizes is that
based, in part, on the extent to which the authors they allow us to see how large effects are.
have attended consistently, accurately, and Significance testing of the difference between a
transparently to questions of study quality and group that received an intervention and a control
risk of bias. These considerations are protections group only indicates whether that difference is
for authors, readers, and the field at large when it statistically significant, not how large that
comes to concerns about the "garbage in, garbage difference is. In contrast, an effect size says some-
out" problem. thing about the size of the effects. Usually effect
sizes of 0.20 are considered to be small, 0.50 are
Calculating Effect Sizes moderate, and 0.80 are large. Based on several
hundred meta-analyses in the educational and
In meta-analyses. the effects of an intervention psychological interventions, Lipsey and Wilson
are statistically integrated. In order to do this, an (1993) estimated that effect sizes less than d =
effect size has to be calculated for each study that 0.32 are small, those that are 0.33 to 0.55 are
is included in the meta-analysis. The effect size is moderate, and those greater than 0.56 are large.
a way of quantifying the difference between
groups. This effect size for each study has to be It is important to remember, however, that the
standardized in some way to make it comparable effect size is still a statistical concept and does
to the effect sizes of the other studies; otherwise, not directly say something about the
they cannot be integrated. In mental health clinical relevance of the effects of an
research, usually Cohen's d or Hedges' g are used intervention. For example, an effect size of d
as effect sizes. These effect sizes are based on = 0.1 would be considered very clinically
continuous outcomes (that can take any value in a relevant for an intervention aimed at
given range) and indicate the difference between improving survival, but this same effect size may
the means of the two groups, in terms of the not be considered clinically meaningful for
standard deviations of the two groups. An effect an intervention aimed at improving
size of 0.5 means that the two groups differ social skills or knowledge about a mental
from each other by 0.5 standard deviations. health problem.

Cohen's d can be calculated as the difference A big disadvantage of effect sizes is that they
between the means of the two groups, divided by are difficult to explain to participants,
their pooled standard deviation. Hedges' g is clinicians, and policymakers. They also say very
little about the chance that a patient will be free
from a mental
Table 5.1 – Possible Dichotomous Outcomes in RCT

Event No event Total


(success) (fail)

Therapy A C (A+C)
Control B D (B+D)

Odds ratio (OR) = (A*D) Odds of success in treatment group


(C*B) = Odds of success in comparison group

A/(A+C) Risk of success in treatment group


Relative risk (RR) = B/(B+D) = Risk of success in comparison group

Risk difference (RD) = [A/A(+C)]- = Risk in therapy group – risk in control group
[B/(B+D)]

Number needed to treat (NNT)=1/RD = 1 divided by risk difference

health problem after receiving a treatment. One portions of responders to treatment. and the
common way to solve that is to transform effect number of patients who relapse or number of
sizes into the number needed to treat (NNT), participants dropping out of an intervention are
which indicates the number of patients that have dichotomous outcomes. Dichotomous outcomes
to be treated in order to generate one additional are much easier to understand for patients and
positive outcome (Laupacis, Sackett, & Roberts, clinicians, as a patient either responds to a
1988). The NNT is much easier to understand treatment or not or has a relapse or not.
than effect sizes. Although the NNT is based on
dichotomous outcomes of trials (see below), it can Dichotomous effects of treatments in trials are
still be estimated from effect sizes (based on usually expressed in terms of relative risks (RRs)
continuous outcomes). There are several ways of or odds ratios (ORs). The RR is the "risk" of
converting an effect size to the NNT, all of which participants for an outcome in the intervention
assume that the mean scores follow a normal or group, divided by the risk in the comparison
near normal distribution (da Costa et al., 2012; group. The exact formula for the RR is given in
Furukawa & Leucht, 2011). Table 5.1. In other words, the RR is the proportion
of participants with an outcome in the
Although most meta-analyses in mental health intervention group, divided by the proportion in
care focus on continuous outcomes, such as the comparison group. So if there is no difference
depressive or anxiety symptom severity, and between the two groups, the RR is 1 and if the
effect sizes such as Cohen's d and Hedges' g, 95% confidence interval (CI) around RR does not
many important mental health outcomes are include 1, the RR is significant (significantly
dichotomous, for example, the number or pro- different from I).
The OR is more difficult to understand. The in study design and risk of bias. If there is too
OR is the "odds" that an event will occur in the much clinical and methodological heterogeneity,
treatment, compared to the "odds" occurring in a meta-analysis is not useful.
the comparison group. The "odds" itself is the Statistical heterogeneity is caused by
ratio of the probability that a particular event will clinical and methodological heterogeneity. In
occur to the probability that it will not occur. a meta-analysis, the term heterogeneity usually
Because the OR is so difficult to understand, it is indicates statistical heterogeneity. Heterogeneity
often advised not to use it as an outcome in trials is a key issue in understanding the results
and meta-analyses. The risk difference (RD) is the of a meta-analysis. Heterogeneity in meta-
risk for the event in the intervention group minus analyses in mental health is often high, and
the risk for the event in the comparison group. The this is true for both statistical and clinical
NNT is I divided by the RD. heterogeneity. It is therefore very important
in meta-analyses with high levels of
Pooling of Effect Sizes and Heterogeneity as a heterogeneity to examine sources of
Key Concept in Meta-Analyses heterogeneity. We explain below how that can be
done. If (statistical) heterogeneity is high and
The strength of a meta-analysis is that the
the causes of this cannot be identified, this means
effect sizes for each individual study can be
that there is variability among the effect sizes of
"pooled" by calculating the mean of these effect
this intervention that cannot be explained. This
sizes. This pooled effect size is the best estimate
of the "true" effect size for that intervention that further implies that we in fact do not know
is available. However, it is not adequate to simply under which conditions an intervention is
calculate the mean of these effect sizes across the effective and how large these effects are.
studies because, in that case, small studies would Basically, there are two methods for pooling
have the same "weight" as large studies. It is effect sizes in meta-analyses. The fixed effect
important that in calculating the pooled effect model assumes that all studies are exact
size, large studies get more weight than small replications of each other and share a
studies. common (true) effect size (Borenstein. Hedges,
Higgins, & Rothstein. 2009). All variables
A key issue for the pooling of effect sizes is that may have an effect on the outcomes
heterogeneity, which is the variability among of the interventions are identical across
studies. It refers directly to the problem of all trials. Because these trials estimate
comparing "apples and oranges" in meta-analyses exactly the same effect size, the effect sizes
that we mentioned earlier. Statistical found in the trials only vary due to random error
heterogeneity refers to the variability across the inherent in each study. In the random effects
effect sizes that are found for the included studies. model, it is not assumed that all trials are exact
If there is statistical heterogeneity, this means that replications of each other. Each trial can
the observed effect sizes are more different from introduce its own underlying variance
each other than what would be expected due to because of the differences between the trials.
chance (random error) alone. Clinical So effect sizes differ from each other not only
heterogeneity refers to the variability among the because of random error, as in the fixed
participants, interventions, and outcomes across effects model, but also because of the true
the studies included in the meta-analysis. differences between the studies.
Methodological heterogeneity refers to variability
In mental health, research trials can hardly ever
be considered to be exact replications of each
other, and it is recommended that researchers use
the random effects model and not the fixed effect outliers), a much better way to examine
model. Sometimes researchers let the choice for heterogeneity is to calculate it in term of per-
the fixed or the random effects model depend on centages. The I2 statistic indicates heterogeneity
the level of heterogeneity found (see below), but in percentages or, in other words, the percentage
that is wrong. If there are differences between the of the variability in effect sizes that can be
studies (clinical and methodological attributed to heterogeneity rather than chance
heterogeneity) the random effects model should (Higgins. Thompson. Decks, & Altman, 2003). A
always be used. percentage of 25% is considered low
heterogeneity; 50% is moderate, and 75% is high.
The Forest Plot as the Core of a Meta-Analysis
It is important to calculate not only the I2 for a
The forest plot is a good summary of a meta- meta-analysis but also the 95% CI around I2
analysis and in many ways its core. Any paper (loannidis, Patsopoulos, & Evangelou, 2007).
reporting the results of a meta-analysis has (or Especially with smaller numbers of studies and
should have) such a forest plot. It provides a small numbers of participants, the uncertainty
graphical representation of the effect size for each around I2 can be considerable. For example, it is
study, as well as the 95% CI around that effect very well possible that I2 is zero, but that the 95%
size and the pooled effect size (with 95% CI). The CI goes from 0 to 75%. In that case, the 0%
95% CI of the effect size is given as a line through heterogeneity is not very meaningful.
the effect size. The longer that line is, the broader
the 95% CI and the smaller the sample size. So if There is also a formal test for significance of
an effect size has a short line through it, that heterogeneity, based on what is called Q (a chi
means it is a large study. And large studies should squared statistic). This tests whether observed
be closer to the mean effect size (because they differences between effect sizes can be explained
estimate the effect size more precisely) than by chance alone. If this is the case, then there is
smaller studies. If large studies still deviate much no significant heterogeneity. However, this test
from the pooled effect size, this indicates probable should be interpreted with caution because it has
heterogeneity. If the 95% CI of the effect size of low power, and this is a problem when there are
a study does not overlap with the 95% CI of the small numbers of studies and small sample sizes
pooled effect size for all studies together, that per study. So, the I2 statistic is more informative
study could be an "outlier." If a study is an outlier, when it comes to assessing heterogeneity.
it is important to examine whether there are
characteristics of that study that could explain If heterogeneity is very high, it may be
why this is the case. If there are many outliers in advisable not to perform or publish a meta-
a meta-analysis, heterogeneity is probably also analysis at all because a pooled effect size may
high. Holton (Chapter 6, this volume) provides an mislead and suggest that this pooled effect size is
example of a forest plot and additional meaningful when it is not.
explanation about how to interpret such a plot in
a meta-analysis. Examining Sources of Heterogeneity

Although the forest plot can provide a first When a meta-analysis finds heterogeneity, it is
indication as to whether there is heterogeneity important to examine possible sources of this. It
(many studies, especially large ones, with effect may be true that a characteristic of the
sizes diverting from the pooled effect size; many participants, the intervention, or the study is
related to the effect size and that subgroups of variables may be examined at the same time, just
studies exist with effect sizes that differ very as is done in a "normal" regression analysis.
much from each other. This results in high levels
of heterogeneity, but it is in fact related to the It is important to note that subgroup and
different effect sizes for these subgroups of metaregression analyses are useful for examining
studies. There are different ways of examining possible sources of heterogeneity, but the results
sources of heterogeneity. We discussed the first should always be interpreted with caution. A
one already, namely, checking whether there are significant predictor is not evidence for a causal
outliers. Outliers are studies that differ association between this predictor and outcome.
considerably from the rest of the studies. Outliers For example, suppose that a meta-analysis of a
usually result in an increase in heterogeneity. If therapy compared with care-as-usual control
the characteristics of these studies are really groups shows that individual treatment has
different from the other studies, then this may significantly higher effect sizes than group
explain why these studies result in such different treatments. This finding cannot be considered as
effect sizes. causal evidence that individual treatments are
indeed more effective than group treatments. It is
Another way of examining possible sources of very well possible that this difference in effect
heterogeneity is by subgroup analyses. In these size is caused by another variable that is not
analyses, the set of included studies is divided into measured. The best way to show that there is a
two or more subgroups. Then researchers test difference between individual and group
whether the effect sizes differ significantly from treatments is to focus on trials that directly
each other, and whether heterogeneity is lower in randomize patients to individual or group
each of the subgroups compared to the overall treatments. A meta-analysis of such studies does
group of studies. Usually, researchers perform result in the best evidence for a possible
these subgroup analyses using a mixed effects difference. Subgroup analyses can only result in
model in which the effect sizes within the indirect evidence for such differences.
subgroups are pooled with a random effects
model, then test the effect sizes between the Publication Bias
subgroups with a fixed effect model.
One problem of meta-analyses is that not all
Metaregression analyses may also be used to the studies conducted are actually published.
examine heterogeneity in meta-analyses. In a Authors, editors, and journals are inclined to favor
bivariate metaregression analysis, the association publication of studies that show significant effects
between a continuous characteristic of the studies for interventions. If a study shows no or only
and the effect sizes is computed. For example, the small effects, such studies often are not published.
association between the effect size and the And this is a problem for meta-analyses because
number of sessions in an intervention might be these are based on published studies; if negative
examined in a metaregression analysis. In studies are not published, this may considerably
multivariate metaregression, more than one overestimate the true effect of an intervention. But
predictor is examined at the same time in if a study is not published, how can we solve this
one model. In these analyses, continuous problem? We do not know what the effect size of
outcomes (e.g., number of sessions, number of these unpublished studies are, so we also do not
participants per condition, year of publication) know what the true effect size is.
and categorical
Sometimes direct estimates of unpublished pooled effect size after adjustment for publication
studies can be made, for example, by checking the bias. Hollon (Chapter 6. this volume) provides an
trials on drugs submitted to the U.S. Food and example of a funnel plot and its interpretation.
Drug Administration (Turner, Matthews,
Linardatos, Tell, & Rosenthal, 2008) or by Other Types of Meta-Analyses
checking whether funded grants for trials on
psychotherapy led to published studies (Dries- We have focused most of this chapter on
sen, Hollon, Bockting, Cuijpers, & Turner, 2015). "traditional" meta-analyses. However, two other
However, usually such direct estimates of types of meta-analyses are worthy of brief
publication bias are not possible. mention.

It is possible nonetheless to get indirect In network meta-analyses different


estimates of publication bias. These estimates are comparisons may be included at the same time. In
based on the assumption that large studies (with traditional meta-analyses, only one comparison
many participants) can make a more precise can be examined at a time. The PICO describes
estimate of the effect size, while the effect sizes only one comparison between an intervention and
found in smaller studies can divert more from the a comparison group. In network meta-analyses,
pooled effect size because they are less precise in more comparisons may be examined at the same
their estimates of the effect size. Random time. Suppose, for example, that there are two
variations of the effect sizes are larger in studies treatments for one mental disorder, and both
with relatively fewer participants compared to treatments have been tested in trials that have
those with many participants. This difference can compared them with control groups, and other
be represented graphically in a funnel plot, in trials have directly compared these two treatments
which the effect size is represented on the with each other. In a traditional meta-analysis,
horizontal axis and the size of the study on the three separate analyses should be done, two for
vertical axis. When the size of the study is smaller, each treatment compared with a control group and
the effect sizes can divert more from the mean one for the direct comparisons between the two
effect size, and when the study has more treatments. In a network meta-analysis, all these
participants, its effect size should be closer to the comparisons may be examined at once in the same
pooled effect size across all studies. So when analysis. The network meta-analysis is also called
small studies divert more from the pooled effect a multiple treatment comparison meta-analysis or
size, they should be found in both in the positive a mixed treatment meta-analysis.
and the negative direction. If they are found in the
In "individual patient data" meta-analyses the
positive direction but not in the negative direction,
primary data of trials from a systematic review are
this can be seen as an indication for publication
collected and analyzed (Riley, Lambert, & Abo-
bias. This "asymmetry of the funnel plot" can be
Zaid, 2010). The advantage of this type of meta-
tested with formal statistical tests (e.g.. Egger's
analysis is that all analyses may be done in the
regression intercept test or Begg and Mazumdar's
same way across trials and in that way make a
rank correlation test), but the missing studies can
better estimate of the true effect size. There is also
also be imputed, using Duval and Tweedie's trim
enough statistical power to examine moderators
and fill procedure (Duval & Tweedic, 2000).
of outcome. "Individual patient data" meta-
which estimates how many studies are missing
analyses are also sometimes called mega-
due to publication bias and also calculates the
analyses.
Conclusion the items in• eluded in tools used in general health
research and physical therapy to evaluate the
In this chapter we have discussed the methods methodological quality of randomized controlled
of meta-analyses in mental health care and have trials: A descriptive analysis. BMC Medical
presented a guide for readers on how to interpret Research Methodology 13. 116.
the methods and results of meta-analyses. We
have discussed the methods that researchers use Borenstein. M.. Hedges. L. V.. Higgins. J. P.
for identifying relevant studies, extracting data T.. & Roth-stein. H. R. (2009). Introduction to
from studies that meet the inclusion criteria, meia•analysis. Chichester, UK: Wiley. Chen, P.,
analyzing the results of these studies, and Furukawa, T. A., Shinohara, K., Honyashiki,
statistically integrating the results of these stud- M.,1mai, H., Ichikawa, K., et al. (2014). Quantity
ies into pooled effect sizes. and quality of psychotherapy trials for depression
in the past five decades. Journal of Affective
Many resources exist for learning more about Disorders. 165. 190-195.
meta-analyses, including more extensive books
on the methods of meta-analyses, such as the Crameri, A., von Wyl, A., Koemeda, M.,
Cochrane handbook (Higgins & Green, 2011). Schulthess, P., & Tschuschke, V. (2015).
We also encourage readers to familiarize Sensitivity analysis in multiple imputation in
themselves with the PRISMA Statement (Moher, effectiveness studies of psychotherapy. Frontiers
Liberati, Tetzlaff, Altman, & the PRISMA Group, in Psychology, 6, 1042.
2009). PRISMA is a guide for authors of meta-
analyses about what should be reported. The Cuijpers, P. (2016). Meta-analyses in mental
PRISMA statement, which contains an evidence- health research: A practical guideline. Retrieved
based minimum set of items for reporting in from http:// bit.do/meta-analysis.
systematic reviews and meta-analyses. has been
accepted by most journals in the bio-medical Cuijpers, P., van Straten, A., Bohlmeijer, E.,
field. Authors of meta-analyses are advised to Hollon, S. D., & Andersson, G. (2010). The
use PRISMA to improve the reporting of effects of psychotherapy for adult depression are
systematic reviews and meta-analyses. overestimated: A meta-analysis of study quality
and effect size. Psychological Medicine, 40(2),
Meta-analyses have become indispensable 211-223.
tools for integrating the results of the thousands of
randomized trials in health care, including mental da Costa, B. R., Rutjes, A. W. S., Johnston, B.
health care. The results of meta-analyses are used C., Reichenbach, S., Ntlesch, E., Tonia, T., et al.
by patients, clinicians, and policymakers in (2012). Methods to convert continuous outcomes
mental health care. In order to use meta-analyses into odds ratios of treatment response and
well, it is necessary to understand how they are numbers needed to treat: Meta-epidemiological
conducted and reported, and to bring a critical lens study. International Journal of Epidemiology,
to the findings. 41(5), 1445-1459.

References Dragioti, E., Dimoliatis, I., Fountoulakis, K.


N., & Evangelou, E. (2015). A systematic
Armijo-Olivo, S.. Fuentes. J.. Ospina. M.. appraisal of alle-giance effect in randomized
Saltaji. H.. & Harding, L. (2013). Inconsistency in
controlled trials of psychotherapy. Annals of England Journal of Medicine, 3/8(26), 1728-
General Psychiatry, 14, 25. 1733.

Driessen, E., Hollon, S. D., Bockting, C. L. H., Leykin, Y., & DeRubeis, R. J. (2009).
Cui-jpers, P., & Turner, E. H. (2015). Does Allegiance in psychotherapy outcome research:
publication bias inflate the apparent efficacy of Separating association from bias. Clinical
psychological treatment for major depressive Psychology: Science and Practice, 16(I), 54-65.
disorder?: A systematic review and meta-analysis
of US National In-stitutes of Health-funded trials. Lipsey, M. W., & Wilson, D. B. (1993). The
PLOS ONE, 10(9), e0137864. efficacy of psychological, educational, and
behavioral treatment: Confirmation from meta-
Duval, S., & Tweedie, R. (2000). Trim and fill: analysis. American Psychologist, 48(12), 1181-
A simple funnel-plot-based method of testing and 1209.
adjusting for publication bias in meta-analysis.
Biometrics, 56(2), 455-463. Moher, D., Liberati, A., Tetzlaff, J., Altman,
D. G., & the PRISMA Group. (2009). Preferred
Furukawa, T. A., & Leucht, S. (2011). How to reporting items for systematic reviews and meta-
obtain NNT from Cohen's d: Comparison of two analyses: The PRISMA Statement. PLoS
methods. PLOS ONE, 6(4), el9070. Medicine. 6(7), el 000097.

Higgins, J. P. T., Altman, D. G., Gotzsche, P. Munder, T., Bditsch, 0., Leonhart, R., Gerger,
C., Juni, P., Moher, D., Oxman, A. D., et al. H., & Barth, 1. (2013). Researcher allegiance in
(2011). The Cochrane Collaboration's tool for psychotherapy outcome research: An overview of
assessing risk of bias in randomised trials. British reviews. Clinical Psychology Review, 33(4), 501-
Medical Journal, 343(2),d5928. 511.

Higgins, J. P. T., & Green, S. (Eds.). (2011). Riley, R. D., Lambert, P. C., & Abo-Zaid, G.
Cochrane Handbook for Systematic Reviews of (2010). Meta-analysis of individual participant
Interventions Version 5.1.0 [updated March data: Rationale, conduct, and reporting. British
2011]. Retrieved from www.cochrane- Medical Journal, 340, c221.
handbook.org.
Siddiqui, 0., Hung, H. M. J., & O'Neill, It
Higgins, J. P. T., Thompson, S. G., Decks, J. (2009). MMRM vs. LOCF: A comprehensive
J., & Alt-man, D. G. (2003). Measuring comparison based on simulation study and 25
inconsistency in meta-analyses. British Medical NDA datasets. Journal of Biopharmaceutical
Journal, 327, 557-560. Statistics, 19(2), 227-246.

loannidis, J. P. A., Patsopoulos, N. A., & Trauer, J. M., Qian, M. Y., Doyle, J. S.,
Evangelou, E. (2007). Uncertainty in Rajaratnam, S. M. W., & Cunnington, D. (2015).
heterogeneity estimates in meta-analyses. British Cognitive behavioral therapy for chronic
Medical Journal, 335, 914-916. insomnia: A systematic review and metaanalysis.
Annals of Internal Medicine, 163(3), 191-204.
Laupacis, A., Sackett, D. L., & Roberts, R. S.
(1988). An assessment of clinically useful Turner, E. H., Matthews, A. M., Linardatos, E.,
measures of the consequences of treatment. New Tell, R. A., & Rosenthal, R. (2008). Selective
publication of antidepressant trials and its
influence on apparent efficacy. New England
Journal of Medicine, 358(3), 252-260.

Xia, J., Wright, J., & Adams, C. E. (2008). Five


large Chinese biomedical bibliographic
databases: Accessibility and coverage. Health
Information and Libraries Journal, 25(1), 55-61

S-ar putea să vă placă și