NGUYEN Causal Inference Using Machine Learning PDF

Causal Inference Using Machine Learning:
An Application to Human Rights Treaty Ratification
October 28, 2017
Abstract
Recent advances in machine learning can be leveraged and incorporated into a causality
framework to make robust causal inference in political science. I demonstrate the applicabil-
ity of machine learning-based causal inference to the enduring puzzle of human rights treaty
ratification. The literature remains divided when it comes to explaining why states commit
to international human rights law. Many theories have been proposed only to get empirically
disputed. I address this conundrum in a causal variable importance analysis. Specifically, I
employ Judea Pearl’s structural causal framework and use an ensemble machine learning
method to estimate and compare the causal effect of multiple predictors of state decisions
to become and remain a party to three international human rights treaties on civil and po-
litical rights, women’s rights, and the right not to be tortured. The substantive findings have
important implications for our understanding of the issue of human rights treaty ratification.
1 Introduction
Despite its breathtaking advances and widespread impact, machine learning has rarely been
used in political science research. A major reason could be that political scientists tend to focus
on explaining the causal process rather than making predictions about unobserved instances of
an outcome. This cultural distinction between prediction versus explaining notwithstanding, it
should not prevent us from taking advantage of powerful machine learning methods to make
robust causal inference. I present a causal variable importance analysis that combines machine
learning and a modern causal inference method to address a thorny problem in international
relations—the puzzle of human rights treaty ratification. This template of causal analysis can be
applied to a large variety of research questions in political science to draw causal inference from
observational data. Unlike studies aiming to detect statistical association, the findings of a causal
variable importance analysis should have a causal interpretation and could be used to make
decisions as to which variables are more important to intervene upon and which policy areas
are more effective to change in order to stop torture and state repression, protect human rights,
reduce war duration, and promote economic development, among others. Since almost every
outcome worth examining in social sciences is multicausal, evaluating the relative importance
of its causal predictors could be extremely beneficial.
In the field of international relations, the existing literature remains divided over two
unresolved questions regarding (a) whether and how human rights treaties causally impact
state behavior; and (b) why countries ratify human rights treaties in the first place. In this
paper, I take a novel approach to addressing the second question by investigating the factors
that potentially cause states to ratify three major United Nations (UN) human rights treaties,
including the International Covenant on Civil and Political Rights (ICCPR), the Convention on
the Elimination of All Forms of Discrimination against Women (CEDAW), and the Convention
against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment (CAT). The
goal of my analysis is to estimate and compare the relative causal importance of these factors.
While I do not propose yet another theory as to why states ratify human rights treaties, this
paper nonetheless makes substantive contributions through its innovative methodological appli-
1
cation. First, it subjects multiple existing theories to a different kind of empirical testing that
does not merely rely on a statistical significance test of regression model coefficients. Rather,
my theory-testing constructs a causal model that is more transparent in its causal assumptions
and uses machine learning-based estimation methods that are less dependent on functional form
assumptions. These two features of identification transparency and modeling flexibility are miss-
ing in many current empirical inquiries. Second, my analysis provides new substantive insights
into the causal determinants of treaty ratification. Previous research has analyzed predictors
of state commitment to universal treaties (Lupu 2014). Others have applied machine learning
technique of random forest to examine the predictive associations between various covariates
and state repression (Hill and Jones 2014). My investigation improves upon the former by using
machine learning in lieu of parametric linear regression models and upon the latter by endowing
the findings with a causal interpretation.
Fundamentally, my causal analysis follows Judea Pearl’s philosophy of “define first, iden-
tify second, estimate last” (van der Laan and Rose 2011). I start by examining the literature,
describing the research gaps, and reformulating them within a causal inference framework. A
careful review of existing theories and models of treaty ratification is critical not only to iden-
tify the research problems, but also to provide the substantive foundation upon which I can
then construct my graphical model of the causal process. Any causal analysis either implicitly
assumes or explicitly specifies a graphical causal model that represents the underlying data-
generating process. In an experimental setting, this causal model could be relatively simple. For
an observational study, particularly of a complex problem, a graphical causal model could be
substantially more intricate, but also exponentially more important to explicitly specify because
it encodes the many causal assumptions for identification. I then employ Pearl’s causal infer-
ence method (Pearl 2009) to identify the causal effects of interest and use an ensemble machine
learning technique called Super Learner (Polley and van der Laan 2010) to produce more robust
effect estimates. Finally, I interpret the causal findings in their substantive context.
2
2 Theories and Models of Treaty Ratification
International human rights law is created to protect and promote universal human rights. It es-
tablishes substantive obligations for states parties and designs procedural mechanisms to mon-
itor the implementation of those obligations (De Schutter 2010; Alfredsson et al. 2009; Buer-
genthal 2006). A major global regime is the UN human rights treaty system, which includes
many treaties and their associated monitoring bodies (Keller and Ulfstein 2012; Rodley 2013).
A natural question arises in the literature as to why more and more countries have ratified and
remained committed to human rights treaties that are designed precisely to limit their freedom
in how to treat their own citizens. Figure 1 shows the increasing number of states parties to
three major human rights treaties from 1966 when the ICCPR was opened for ratification until
2013. The question of treaty ratification is a simple, yet vexing, puzzle that scholars have wres-
tled with for a long time. Many theories have been proposed, identifying various explanatory
variables, but any consensus and agreements remain elusive.
Treaty CAT ICCPR CEDAW
175
150
125
100
75
50
25
0
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Figure 1: Numbers of states parties to the ICCPR, the CEDAW, and the CAT from 1966 to 2013. The
three treaties were opened for ratification in 1966, 1979, and 1984, respectively.
First, some scholars believe that international socialization and the pressure of normative
conformity make cause state leaders to realize that treaty ratification is the expected and appro-
priate thing to do (Finnemore and Sikkink 1998). Two studies by Goodliffe and Hawkins (2006)
3
and Hathaway (2007) find correlative evidence to support this argument when they use global
and regional ratification rates as proxies for international socialization. A prominent study that
follows, however, casts doubt on the role of socialization as the driving force behind treaty rat-
ification. Simmons (2009, 90–96) creates a series of variables (measuring regional normative
convergence, socialization opportunities, an index for two different time periods, and informa-
tion environments) that interact with density of regional ratification and argues that regional
ratification rates do not necessarily reflect a normative force as much as a strategic calculation. It
is not immediately clear what causal models that Simmons (2009) assumes would generate the
data and whether and how the effect estimates of those interactive variables could be causally
interpreted.
The second group of explanations focuses on the economic reasons that states voluntarily
commit to universal human rights standards and subject themselves to international monitoring.
According to these explanations, states use ratification as a signaling device to improve their so-
cial standing, expecting to gain material benefits in return, even if they are disingenuous about
treaty compliance. The need for social signaling could be significant given the pressures on
lending institutions, foreign investors, and developed countries to link foreign aid (Lebovic and
Voeten 2006; Spence 2014), international investment (Blanton and Blanton 2007), and pref-
erential trade agreements (Hafner-Burton 2005) to human rights issues in recipient countries.
Participation in international trade in particular has been shown to be a significant predictor of
treaty commitment (Lupu 2014). The transactional rationale of treaty ratification could be even
more pressing for transitional and newly independent countries since they often need external
economic assistance and financial support (Smith-Cannoy 2012, 64–91). This instrumental ar-
gument, however, turns out to have virtually no empirical support according to a critical study
by Nielsen and Simmons (2015). The two authors find no correlation between ratifications of
four major human rights regimes (under the ICCPR and the CAT) and either the amounts of
foreign aid from OECD countries or other measures of tangible and intangible benefits.
Third, the most popular explanations of treaty ratification often identify domestic institu-
tions as the key predictors. An early theory advances what is often referred to as the “lock in”
argument, according to which transitional countries or those facing potential democratic insta-
4
bility tend to join human rights regimes to lock in and consolidate their democratic institutions
(Moravcsik 2000). Although this argument finds some empirical support in another study (Neu-
mayer 2007), there are some dissenting findings as well, indicating that neither new democra-
cies nor unstable, volatile regimes are significant predictors of CAT ratification (Goodliffe and
Hawkins 2006).
Researchers also focus on the interaction of domestic institutions and human rights prac-
tices to explain ratifications (Hathaway 2007). Post-ratification, they argue, states that have
sub-standard human rights protection will likely incur a higher cost of policy adjustment. This
cost, in turn, is more likely to actually materialize if democratic institutions are in place to
constrain state leaders. As a result, a poor human rights record predicts a low probability of
ratification, but only among democracies. Ratification cost may rise as well, depending on the
types of domestic institutions, including constitutional ratification rules, political regimes, and
an independent court system (Simmons 2009, 67–77). Hill (2016a) applies the same logic to
explain how governments selectively make reservations when they ratify human rights treaties
based on their domestic standards and legal institutions. Conversely, autocracies are just as
likely to ratify human rights treaties since their ratifications are usually empty promises that do
not bring any real cost of behavioral change (von Stein 2016). The theoretical expectation is
that, among autocracies, prior human rights practices have little impact on the probability of
treaty ratification.
Generally, it should be noted, states are believed to be less likely to commit to international
treaties if their prior level of compliance is low. This is often known as the selection effect
argument (Downs, Rocke and Barsoom 1996; von Stein 2005; Simmons and Hopkins 2005). In
the literature on international human rights law, however, this selection effect is often treated as
source of potential bias where prior measures of human rights outcome may confound the causal
relationship between human rights treaties and contemporaneous measure of the outcome. The
causal impact of prior human rights practices on treaty ratification is rarely a quantity of interest
to investigate.
For the most part, democracies are also believed to be more likely than autocracies to ratify
human rights treaties (Landman 2005) because of their domestic pressures or an incentive to
5
export rights-respecting norms. Hafner-Burton, Mansfield and Pevehouse (2015) similarly argue
that autocracies are less likely to join human rights regimes that may expose them to a high cost
of compliance. Vreeland (2008) adds an important caveat, however. He agrees that because
dictators are more inclined to use torture to retain power, they are indeed less likely to ratify
the CAT so as to avoid the cost associated with treaty violations. Yet, for dictators that co-
exist with multiple political parties, they have to bear the cost of non-ratification in the form of
pressures from the opposition parties. It turns out, according to Vreeland (2008), dictatorships
with multiple parties are actually more likely to ratify the treaty.
Hollyer and Rosendorff (2011) concur with Vreeland (2008), but they differ with respect
to his reasoning. For repressive leaders, the two authors claim, ratifying the CAT can actually
bring some significant signaling benefits with respect to a particular audience: the domestic
opposition. Opposition groups perceive an authoritarian leader’s act of committing to the CAT
(and then flaunting treaty violations) as a credible signal of her strength. As a result, the oppo-
sition is less likely to mount a challenge, in effect prolonging the survival of the authoritarian
leader. The implication is that autocracies are more likely to ratify costly human rights treaties
not because they concede to pressures from the opposition parties as Vreeland (2008) argues,
but rather because they actively seek ratification to reap its domestic signaling benefits. For
many human rights scholars, this credible commitment argument to explain treaty ratification
among autocratic regimes “has some plausibility problems on its face” (Simmons 2011, 743),
but it has not been disputed empirically. Even Hollyer and Rosendorff (2011) have conducted no
causal tests, pointing instead to the statistical association between CAT ratification and several
different outcomes such as leadership survival, level of government repression, and the extent
of opposition efforts.
To summarize, exactly why states ratify human rights treaties is still unclear. There could
be many reasons and multiple theories, but findings are all over the map and often contradict
each other or go untested causally. Whether they are ideational, instrumental, or institutional,
theories of treaty ratification remain contested and the variation in treaty ratification “has not yet
been fully explored” (Hafner-Burton 2012, 271). As Simmons (2011, 737–744) also observes,
the question of why states ratify international human rights law remains “an enduring puzzle.”
6
My causal variable importance analysis offers a solution to this puzzle by comparing the
causal effect estimates of a large number of theoretically identified predictors of treaty ratifica-
tion across three major human rights treaties. Its novelty is the machine learning-based causal
inference approach that I adopt to address two major limitations in existing empirical inquiries.
First, existing studies mostly rely on regression models that relies on the statistical significance of
ratification predictors. These models almost always make some restrictive parametric assump-
tions such as linearity, normality, and additivity to characterize the shape of the relationships
between treaty ratification and its predictors. Usually no justifications are provided as to why,
for example, a linear functional form or additivity of covariate effects is appropriate or accurate
instead of, for example, exponential, U-shaped, higher-order, or threshold effects. Since we do
not know a priori the underlying data-generating process, and usually it is virtually impossible to
know especially with regard to complex political phenomena, a conveniently specified statistical
model is likely a misspecified one, thus producing unreliable and biased effect estimates.
The second limitation is that virtually every study implies a causal query about the deter-
minants of treaty ratification. Yet, none has openly embraced a causal language and inference
framework within which to formulate and estimate the quantities of interest that correspond to
the research questions. This inattention to causal identification has unfortunate implications.
Scholars usually do not make explicit all their causal assumptions for identification. For ex-
ample, rather than an identification issue, endogeneity is often viewed as a statistical problem
because “there is no agreement on the most appropriate statistical approach” (von Stein 2016,
661). Researchers, as a result, often mistake estimation techniques such as propensity score
matching for an identification strategy (Pearl 2009, 349) and fail to explicitly link causal iden-
tification to transparent causal assumptions. Moreover, researchers often do not employ highly
useful and intuitive causal inference methods such as the backdoor criterion to guide their co-
variate selection and inform their statistical modeling, resorting instead to statistical “fixes” such
as country fixed effects and time trends that, without proper substantive justifications, could be
arbitrary or even counterproductive (Chaudoin, Hays and Hicks 2016).
The following two examples underscore the benefits of embracing a transparent causal
inference framework. In a prominent study of treaty commitment, the researcher fits multiple
7
regression models and successively regresses ratifications of human rights treaties and optional
protocols and provisions on several predictors that are measured contemporaneously, including
democracy, human rights violations, and their interaction term. The regression coefficient for
democracy is then interpreted as indication that “for each point increase in the measure of
Democracy, states with no human rights violations have between 10 and 54 percent increased
chance of ratifying human rights treaties than nondemocratic ones” (Hathaway 2007, 609).
This modeling procedure and interpretation are appropriate for a causal model repre-
sented in Figure 2a where X denotes democracy, Y stands for human rights violations, and
A is ratification. The majority of the literature, however, suggests that it is at least as likely
that democracy contemporaneously influences the extent of human rights violations rather than
the other way around even if it is possible that state repression may impede democratization
or undermine democracy in the next time period. A different causal model in Figure 2b could
be deemed just as, if not more, plausible, in which conditioning on human rights violations Y
would induce a post-treatment bias in estimating the causal effect of democracy X on ratifi-
cation A. The broader point is that whether the causal effect of interest can be identified and
estimated without bias depends intimately on the topology of the causal model and it is unnec-
essarily difficult, if not impossible, to fairly evaluate the causal model’s substantive plausibility
in the absence of an explicit, preferably graphical, representation of the causal model.
Y Y
X A X A
(a) (b)
Figure 2: (a) Simplified causal model inferred from Hathaway (2007) of the effect of X (democracy)
on A (treaty ratification), which is confounded by Y (torture practice); (b) Modified causal model
adapted from Hathaway (2007) of the effect of X (democracy) on A (treaty ratification) both directly
and indirectly through Y , suggesting a potential post-treatment bias in the simplified model.
For a more complicated example, the study by Vreeland (2008) raises the possibility of
omitted variable bias in explaining the positive correlation between CAT ratification and torture
practices in dictatorships. The situation is represented in Figure 3 where the vertices X, Y , and
A respectively denote multiple parties, torture, and CAT ratification. Failing to condition on
8
X in this case would confound the potential (non)relationship between Y and A and explain
why “the more a dictatorship practices torture, the more likely it is to sign and ratify the CAT”
(Vreeland 2008, 68).
Y A
Figure 3: Simplified causal model inferred from Vreeland (2008) of the effect of X (multiple parties)
on both Y (torture) and A (CAT ratification).
Assuming the goal of Vreeland (2008) is to make causal inference, we can infer from his
statistical models various causal models that the author implicitly assumes. Table 1 in Vreeland
(2008, 83) presents multiple regression models that estimate the instantaneous effect of multi-
ple parties on torture among dictatorships. These models are represented in Figure 4a where X
denotes multiple parties, Y denotes torture, and W1 is a set of control variables (gross domestic
product per capita, population, trade/GDP, civil war, and communist regime). I add the node S
in double circle to indicate the sample selection of only dictatorships.1
Vreeland (2008) then proceeds to estimate the instantaneous effect of multiple parties
(X) on CAT ratification (A) among dictatorships (S). His regression models in Table 3 (Vree-
land 2008, 90) assume the causal model in Figure 4b where W2 is a different set of control
variables (communist regime, lagged regional score of CAT ratification, the number of countries
that have ratified the CAT, the percentage of the population that are Muslims, GDP per capita,
population, and the trade/GDP proportion). Vreeland (2008, 89) also controls for “the log of
the Hathaway torture scale.” This is a curious modeling decision, however, since it implies that
Y is a confounding variable that affects both X and A. Thus, it can be seen that between the
1
The original study does not discuss sample selection and its consequences for identification. Here I assume that
sample selection S, which is based on regime type, is dependent on the control variables W1 and W2 . This is not un-
reasonable since democracy arguably depends on economic development, the presence or absence of civil war, trade,
among others. This assumption is also convenient because we can then remove from consideration the consequences
of sample selection in order to focus on the causal relationships between multiple parties and, respectively, torture
and treaty ratification. In other cases, though, as Bareinboim, Tian and Pearl (2014) demonstrate, sample selection
could potentially render the causal effect of X on Y in Figure 4a non-identifiable from the sample data. For example,
insofar as legally organized political parties (treatment X) and torture (outcome Y ) both influence sample selection
S, that is, the use of torture may suppress and undermine democracy (Y → S) while mobilization by opposition
parties promotes democratization (X → S), we will end up with a collider bias X → S ← Y and the causal effect
of X on Y will not be recoverable from the sample data.
9
W1 W2 A
X Y X Y
S S
(a) (b)
Figure 4: (a) Causal model inferred from Vreeland (2008, 83) of the effect of X (multiple parties) on
Y (torture) among S (dictatorships) with control variables W1 ; (b) Causal model inferred from Vree-
land (2008, 90) of the effect of X (multiple parties) on A (CAT ratification) among S (dictatorships)
with control variables W2 . Arrows of opposite directions between X and Y across the two causal
models suggest incoherent assumptions about the causal process.
causal model in Figure 4a (where X → Y ) and the causal model in Figure 4b (where Y → X),
some incoherent assumptions are made with respect to the contemporaneous causal relationship
between multiple parties and torture. If multiple parties only affect torture as assumed in Fig-
ure 4a but not the other way around, then controlling for torture as Vreeland (2008, 90) does
would introduce a post-treatment bias. It might be that X and Y mutually cause each other
instantaneously, but then it would not be possible to identify the causal effect of X (multiple
parties) on either A (CAT ratification) or Y (torture).
It should be emphasized that I remain agnostic at this point as to whether these causal
models accurately depict the true underlying causal process or which specific statistical methods
are used to estimate the causal quantities of interest from observational data. Nevertheless, the
two examples illustrate the critical importance of graphically representing our causal models.
A graphical model would make explicit our assumptions, consistent or otherwise, about the
underlying data-generating process and reveal potential identification problems that may arise.
10
3 Causal Variable Importance Analysis of Treaty Ratification
3.1 Notation and causal model formulation
Traditional variable importance analyses use parametric models to estimate the association be-
tween input variables and an outcome, using a variety of metrics such as regression coefficients
and p-values, model fit, or predictive accuracy. Taking a different approach, I instead formulate
variable importance in terms of their average causal effects. Informally, the causal effect of a
variable is defined as the effect of an intervention to fix, as opposed to observe, that variable.
For a binary variable, the treatment and control values are intuitively clear. For a continuous
variable, I use its observed maximum and minimum values.
In an observational setting, the first step in identifying and estimating causal effects is to
build a non-parametric structural causal model as a set of equations to describe, to the best of
our knowledge, the underlying data-generating process. In my following model, W is a set of
time-invariant covariates; X1 and X2 are either binary or continuous time-varying predictors; Y
is human rights outcome; and A is treaty ratification.2 The subscript t indicates the time periods
during which the variables are measured. Together these equations form a generative system
from which n country–year observations On are sampled and the joint probability distribution
of the observed data is On = (W, X1t , X2t , At , Yt ) ∼ PO .
W = fW (UW )
X1t = fX1 (W, At−1 , Yt−1 , X1t−1 , X2t−1 , UX1 )
X2t = fX2 (W, At−1 , Yt−1 , X1t−1 , X2t−1 , UX2 ) (1)
At = fA (W, At−1 , Yt−1 , X1t , X2t , UA )
Yt = fY (W, Yt−1 , At , X1t , X2t , UY )

2
Quantitative human rights law research mostly focuses on the influence of human rights treaties on state prac-
tices. It therefore often considers treaty ratification as the treatment, the impact of which is to be evaluated. In the
epidemiology and biomedical literature, from which I derive a lot of methodological insights, the treatment is usually
denoted A and the outcome Y . To be consistent with the larger research program on international human rights
law, throughout the paper I use A to denote treaty ratification, which is the outcome in this study. The treatments in
my causal variable importance analysis are ratification predictors denoted X such as {X1, X2}. As annotated and
explained later in my graphical causal model, human rights practice, denoted Y , is actually a potential confounder.
11
A structural causal model is best represented in the form of an acyclic directed graph
(DAG). A causal DAG (Darwiche 2009; Elwert 2013; Pearl 2009) comprises of a set of nodes/ver-
tices denoting random variables. An edge/arrow denotes one variable’s (the parent node) direct
causal influence on another node (the child node). A path in a causal DAG is an arrow or a
sequence of arrows, regardless of their directions, that connects one node to another. A causal
(or directed) path have all arrows on its path point to the same direction. Otherwise, it is a
non-causal path.
My causal DAG in Figure 5 has a dynamic structure that reflects a temporal order with
past nodes in the left shaded block and future nodes in the right shaded block. Each block
represents a single time period. There are no arrows or sequence of arrows going from the
block on the right to the block on the left, meaning that no variable in the future should have
a causal influence on any variable in the past. The DAG is also acyclic in the sense that, within
the same temporal block, there are no loops or directed paths going from a node to itself. I
make no assumptions about any of the functional forms f = {fW , fX1 , fX2 , fA , fY }, which is
consistent with the recognition that usually we do not have enough knowledge to specify the
exact functional forms that characterize the relationships between variables. For the sake of
simplicity and without loss of generality, I construct a causal model with only two time-varying
predictors X1 and X2 over two time periods from t − 1 to t. A larger number of predictors over
a longer time span can be represented in a similar fashion.
As in any causal analyses, we have to make a few assumptions about the underlying causal
process. Similar to Díaz et al. (2015, 6), I assume ratification predictors do not instantaneously
affect each other although they may influence every other predictor of the next time period.
That means, for example, the amount of official development assistance (ODA) and economic
development are conditionally independent from each other in the same time period. ODA
at time t − 1, however, could certainly affect economic development at time t (notationally,
X1t−1 → X2t ). From an identification standpoint, this assumption is necessary because if the
predictors are allowed to mutually cause each other instantaneously, it would render the causal
model cyclical and make it impossible to identify their causal effects.
I further assume the exogenous variables U = {UW , UX1 , UX2 , UA , UY } are jointly inde-
12
pendent. As a result, the values of any node is strictly a function of its parent nodes and some
exogenous factors. This implies that observing a variable’s parent nodes will render that variable
independent from other covariates except for its descendants. For example, treaty ratification At
has as its parent nodes time-invariant covariates W , predictors Xt , human rights practice in the
immediate past Yt−1 , and prior ratification status At−1 . If we observe the set {W, Yt−1 , At−1 , Xt },
then At is conditionally independent from other nodes, including all Xt−1 , except for the de-
scendants of At such as Yt and At+1 .
X1t−1 X1t
At−1 Yt−1 At Yt
X2t−1 X2t
Figure 5: A dynamic graphical causal model with shaded blocks indicating two temporal periods.
Time-invariant covariates W , which precede and potentially affect all other variables, are not repre-
sented. The sufficient adjustment sets to identify the causal effects of X1t → At and X2t → At are
{W, At−1 , Yt−1 , X2t } and {W, At−1 , Yt−1 , X1t }, respectively.
It should be emphasized that, short of a randomization of the treatment like in an exper-
imental design, any observational studies that aim to make causal inference have to make this
exogeneity assumption and the only way to justify it is to rely on the domain knowledge in the
literature (Table 1). In other words, since one cannot know if a model accurately represents
the causal process based on a scrutiny of the observed data alone, it is important that the body
of knowledge in the literature should guide and justify the construction of my causal model as
follows. First, the causal dependence Yt−1 → At is informed by the selection effect argument
13
that a state may make a commitment decision based in part on its prior level of compliance
because they will significantly determine its ratification cost (Downs, Rocke and Barsoom 1996;
von Stein 2005).
Second, I allow for the causal dependencies Xt−1 → Xt and Yt−1 → Yt . This is a routine
assumption in the context of time-series cross-section data structure. Substantively, this assump-
tion also permits the possibility that human rights violations may have some inherent dynamic
that goes beyond contextual factors such as poverty, dictatorship, involvement in conflicts, and
so forth. As Hill and Jones (2014, 674) observes, this argument means that “the governments
can become habituated to the use of violence to resolve political conflict.” I include this causal
relationship, bearing in mind that, in a graphical causal model, an arrow between variables in-
dicates a possible, but not necessarily an actual causal link. A missing arrow, on the other hand,
is equivalent to ruling out any direct causality.
Third, an argument can also be made that human rights practices affect some ratification
predictors in the next time period. An obvious example is that the use of torture and other
extrajudicial measures by the government could intimidate its critics, suppress movements for
democratization, and undermine democracy. The inclusion of the directed arrows Yt−1 → X1t
and Yt−1 → X2t in my causal model is informed by this argument.
Fourth, I similarly speculate a direct causal dependence At−1 → At based on the obser-
vation that once governments ratify an international human rights treaty, they are unlikely to
withdraw from that treaty. It should be noted that in many cases withdrawal is entirely legally
possible. Many human rights treaties and their optional protocols have denunciation provisions
that allow states to exit from these institutions, including Article 31 of the CAT, Article 12 of
the First Optional Protocol to the ICCPR, and Article 19 of the Optional Protocol to the CEDAW.
This is not the case with the ICCPR and the CEDAW, which do not have a denunciation clause
or provision. That, however, has not prevented some states from denouncing and attempting
to withdraw from the ICCPR (Tyagi 2009). I therefore code treaty membership as an implicit
annual ratification as opposed to a terminal event. This is also consistent with conventional
modeling practices in the literature that estimates the impact of human rights treaty ratification
as a time-varying treatment.
14
Finally, the causal dependencies At−1 → X1t and At−1 → X2t suggest that we leave open
the possibility that a human rights treaty, once ratified, could influence state behavior in the next
time period through a variety of mediators such as public opinion and electoral accountability in
democracies (Dai 2005; Wallace 2013), legislative constraints of the executive by the opposition
parties (Lupu 2015), and judicial effectiveness of the domestic court system (Crabtree and Fariss
2015; Powell and Staton 2009).
Table 1 lists the model variables and data sources for their measurements. It also refers
to studies in the literature that similarly classify or assume these variables as time-invariant
covariates, confounders, and ratification predictors. For example, if a study that investigates the
impact of a human rights treaty on state practice includes democracy and independent judiciary
as time-varying control variables in its statistical models, we can infer that study views these two
covariates as ratification predictors. Appendix A provides more detailed variable descriptions,
coding, and data sources.
Given the causal model and its encoded assumptions, I formulate the causal importance
of a predictor in terms of its contemporaneous average causal effect on treaty ratification. It is

denoted by τ = E At |do(X1t = 1) − E At |do(X1t = 0) where the do-operator is notation for
an active intervention to fix the value of X1. In the interventional framework of causal inference
(Pearl 2009), that means we would intervene on the generative system (Equation set 1) to fix
the equation X1t = fX1 (W, At−1 , X1t−1 , X2t−1 , UX1 ) reiteratively at X1t = {0, 1}. From the
two resulting modified generative systems At = fA (W, At−1 , Yt−1 , x, X2t , UA ) for x = {0, 1}, we
then compute the difference between the two mean values of treaty ratification, which will be a
consistent estimate of the causal effect of X1 as long as causal identification is established.
3.2 Causal identification
Causal identification involves establishing the conditions under which a property of an interven-

tional distribution such as the expectation E A|do(X = x) can be computed without bias from
an observational probability distribution. My causal identification strategy is to identify a valid
adjustment set of observed variables that makes the interventional distribution of the outcome
A (treaty ratification) essentially equivalent to its observed conditional distribution.
15
Table 1: Model variables
Sets Variables and references

Ratification rules (Simmons 2009) measured by Simmons (2009).
W Domestic legal traditions (Mitchell, Ring and Spellman 2013)
measured by La Porta, Lopez-de Silanes and Shleifer (2008).
ICCPR proportion of ratification globally (Goodliffe and Hawkins 2006; Hathaway 2007)
measured by Office of High Commissioner for Human Rights (OHCHR).
CEDAW proportion of ratification globally (Goodliffe and Hawkins 2006; Hathaway 2007)
measured by OHCHR.
CAT proportion of ratification globally (Goodliffe and Hawkins 2006; Hathaway 2007)
measured by OHCHR.
ICCPR proportion of ratification regionally (Goodliffe and Hawkins 2006; Hathaway 2007)
measured by OHCHR.
CEDAW proportion of ratification regionally (Goodliffe and Hawkins 2006; Hathaway 2007)
measured by OHCHR.
CAT proportion of ratification regionally (Goodliffe and Hawkins 2006; Hathaway 2007)
measured by OHCHR.
Democracy/dictatorship classification
(Hathaway 2007; Chapman and Chaudoin 2013; Neumayer 2007)
measured by Cheibub, Gandhi and Vreeland (2010).
X Multiple parties (Vreeland 2008; Hollyer and Rosendorff 2011)
Transition to/from democracy (Goodliffe and Hawkins 2006; Moravcsik 2000)
Involvement in militarized interstate dispute (Chapman and Chaudoin 2013)
measured by Melander, Pettersson and Themnér (2016) and Gleditsch et al. (2002).
Judicial independence (Powell and Staton 2009) measured by
Linzer and Staton (2015).
Population size (Hafner-Burton and Tsutsui 2007)
measured by the World Bank Indicators.
Gross domestic product (GDP) per capita (Hafner-Burton and Tsutsui 2007)
Participation in international trade (Hafner-Burton 2013)
measured as trade volume/GDP by the World Bank Indicators.
Net official development assistance (Nielsen and Simmons 2015)
CIRI torture index (Cingranelli, Richards and Clay 2013).
Y CIRI women’s political rights index (Cingranelli, Richards and Clay 2013).
Human rights dynamic latent score (Fariss 2014).
ICCPR ratification measured by OHCHR.
A CEDAW ratification measured by OHCHR.
CAT ratification measured by OHCHR.
Any causal identification in the setting of observational data ultimately depends on the un-
derlying causal structure, which is best represented by a causal DAG. DAGs are thus an effective
tool to make all causal assumptions transparent and facilitate a clear and easy determination of
16
sufficient adjustment sets using the backdoor criterion. To illustrate identification of the causal
effect of X1t on At , for example, I apply the following backdoor criterion (Pearl, Glymour and
Jewell 2016, 61–66) to find an adjustment set of variables such that conditioning on that set
will:
(a) block any (non-causal) paths from X1t to At that have an arrow coming into X1t ;
(b) leave open all causal paths from X1t to At ; and
(c) not condition on a collider (a node that lies on any paths between X1t and At and has
two arrows coming into it) or a descendant of a collider (a node connected to a collider
through a directed path emanating from the collider).
When we condition on an adjustment set that satisfies the backdoor criterion, we essen-
tially remove all non-causal pathways from X1t to At and render these two variables condition-
ally independent or d-separated and, as a result, the interventional distribution of the outcome
A when X1 is intervened upon is essentially equivalent to its observational distribution. More
generally, when all non-causal paths between a predictor and the outcome are closed off, any
remaining significant correlation between them is evidence of a causal relationship.
From the graphical causal model in Figure 5, I derive a sufficient set of covariates for
adjustment Z1 = {W, At−1 , Yt−1 , X2t } that satisfies the backdoor requirement to identify the
causal effect of X1t on At . Specifically, conditioning on Yt−1 will, according to rule (a), block
five non-causal paths from X1t to At , including (i) X1t ← At−1 → Yt−1 → At ; (ii) X1t ←
X1t−1 → At−1 → Yt−1 → At ; (iii) X1t ← Yt−1 → At ; (iv) X1t ← Yt−1 → X2t → At ;
and (v) X1t ← X2t−1 → Yt−1 → At . Similarly, conditioning on At−1 will, by the same rule,
block two other non-causal paths from X1t to At , including (i) X1t ← At−1 → At and (ii)
X1t ← At−1 → X2t → At .
However, Yt−1 is also a collider on the path X1t ← X1t−1 → Yt−1 ← X2t−1 → X2t →
At . Conditioning on Yt−1 will therefore open that non-causal path and violate rule (b) of the
backdoor requirement. I therefore further condition on X2t to block this non-causal path. For
the same reason that I have accidentally opened the non-causal path X1t ← X1t−1 → At−1 ←
X2t−1 → X2t → At when conditioning on the collider At−1 , I block this path by conditioning
17
on X2t . Conditioning on X2t also happens to block three other non-causal paths that traverse
through X2t , including (i) X1t ← X2t−1 → X2t → At ; (ii) X1t ← At−1 → X2t → At ; and
(iii) X1t ← X2t−1 → At−1 → X2t → At . The latter two of these three non-causal paths run
through At−1 as well and therefore are already blocked when we condition on At−1 .
We should not condition on contemporaneous measure of human rights practice Yt when
estimating the causal effect of X1t , however. Since it is a collider on the path X1t → Yt ← At ,
conditioning on Yt would violate rule (c) of the backdoor criterion, introducing a non-causal
association between X1t and At and biasing the causal effect estimate of X1t . For identification
of the causal effect of X2t on At , I apply the same rules and similarly derive a sufficient adjust-
ment set Z2 = {W, At−1 , Yt−1 , X1t }. In summary, to identify the contemporaneous causal effect
of a ratification predictor, I condition on time-invariant covariates, immediately prior ratification
status and level of compliance, and other contemporary time-varying covariates.
In addition to a causal variable importance analysis, I use the same graphical causal model
to develop a causal test of many theories of CAT ratification. First, I test the argument by Hath-
away (2007) that democracy (X1t ) and torture practices (Yt ) interact to lower the probability
of CAT ratification (At ). Based on the causal DAG in Figure 5, one should not condition on Yt or,
for that matter, use an interaction term of Yt and X1t while estimating the effect of X1t on At .
Since Yt is a collider on two different paths X1t → Yt ← At and X1t → Yt ← Yt−1 → At , con-
ditioning on Yt will induce a collider bias. I instead causally test this interactive effect argument
by estimating the Yt−1 -specific effect of X1t on At , using the adjustment set Z = {W, At−1 , X2t }
that satisfies the backdoor requirement within each subset of observations based on the values
of Yt−1 (Pearl, Glymour and Jewell 2016, 71–72). The test results will provide evidence as to
whether there is any effect modification by past torture practice, that is, whether the effect of
democracy on treaty ratification varies across levels of compliance in the previous year. The con-
ventional expectation is that the positive causal effect of democracy on treaty ratification will
diminish and eventually reverse its direction as the level of torture in the prior year increases.
Note that we cannot identify the X1t -specific causal effect of Yt−1 on At because of potential
post-treatment bias since X1t could be a descendant of Yt−1 along the path Yt−1 → X1t → At if
the use of torture possibly undermines democratic institutions.
18
Second, I test Vreeland’s omitted variable bias argument by directly estimating the causal
effect of multiple political parties (X2) on CAT ratification (A) among dictatorships (X1 =
0). The quantity of interest corresponding to the test is formulated as the X1t -specific causal
effect of X2t on At , that is, the causal effect of multiple parties on treaty ratification among
observations with the value X1t = 0. The sufficient adjustment set for identification is Z =
{W, At−1 , Yt−1 , X1t }. As Vreeland (2008, 79) predicts, “the effect of the multiparty institution
is to make a dictatorship more likely to enter into the CAT,” implying a positive causal effect of
multiple parties.
Third, I estimate the average causal effect of prior torture practice on CAT ratification
(Yt−1 → At ) in a causal test of the selection effect argument. This argument is often made
but has rarely been empirically quantified within a causal inference framework. The theoretical
expectation is a negative causal effect of Yt−1 , suggesting that higher level of torture in the
previous year is expected to cause state leaders to be less likely to ratify the CAT in the following
year. A sufficient adjustment set I derive for identification is Z = {W, At−1 , X1t−1 , X2t−1 }.
Finally, I also test the argument with respect to the signaling benefits of CAT ratification
for dictators (Hollyer and Rosendorff 2011) by estimating the causal effect of torture on CAT
ratification among autocracies, that is, the X1t−1 -specific causal effect of Yt−1 on At . The the-
oretical expectation is that “authoritarian governments that torture heavily are more likely to
sign the treaty than those that torture less” (Hollyer and Rosendorff 2011, 276), which implies
a positive effect of Yt−1 among observations that have the value X1t−1 = 0. A sufficient set that
satisfies the backdoor criterion for causal effect identification is Z = {W, At−1 , X2t−1 }.
3.3 Machine learning-based estimation
Once we have determined the sufficient adjustment sets Z that satisfy the backdoor requirement
for identification of various causal effects, I adopt two machine learning-based methods for
causal effect estimation: substitution estimation and targeted maximum likelihood estimation
(TMLE). My estimation methods are analogous to the OLS estimator if the underlying causal
system in Equation set 1 is assumed to be linear and all covariate effects are additive and all the
noise terms U are Gaussian. The use of machine learning is aimed to relax this assumption.
19
For each of the continuous predictors of treaty ratification (global proportion of ratifica-
tion, regional proportions of ratification, population size, GDP per capita, trade/GDP propor-
tion, net amount of ODA, and judicial independence) the substitution estimator (Robins 1986;
1 Pn
Robins, Greenland and Hu 1999) computes τ̂ = n i=1 Qn (1, Z) − Qn (0, Z) as an estimate of

its average causal effect τ = E A|do(X = 1) − E A|do(X = 0) . Specifically, I fit a prediction

model Q̄n (X, Z) = E A|X, Z of treaty ratification A using X and the corresponding sufficient
adjustment set Z. I then reiteratively substitute the predictor values with X = 1 (empirically
maximum value) and X = 0 (empirically minimum value) for each observation, generate the
counterfactual outcomes, and compute the mean difference.
For variance estimation, I use the nonparametric bootstrap method. In the presence of
missing data, my procedure is similar to Daniel et al. (2011, 491) and suggested by Tsiatis
(2007, 362–371). I combine bootstrap with single stochastic imputation rather than multiple
imputation in order to make efficient and still valid inference. In addition to its greater efficiency,
another benefit of combining nonparametric bootstrap and single (improper) imputation is that
we do not have to rely on the normality assumption as required by the Rubin’s approach (Little
and Rubin 2014) when pooling variances across imputed datasets. Instead, I create distribution-
free confidence intervals, using the 2.5% and 97.5% quantiles of the bootstrap distribution to
obtain the desired coverage.
The key to obtaining consistent effect estimates with a substitution estimator is to fit a cor-
rectly specified outcome model Q̄n that approximates the (unknown) data generating mecha-
nism. The standard practice is to assume a binomial distribution for the binary outcome of treaty
ratification and then model a property of the outcome distribution as a linear, additive function
of a set of covariates, sometimes with an interaction term included. If these distributional and
functional form assumptions are wrong, which they likely are for probably non-linear, highly
complex political phenomena, the results will be misspecified models, biased effect estimates,
invalid inference, and misleading conclusions. The ensemble machine learning technique Super
Learner (van der Laan, Polley and Hubbard 2007; Sinisi et al. 2007) offers a powerful solution
to this problem of correct functional forms.
Super Learner has been used in economics (Kreif et al. 2015), political science (Samii,
20
Paler and Daly 2016), and epidemiology and biomedical research (Neugebauer et al. 2013;
Pirracchio, Petersen and van der Laan 2015). It stacks a user-selected library of predictive
algorithms and uses cross-validation to evaluate the performance of each algorithm in minimiz-
ing a specified loss function. For the binary outcome of treaty ratification, an appropriate loss
h 1−A i
function is the negative log-likelihood −log Q(X, Z)A 1 − Q(X, Z) , which measures the
degree of misfit with the observed data. User-selected predictive algorithms can include simple
main-term linear regression model, semi-parametric generalized additive model (Hastie and Tib-
shirani 1990), regularized regression models (Tibshirani 1996), and non-parametric tree-based
ensemble methods such as boosting (Friedman 2001) and random forest (Breiman 2001). Ta-
ble 2 lists the algorithms I use for my machine learning-based substitution estimation given the
constraints in terms of computational resources.
Table 2: Algorithms used in Super Learning-based Substitution Estimation
Algorithm Description
Pp
GLMnet Regularized logistic regression with lasso penalty j=1 |βj |.
GAM Generalized additive model.
(Tuned) XGBoost Extreme gradient boosting (eta = 0.01, depth = 4, ntree = 500).
The use of cross-validation is crucial for the algorithms to generalize well in terms of pre-
dicting unknown outcome values and avoiding overfitting. Super Learner then creates a linear
combination of these algorithms, each of which is weighted by its average predictive accuracy,
to build a hybrid prediction function that performs approximately as well as and usually better
than the best algorithm in the library. The ability of Super Learner to assemble a rich, diverse set
of algorithms makes it particularly effective and much more likely to approximate the underlying
data generating process (Polley and van der Laan 2010).
One favorite, state-of-the-art algorithm is extreme gradient boosting (Chen and He 2015;
Chen and Guestrin 2016), a faster implementation of the very popular and effective machine
learning technique of gradient boosting machine (Friedman 2001; Schapire and Freund 2012;
Natekin and Knoll 2013). Extreme gradient boosting (XGBoost) is non-parametric and its tree-
based nature allows it to capture non-linear, interactive dynamics among a large number of
predictors. Furthermore, unlike other tree-based methods such as random forest and gradient
21
boosting machine, XGBoost has greater computational efficiency, which makes it particularly
suitable to use in the context of nonparametric bootstrap for inference.
The performance of XGBoost could be sensitive to hyper-parameter setting. I employ a
combination of cross-validation and grid search to select the best among a large number of
configurations (comprising of varying learning rates, tree depths, and numbers of trees) that are
tuned specifically to each of the three singly imputed ICCPR, CEDAW, and CAT datasets.
XGB_500_4_0.01_All XGB_500_4_0.01_All
Discrete SL Discrete SL
Super Learner Super Learner
XGB_500_5_0.01_All XGB_500_5_0.01_All
XGB_1000_4_0.01_All XGB_1000_4_0.01_All
XGB_500_6_0.01_All XGB_200_4_0.05_All
XGB_200_4_0.05_All XGB_500_7_0.01_All
XGB_1000_5_0.01_All XGB_500_6_0.01_All
XGB_500_7_0.01_All XGB_200_4_0.01_All
XGB_200_5_0.05_All XGB_1000_5_0.01_All
XGB_1000_6_0.01_All XGB_200_5_0.05_All
XGB_200_6_0.05_All XGB_200_5_0.01_All
XGB_1000_7_0.01_All XGB_1000_7_0.01_All
XGB_200_7_0.05_All XGB_200_7_0.05_All
XGB_200_5_0.1_All XGB_200_6_0.05_All
XGB_200_4_0.1_All XGB_200_6_0.01_All
XGB_500_4_0.05_All XGB_1000_6_0.01_All
XGB_500_5_0.05_All XGB_200_4_0.1_All
XGB_200_6_0.1_All XGB_200_7_0.01_All
XGB_200_4_0.01_All XGB_500_4_0.05_All
XGB_500_6_0.05_All XGB_200_5_0.1_All
XGB_200_5_0.01_All XGB_200_6_0.1_All
XGB_200_7_0.1_All XGB_200_7_0.1_All
XGB_500_7_0.05_All XGB_500_6_0.05_All
Method
Method
XGB_200_6_0.01_All XGB_500_5_0.05_All
XGB_500_4_0.1_All XGB_200_4_0.2_All
XGB_200_5_0.2_All XGB_500_7_0.05_All
XGB_200_4_0.2_All XGB_1000_4_0.05_All
XGB_200_7_0.01_All XGB_500_4_0.1_All
XGB_500_5_0.1_All XGB_200_5_0.2_All
XGB_1000_4_0.05_All XGB_200_6_0.2_All
XGB_200_6_0.2_All XGB_200_7_0.2_All
XGB_500_6_0.1_All XGB_500_6_0.1_All
XGB_1000_5_0.05_All XGB_1000_7_0.05_All
XGB_200_7_0.2_All XGB_1000_6_0.05_All
XGB_1000_6_0.05_All XGB_500_5_0.1_All
XGB_1000_7_0.05_All XGB_500_7_0.1_All
XGB_500_7_0.1_All XGB_1000_5_0.05_All
XGB_1000_4_0.1_All XGB_500_4_0.2_All
XGB_1000_5_0.1_All XGB_1000_7_0.1_All
XGB_1000_6_0.1_All XGB_1000_6_0.1_All
XGB_500_5_0.2_All XGB_1000_4_0.1_All
XGB_500_4_0.2_All XGB_500_7_0.2_All
XGB_1000_7_0.1_All XGB_500_5_0.2_All
XGB_500_6_0.2_All XGB_500_6_0.2_All
XGB_500_7_0.2_All XGB_1000_5_0.1_All
XGB_1000_5_0.2_All XGB_1000_7_0.2_All
XGB_1000_6_0.2_All XGB_1000_4_0.2_All
XGB_1000_4_0.2_All XGB_1000_5_0.2_All
XGB_1000_7_0.2_All XGB_1000_6_0.2_All
0.020 0.025 0.030 0.035 0.02 0.03 0.04 0.05
V−fold CV Risk Estimate V−fold CV Risk Estimate
(a) (b)
XGB_500_4_0.01_All
Discrete SL
Super Learner
XGB_500_5_0.01_All
XGB_500_6_0.01_All
XGB_1000_4_0.01_All
XGB_200_4_0.05_All
XGB_500_7_0.01_All
XGB_1000_5_0.01_All
XGB_200_5_0.05_All
XGB_1000_6_0.01_All
XGB_200_6_0.05_All
XGB_200_7_0.05_All
XGB_200_4_0.1_All
XGB_200_4_0.01_All
XGB_1000_7_0.01_All
XGB_200_5_0.1_All
XGB_200_5_0.01_All
XGB_500_4_0.05_All
XGB_200_7_0.1_All
XGB_500_5_0.05_All
XGB_200_6_0.01_All
XGB_200_6_0.1_All
XGB_500_6_0.05_All
Method
XGB_200_7_0.01_All
XGB_500_7_0.05_All
XGB_500_4_0.1_All
XGB_200_5_0.2_All
XGB_1000_5_0.05_All
XGB_1000_4_0.05_All
XGB_500_5_0.1_All
XGB_1000_6_0.05_All
XGB_500_7_0.1_All
XGB_200_7_0.2_All
XGB_200_4_0.2_All
XGB_500_6_0.1_All
XGB_200_6_0.2_All
XGB_1000_7_0.05_All
XGB_1000_4_0.1_All
XGB_1000_5_0.1_All
XGB_1000_7_0.1_All
XGB_500_7_0.2_All
XGB_500_5_0.2_All
XGB_1000_6_0.1_All
XGB_500_6_0.2_All
XGB_500_4_0.2_All
XGB_1000_7_0.2_All
XGB_1000_6_0.2_All
XGB_1000_5_0.2_All
XGB_1000_4_0.2_All
0.03 0.04
V−fold CV Risk Estimate
(c)
Figure 6: Cross-validated risk of XGB algorithms in predicting (a) ICCPR ratification, (b) CEDAW
ratification, and (c) CAT ratification.
22
To estimate the causal effect of binary predictors (democracy, multiple political parties,
democratic transition, and involvement in militarized interstate disputes), I use targeted maxi-
mum likelihood estimation (van der Laan and Rose 2011). Similar to the substitution estima-
tor, TMLE also starts by fitting an initial predictive outcome model of treaty ratification Q0n =
E(A|X, Z). It then modifies the initial model Q0n (X, Z) into an updated model Q1n (X, Z), using
the modifying equation logit(Q1n ) = logit(Q0n ) + n Hn where the “clever covariate” Hn (X, Z) =
h i
I(X=1) I(X=0)
gn (X=1|Z) − gn (X=0|Z) is a function of the treatment mechanism gn = E(X|Z) and the coeffi-
cient n is obtained via a separate regression model logit(A) = logit(Q0n ) + n Hn . In the third
and final step, TMLE similarly substitutes two distinct values of a binary predictor, plugs them
into the updated outcome model Q1n (X, Z) to generate the counterfactual outcomes for each ob-
servation, and computes the average causal effect as the mean difference of the counterfactual
outcome values.
TMLE is essentially the substitution estimator but with an additional updating step in be-
tween to incorporate information about treatment assignment. This updating step is at the heart
of the TMLE methodology. It makes the estimator doubly robust by reducing any remaining bias
in the initial outcome model, producing unbiased estimates if either the initial outcome model
Q0n or the treatment assignment model gn is consistent. It is maximally efficient asymptotically
if both Q0n and gn are consistent. Note that both Q0n and gn are already more robust to mis-
specification, and thus more likely to be consistent than standard parametric statistical models,
because I have incorporated machine learning in my estimation.
In short, the TMLE methodology computes causal effect estimates of binary treatment
variables that are more robust than both parametric regression models and propensity score-
based estimators. Machine learning-based TMLE is even more robust and less computationally
expensive than the machine learning-based substitution estimator with bootstrapped samples
thanks to its efficient influence function-based approach to variance estimation (van der Laan
and Rose 2011, 94–97). Because of TMLE’s greater computational efficiency, I am able to employ
a more diverse and richer set of learning algorithms in Table 3.
To handle missing data when estimating the causal effect of binary predictors, I conduct
multiple imputation, using the Amelia II program (Honaker et al. 2011), and combine estimates
23
across m = 5 imputed data sets. Appendix B provides the summary statistics of the observed
data and Appendix C summarizes the imputation process. The ICCPR, the CEDAW, and the
CAT were opened for ratification at different times. I thus create three separate datasets (and,
correspondingly, 15 imputed datasets) that have different temporal coverage periods, including
1967–2013 for the ICCPR (opened for ratification in 16 December 1966), 1982–2013 for the
CEDAW (adopted and opened for ratification in 18 December 1979, but the CIRI measure of
women’s political rights only begin in 1981), and 1985–2013 for the CAT (opened for ratifi-
cation in 10 December 1984). For algorithmic learning stability and ease of interpretation, I
standardize all continuous covariates into a bounded range between zero and one.
Table 3: Algorithms used in Super Learner-based Targeted Maximum Likelihood Estimation
Algorithm Description
Pp
GLMnet Regularized logistic regression with lasso penalty j=1 |βj |.
GAM Generalized additive model (degree of polynomials = 2).
polymars Polynomial multivariate adaptive regression with splines.
randomForest Random Forest (ntree = 1,000).
XGBoost Extreme gradient boosting (eta = 0.01, depth = 4, ntree = 500).
3.4 Results and interpretation
Table 4 reports the estimates of the contemporaneous average causal effects of the ratification
predictors. Despite some differences, their causal effect estimates are relatively consistent across
three human rights treaties. First, the results underscore the importance of regional socialization
and norm diffusion in causing states to ratify human rights treaties. Going from the observed
lowest proportion to the observed highest proportion of regional ratifications will increase a
country’s probability of becoming and remaining a state party by somewhere between 7.2%
to 9.5%, depending on the treaties. Density of regional ratification is, in fact, the single most
causally consistent and the second most causally important predictor of treaty ratification across
all three human rights treaties.
Second, similar to other studies in the literature (Landman 2005), my findings further
confirm that democracy is a significant predictor of treaty ratification. In fact, I find that democ-
racy is the most causally important variable for the ratification of the ICCPR and the CEDAW.
24
Table 4: Causal effect point estimates and 95% CI of predictors on treaty ratification
Predictors ICCPR CEDAW CAT

Super Learner-based Targeted Maximum Likelihood Estimator
Influence function-based CI with multiple imputation
Democracy 0.237 0.116 0.093
[0.121, 0.353] [0.064, 0.168] [−0.065, 0.251]
Multiple parties 0.153 0.197 0.192
[−0.063, 0.370] [−0.114, 0.508] [0.040, 0.344]
Democratic transition 0.186 0.091 −0.013
[−0.080, 0.451] [−0.046, 0.227] [−0.144, 0.118]
Involvement in militarized −0.004 −0.002 −0.010
interstate disputes [−0.015, 0.007] [−0.017, 0.013] [−0.023, 0.004]
Super Learner-based Substitution Estimator
Bootstrap (B = 500) quantile-based CI with single stochastic imputation
Global proportion of ratification −0.011 −0.011 −0.019
[−0.032, 0.000] [−0.025, 0.000] [−0.042, 0.002]
Regional proportions of ratification 0.095 0.072 0.094
[0.039, 0.190] [0.034, 0.155] [0.033, 0.241]
Population size 0.009 0.025 0.028
[−0.004, 0.027] [0.001, 0.087] [0.005, 0.056]
GDP per capita −0.003 −0.017 0.037
[−0.020, 0.011] [−0.043, −0.001] [−0.007, 0.121]
Trade/GDP −0.002 0.007 0.003
[−0.015 , 0.011] [−0.010, 0.032] [−0.014, 0.016]
Net official development assistance 0.014 0.003 0.004
[−0.010, 0.043] [−0.025, 0.019] [−0.027, 0.025]
Judicial independence −0.005 0.029 0.024
[−0.031, 0.014] [0.004, 0.094] [−0.008, 0.108]
Number of countries 192 192 192
Number of years 47 32 29
Number of observations 7,870 5,823 5,354
Being a democracy causes the probability of being a state party to these two treaties to go up
by 23.7% and 11.6%, respectively. Democracy is being defined here as having direct election
of the executive, election of the legislature, and an alternation of power, among other criteria
(Cheibub, Gandhi and Vreeland 2010). The coding criteria for democracy, in other words, are
unlikely to overlap conceptually with various measures of human rights outcomes (Hill 2016b;
von Stein 2016). By implications, my findings suggest that the best way to push a state to
25
ratify and remain committed to human rights treaties is to support its domestic democratic in-
stitutions and promote ratifications by its regional neighbors. In the case of CAT ratification, it
should be cautioned, it is not democracy per se that has a significant causal impact. Rather, it
is the existence of de facto multiple political parties that increases the probability of ratification
by 19.2%.
Third, as to other predictors, their causal importance is either very limited or inconsistent.
Like Goodliffe and Hawkins (2006), I find that democratic transition does not significantly affect
ratification of any treaties, indicating a lack of empirical support for the “lock in” argument. In-
volving in militarized interstate disputes is not causally important, either. My findings also share
the skepticism by Nielsen and Simmons (2015) with respect to many economic variables such as
economic development, the amount of ODA received, and participation in international trade.
These variables do not seem to matter causally for human rights treaty ratification. Population
size tends to have a significantly positive, but substantively very small, causal impact, averaging
about 2% across three treaties. Independence of the judiciary makes states slightly more likely
to ratify the CEDAW, but otherwise has no impact on the ratification of the ICCPR and the CAT.
I employ the same template of causal analysis, including graphical identification and ma-
chine learning-based TMLE estimation, to test many theories of CAT ratification. The results
reported in Table 5 offer several interesting findings. First, I find scant evidence to support
the commonly accepted argument regarding the interactive effect of democratic institutions and
human rights practice on CAT ratification (Hathaway 2007). Instead, my findings suggest that,
irrespective of a state’s torture practice in the year prior, changing the regime type from a dicta-
torship to a democracy does not lower the probability of its CAT ratification status. If anything,
being a democracy causes an increase, not a decrease, by 8.2% in the chance of becoming and
remaining a state party to the CAT even at the highest level of torture practice during the previ-
ous year, although this estimate is certainly not statistically significant.
One speculative reason could be that the executives in non-compliant democracies do
want to ratify and comply because torture practices in the past were more a legacy of an abusive
government agency. Such executives, perhaps under the pressures of the democratic public,
could have an incentive to ratify the CAT and even use treaty obligations as a way to constrain
26
domestic abusive forces. In any event, these causal tests partially challenge the conventional
wisdom that poorly performing democracies are reluctant to become a treaty member because
their democratic institutions will make subsequent compliance very costly. Nevertheless, there is
some evidence, though not extremely solid, that being a democracy does increase the probability
of becoming a state party to the CAT by 14% among those countries that did not practice torture
at all—a significantly greater effect than among those that engaged in torture in the immediate
past.
Table 5: CAT ratification theories and causal effect point estimates and 95% CI
Theory tested Notation Mean SE Lower Upper

Interactive effect argument
Democracy w/ No Torture X1t → At at Yt−1 = 2 0.140 0.075 −0.007 0.287
Democracy w/ Occasion Torture X1t → At at Yt−1 = 1 0.056 0.047 −0.037 0.148
Democracy w/ Freq. Torture X1t → At at Yt−1 = 0 0.082 0.071 −0.056 0.221
Omitted variable bias argument
Multiple parties in Dictatorships X2t → At at X1t = 0 0.050 0.043 −0.034 0.134
Selection effect argument
Torture in All Yt−1 → At 0.116 0.044 0.029 0.202
Torture in Democracies Yt−1 → At at X1t−1 = 1 −0.018 0.012 −0.042 0.005
Credible commitment argument
Torture in Dictatorships Yt−1 → At at X1t−1 = 0 0.201 0.125 −0.043 0.445
Second, as indicated previously, the kind of domestic institutions that significantly improve
the probability of a country being a CAT member is not democracy in general, but rather the
presence of de facto multiple political parties. However, contrary to Vreeland (2008), multiple
political parties existing under authoritarian regimes do not seem to have a significant causal
impact on treaty ratification. This raises an interesting puzzle, which is that marginally multiple
political parties seem to be a causally important variable, but its regime type-specific effects
can vary significantly. This also suggests for further inquiries into the potentially heterogeneous
causal effects by different components within the definition of democracy.
Third, I rescale and dichotomize the CIRI torture index (with zero indicating no torture
and one indicating occasional or frequent torture) and test the selection effect argument by
directly estimating the causal impact of torture practices on CAT ratification in the following
time period. States that engage in occasional or even frequent torture practices are actually
11.4% more likely than those engaging in no torture at all to be a state party to the CAT in
27
the following year. In other words, this is evidence of an adverse selection effect. Governments
whose prior human rights practices do not conform to international standards tend to self-select
into, not away from, the CAT.
For a closer look at this surprising finding about the adverse selection effect, I further
disaggregate the sample observations into democracies and dictatorships based on their regime
classification during the time period when their human rights practices are recorded so as not
to introduce a post-treatment bias. It turns out that among democracies, engaging in torture
practices would cause only a small 1.8% decrease in their chance of being a CAT member the
following year. This comports with my previous findings that democracy and rights practices do
not significantly interact to determine CAT ratification.
Among dictatorships, though, the estimates are highly variable and uncertain. The point
estimate suggests that authoritarian regimes that practice torture are, on average, 20% more
likely to ratify the CAT the following year, which seems to support a claim in the literature
that “[t]he empirical record has shown fairly consistently that among non-democracies, the less
compliant are as likely (and in some cases even more likely) to ratify” (von Stein 2016, 661).
However, the high variability of causal effect estimates mean that we do not find solid empiri-
cal support for the counterintuitive claim by Hollyer and Rosendorff (2011) that authoritarian
leaders may be signaling their strength to opposition groups by way of a CAT ratification. In
short, my causal effect estimation indicates that prior torture practices do not significantly make
CAT ratification more likely even though it points to a potential existence of an adverse selec-
tion effect. This, by implication, reiterates the need to take into account prior rights practices
if one wants to single out and estimate the causal impact of CAT ratification on human rights
practices. Otherwise, the causal effect of the CAT would be biased downward towards zero or
even negative and CAT ratification would likely appear to exacerbate human rights violations.
4 Conclusion
Machine learning in many respects has outpaced statistical theory in terms of modeling reality
(Efron and Hastie 2016). Political scientists could leverage these powerful methods in service
28
of the goal of making causal inference about political behavior and institutions. Embedding
machine learning within a causal inference framework is an effective way to increase model
flexibility while circumventing the inherent issue of model interpretability in machine learning.
One area of application is causal variable importance analysis. As demonstrated in recent
research in public health and biomedical studies (Díaz et al. 2015; Hubbard et al. 2013; Pir-
racchio et al. 2016; Ahern et al. 2016), one can reformulate traditional measures of variable
importance in terms of the causal effects of predictor variables. I adopt a similar template of
analysis that specifically leverages state-of-the-art machine learning techniques and incorporates
them into the structural causal inference framework for causal effect estimation. I then apply
this template to address the puzzle of human rights treaty ratification and test many existing
theories of CAT ratification in the literature.
In terms of broad interpretation, my analysis casts doubt on the instrumental explanations,
questions some popular institutional models, and generally supports the norms-based theories
of human rights treaty ratification. It partially confirms some less intuitive arguments in the
literature while challenging some of the most commonly accepted conventional wisdom, includ-
ing that democracy and state practices interact to determine ratification decisions and that states
self-select into treaty regimes based on their high level of compliance. Importantly, my findings
have a causal, rather than correlative, interpretation. Additionally, the data-adaptive, machine
learning-based estimation methods that I use are much less dependent upon distributional and
functional form assumptions as do traditional statistical models.
Despite the great promises of machine learning and the structural causal inference frame-
work, the dearth of applied research that combines these two methods suggests that there is a
gap to bridge between methodological advances in causal inference and machine learning on
the one hand and substantive applications in political science research on the other hand. Given
that any causal analysis requires a sufficient understanding of the literature in any particular
research area, applied researchers are probably better positioned to bridge this gap by adopting
machine learning methods in their political science research.
Finally, there is a critical need to openly embrace the structural, interventional framework
of causal inference in political science given that a lot of research questions in the discipline are
29
explicitly causal queries. This framework has developed significantly in the last decade or so
(Pearl 2014) and has been adopted very successfully in sociology, epidemiology, and biomedical
research. It was also recently presented in a way that makes it much more accessible to scholars
of various methodological persuasions (Pearl, Glymour and Jewell 2016). My application of this
framework to the issue of human rights treaty ratification shows that it can help researchers
clarify confusion about the assumed underlying causal process, identify incoherence in causal
assumptions, and modify our causal models to increase their substantive plausibility. Employing
this structural causal inference framework could be extremely beneficial to applied political
science research.
30
References
Ahern, Jennifer, K Ellicott Colson, Claire Margerson-Zilko, Alan Hubbard and Sandro Galea. 2016. “Pre-
dicting the Population Health Impacts of Community Interventions: The Case of Alcohol Outlets and
Binge Drinking.” American Journal of Public Health 106(11):1938–1943. 29
Alfredsson, Gudmundur, Jonas Grimheden, BC Ramcharan and Alfred de Zayas. 2009. International
Human Rights Monitoring Mechanisms: Essays in Honour of Jakob Th. Möller. Martinus Nijhoff. 3
Bareinboim, Elias, Jin Tian and Judea Pearl. 2014. Recovering from Selection Bias in Causal and Sta-
tistical Inference. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. pp. 2410–2416.
9
Blanton, Shannon Lindsey and Robert G Blanton. 2007. “What Attracts Foreign Investors? An Examina-
tion of Human Rights and Foreign Direct Investment.” Journal of Politics 69(1):143–155. 4
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45(1):5–32. 21
Buergenthal, Thomas. 2006. “The Evolving International Human Rights System.” American Journal of
International Law 100:783–807. 3
Chapman, Terrence L and Stephen Chaudoin. 2013. “Ratification Patterns and the International Criminal
Court1.” International Studies Quarterly 57(2):400–409. 16
Chaudoin, Stephen, Jude Hays and Raymond Hicks. 2016. “Do We Really Know the WTO Cures Cancer?
False Positives and the Effects of International Institutions.” British Journal of Political Science pp. 1–26.
7
Cheibub, José Antonio, Jennifer Gandhi and James Raymond Vreeland. 2010. “Democracy and Dictator-
ship Revisited.” Public choice 143(1-2):67–101. 16, 25, 36
Chen, Tianqi and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” arXiv preprint
arXiv:1603.02754 . 21
Chen, Tianqi and Tong He. 2015. Higgs Boson Discovery with Boosted Trees. In JMLR: Workshop and
Conference Proceedings. Number 42 pp. 69–80. 21
Cingranelli, David L., David L. Richards and K. Chad Clay. 2013. “The Cingranelli-Richards (CIRI) Human
Rights Dataset.” CIRI Human Rights Data Website: http: // www. humanrightsdata. org . 16
Crabtree, Charles D and Christopher J Fariss. 2015. “Uncovering Patterns among Latent Variables: Human
Rights and De Facto Judicial Independence.” Research & Politics 2(3):2053168015605343. 15
Dai, Xinyuan. 2005. “Why Comply? The Domestic Constituency Mechanism.” International Organization
59(02):363–398. 15
Daniel, Rhian M, Bianca L De Stavola, Simon N Cousens et al. 2011. “gformula: Estimating Causal Effects
in the Presence of Time-varying Confounding or Mediation Using the g-computation Formula.” Stata
Journal 11(4):479. 20
Darwiche, Adnan. 2009. Modeling and Reasoning with Bayesian Networks. Cambridge University Press.
12
De Schutter, Olivier. 2010. International Human Rights Law: Cases, Materials, Commentary. Cambridge
University Press. 3
Díaz, Iván, Alan Hubbard, Anna Decker and Mitchell Cohen. 2015. “Variable Importance and Prediction
Methods for Longitudinal Problems with Missing Variables.” PloS One 10(3):1–17. 12, 29
31
Downs, George .W., David M. Rocke and Peter N. Barsoom. 1996. “Is the Good News about Compliance
Good News about Cooperation?” International Organization 50:379–406. 5, 14
Efron, Bradley and Trevor Hastie. 2016. Computer Age Statistical Inference. Vol. 5 Cambridge University
Press. 28
Elwert, Felix. 2013. Graphical Causal Models. In Handbook of Causal Analysis for Social Research. Springer
pp. 245–273. 12
Elwert, Felix and Christopher Winship. 2014. “Endogenous Selection Bias: The Problem of Conditioning
on a Collider Variable.” Annual Review of Sociology 40:31–53.
Fariss, Christopher J. 2014. “Respect for Human Rights Has Improved Over Time: Modeling the Changing
Standard of Accountability.” American Political Science Review 108(2):297–318. 16, 36
Finnemore, Martha and Kathryn Sikkink. 1998. “International Norm Dynamics and Political Change.”
International Organization 52(4):887–917. 3
Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of
Statistics pp. 1189–1232. 21
Gleditsch, Nils Petter, Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg and Håvard Strand.
2002. “Armed Conflict 1946-2001: A New Dataset.” Journal of Peace Research 39(5):615–637. 16
Goodliffe, Jay and Darren G. Hawkins. 2006. “Explaining Commitment: States and the Convention
against Torture.” Journal of Politics 68(2):358–371. 3, 5, 16, 26
Hafner-Burton, Emilie and Kiyoteru Tsutsui. 2007. “Justice Lost! The Failure of International Human
Rights Law to Matter Where Needed Most.” Journal of Peace Research 44(4):407–425. 16
Hafner-Burton, Emilie M. 2005. “Trading Human Rights: How Preferential Trade Agreements Influence
Government Repression.” International Organization 59(3):593–629. 4
Hafner-Burton, Emilie M. 2012. “International Regimes for Human Rights.” Annual Review of Political
Science 15:265–286. 6
Hafner-Burton, Emilie M. 2013. Forced to Be Good: Why Trade Agreements Boost Human Rights. Cornell
University Press. 16
Hafner-Burton, Emilie M, Edward D Mansfield and Jon CW Pevehouse. 2015. “Human Rights Institutions,
Sovereignty Costs and Democratization.” British Journal of Political Science 45(1):1–27. 6
Hastie, Trevor J and Robert J Tibshirani. 1990. Generalized Additive Models. Vol. 43 CRC Press. 21
Hathaway, Oona A. 2007. “Why Do Countries Commit to Human Rights Treaties?” Journal of Conflict
Resolution 51(4):588–621. 4, 5, 8, 16, 18, 26
Hill, Daniel W. 2016a. “Avoiding Obligation: Reservations to Human Rights Treaties.” Journal of Conflict
Resolution 60(6):1–30. 5
Hill, Daniel W. 2016b. “Democracy and the Concept of Personal Integrity Rights.” Journal of Politics
78(3):822–835. 25, 36
Hill, Daniel W and Zachary M Jones. 2014. “An Empirical Evaluation of Explanations for State Repres-
sion.” American Political Science Review 108(3):1–27. 2, 14
Hollyer, James and B. Peter Rosendorff. 2011. “Why Do Authoritarian Regimes Sign the Convention
against Torture? Signaling, Domestic Politics and Non-Compliance.” Quarterly Journal of Political Sci-
ence 6(3-4):275–327. 6, 16, 19, 28
32
Honaker, James, Gary King, Matthew Blackwell et al. 2011. “Amelia II: A Program for Missing Data.”
Journal of Statistical Software 45(7):1–47. 23
Hubbard, Alan, Ivan Diaz Munoz, Anna Decker, John B Holcomb, Martin A Schreiber, Eileen M Bulger,
Karen J Brasel, Erin E Fox, Deborah J Del Junco, Charles E Wade et al. 2013. “Time-Dependent
Prediction and Evaluation of Variable Importance Using SuperLearning in High Dimensional Clinical
Data.” The Journal of Trauma and Acute Care Surgery 75(1):S53–S60. 29
Keller, Helen and Geir Ulfstein. 2012. UN Human Rights Treaty Bodies: Law and Legitimacy. Vol. 1
Cambridge University Press. 3
Kreif, Noémi, Richard Grieve, Iván Díaz and David Harrison. 2015. “Evaluation of the Effect of a Contin-
uous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain
Injury.” Health Economics 24(9):1213–1228. 20
La Porta, Rafael, Florencio Lopez-de Silanes and Andrei Shleifer. 2008. “The Economic Consequences of
Legal Origins.” Journal of Economic Literature 46(2):285–332. 16, 36
Landman, Todd. 2005. Protecting Human Rights: A Comparative Study. Georgetown University Press. 5,
24
Lebovic, James H and Erik Voeten. 2006. “The Politics of Shame: The Condemnation of Country Human
Rights Practices in the UNHCR.” Internationl Studies Quarterly 50(4):861–888. 4
Linzer, Drew A and Jeffrey K Staton. 2015. “A Global Measure of Judicial Independence, 1948–2012.”
Journal of Law and Courts 3(2):223–256. 16
Little, Roderick JA and Donald B Rubin. 2014. Statistical Analysis with Missing Data. John Wiley & Sons.
20
Lupu, Yonatan. 2014. “Why Do States Join Some Universal Treaties but Not Others? An Analysis of Treaty
Commitment Preferences.” Journal of Conflict Resolution pp. 1–32. 2, 4
Lupu, Yonatan. 2015. “Legislative Veto Players and the Effects of International Human Rights Agree-
ments.” American Journal of Political Science 59(3):578–594. 15
Melander, Erik, Therése Pettersson and Lotta Themnér. 2016. “Organized Violence, 1989–2015.” Journal
of Peace Research 53(5):727–742. 16
Mitchell, Sara McLaughlin, Jonathan J Ring and Mary K Spellman. 2013. “Domestic Legal Traditions and
States’ Human Rights Practices.” Journal of Peace Research 50(2):189–202. 16
Moravcsik, Andrew. 2000. “The Origins of Human Rights Regimes: Democratic Delegation in Postwar
Europe.” International Organization 54(2):217–252. 5, 16
Natekin, Alexey and Alois Knoll. 2013. “Gradient Boosting Machines, A Tutorial.” Frontiers in Neuro-
robotics 7. 21
Neugebauer, Romain, Bruce Fireman, Jason A Roy, Marsha A Raebel, Gregory A Nichols and Patrick J
O’Connor. 2013. “Super Learning to Hedge against Incorrect Inference from Arbitrary Parametric
Assumptions in Marginal Structural Modeling.” Journal of Clinical Epidemiology 66(8):S99–S109. 21
Neumayer, Eric. 2007. “Qualified Ratification: Explaining Reservations to International Human Rights
Treaties.” The Journal of Legal Studies 36(2):397–429. 5, 16
Nielsen, Richard A and Beth A Simmons. 2015. “Rewards for Ratification: Payoffs for Participating in the
International Human Rights Regime?” International Studies Quarterly 59(2):197–208. 4, 16, 26
33
Pearl, Judea. 2009. Causality. Cambridge University Press. 2, 7, 12, 15
Pearl, Judea. 2014. “The Deductive Approach to Causal Inference.” Journal of Causal Inference 2(2):115–
129. 30
Pearl, Judea, Madelyn Glymour and Nicholas P Jewell. 2016. Causal Inference in Statistics: A Primer.
John Wiley & Sons. 17, 18, 30
Pirracchio, Romain, John K Yue, Geoffrey T Manley, Mark J van der Laan, Alan E Hubbard et al. 2016.
“Collaborative Targeted Maximum Likelihood Estimation for Variable Importance Measure: Illustration
for Functional Outcome Prediction in Mild Traumatic Brain Injuries.” Statistical Methods in Medical
Research pp. 1–15. 29
Pirracchio, Romain, Maya L Petersen and Mark van der Laan. 2015. “Improving Propensity Score Esti-
mators’ Robustness to Model Misspecification Using Super Learner.” American Journal of Epidemiology
181(2):108–119. 21
Polley, Eric C and Mark J van der Laan. 2010. “Super Learner in Prediction.” Working Paper Series UC
Berkeley Division of Biostatistics . 2, 21
Powell, Emilia J. and Jeffrey K. Staton. 2009. “Domestic Judicial Institutions and Human Rights Treaty
Violation.” International Studies Quarterly 53(1):149–174. 15, 16
Robins, James. 1986. “A New Approach to Causal Inference in Mortality Studies with a Sustained Expo-
sure Period – Application to Control of the Healthy Worker Survivor Effect.” Mathematical Modelling
7(9-12):1393–1512. 20
Robins, James M, Sander Greenland and Fu-Chang Hu. 1999. “Estimation of the Causal Effect of a
Time-varying Exposure on the Marginal Mean of a Repeated Binary Outcome.” Journal of the American
Statistical Association 94(447):687–700. 20
Rodley, Nigel S. 2013. The Role and Impact of Treaty Bodies. In The Oxford Handbook of International
Human Rights Law, ed. Dinah Shelton. Oxford University Press pp. 621–648. 3
Samii, Cyrus, Laura Paler and Sarah Daly. 2016. “Retrospective Causal Inference with Machine Learning
Ensembles: An Application to Anti-Recidivism Policies in Colombia.” Political Analysis Forthcoming. 20
Schapire, Robert E and Yoav Freund. 2012. Boosting: Foundations and Algorithms. MIT press. 21
Simmons, Beth A. 2009. Mobilizing for Human Rights: International Law in Domestic Politics. Cambridge:
Cambridge University Press. 4, 5, 16, 36
Simmons, Beth A. 2011. “Reflections on Mobilizing for Human Rights.” NYU Journal of International Law
and Politics 44:729–750. 6
Simmons, Beth A and Daniel J Hopkins. 2005. “The Constraining Power of International Treaties: Theory
and Methods.” American Political Science Review 99(04):623–631. 5
Sinisi, Sandra E, Eric C Polley, Maya L Petersen, Soo-Yon Rhee and Mark J van der Laan. 2007. “Super
Learning: An Application to the Prediction of HIV-1 Drug Resistance.” Statistical Applications in Genetics
and Molecular Biology 6(1). 20
Smith-Cannoy, Heather. 2012. Insincere Commitments: Human Rights Treaties, Abusive States, and Citizen
Activism. Georgetown University Press. 4
Spence, Douglas Hamilton. 2014. “Foreign Aid and Human Rights Treaty Ratification: Moving beyond
the Rewards Thesis.” The International Journal of Human Rights 18(4-5):414–432. 4
34
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statis-
tical Society. Series B (Methodological) pp. 267–288. 21
Tsiatis, Anastasios. 2007. Semiparametric Theory and Missing Data. Springer Science & Business Media.
20
Tyagi, Yogesh. 2009. “The Denunciation of Human Rights Treaties.” British Yearbook of International Law
79(1):86–193. 14
van der Laan, Mark J, Eric C Polley and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications
in Genetics and Molecular Biology 6(1). 20
van der Laan, Mark J and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and
Experimental Data. Springer. 2, 23
VanderWeele, Tyler J. 2009. “On the Distinction between Interaction and Effect Modification.” Epidemi-
ology 20(6):863–871.
von Stein, Jana. 2005. “Do Treaties Constrain or Screen? Selection Bias and Treaty Compliance.” Ameri-
can Political Science Review 99(4):611–622. 5, 14
von Stein, Jana. 2016. “Making Promises, Keeping Promises: Democracy, Ratification and Compliance in
International Human Rights Law.” British Journal of Political Science 46(3):655–679. 5, 7, 25, 28
Vreeland, James R. 2008. “Political Institutions and Human Rights: Why Dictatorships Enter into the
United Nations Convention Against Torture.” International Organization 62(1):65. 6, 8, 9, 10, 16, 19,
27
Wallace, Geoffrey PR. 2013. “International Law and Public Attitudes toward Torture: An Experimental
Study.” International Organization 67(01):105–140. 15
35
A Variable Description
• Treaty ratification status of the ICCPR, CEDAW, CAT: A country–year binary variable coded 1 for
ratification and 0 otherwise. Data are coded manually from the database of the Office of the High
Commissioner for Human Rights.
(http://www.ohchr.org/EN/HRBodies/Pages/HumanRightsBodies.aspx).
• Human rights dynamic latent protection scores: a country–year interval variable that measures
respect for physical integrity rights. Rescaled to a 0–1 range from the empirical range for ease
of estimation and interpretation. The scores were generated by Fariss (2014) using a dynamic
ordinal item-response theory model that accounts for systematic change in the way human rights
abuses have been monitored over time. The human rights scores model builds on data from the
CIRI Human Rights Data Project, the Political Terror Scale, the Ill Treatment and Torture Data
Collection, the Uppsala Conflict Data Program, and several other public sources.
Variable name in original dataset is latentmean.
(http://humanrightsscores.org).
• CIRI women’s political rights: an ordinal variable from 0 – 3 that measures the extent to which
women’s political rights are protected, including the rights to vote, run for political office, hold
elected office, join political parties, and petition government officials.
A score of 0 indicates these rights are not guaranteed by law; a score of 1 indicates rights are
guaranteed by law but severely restricted in practice; a score of 2 indicates rights are guaranteed
by law but moderately restricted in practices; and a score of 3 indicates rights are guaranteed in
law and practice.
(http://www.humanrightsdata.com/p/data-documentation.html).
• CIRI toture index: an ordinal index that measures the extent of torture practice by government
officials or by private individuals at the instigation of government officials. A score of zero indi-
cates frequent torture practice; a score of 1 indicates occasional torture practice; and a score of 2
indicates that torture did not occur in a given year.
(http://www.humanrightsdata.com/p/data-documentation.html).
• Legal origins: a cross-sectional (country) multinomial variable coded for British, French, German,
Scandinavian, and Socialist legal origins. Data are from La Porta, Lopez-de Silanes and Shleifer
(2008). I recoded 1 for common law and 0 otherwise.
• Ratification rules: a cross-sectional (country) five-point ordinal variable (1, 1.5, 2, 3, 4) by (Sim-
mons 2009). Its empirical maximum value, however, is only a score of 3. It measures “the insti-
tutional “hurdle” that must be overcome in order to get a treaty ratified.” The coding is based on
descriptions of national constitution or basic rule.
(http://scholar.harvard.edu/files/bsimmons/files/APP_3.2_Ratification_rules.pdf).
• Global and regional ratification rates: continuous variables measuring the cumulative ratification
rates globally and by region. Regional classification is defined using the United Nations Regional
Groups of Member States, including Africa Group (AG), Asia-Pacific Group (APG), Eastern Euro-
pean Group (EEG), Latin American and Caribbean Group (GRULAC), and Western European and
Others Group (WEOG).
(http://www.un.org/depts/DGACM/RegionalGroups.shtml).
• Democracy: measured by the dummy variable democracy in the Democracy-Dictatorship dataset
by Cheibub, Gandhi and Vreeland (2010). It is coded 1 if the regime qualifies as democratic and 0
otherwise. This measure is preferred to the Polity 4 dataset to avoid a conceptual overlap between
democracy and physical integrity rights (Hill 2016b).
(https://sites.google.com/site/joseantoniocheibub/datasets/democracy-and-dictatorship-revisited).
36
• Multiple parties: a ordinal variable coded 0 for no parties, 1 for single party, and 2 for multi-
ple parties. Variable name in original dataset is defacto. I recoded 1 for multiple parties and 0
otherwise.
• Democratic transition: a binary variable coded 1 when there is transition to or from democracy
and 0 otherwise.
Variable name in original dataset is tt.
• Judicial independence: a time-series cross-sectional latent score (0 – 1) measuring judicial inde-
pendence. The scores range from 0 (no judicial independence) to 1 (complete judicial indepen-
dence).
(http://polisci.emory.edu/faculty/jkstato/page3/index.html).
• GDP per capita: a country–year interval variable measuring gross domestic product divided by
midyear population measured in current US dollars. A few country-year observations have a GDP
per capita value of zero. I change that into the next smallest value of 65.
(http://data.worldbank.org/indicator/NY.GDP.PCAP.CD).
• Population: a country–year interval variable measuring the total number of residents in a country
regardless of their legal status.
(http://data.worldbank.org/indicator/SP.POP.TOTL).
• Trade: a country–year interval variable measuring the sum of exports and imports of goods and
services as a share of gross domestic product.
(http://data.worldbank.org/indicator/NE.TRD.GNFS.ZS).
• Net ODA received (current USD): data are from the World Bank Indicators database.
(http://data.worldbank.org/indicator/DT.ODA.ODAT.CD).
• Involvement in militarized interstate dispute: a country–year binary variable from the Milita-
rized Interstate Dispute Data (MIDB dataset, version 4.1). It is recoded 1 to indicate a country’s
involvement in any side of an militarized dispute and 0 otherwise between the start year and the
end year of a dispute.
(http://cow.dss.ucdavis.edu/data-sets/MIDs).
37
B Summary Statistics
B.1 Summary statistics
Table 6: Summary Statistics
Statistic N Mean St. Dev. Min Max

COW country code 8,062 — — 2 990
Year 8,062 — — 1966 2013
ICCPR ratification 8,062 0.560 0.496 0 1
CEDAW ratification 8,062 0.563 0.496 0 1
CAT ratification 8,062 0.370 0.483 0 1
Human rights scores 8,062 0.345 1.420 −3.110 4.710
CIRI women’s political rights 4,840 1.780 0.649 0 3
CIRI torture index 4,850 0.778 0.747 0 2
Legal origins 7,956 — — 1 5
Ratification rules 7,796 1.800 0.640 1 3
ICCPR global rate 8,062 0.561 0.268 0 0.869
CEDAW global rate 8,062 0.564 0.379 0 0.964
CAT global rate 8,062 0.369 0.316 0 0.792
ICCPR regional rates 8,062 0.563 0.311 0 1
CEDAW regional rates 8,062 0.565 0.397 0 1
CAT regional rates 8,062 0.372 0.356 0 1
Democracy 6,886 0.442 0.497 0 1
Multiple parties 6,886 1.650 0.653 0 2
Transition 6,886 0.018 0.134 0 1
Judicial independence 7,679 0.465 0.321 0.01 0.995
Population 7,798 31,846,961 115,863,080 9,419 1,357,380,000
GDP per capita 7,055 6,907 14,088 37.5 193,648
Trade 6,536 75.7 49.3 0.021 532
Net ODA 7,490 268,622,622 619,681,691 −943,150,000 22,057,090,000
Militarized dispute 7,501 0.308 0.462 0 1
B.2 R code for data preprocessing

• R version 3.3.2 (2016-10-31)
• Platform: x86_64-w64-mingw32/x64 (64-bit)
• Running under: Windows >= 8 x64 (build 9200)
1 options(digits = 3)
2 options(dplyr.width = Inf)
3 rm(list = ls())
4 cat("\014")
5
6 library(dplyr) # Upload dplyr to process data
7 library(tidyr) # tinyr package to tidy data
8 library(foreign) # Read Stata data
9 library(lubridate) # Handle dates data
10 library(stargazer) # Export summary statistics in latex table
38
11 library(reshape2) # convert data sets into long and wide formats
12 library(ggplot2)
13 library(ggthemes)
14
15 ######################################
16 # Ratification and human rights scores
17 ######################################
18
19 # Treaty ratification years by 192 countries
20 # ICCPR open 1966, entry 1976
21 # CAT open 1984, entry 1987
22 # CEDAW open 1979 entry 1981
23 ratifcow <− read.csv("ratification.csv") %>%
24 mutate(catyear = year(as.Date(cat.date, format = "%m/%d/%Y")),
25 iccpryear = year(as.Date(iccpr.date, format = "%m/%d/%Y")),
26 cedawyear = year(as.Date(cedaw.date, format = "%m/%d/%Y"))) %>%
27 dplyr::select(cow, name, catyear, iccpryear, cedawyear) %>%
28 mutate(catyear = ifelse(is.na(catyear), 0, catyear),
29 iccpryear = ifelse(is.na(iccpryear), 0, iccpryear),
30 cedawyear = ifelse(is.na(cedawyear), 0, cedawyear))
31
32 # PTS on 203 states from 1976−2014 by order Amnesty > SD > HRW
33 ptscores <− read.csv("PTS2015.csv") %>%
34 mutate(pts = ifelse(is.na(Amnesty),
35 ifelse(is.na(State.Dept),
36 ifelse(is.na(HRW), NA, HRW),
37 State.Dept), Amnesty)) %>%
38 rename(cow = COWnum, year = Year) %>%
39 dplyr::select(cow, year, pts)
40
41 # HR protection scores on 205 states from 1949−2013
42 hrscores <− read.csv("hrscores.csv") %>%
43 rename(cow = COW, year = YEAR, hrs = latentmean) %>%
44 dplyr::select(cow, year, hrs)
45
46 ######################
47 # Baseline covariates
48 ######################
49
50 # Regional indicator of 194 states
51 regional <− read.csv("region.csv") %>%
52 dplyr::select(cow, region)
53
54 # Legal origin and ratification rules in 187 states

55 # legal origins (1−English, 2−French, 4−German, 5−Scandinavian)
56 # ratification rules 1 (lowest hurdle) to 4 (highest)
57 legalratif <− read.csv("legalratif.csv") %>%
58 mutate(cow, legor = as.factor(legor), ratifrule = as.factor(ratifrule)) %>%
59 dplyr::select(−name)
60
61 # Combine TSCS dataset using PTS (region, legalratif)
62 # 192 states from 1966−2013 (remember to filter by treaty opening years later)
63 data <− ratifcow %>% left_join(hrscores, by = "cow") %>%
39
64 mutate(iccpr = ifelse(iccpryear == 0, 0, ifelse(year >= iccpryear, 1, 0)),
65 cedaw = ifelse(cedawyear == 0, 0, ifelse(year >= cedawyear, 1, 0)),
66 cat = ifelse(catyear == 0, 0, ifelse(year >= catyear, 1, 0))) %>%
67 dplyr::select(−c(name, iccpryear, cedawyear, catyear)) %>%
68 dplyr::filter(year > 1965) %>%
69 left_join(regional, by = "cow") %>%
70 left_join(legalratif, by = "cow")
71
72 ###############################
73 # Plot number of states parties
74 # to each treaty over time
75 ###############################
76 # Count the numbers of states parties to each procedure over time since 2003
77 ratify <− data.frame(data %>% group_by(year) %>%
78 summarise(CAT = sum(cat),
79 ICCPR = sum(iccpr),
80 CEDAW = sum(cedaw)))
81
82 # Convert to long format and plot the number of states parties since 2003
83 ratify_long <− melt(ratify, id = "year") %>%
84 filter(year > 1965) %>%
85 rename(Number = value, Treaty = variable, Year = year)
86
87 ggplot(data = ratify_long, aes(x = Year, y = Number, colour = Treaty)) +
88 geom_point(size = 6) + geom_line(size = 2) +
89 xlab("Year") + ylab("Number of States Parties") +
90 labs(x = "Year", y = "Number of States Parties") + theme_wsj() +
91 scale_x_continuous(breaks = c(1965, 1970, 1975, 1980, 1985, 1990,
92 1995, 2000, 2005, 2010, 2015)) +
93 scale_y_continuous(breaks = c(0, 25, 50, 75, 100, 125, 150, 175, 200, 225)) +
94 theme(legend.title = element_text(size = 25, hjust = 3, vjust = 7)) +
95 theme(axis.text = element_text(size = 25)) +
96 theme(legend.text = element_text(size = 25))
97
98 # Time−varying covariates
99 #########################
100 # Calculate regional and global ratification rates from 1966−2013
101 ratify_global <− data.frame(data %>% group_by(year) %>%
102 summarise(iccpr_glbavg = mean(iccpr),
103 cedaw_glbavg = mean(cedaw),
104 cat_glbavg = mean(cat)))
105
106 ratify_regional <− data.frame(data %>% group_by(year, region) %>%
107 summarise(iccpr_regavg = mean(iccpr),
108 cedaw_regavg = mean(cedaw),
109 cat_regavg = mean(cat)))
110
111 # Join data with two diffusion ratification rates
112 data <− dplyr::left_join(data, ratify_global, by = c("year" = "year")) %>%
113 left_join(ratify_regional, by = c("year" = "year", "region" = "region")) %>%
114 dplyr::select(−c(region))
115
116 # CIRI data on 198 states from 1981−2011
40
117 # torture (0 = frequent, 2 = no)
118 # women’s political rights (0 = no, 3 = law and practice)
119 ciri <− read.csv("CIRI.csv") %>%
120 dplyr::select(cow = COW, year = YEAR, torture = TORT, wpol = WOPOL) %>%
121 mutate(wpol = ifelse(wpol < 0, NA, wpol),
122 torture = ifelse(torture < 0, NA, torture))
123
124 # Latent Judicial Independence estimates 200 states from 1948−2012

125 judind <− read.csv("lji.csv") %>%
126 dplyr::select(cow = ccode, year = year, ji = LJI)
127
128 # Population in 185 states from 1966−2015
129 pop <− read.csv("Pop-WDI.csv", header = TRUE, stringsAsFactors = FALSE) %>%
130 gather(year, population, −c(name, cow), factor_key = TRUE) %>%
131 mutate(year = as.integer(gsub("X", "", year)),
132 population = as.numeric(population)) %>%
133 dplyr::select(cow, year, population)
134
135 # GDP per capita in constant US dollars in 185 states from 1966−2015
136 GDPpc <− read.csv("GDPpcWDI.csv", header = TRUE, stringsAsFactors = FALSE) %>%
137 gather(year, gdppc, −c(name, cow), factor_key = TRUE) %>%
138 mutate(year = as.integer(gsub("X", "", year)), gdppc = as.numeric(gdppc)) %>%
139 dplyr::select(cow, year, gdppc) %>%
140 mutate(gdppc = ifelse(gdppc <= 0, NA, gdppc))
141
142 # Trade data 170 states from 1960−2014

143 trade <− read.csv("trade.csv") %>%
144 dplyr::select(−c(Country.Name)) %>%
145 gather(year, trade, −c(COW), factor_key = TRUE) %>%
146 mutate(year = as.integer(gsub("X", "", year))) %>%
147 dplyr::select(cow = COW, year, trade)
148
149 # Net ODA data 172 states from 1960−2015
150 oda <− read.csv("netODA.csv") %>%
151 dplyr::select(−c(name)) %>%
152 gather(year, oda, −c(cow), factor_key = TRUE) %>%
153 mutate(year = as.integer(gsub("X", "", year))) %>%
154 mutate(oda = ifelse(is.na(oda), 0, oda))
155
156 # Democracy−dictatorship data on 202 countries from 1946 − 2008
157 # defacto multiple parties (0 = no, 1 = single, 2 = multiple)
158 # transition to/from democracy
159 dd <− read.dta("DD.dta") %>%
160 dplyr::select(year = year, cow = cowcode2,
161 parties = defacto, transition = tt, democracy = democracy)
162
163 # MID data on 178 states from 1966 to 2013
164 MID <− read.csv("MIDB.csv") %>%
165 dplyr::select(cow = ccode, start = StYear, end = EndYear) %>%
166 right_join(hrscores, by = c("cow")) %>%
167 dplyr::select(cow, start, end, year) %>%
168 filter(start > 1965) %>% filter(year > 1965) %>%
169 mutate(dispute = ifelse(year >= start, ifelse(year <= end, 1, 0), 0)) %>%
41
170 dplyr::select(cow, year, dispute) %>%
171 distinct() %>% group_by(cow, year) %>% summarize(dispute = max(dispute))
172
173 # Combine data on 192 states from 1966−2013
174 data <− left_join(data, ciri, by = c("cow" = "cow", "year" = "year")) %>%
175 left_join(dd, by = c("cow" = "cow", "year" = "year")) %>%
176 left_join(judind, by = c("cow" = "cow", "year" = "year")) %>%
177 left_join(pop, by = c("cow" = "cow", "year" = "year")) %>%
178 left_join(GDPpc, by = c("cow" = "cow", "year" = "year")) %>%
179 left_join(trade, by = c("cow" = "cow", "year" = "year")) %>%
180 left_join(oda, by = c("cow" = "cow", "year" = "year")) %>%
181 left_join(MID, by = c("cow" = "cow", "year" = "year"))
182
183 # Summary statistics in LaTeX

184 sum(complete.cases(data))
185 stargazer(data)
186
187 # Saving data into drive
188 write.csv(data, "finaldata.csv", row.names = FALSE)
189
190 # Saving data in RData file
191 save.image("datawork.RData")
42
C Multiple Imputation of Missing Data
C.1 Multiple imputation
Multiple imputation is used to fill in missing data and create five imputed datasets, covering 192 countries
from 1965 – 2013. All variables in Table 6 are used to make the MAR assumption as plausible as possible.
When modeling and estimating causal effects, however, I subset the observations by their appropriate
time periods. For example, I only use observations from 1985–2013 when estimating the causal effects
of predictive covariates on CAT ratification and 1982–2013 for modeling CEDAW ratification. As a result,
the fractions of imputed missing data that are actually used for estimation tend to be lower. Variables
with the highest missing fractions that are in use are CIRI torture index (missing fraction is 0.197) and
CIRI measures of women’s political rights (missing fraction is 0.196).
Table 7: Fractions of missing data by variables
Variables Missing fraction

CIRI women’s political rights 0.400
CIRI torture index 0.398
Trade participation 0.189
DD transition 0.146
DD multiple parties 0.146
DD democracy 0.146
GDP per capita 0.125
Judicial independence 0.048
Net ODA 0.071
Involvement in militarized dispute 0.070
Population size 0.033
Ratification rules 0.033
Legal origins 0.013
CAT ratification 0.000
CAT global ratification rate 0.000
CAT regional ratification rates 0.000
CEDAW ratification 0.000
CEDAW global ratification rate 0.000
CEDAW regional ratification rates 0.000
ICCPR ratification 0.000
ICCPR global ratification rate 0.000
ICCPR regional ratification rates 0.000
N of obs. after list-wise deletion 3,615
N of obs. after imputation 8,062
43
Missingness Map
2
20
31
40
41
42
51
52
53
54
55
56
57
58
60
70
80
90
91
92
93
94
95
100
101
110
115
130
135
140
145
150
155
160
165
200
205
210
211
212
220
221
223
225
230
232
235
255
290
305
310
316
317
325
331
338
339
341
343
344
345
346
349
350
352
355
359
360
365
366
367
368
369
370
371
372
373
375
380
385
390
395
403
404
411
420
432
433
434
435
436
437
438
439
450
451
452
461
471
475
481
482
483
484
490
500
501
510
516
517
520
522
530
531
540
541
551
552
553
560
565
570
571
572
580
581
590
591
600
615
616
620
625
626
630
640
645
651
652
660
663
666
670
679
690
692
694
696
698
700
701
702
703
704
705
710
712
731
732
740
750
760
770
771
775
780
781
790
800
811
812
816
820
830
835
840
850
860
900
910
920
935
940
946
947
950
955
970
983
986
987
990
wpol
torture
trade
democracy
transition
parties
gdppc
oda
dispute
ji
ratifrule
population
legor
cat_regavg
cedaw_regavg
iccpr_regavg
cat_glbavg
cedaw_glbavg
iccpr_glbavg
cat
cedaw
iccpr
hrs
year
cow
Figure 7: Map of missing data for multiple imputation
C.2 R code for multiple imputation
3 rm(list = ls())
4 cat("\014")
5

7 library(tidyr) # tinyr package to tidy data
8 library(foreign) # Read Stata data
9 library(ggplot2) # ggplot graphics
10 library(Amelia) # Multiple imputation
11 library(lubridate) # Handle dates data
12
13 # Summary statistics of raw data and impute GDP = min and conflict NA = 0
14 data <− read.csv("finaldata.csv")
15 stargazer(data)
16
17 # Multiple imputation using Amelia package

18 set.seed(123)
19 mi.data <− amelia(data, m = 5, ts = "year", cs = "cow",
44
20 p2s = 2, polytime = 3,
21 logs = c("gdppc", "population"),
22 noms = c("legor", "ratifrule", "democracy", "transition",
23 "parties", "dispute"),
24 ords = c("wpol", "torture"),
25 emburn = c(50, 500), boot.type = "none",
26 bounds = rbind(c(3, −3.1, 4.7), c(20, 0, 1),
27 c(21, 9, 21), c(22, 4, 12), c(23, 0, 532),
28 c(24, 0, 24)))
29
30 # Write imputed data sets into CSV files
31 save(mi.data, file = "midata.RData")
32 write.amelia(obj = mi.data, file.stem = "midata", row.names = FALSE)
33
34 # Create missingness map and diagnostics
35 missmap(mi.data)
36 summary(mi.data)
37
38 # Stack all five data sets and export into a CSV file
39 data1 <− read.csv("midata1.csv")
44 stackdata <− rbind(data1, data2, data3, data4, data5)
45 write.csv(stackdata, file = "stackeddata.csv", row.names = FALSE)
45
D R code for main analysis
D.1 R code for XGBoost tuning
1 ###########################################
2 # Tuning XGBoost Hyperparameters Using a Combination of
3 # Grid Search and Cross−validated Super Learner
4 ###########################################
7 options("scipen" = 5)
8 rm(list = ls())
9 cat("\014")
10
11 # Load packages
13 library(ggplot2) # visualize data
14 library(ggthemes) # use various themes in ggplot
15 library(SuperLearner) # use Super Learner predictive method
16 library(gam) # algorithm used within TMLE
17 library(glmnet) # algorithm used within TMLE
18 library(randomForest) # algorithm used within TMLE
19 library(xgboost) # algorithm for XGBoost
20 library(xtable)
21 library(Amelia)
22 library(foreach) # do parallel loop
23 library(doParallel) # do parallel loop
24 library(RhpcBLASctl) #multicore
25
26 # Setup parallel computation − use all cores on our computer.

27 num_cores = RhpcBLASctl::get_num_cores()
28
29 # Use all of those cores for parallel SuperLearner.
30 options(mc.cores = num_cores)
31
32 # Check how many parallel workers we are using:

33 getOption("mc.cores")
34
35 # We need to set a different type of seed that works across cores.
36 set.seed(1, "L’Ecuyer-CMRG")
37
38 # Create function rescaling outcome into 0−1

39 std <− function(x) {x = (x − min(x))/(max(x) − min(x))}
40
41 # Read stacked data sets and process data
42 data <− read.csv("midata5.csv")
43 datatuning <− data %>%
44 mutate(legor = ifelse(legor == 1, 1, 0),
45 ratifrule = std(ratifrule),
46 iccpr_glbavg = std(iccpr_glbavg),
47 cedaw_glbavg = std(cedaw_glbavg),
48 cat_glbavg = std(cat_glbavg),
46
49 iccpr_regavg = std(iccpr_regavg),
50 cedaw_regavg = std(cedaw_regavg),
51 cat_regavg = std(cat_regavg),
52 ji = std(ji), population = std(population),
53 gdppc = std(gdppc), trade = std(trade), oda = std(oda),
54 parties = ifelse(parties < 2, 0, 1),
55 hrs = std(hrs),
56 wpol = std(wpol),
57 torture = std(torture)) %>%
58 group_by(cow) %>%
59 mutate(laghrs = lag(hrs, 1),
60 lagwpol = lag(wpol, 1),
61 lagtorture = lag(torture, 1),
62 lagiccpr = lag(iccpr, 1),
63 lagcedaw = lag(cedaw, 1),
64 lagcat = lag(cat, 1))
65
66 # Use 1967−2013 because ICCPR opened in 12/1966
67 # HRP scores 1948−2013
68 # Total 7,870 obs across 192 countries over 47 years
69 datatuning_iccpr <− datatuning %>%
70 dplyr::select(c(iccpr_glbavg, iccpr_regavg,
71 population, gdppc, trade, oda, ji,
72 democracy, parties, transition, dispute,
73 legor, ratifrule,
74 lagiccpr, laghrs,
75 iccpr,
76 year, cow)) %>%
77 filter(year >= 1967) %>% na.omit()
78 Y1 <− datatuning_iccpr$iccpr
79 X1 <− data.frame(datatuning_iccpr[, 1:15])
80 id1 <− factor(datatuning_iccpr$cow)
81
82 # Use 1982−2012 because CIRI wpol starts at 1981 and stops at 2011
84 datatuning_cedaw <− datatuning %>%
85 dplyr::select(c(cedaw_glbavg, cedaw_regavg,
89 lagcedaw, lagwpol,
90 cedaw,
91 year, cow)) %>%
93 Y2 <− datatuning_cedaw$cedaw
94 X2 <− data.frame(datatuning_cedaw[, 1:15])
95 id2 <− factor(datatuning_cedaw$cow)
96
97 # Use 1985−2012 because CAT opened in 1984
98 # CIRI torture stops at 2011
100 datatuning_cat <− datatuning %>%
101 dplyr::select(c(cat_glbavg, cat_regavg,
47
105 lagcat, lagtorture,
106 cat,
107 cow, year)) %>%
109 Y3 <− datatuning_cat$cat
110 X3 <− data.frame(datatuning_cat[, 1:15])
111 id3 <− factor(datatuning_cat$cow)
112
113 # 3∗4∗4 = 48 different configurations.
114 tune = list(ntrees = c(200, 500, 1000),
115 max_depth = c(4:7),
116 shrinkage = c(0.01, 0.05, 0.1, 0.2))
117
118 # Set detailed names = T so we can see the configuration for each function.
119 learners = create.Learner("SL.xgboost", tune = tune,
120 detailed_names = T, name_prefix = "XGB")
121
122 # Fit the SuperLearner using ICCPR, CEDAW, and CAT stacked datasets
123 SL.library <− c(learners$names)
124
125 set.seed(3)
126 sl3 = CV.SuperLearner(Y = Y3, X = X3,
127 family = binomial(), SL.library = SL.library,
128 method = "method.NNLS", id = id3, verbose = TRUE,
129 control = list(saveFitLibrary = TRUE, trimLogit = 1e−04),
130 cvControl = list(V = 5L, shuffle = TRUE),
131 parallel = "multicore")
132 plot.CV.SuperLearner(sl3)
133 result_CVcat <− summary.CV.SuperLearner(sl3)$Table
134 result_CVcat[order(result_CVcat$Ave), ]
135
136 set.seed(2)
144 result_CVcedaw <− summary.CV.SuperLearner(sl2)$Table
145 result_CVcedaw[order(result_CVcedaw$Ave), ]
146
147 set.seed(1)
48
155 result_CViccpr <− summary.CV.SuperLearner(sl1)$Table
156 result_CViccpr[order(result_CViccpr$Ave), ]
157
158 save.image("xgboost-tuning-updated.RData")
D.2 R code for comparing predictive algorithms
3 options("scipen" = 5)
4 rm(list = ls())
5 cat("\014")
6
7 # Load packages
15 library(xgboost) # algorithm for XGBoost
16 library(xtable)
17 library(Amelia)
18 library(foreach) # do parallel loop
19 library(doParallel) # do parallel loop
21
24
27
30
31 # We need to set a different type of seed that works across cores.
33
35 std <− function(x) {
36 x = (x − min(x))/(max(x) − min(x))
37 }
38

41 datatuning <− data %>%
42 mutate(legor = ifelse(legor == 1, 1, 0),
43 ratifrule = std(ratifrule),
49
53 hrs = std(hrs),
54 wpol = std(wpol),
55 torture = std(torture)) %>%
57 mutate(laghrs = lag(hrs, 1),
58 lagwpol = lag(wpol, 1),
59 lagtorture = lag(torture, 1),
60 lagiccpr = lag(iccpr, 1),
61 lagcedaw = lag(cedaw, 1),
62 lagcat = lag(cat, 1))
63
64 # Use 1967−2013 because ICCPR opened in 12/1966

65 # HRP scores 1948−2013
67 datatuning_iccpr <− datatuning %>%
73 iccpr,
74 year, cow)) %>%
76 Y1 <− datatuning_iccpr$iccpr
77 X1 <− data.frame(datatuning_iccpr[, 1:15])
78 id1 <− factor(datatuning_iccpr$cow)
79
80 # Tuning XGB
81 XGB_iccpr1 = create.Learner("SL.xgboost",
82 tune = list(ntrees = 500, max_depth = 4, shrinkage = 0.01),
83 detailed_names = T, name_prefix = "XGB_iccpr1")
96
97 # Create Super Learner library
50
98 SL.library_iccpr <− c("SL.glm", "SL.glmnet",
99 "SL.gam", "SL.polymars",
100 "SL.randomForest", "SL.gbm", "SL.xgboost",
101 XGB_iccpr1$names, XGB_iccpr2$names,
102 XGB_iccpr3$names, XGB_iccpr4$names,
103 XGB_iccpr5$names)
104
105 set.seed(1)
106 sl1full = CV.SuperLearner(Y = Y1, X = X1,
107 family = binomial(), SL.library = SL.library_iccpr,
108 method = "method.NNloglik", id = id1, verbose = TRUE,
112 plot.CV.SuperLearner(sl1full)
113 result_CViccprfull <− summary.CV.SuperLearner(sl1full)$Table
114 tuning_iccpr <− result_CViccprfull[order(result_CViccprfull$Ave), ]
115 xtable(tuning_iccpr, digits = rep(5, 6))
116
117 save.image("tuning-full.RData")
D.3 R code for variable importance analysis and theory testing
1 #########################################
2 # Estimating Causal Effects of Binary Variables Using SL−based TMLE
3 #########################################
6 rm(list = ls())
7 cat("\014")
8 options("scipen" = 100, "digits" = 4)
9
10 # Load packages
11 library(dplyr) # Manage data
15 library(tmle) # use TMLE method
19 library(polspline) # algorithm used within TMLE
20 library(xgboost) # algorithm used within TMLE
21 library(xtable) # create LaTeX tables
22 library(Amelia) # combine estimates from multiple imputation
24 library(parallel) # parallel computing
25
26 # Tuning XGB
27 XGB_cat = create.Learner("SL.xgboost",
51
29 detailed_names = T, name_prefix = "XGB_cat")
30
32 SL.library <− c("SL.glmnet", "SL.gam", "SL.polymars",
33 "SL.randomForest", XGB_cat$names)
34
35 # Set multicore compatible seed.
37
40
43
46
49 x = (x − min(x))/(max(x) − min(x))
50 }
51
52 ###############################
54 data <− read.csv("stackeddata.csv")
55 d <− split(data, rep(1:5, each = nrow(data)/5))
56 iccpr_bin <− data.frame(matrix(NA, nrow = 10, ncol = 4))
57 cedaw_bin <− data.frame(matrix(NA, nrow = 10, ncol = 4))
58 cat_bin <− data.frame(matrix(NA, nrow = 10, ncol = 4))
59
60 for (m in 1:5) {
61
62 # Create holders for TMLE estimates for each imputed dataset
63 tmle_bin_iccpr <− data.frame(matrix(NA, nrow = 2, ncol = 4))
64 tmle_bin_cat <− data.frame(matrix(NA, nrow = 2, ncol = 4))
65 tmle_bin_cedaw <− data.frame(matrix(NA, nrow = 2, ncol = 4))
66
67 # Transform variables
68 d[[m]] <− d[[m]] %>%
69 mutate(legor = ifelse(legor == 1, 1, 0), ratifrule = std(ratifrule),
79 hrs = std(hrs), wpol = std(wpol), torture = std(torture)) %>%
81 mutate(laghrs = lag(hrs), lagwpol = lag(wpol), lagtorture = lag(torture),
52
82 lagiccpr = lag(iccpr), lagcedaw = lag(cedaw), lagcat = lag(cat))
83
84 # Create a dataset for each treaty ICCPR 1966, CEDAW 1979, CAT 1984
85 # Use 1967−2013 because ICCPR opened in 12/1966 and HRP scores 1948−2013
86 # Use 1982−2013 because CIRI wpol starts at 1981
87 # Use 1985−2013 because CAT opened in 12/1984
88 data_iccpr <− data.frame(d[[m]]) %>%
90 data_cedaw <− data.frame(d[[m]]) %>%
92 data_cat <− data.frame(d[[m]]) %>%
94
95 # Create ICCPR ratification dataset

96 data_iccpr <− data_iccpr %>%
97 dplyr::select(c(democracy, parties, transition, dispute,
98 iccpr_glbavg, iccpr_regavg,
102 iccpr, cow))
103 # Model ICCPR ratification
104 for (i in 1:4) {
105 # Identify model variables
106 id <− factor(data_iccpr$cow)
107 Y <− data_iccpr$iccpr
108 A <− data_iccpr[, i]
109 W <− data.frame(dplyr::select(data_iccpr, −c(i, 16, 17)))
110 tmle_iccpr <− tmle(Y, A, W,
111 Qbounds = c(0, 1), Q.SL.library = SL.library,
112 gbound = 1e−4, g.SL.library = SL.library,
113 family = "binomial", fluctuation = "logistic",
114 id = id, verbose = TRUE)
115 tmle_bin_iccpr[1:2, i] <− c(tmle_iccpr$estimates$ATE$psi,
116 sqrt(tmle_iccpr$estimates$ATE$var.psi))
117 print(c(m, "ICCPR", i))
118 }
119
120 # Create CEDAW ratification dataset
121 data_cedaw <− data_cedaw %>%
123 cedaw_glbavg, cedaw_regavg,
124 ji, population, gdppc, trade, oda,
127 cedaw, cow))
128 # Model CEDAW ratification
129 for (i in 1:4) {
131 id <− factor(data_cedaw$cow)
132 Y = data_cedaw$cedaw
133 A = data_cedaw[, i]
134 W = data.frame(dplyr::select(data_cedaw, −c(i, 16, 17)))
53
135 tmle_cedaw <− tmle(Y, A, W,
140 tmle_bin_cedaw[1:2, i] <− c(tmle_cedaw$estimates$ATE$psi,
141 sqrt(tmle_cedaw$estimates$ATE$var.psi))
142 print(c(m, "CEDAW", i))
143 }
144
145 # Create CAT ratification dataset
146 data_cat <− data_cat %>%
148 cat_glbavg, cat_regavg,
149 ji, population, gdppc, trade, oda,
152 cat, cow))
153 # Model CAT ratification
154 for (i in 1:4) {
156 id <− factor(data_cat$cow)
157 Y = data_cat$cat
158 A = data_cat[, i]
159 W = data.frame(dplyr::select(data_cat, −c(i, 16, 17)))
160 tmle_cat <− tmle(Y, A, W,
165 tmle_bin_cat[1:2, i] <− c(tmle_cat$estimates$ATE$psi,
166 sqrt(tmle_cat$estimates$ATE$var.psi))
167 print(c(m, "CAT", i))
168 }
169
170 iccpr_bin[(2∗m − 1):(2∗m), ] <− tmle_bin_iccpr
171 cedaw_bin[(2∗m − 1):(2∗m), ] <− tmle_bin_cedaw
172 cat_bin[(2∗m − 1):(2∗m), ] <− tmle_bin_cat
173 }
174
175 # Combine TMLE estimates from 5 imputed datasets
176 variables <− c("Democracy", "Multiple parties",
177 "Transition", "Militarized disputes")
178 vim_bin_iccpr <− data.frame(mi.meld(q = iccpr_bin[c(1, 3, 5, 7, 9), ],
179 se = iccpr_bin[c(2, 4, 6, 8, 10), ],
180 byrow = TRUE))
181 result_iccpr <− data.frame(cbind(t(vim_bin_iccpr[, 1:4]),
182 t(vim_bin_iccpr[, 5:8]))) %>%
183 mutate(variables = variables, mean = X1, sd = X2,
184 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
185 dplyr::select(variables, mean, lower, upper)
186
187 vim_bin_cedaw <− data.frame(mi.meld(q = cedaw_bin[c(1, 3, 5, 7, 9), ],
54
188 se = cedaw_bin[c(2, 4, 6, 8, 10), ],
189 byrow = TRUE))
190 result_cedaw <− data.frame(cbind(t(vim_bin_cedaw[, 1:4]),
191 t(vim_bin_cedaw[, 5:8]))) %>%
193 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
195
196 vim_bin_cat <− data.frame(mi.meld(q = cat_bin[c(1, 3, 5, 7, 9), ],
197 se = cat_bin[c(2, 4, 6, 8, 10), ],
198 byrow = TRUE))
199 result_cat <− data.frame(cbind(t(vim_bin_cat[, 1:4]),
200 t(vim_bin_cat[, 5:8]))) %>%
202 lower = X1 − 1.96∗X2, upper = X1 + 1.96∗X2) %>%
204
205 effect <− data.frame(rbind(result_iccpr, result_cedaw, result_cat))
206 xtable(effect, digits = c(rep(3, 5)))
207
208 save.image("CVIA-bin-TMLE.RData")
209
210 ################################################
211 # Estimating Causal Effects of Continuous Variables Using SL−based Substitution
212 ################################################
215 rm(list = ls())
216 cat("\014")
217
218 # Load packages
232 library(doParallel) # parallel computing
233 library(foreach) # parallel computing
234
235 # Tuning XGB
236 XGB = create.Learner("SL.xgboost",
238 detailed_names = T, name_prefix = "XGB_cat")
239
55
241 SL.library <− c("SL.glmnet", "SL.gam", XGB$names)
242
245
248
251
254
257 x = (x − min(x))/(max(x) − min(x))
258 }
259
260 # Read 5th imputed data set and process data

262 data <− data %>%
263 mutate(legor = ifelse(legor == 1, 1, 0), ratifrule = std(ratifrule),
273 hrs = std(hrs), wpol = std(wpol), torture = std(torture)) %>%
275 mutate(laghrs = lag(hrs, 1), lagwpol = lag(wpol, 1), lagtorture = lag(torture, 1),
276 lagiccpr = lag(iccpr, 1), lagcedaw = lag(cedaw, 1), lagcat = lag(cat, 1))
277
278 # Create a dataset for each treaty ICCPR 1966, CEDAW 1979, CAT 1984
279 # Use 1966−2013 because ICCPR opened in 1966 and HRP scores 1948−2013
280 # Use 1982−2012 because CIRI wpol starts at 1981 and stops at 2011
281 # Use 1984−2012 because CAT opened in 1984 and CIRI torture stops at 2011
282 # Count number of observations for each dataset
283 data_iccpr <− data %>%
285 n_iccpr <− nrow(data_iccpr)
286 data_cedaw <− data %>%
288 n_cedaw <− nrow(data_cedaw)
289 data_cat <− data %>%
291 n_cat <− nrow(data_cat)
292
56
295
298
301
304
305 # For bootstrap−based inference, use stochastic imputation with 1 imputed dataset
306 # Take quantile for CI, no need for normality assumption
307 cl <− makeCluster(2)
308 registerDoParallel(cl)
309 B <− 500
310 psi_boot <− data.frame(matrix(NA, nrow = B, ncol = 21))
311
312 foreach(b = 1:B, .packages = c("dplyr", "xgboost", "glmnet", "SuperLearner"),
313 .verbose = TRUE) %do% {
314 bootIndices_iccpr <− sample(1:n_iccpr, replace = TRUE)
315 bootIndices_cedaw <− sample(1:n_cedaw, replace = TRUE)
316 bootIndices_cat <− sample(1:n_cat, replace = TRUE)
317
318 bootData_iccpr <− data_iccpr[bootIndices_iccpr, ]
319 bootData_cedaw <− data_cedaw[bootIndices_cedaw, ]
320 bootData_cat <− data_cat[bootIndices_cat, ]
321
322 # Create ICCPR ratification resample dataset
323 bootData_iccpr <− bootData_iccpr %>%
329 iccpr, cow))
330 bootData_iccpr <− data.frame(bootData_iccpr)
331 niccpr <− nrow(bootData_iccpr)
332 # Model ICCPR ratification using resample dataset
333 psi_iccpr <− data.frame(matrix(NA, nrow = 1, ncol = 7))
334 for (i in 1:7) {
335 id <− factor(bootData_iccpr$cow)
336 Y <− bootData_iccpr$iccpr
337 X <− data.frame(bootData_iccpr[, 1:15])
338 X1 <− X0 <− X
339 X1[, i] <− 1
340 X0[, i] <− 0
341 newdata <− rbind(X, X1, X0)
342 Q_iccpr <− mcSuperLearner(Y = Y, X = X, newX = newdata,
343 SL.library = SL.library,
344 family = "binomial",
345 method = "method.NNloglik",
346 cvControl = list(V = 5L),
57
347 verbose = TRUE)
348 predX1 <− Q_iccpr$SL.predict[(niccpr + 1):(2∗niccpr)]
349 predX0 <− Q_iccpr$SL.predict[(2∗niccpr + 1):(3∗niccpr)]
350 psi_iccpr[, i] <− mean(predX1 − predX0)
351 print(c(b, "ICCPR", i))
352 }
353
354 # Create CEDAW ratification resample dataset

355 bootData_cedaw <− bootData_cedaw %>%
356 dplyr::select(c(cedaw_glbavg, cedaw_regavg,
361 cedaw, cow))
362 bootData_cedaw <− data.frame(bootData_cedaw)
363 ncedaw <− nrow(bootData_cedaw)
364 # Model CEDAW ratification using resample dataset
365 psi_cedaw <− data.frame(matrix(NA, nrow = 1, ncol = 7))
366 for (i in 1:7) {
367 id <− factor(bootData_cedaw$cow)
368 Y <− bootData_cedaw$cedaw
369 X <− data.frame(bootData_cedaw[, 1:15])
370 X1 <− X0 <− X
371 X1[, i] <− 1
372 X0[, i] <− 0
374 Q_cedaw <− mcSuperLearner(Y = Y, X = X, newX = newdata,
379 verbose = TRUE)
380 predX1 <− Q_cedaw$SL.predict[(ncedaw + 1):(2∗ncedaw)]
381 predX0 <− Q_cedaw$SL.predict[(2∗ncedaw + 1):(3∗ncedaw)]
382 psi_cedaw[, i] <− mean(predX1 − predX0)
383 print(c(b, "CEDAW", i))
384 }
385
386 # Create CAT ratification resampled dataset
387 bootData_cat <− bootData_cat %>%
388 dplyr::select(c(cat_glbavg, cat_regavg,
393 cat, cow))
394 bootData_cat <− data.frame(bootData_cat)
395 ncat <− nrow(bootData_cat)
396 # Model CAT ratification using resampled dataset
397 psi_cat <− data.frame(matrix(NA, nrow = 1, ncol = 7))
398 for (i in 1:7) {
399 id <− factor(bootData_cat$cow)
58
400 Y <− bootData_cat$cat
401 X <− data.frame(bootData_cat[, 1:15])
402 X1 <− X0 <− X
403 X1[, i] <− 1
404 X0[, i] <− 0
406 Q_cat <− mcSuperLearner(Y = Y, X = X, newX = newdata,
411 verbose = TRUE)
412 predX1 <− Q_cat$SL.predict[(ncat + 1):(2∗ncat)]
413 predX0 <− Q_cat$SL.predict[(2∗ncat + 1):(3∗ncat)]
414 psi_cat[, i] <− mean(predX1 − predX0)
415 print(c(b, "CAT", i))
416 }
417
418 # Combine bootstrap estimates
419 psi_boot[b, 1:21] <− cbind(psi_iccpr, psi_cedaw, psi_cat)
420 }
421 psi_boot2 <− psi_boot
422 psi_boot3 <− psi_boot
423
424 lower_quantile <− function(x, prob){quantile(x, prob = 0.025)}
425 upper_quantile <− function(x, prob){quantile(x, prob = 0.975)}
426 mean_boot <− apply(psi_boot, 2, mean)
427 lower_boot <− apply(psi_boot, 2, lower_quantile)
428 upper_boot <− apply(psi_boot, 2, upper_quantile)
429
430 # Combine effect estimates across 5 datasets
431 variables <− c("Global rate", "Regional rate",
432 "Population", "GDP per capita",
433 "Trade", "Net ODA", "Judicial independence")
434
435 via_iccpr <− data.frame(cbind(mean = mean_boot[1:7],
436 lower = lower_boot[1:7],
437 upper = upper_boot[1:7]))
438 via_cedaw <− data.frame(cbind(mean = mean_boot[8:14],
441 via_cat <− data.frame(cbind(mean = mean_boot[15:21],
444
445 # Plot VIM results for all three treaty ratifications
446 effect <− rbind(via_iccpr, via_cedaw, via_cat) %>%
447 mutate(treaty = rep(c("ICCPR", "CEDAW", "CAT"), each = 7),
448 variables = rep(variables, 3)) %>%
449 dplyr::select(c(treaty, variables, mean, lower, upper))
450 row.names(effect) <− NULL
451 colnames(effect) <− c( "Treaty","Covariate", "Mean", "Lower", "Upper")
452 xtable(effect, digits = c(rep(3, 6)))
59
453
454 save.image("CVIA-continuous-SL.RData")
455
456 ######################################
457 # Testing Theories of Treaty Ratification Using SL−based TMLE
458 ######################################
461 rm(list = ls())
462 cat("\014")
463 options("scipen" = 100, "digits" = 4)
464
465 # Load packages
479 library(parallel) # parallel computing
480
481 # Tuning XGB
482 XGB = create.Learner("SL.xgboost",
484 detailed_names = T, name_prefix = "XGB")
485
487 SL.library <− c("SL.glmnet", "SL.gam", "SL.polymars",
488 "SL.randomForest", XGB$names)
489

492
495

498
501

504 x = (x − min(x))/(max(x) − min(x))
505 }
60
506
507 ###############################
508 # Test the effect of Democracy on Ratification among Torture = 0
509 set.seed(0)
512 DemTor0 <− data.frame(matrix(NA, nrow = 2, ncol = 5))
513
514 for (m in 1:5) {
515 print(c("Democracy on Ratification among Frequent Torture (0)", m))
516 # Subset to data among democracies only and transform variables
517 data_dem0 <− d[[m]] %>%
518 mutate(parties = ifelse(parties < 2, 0, 1),
519 legor = ifelse(legor == 1, 1, 0), ratifrule = std(ratifrule),
520 cat_glbavg = std(cat_glbavg), cat_regavg = std(cat_regavg),
522 gdppc = std(gdppc), trade = std(trade), oda = std(oda)) %>%
523 group_by(cow) %>% mutate(lagtorture = lag(torture),
524 lagcat = lag(cat)) %>%
525 filter(year >= 1985, lagtorture == 0) %>% na.omit()
526
528 id <− factor(data_dem0$cow)
529 Y = data_dem0$cat
530 A <− data_dem0$democracy
531 W <− data_dem0 %>% dplyr::select(legor, ratifrule,
532 lagcat,
533 parties, transition, dispute,
536 cow)
537 W <− data.frame(W) %>% dplyr::select(−c(cow))
538 tmle_demtor0 <− tmle(Y, A, W,
543 DemTor0[1:2, m] <− c(tmle_demtor0$estimates$ATE$psi,
544 sqrt(tmle_demtor0$estimates$ATE$var.psi))
545 }
546
547 # Combine estimates of TorDic effect
548 demtor0_comest <− data.frame(mi.meld(q = DemTor0[1, ], se = DemTor0[2, ],
549 byrow = FALSE))
550
551 ###############################
553 set.seed(1)
557
558 for (m in 1:5) {
61
559 print(c("Democracy on Ratification among Occasional Torture (1)", m))
561 data_dem1 <− d[[m]] %>%
570
576 lagcat,
580 cow)
589 }
590
593 byrow = FALSE))
594
595 ###############################
597 set.seed(2)
601
602 for (m in 1:5) {

603 print(c("Democracy on Ratification among No Torture (2)", m))
605 data_dem2 <− d[[m]] %>%
62
614
620 lagcat,
624 cow)
633 }
634
637 byrow = FALSE))
638
639 ###############################
640 # Test the effect of Torture on Ratification among All countries
641 # When Fix No Torture into Occassional/Frequent Torture
642 set.seed(3)
645 TorAll <− data.frame(matrix(NA, nrow = 2, ncol = 5))
646
647 for (m in 1:5) {
648 print(c("Torture on Ratification among All", m))
649 # Transform variables (0 = no torture, 1 = yes torture, parties < 2)
650 data_cat <− d[[m]] %>%
651 mutate(torture = ifelse(torture < 2, 1, 0),
658 lagcat = lag(cat),
659 lagdemocracy = lag(democracy),
660 lagparties = lag(parties),
661 lagtransition = lag(transition),
662 lagdispute = lag(dispute),
663 lagcat_glbavg = lag(cat_glbavg),
664 lagcat_regavg = lag(cat_regavg),
63
665 lagji = lag(ji),
666 lagpopulation = lag(population),
667 laggdppc = lag(gdppc),
668 lagtrade = lag(trade),
669 lagoda = lag(oda)) %>%
671

674 Y <− data_cat$cat
675 A <− data_cat$lagtorture
676 W <− data_cat %>% dplyr::select(legor, ratifrule,
677 lagcat,
678 lagdemocracy, lagparties, lagtransition, lagdispute,
679 lagcat_glbavg, lagcat_regavg,
680 lagpopulation, laggdppc, lagtrade, lagoda, lagji,
681 cow)
683 tmle_torall <− tmle(Y, A, W,
688 TorAll[1:2, m] <− c(tmle_torall$estimates$ATE$psi,
689 sqrt(tmle_torall$estimates$ATE$var.psi))
690 }
691
693 torall_comest <− data.frame(mi.meld(q = TorAll[1, ], se = TorAll[2, ],
694 byrow = FALSE))
695
696 ###############################
697 # Test the effect of Torture on Ratification among Democracies
699 set.seed(4)
702 TorDemo <− data.frame(matrix(NA, nrow = 2, ncol = 5))
703
704 for (m in 1:5) {
705 print(c("Torture on Ratification among Democracies", m))
707 data_tordemo <− d[[m]] %>%
64
727 filter(year >= 1985, lagdemocracy == 1) %>% na.omit()
728
730 id <− factor(data_tordemo$cow)
731 Y <− data_tordemo$cat
732 A <− data_tordemo$lagtorture
733 W <− data_tordemo %>% dplyr::select(legor, ratifrule,
734 lagcat,
735 lagparties, lagtransition, lagdispute,
738 cow)
740 tmle_tordemo <− tmle(Y, A, W,
745 TorDemo[1:2, m] <− c(tmle_tordemo$estimates$ATE$psi,
746 sqrt(tmle_tordemo$estimates$ATE$var.psi))
747 }
748

750 tordemo_comest <− data.frame(mi.meld(q = TorDemo[1, ], se = TorDemo[2, ],
751 byrow = FALSE))
752
753 ###############################
754 # Test the effect of Torture on Ratification among Dictatorships
756 set.seed(5)
759 TorAuto <− data.frame(matrix(NA, nrow = 2, ncol = 5))
760
761 for (m in 1:5) {

762 print(c("Torture on Ratification among Dictatorships", m))
764 data_torauto <− d[[m]] %>%
65
784 filter(year >= 1985, lagdemocracy == 0) %>% na.omit()
785
787 id <− factor(data_torauto$cow)
788 Y <− data_torauto$cat
789 A <− data_torauto$lagtorture
790 W <− data_torauto %>% dplyr::select(legor, ratifrule,
791 lagcat,
792 lagparties, lagtransition, lagdispute,
795 cow)
797 tmle_torauto <− tmle(Y, A, W,
802 TorAuto[1:2, m] <− c(tmle_torauto$estimates$ATE$psi,
803 sqrt(tmle_torauto$estimates$ATE$var.psi))
804 }
805
807 torauto_comest <− data.frame(mi.meld(q = TorAuto[1, ], se = TorAuto[2, ],
808 byrow = FALSE))
809
810 #############################
811 # Test the effect of Multiple Parties on Ratification among Dictatorships
812 set.seed(6)
815 PartyDic <− data.frame(matrix(NA, nrow = 2, ncol = 5))
816
817 for (m in 1:5) {
818 print(c("Multiple Parties on Ratification among Dictatorships", m))
820 # (0 = no torture, 1 = yes torture)
821 data_cat <− d[[m]] %>%
822 mutate(torture = std(torture),
66
828 group_by(cow) %>% mutate(lagcat = lag(cat), lagtorture = lag(torture)) %>%
829 filter(year >= 1985, democracy == 0) %>% na.omit()
830

832 Y <− data_cat$cat
834 A <− data_cat$parties
835 W <− data_cat %>% dplyr::select(legor, ratifrule,
837 transition, dispute,
840 cow)
842 tmle_partydic <− tmle(Y, A, W,
847 PartyDic[1:2, m] <− c(tmle_partydic$estimates$ATE$psi,
848 sqrt(tmle_partydic$estimates$ATE$var.psi))
849 }
850
851 # Combine estimates of PartyDic effect
852 partydic_comest <− data.frame(mi.meld(q = PartyDic[1, ], se = PartyDic[2, ],
853 byrow = FALSE))
854
855 #############################
856 effect <− data.frame(rbind(demtor0_comest, demtor1_comest, demtor2_comest,
857 torall_comest, torauto_comest, tordemo_comest,
858 partydic_comest))
859 effect <− cbind(Theory = c("Democracy w/ Torture (Frequent)",
860 "Democracy w/ Torture (Occasional)",
861 "Democracy w/ Torture (Never)",
862 "Torture among All",
863 "Torture among Dictatorships",
864 "Torture among Democracies",
865 "Parties among Dictatorships"),
866 effect) %>%
867 mutate(Lower = X1 − 1.96∗X2, Upper = X1 + 1.96∗X2) %>%
868 rename(Mean = X1, SE = X2)
869 xtable(effect, digits = rep(3, 6))
870
871 save.image("TheoryTesting-TMLE.RData")
67

NGUYEN Causal Inference Using Machine Learning PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

NGUYEN Causal Inference Using Machine Learning PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Causal Inference Using Machine Learning:

An Application to Human Rights Treaty Ratification

October 28, 2017

an outcome. This cultural distinction between prediction versus explaining notwithstanding, it

of its causal predictors could be extremely beneficial.

the findings with a causal interpretation.

an observational study, particularly of a complex problem, a graphical causal model could be

variables, but any consensus and agreements remain elusive.

Treaty CAT ICCPR CEDAW

Participation in international trade in particular has been shown to be a significant predictor of

arbitrary or even counterproductive (Chaudoin, Hays and Hicks 2016).

in the absence of an explicit, preferably graphical, representation of the causal model.

(Vreeland 2008, 68).

in double circle to indicate the sample selection of only dictatorships.1

parties) on either A (CAT ratification) or Y (torture).

3.1 Notation and causal model formulation

variable, I use its observed maximum and minimum values.

our knowledge, the underlying data-generating process. In my following model, W is a set of

time-invariant covariates; X1 and X2 are either binary or continuous time-varying predictors; Y

of the observed data is On = (W, X1t , X2t , At , Yt ) ∼ PO .

X1t = fX1 (W, At−1 , Yt−1 , X1t−1 , X2t−1 , UX1 )

X2t = fX2 (W, At−1 , Yt−1 , X1t−1 , X2t−1 , UX2 ) (1)

At = fA (W, At−1 , Yt−1 , X1t , X2t , UA )

Yt = fY (W, Yt−1 , At , X1t , X2t , UY )

a longer time span can be represented in a similar fashion.

at time t − 1, however, could certainly affect economic development at time t (notationally,

model cyclical and make it impossible to identify their causal effects.

scendants of At such as Yt and At+1 .

It should be emphasized that, short of a randomization of the treatment like in an exper-

von Stein 2005).

is equivalent to ruling out any direct causality.

and Yt−1 → X2t in my causal model is informed by this argument.

2015; Powell and Staton 2009).

covariates as ratification predictors. Appendix A provides more detailed variable descriptions,

coding, and data sources.

of a predictor in terms of its contemporaneous average causal effect on treaty ratification. It is

consistent estimate of the causal effect of X1 as long as causal identification is established.

3.2 Causal identification

an observational probability distribution. My causal identification strategy is to identify a valid

A (treaty ratification) essentially equivalent to its observed conditional distribution.

Sets Variables and references

(b) leave open all causal paths from X1t to At ; and

through a directed path emanating from the collider).

A when X1 is intervened upon is essentially equivalent to its observational distribution. More

remaining significant correlation between them is evidence of a causal relationship.

X1t ← At−1 → X2t → At .

We should not condition on contemporaneous measure of human rights practice Yt when

of a ratification predictor, I condition on time-invariant covariates, immediately prior ratification

status and level of compliance, and other contemporary time-varying covariates.

the use of torture possibly undermines democratic institutions.

3.3 Machine learning-based estimation

counterfactual outcomes, and compute the mean difference.

obtain the desired coverage.

to this problem of correct functional forms.

constraints in terms of computational resources.

Table 2: Algorithms used in Super Learning-based Substitution Estimation

data generating process (Polley and van der Laan 2010).

suitable to use in the context of nonparametric bootstrap for inference.

The performance of XGBoost could be sensitive to hyper-parameter setting. I employ a

Q0n or the treatment assignment model gn is consistent. It is maximally efficient asymptotically

because I have incorporated machine learning in my estimation.

a more diverse and richer set of learning algorithms in Table 3.

Table 3: Algorithms used in Super Learner-based Targeted Maximum Likelihood Estimation