Semiparametric Inference For Causal Effects

In this section, we discuss how to estimate the effect of treatment \(T\) on outcome \(Y\) commonly expressed through the use of counterfactuals of the form \(Y(t)\) or \(Y|\text{do}(t)\) – which reads as the potential outcome \(Y\) had treatment been assigned to \(t\). We follow the estimation procedure outlined in our work Semiparametric Inference For Causal Effects In Graphical Models With Hidden Variables (Bhattacharya, Nabi, & Shpitser, 2020). If the outcome is continuous, the effect is typically captured by the average causal effect (ACE) defined as,

\(\psi = E[Y(1)] - E[Y(0)], \quad \text{(when Y is continuous.)}\)

When \(Y\) is binary, the effect is typically reported as the log of odds ratios (LOR) as follows.

\(\psi = \log(\frac{p(Y(1) = 1)/p(Y(1) = 0)}{p(Y(0) = 1)/p(Y(0) = 0)}), \quad \text{(when Y is binary.)}\)

In Ananke, we consider identification and estimation of these quantities using the language of causal graphs. We illustrate these through a series of examples with increasing levels of complications. The good news is, all the user needs to specify is the data (as a Pandas data frame) and a story for what they believe the world looks like in the form of an acyclic directed mixed graph (ADMG) where each bidirected edge encodes unmeasured common confounders. Ananke will then provide suggestions for the best estimator given the graphical story provided by the user. If it is not possible to compute the ACE/LOR given this story (i.e., the quantity is not identified) then Ananke will also inform the user if this is the case.

Following are the necessary packages that need to be imported.

[1]:
from ananke.graphs import ADMG
from ananke.estimation import CausalEffect
from ananke.datasets import load_afixable_data, load_conditionally_ignorable_data, load_frontdoor_data
from ananke.estimation import AutomatedIF
import numpy as np

Let’s say we are studying the efficacy of a new antiretroviral therapy vs. an old one. Call this the treatment \(T\) where \(T=1\) represents receiving the new drug and \(T=0\) represents receiving the old. Our outcome of interest is the patients CD4 counts post treatment.

Estimation via Adjustment

As a first pass, an analyst might check the causal effect under the causal graphical model where all common confounders of the treatment and outcome are measured. A highly simplistic story would be as follows. The initial viral load of the patient affects both which treatment the patient is assigned to as well as their final outcome. This is represented graphically as follows. Just be careful not to have spaces in your variable names, the machine learning packages underlying Ananke are not compatible with that.

[2]:
vertices = ['ViralLoad', 'T', 'CD4']
di_edges = [('ViralLoad', 'T'), ('ViralLoad', 'CD4'), ('T', 'CD4')]
G = ADMG(vertices, di_edges)
G.draw(direction="LR")
[2]:
../_images/notebooks_estimation_3_0.svg

We now load some toy data on these three variables, and ask Ananke to estimate the effect. We see that as soon as we instantiate the CausalEffect object, Ananke offers a list of estimators that are available under the specified causal graph. Further, it recommends one that achieves the semiparametric efficiency bound, i.e., the estimator with the lowest variance.

[3]:
ace_obj = CausalEffect(graph=G, treatment='T', outcome='CD4')  # setting up the CausalEffect object

 Treatment is a-fixable and graph is mb-shielded.

 Available estimators are:

1. IPW (ipw)
2. Outcome regression (gformula)
3. Generalized AIPW (aipw)
4. Efficient Generalized AIPW (eff-aipw)

Suggested estimator is Efficient Generalized AIPW

In this case, it recommends Efficient Generalized AIPW which is to say, that the estimator used to compute the effect in this case looks a lot like Augmented IPW. Further, Ananke uses semiparametric theory in order to provide an estimator that achieves the lowest asymptotic variance. Turns out, the efficient estimator in this case is trivial, but the next case demonstrates a harder scenario.

Given the list of estimators, it is up to the user to specify what estimators they want to work with. For instance, if the user decides to use the efficient generalized AIPW, they only need to use the keyword given in front of it, i.e., eff-aipw, when computing the effect. All the nuisance models are fit using generalized linear models provided in the statsmodels. Users interested in using different modeling approaches can refer to the documentation on accessing the functional form of the influence functions at the end of this notebook.

[4]:
data = load_conditionally_ignorable_data() # loading the data
ace = ace_obj.compute_effect(data, "eff-aipw") # computing the effect
print("ace = ", ace, "\n")
ace =  0.9848858526281847

A realistic model will rarely be so simple. So what if we want to build a more complicated model such as the one shown below. Where, bidirected edges (\(\leftrightarrow\)) are used to encode unmeasured common confounders. For example, we hypothesize that we do not have information on some unmeasured common confounders between an individual’s income and treatment assignment.

[5]:
vertices = ['Income', 'Insurance', 'ViralLoad', 'Education', 'T', 'Toxicity', 'CD4']
di_edges = [('ViralLoad', 'Income'), ('ViralLoad', 'T'), ('ViralLoad', 'Toxicity'),
            ('Education', 'Income'), ('Education', 'T'), ('Education', 'Toxicity'),
            ('Income', 'Insurance'), ('Insurance', 'T'), ('T', 'Toxicity'), ('Toxicity', 'CD4'), ('T', 'CD4')]
bi_edges = [('Income', 'T'), ('Insurance', 'ViralLoad'), ('Education', 'CD4')]
G = ADMG(vertices, di_edges, bi_edges)
G.draw(direction="LR")
[5]:
../_images/notebooks_estimation_9_0.svg
[6]:
ace_obj = CausalEffect(graph=G, treatment='T', outcome='CD4')  # setting up the ACE object

 Treatment is a-fixable and graph is mb-shielded.

 Available estimators are:

1. IPW (ipw)
2. Outcome regression (gformula)
3. Generalized AIPW (aipw)
4. Efficient Generalized AIPW (eff-aipw)

Suggested estimator is Efficient Generalized AIPW

In the following, we print the outputs of all four estimators.

[7]:
data = load_afixable_data() # load some pre-simulated data
ace_ipw = ace_obj.compute_effect(data, "ipw")
ace_gformula = ace_obj.compute_effect(data, "gformula")
ace_aipw = ace_obj.compute_effect(data, "aipw")
ace_eff = ace_obj.compute_effect(data, "eff-aipw")
print("ace using IPW = ", ace_ipw)
print("ace using g-formula = ", ace_gformula)
print("ace using AIPW = ", ace_aipw)
print("ace using efficient AIPW = ", ace_eff)
ace using IPW =  0.4917151858666866
ace using g-formula =  0.49680779965965227
ace using AIPW =  0.5005117929752627
ace using efficient AIPW =  0.4917151858667226

In addition, the user can run bootstraps and obtain \((1-\alpha)*100\%\) confidence level for the causal effect point estimate. The user needs to specify two arguments: number of bootstraps n_bootstraps and the significance level \(\alpha\) alpha. The confidence interval can then be obtained via bootstrap percentile intervals, as described in All of Nonparametric Statistics. In this case, the call to the compute_effect returns three values: the first corresponds to the effect computed on the original data, the second and third are the pre-specified lower and upper quantiles, i.e., \((\frac{\alpha}{2}, 1 - \frac{\alpha}{2}).\) The default value for \(\alpha\) is set to be \(0.05\). We illustrate this through the following toy example.

[8]:
np.random.seed(0)  # setting the seed to get consistent numbers in the documentation
ace, Ql, Qu = ace_obj.compute_effect(data, "eff-aipw", n_bootstraps=5, alpha=0.05)
print(" ace = ", ace, "\n",
      "lower quantile = ", Ql, "\n",
      "upper quantile = ", Qu)
 ace =  0.4917151858667208
 lower quantile =  0.45738153193197817
 upper quantile =  0.5898403691123565

Estimation beyond Adjustment

The previous examples showed that there exists a large class of causal graphs for which we can obtain the causal effect via covariate adjustment. But what happens when covariate adjustment fails? It turns out, that there is an even wider class of models for which we can obtain the causal effect, even when we cannot block all backdoor paths from the treatment to the outcome. Revisiting our running example on antiretroviral therapies: Let’s first see an example where the causal effect is not (non-parametrically) identified from the observed data.

[9]:
vertices = ['ViralLoad', 'T', 'CD4']
di_edges = [('ViralLoad', 'T'), ('ViralLoad', 'CD4'), ('T', 'CD4')]
bi_edges = [('T', 'CD4')]
G = ADMG(vertices, di_edges, bi_edges)
G.draw(direction="LR")
[9]:
../_images/notebooks_estimation_16_0.svg

You will notice that this differs ever so slightly yet crucially from our first example in that we as analysts now believe that there are unmeasured confounders between the treatment (T) and the outcome (CD4). If we ask Ananke to estimate the causal effect under this model of the world, it will declare (correctly) that the effect is not identified non-parametrically under the given model.

[10]:
ace_obj = CausalEffect(graph=G, treatment='T', outcome='CD4')
Effect is not identified!

So what can we do then? Well, if we believe another model wherein the effect of the treatment is mediated completely through some biologically known mechanism whose by-product we can measure. Say this occurs through a reduction in viral load, and that it addition to a baseline measurement we have a second follow-up measurement.

[11]:
vertices = ['ViralLoad1', 'T', 'ViralLoad2', 'CD4']
di_edges = [('ViralLoad1', 'T'), ('ViralLoad1', 'CD4'), ('ViralLoad1', 'ViralLoad2'),
            ('T', 'ViralLoad2'), ('ViralLoad2', 'CD4')]
bi_edges = [('T', 'CD4')]
G = ADMG(vertices, di_edges, bi_edges)
G.draw(direction="LR")
[11]:
../_images/notebooks_estimation_20_0.svg

It so happens that this is quite similar to the front-door graph that may be familiar to some users, and the effect is now identified! Here is how we can estimate it using Ananke.

[12]:
ace_obj = CausalEffect(graph=G, treatment='T', outcome='CD4')  # setting up the ACE object
data = load_frontdoor_data()
ace = ace_obj.compute_effect(data, "eff-apipw")
print("ace = ", ace)

 Treatment is p-fixable and graph is mb-shielded.

 Available estimators are:

1. Primal IPW (p-ipw)
2. Dual IPW (d-ipw)
3. APIPW (apipw)
4. Efficient APIPW (eff-apipw)

Suggested estimator is Efficient APIPW

ace =  -0.7813625198154692

Useful Graphical Criteria

In the previous section, we have seen that causal effects are not always identified, and if they are, they could be identified and estimated in different ways. Here we provide some simple graphical criteria that a user should keep in mind while creating their graphical model that may help guide them to obtain the best estimator possible. These graphical criteria and their corresponding estimators can be found in our work Semiparametric Inference For Causal Effects In Graphical Models With Hidden Variables.

Adjustment Fixability

If the treatment of interest (say \(T\)) has no bidirected path to any of its descendants (i.e., there is no \(T \leftrightarrow \cdots \leftrightarrow X\) for any \(X\) that is a descendant of \(T\)), then we say the treatment is adjustment or a-fixable. In this case, Ananke will find estimators that are based on IPW and covariate adjustment as well as a semiparametric estimator that has the form of Augmented IPW. Ananke will always recommend Generalized AIPW as the preferred estimator due to its double robustness property. Further, if the graph satisfies a property known as the mb-shielded (Markov blanket shielded) property, Ananke will be able to provide the most efficient influence function based estimator – one that achieves the semiparametric efficiency bound, and recommend using this instead when possible. Below is an example of a causal graph where the treatment is a-fixable and the graph is mb-shielded.

[13]:
vertices = ['Income', 'Insurance', 'ViralLoad', 'Education', 'T', 'Toxicity', 'CD4']
di_edges = [('ViralLoad', 'Income'), ('ViralLoad', 'T'), ('ViralLoad', 'Toxicity'),
            ('Education', 'Income'), ('Education', 'T'), ('Education', 'Toxicity'),
            ('Income', 'Insurance'), ('Insurance', 'T'), ('T', 'Toxicity'), ('Toxicity', 'CD4'), ('T', 'CD4')]
bi_edges = [('Income', 'T'), ('Insurance', 'ViralLoad'), ('Education', 'CD4')]
G = ADMG(vertices, di_edges, bi_edges)
ace_obj = CausalEffect(graph=G, treatment='T', outcome='CD4')  # setting up the ACE object
G.draw(direction="LR")

 Treatment is a-fixable and graph is mb-shielded.

 Available estimators are:

1. IPW (ipw)
2. Outcome regression (gformula)
3. Generalized AIPW (aipw)
4. Efficient Generalized AIPW (eff-aipw)

Suggested estimator is Efficient Generalized AIPW

[13]:
../_images/notebooks_estimation_24_1.svg

Primal Fixability

If the treatment of interest (say \(T\)) has no bidirected path to any of its children (i.e., there is no \(T \leftrightarrow \cdots \leftrightarrow X\) for any \(X\) that is a child of \(T\)), then we say the treatment is primal or p-fixable. Notice that due to the graphical criterion being related to children as opposed to all descendants, p-fixability is strictly more general than a-fixability. Further, p-fixability covers many important and popular classes of causal graphs such as the front-door graph.

When \(T\) is p-fixable, Ananke finds estimators based on Primal IPW, Dual IPW, and Augmented Primal IPW. Ananke will always recommend Augmented Primal IPW as the preferred estimator. As before, when the graph is mb-shielded, Ananke is also able to provide the efficient influence function based estimator and will instead recommend Efficient Augmented Primal IPW. Below is an example of an mb-shielded graph where \(T\) is p-fixable.

[14]:
vertices = ['ViralLoad', 'Income', 'T', 'Toxicity', 'Adherence', 'CD4']
di_edges = [('ViralLoad', 'T'), ('ViralLoad', 'Toxicity'), ('ViralLoad', 'Adherence'), ('ViralLoad', 'CD4'),
            ('Income', 'T'), ('Income', 'Adherence'),
            ('T', 'Toxicity'), ('Toxicity', 'Adherence'), ('Adherence', 'CD4'), ('T', 'CD4')]
bi_edges = [('T', 'Adherence'), ('Toxicity', 'CD4')]
G = ADMG(vertices, di_edges, bi_edges)
ace_obj = CausalEffect(graph=G, treatment='T', outcome='CD4')  # setting up the ACE object
G.draw(direction="LR")

 Treatment is p-fixable and graph is mb-shielded.

 Available estimators are:

1. Primal IPW (p-ipw)
2. Dual IPW (d-ipw)
3. APIPW (apipw)
4. Efficient APIPW (eff-apipw)

Suggested estimator is Efficient APIPW

[14]:
../_images/notebooks_estimation_26_1.svg

Nested Fixability

Finally, this if the treatment is neither a-fixable nor p-fixable, Ananke checks if it is what we call nested fixable. That is, the treatment is nested fixable if running Algorithm 2 in our paper succeeds in finding an estimator. Ananke will then allow the user to use either Nested IPW or Augmented Nested IPW but its recommendation will be the latter. Below is an example.

[15]:
vertices = ['Income', 'Exercise', 'T', 'Toxicity', 'Adherence', 'CD4', 'ViralLoad']
di_edges = [('ViralLoad', 'T'), ('ViralLoad', 'CD4'), ('Income', 'T'), ('Exercise', 'CD4'),
            ('T', 'Toxicity'), ('T', 'CD4'), ('Toxicity', 'Adherence'), ('Adherence', 'CD4')]
bi_edges = [('Income', 'Toxicity'), ('Exercise', 'Income'), ('Exercise', 'T'),
            ('ViralLoad', 'Adherence'), ('ViralLoad', 'CD4')]
G = ADMG(vertices, di_edges, bi_edges)
ace_obj = CausalEffect(graph=G, treatment='T', outcome='CD4')  # setting up the ACE object
G.draw(direction="LR")

 Effect is identified.

 Available estimators:

1. Nested IPW (n-ipw)
2. Augmented NIPW (anipw)

Suggested estimator is Augmented NIPW

[15]:
../_images/notebooks_estimation_28_1.svg

Note that naturally the class of models that satisfy a-fixability, p-fixability and nested fixability are subsets of each other. That is,

\[\text{adjustment fixability} \subset \text{primal fixability} \subset \text{nested fixability}.\]

Thus, if the user wanted to use Nested IPW for a-fixable graphs, Ananke would not stop you. However, the reverse will result in an error from Ananke. Further, if the effect is not identified, Ananke will not allow the user to use any of the estimators.

Accessing the Functional Form of the Influence Function

For advanced users, we also provide tools to access the functional form of the influence funciton directly. If the treatment is adjustment fixable, the code spits out the functional form of the non-parametric influence function given in Theorem 2 in our paper. If the treatment is primal fixable, the code provides the functional form of the non-parametric influence function given in Theorem 11 in our paper. Further, if the graph is mb-shielded, we provide the respective efficient influence function. In the following examples we are interested in the effect of treatment \(T\) on outcome \(Y\).

[16]:
vertices = ['Z1', 'Z2', 'C2', 'C1', 'T', 'M', 'Y']
di_edges = [('C2', 'Z1'), ('C2', 'T'), ('C2', 'M'),
            ('C1', 'Z1'), ('C1', 'T'), ('C1', 'M'),
            ('Z1', 'Z2'), ('Z2', 'T'), ('T', 'M'), ('M', 'Y'), ('T', 'Y')]
bi_edges = [('Z1', 'T'), ('Z2', 'C2'), ('C1', 'Y')]
G = ADMG(vertices, di_edges, bi_edges)
influence = AutomatedIF(G, 'T', 'Y')
print("np-IF = ", influence.nonparametric_if_, "\n")
print("beta primal = ", influence.beta_primal_, "\n")
print("efficient IF = \n", influence.eff_if_, "\n")
G.draw(direction="LR")

 Treatment is a-fixable and graph is mb-shielded.

 Available estimators are:

1. IPW (ipw)
2. Outcome regression (gformula)
3. Generalized AIPW (aipw)
4. Efficient Generalized AIPW (eff-aipw)

Suggested estimator is Efficient Generalized AIPW

np-IF =  I(T=t) x 1/p(T|C2,Z1,C1,Z2) x (Y - E[Y|T=t,C2,Z1,C1,Z2]) + E[Y|T=t,C2,Z1,C1,Z2] - Ψ

beta primal =  I(T=t) x 1/p(T|C2,Z1,C1,Z2) x Y

efficient IF =
 E[βprimal|C1] - E[βprimal] + E[βprimal|T,M,Y,C1] - E[βprimal|T,M,C1] + E[βprimal|C2] - E[βprimal] + E[βprimal|C2,Z1,C1] - E[βprimal|C2,C1] + E[βprimal|T,M,C2,C1] - E[βprimal|T,C2,C1] + E[βprimal|Z1,C2,Z2] - E[βprimal|C2,Z1]

[16]:
../_images/notebooks_estimation_31_1.svg
[17]:
vertices = ['C2', 'C1', 'T', 'M1', 'M2', 'Y']
di_edges = [('C2', 'T'), ('C2', 'M1'), ('C2', 'M2'), ('C2', 'Y'),
            ('C1', 'T'), ('C1', 'M2'),
            ('T', 'M1'), ('M1', 'M2'), ('M2', 'Y'), ('T', 'Y')]
bi_edges = [('T', 'M2'), ('M1', 'Y')]
G = ADMG(vertices, di_edges, bi_edges)
influence = AutomatedIF(G, 'T', 'Y')
print("beta primal = ", influence.beta_primal_, "\n")
print("beta dual = ", influence.beta_dual_, "\n")
print("np-IF = ", influence.nonparametric_if_, "\n")
print("efficient IF = \n", influence.eff_if_, "\n")
G.draw(direction="LR")

 Treatment is p-fixable and graph is mb-shielded.

 Available estimators are:

1. Primal IPW (p-ipw)
2. Dual IPW (d-ipw)
3. APIPW (apipw)
4. Efficient APIPW (eff-apipw)

Suggested estimator is Efficient APIPW

beta primal =  I(T=t) x 1/[p(T|C2,C1)p(M2|T,C2,C1,M1)] x Σ_T p(T|C2,C1)p(M2|T,C2,C1,M1) x Y

beta dual =  [p(M1|T=t,C2)/p(M1|T,C2)] x E[Y|T=t,M2,C2,M1]

np-IF =  E[βprimal or βdual|C2,C1] - E[βprimal or βdual|C1] + E[βprimal or βdual|C1] - E[βprimal or βdual] + E[βdual|T,C2,C1] - E[βdual|C1,C2] + E[βprimal|T,C2,C1,M1] - E[βprimal|C1,C2,T] + E[βdual|T,C1,M1,M2,C2] - E[βdual|C1,C2,T,M1] + E[βprimal|T,C1,M1,M2,Y,C2] - E[βprimal|C1,C2,T,M1,M2]

efficient IF =
 E[βprimal or βdual|C2] - E[βprimal or βdual] + E[βprimal or βdual|C1] - E[βprimal or βdual] + E[βdual|T,C2,C1] - E[βdual|C2,C1] + E[βprimal|T,C2,M1] - E[βprimal|T,C2] + E[βdual|T,C2,C1,M1,M2] - E[βdual|T,C2,C1,M1] + E[βprimal|T,C2,M1,M2,Y] - E[βprimal|T,M2,C2,M1]

[17]:
../_images/notebooks_estimation_32_1.svg