Lecture 01

Goals of the course:

  • Quantify causal questions using the mathematical language of potential outcomes (one framework)
  • Design studies to estimate causal effects
  • Analyze data from these studies to estimate causal effects
  • Assess robustness of analysis to violations of underlying modeling assumptions.

Potential Outcomes Model for Defining Effects Caused by a Treatment

Definitions

These are due to Neyman 1923, Rubin 1974. More background can be found in Holland (1986).

Let A\mathcal{A} be a set of treatments. The simplest example is perhaps A={0,1}\mathcal{A} = \{0,1\} where 0 corresponds to a control and 1 corresponds to a treatment.

A "unit" refers to a sample to which a treatment can be applied. E.g. a patient, or perhaps a model run? We use the notation

  • Yi(1) Y_i(1) denotes the potential outcome if treatment is applied.
  • Yi(0) Y_i(0) denotes the potential outcome if control is applied.

The causal effect of treatment compared to control for unit ii can be expressed as τi=Yi(1)Yi(0). \tau_i = Y_i(1) - Y_i(0).

Let's put together a climate example: measure whether tropical cyclone forms under deep/shallow dynamics in a weather-scale experiment.

  • τi=1\tau_i = 1 would show that deep atmosphere causes tropical cyclone for unit ii.
  • τi=0\tau_i = 0 means deep atmosphere does not cause for unit ii.
  • τi=1 \tau_i = -1 means deep atmosphere inhibits tropical cyclone from forming for unit i i.

Some notes on causal effects:

  • The causal effect of a treatment can only be defined in reference to another treatment (e.g. a control). Do these treatments have to be mutually exclusive?
  • This framework focuses on effects which result from causes (effect of deep/shallow atmosphere on tropical cyclones) rather than causes of effects (why did tropical cyclone form?) In the real world (why did Judy get lung cancer?) you have the problem of infinite regress (she has lung cancer because she smoked, because her parents smoked, because her parents hated each other...). In the computational world this might not be true?
    • Potential outcomes gives actionable information on how to live our lives. And can do so in purely observational situations.
  • Cause-effect relationships have to have a temporal ordering.
    • Can't have effect before a cause.
    • Can't have causal simultaneity --> impossible to distinguish directionality.
  • Relationship to do calculus
    • τi=P(Yi=ydo(1))P(Yi=ydo(0))\tau_i = \probp(Y_i=y \mid \textrm{do}(1)) - \probp(Y_i=y \mid \textrm{do}(0))

Before-after study: temporal stability and causal transience

Fundamental problem of causal inference:

  • We cannot observe both Yi(1)Y_i(1) and Yi(0)Y_i(0) in the real world, and therefore we cannot observe the causal effect of the active treatment

Temporal stability

  • Temporal stability assumption: The value of Yi(0)Y_i(0) does not depend on when we apply 0 0 to unit i i and then measure.
  • If this holds, we can take a sequence of measurements, then we can measure Yi(Z)Y_i(Z) by a sequence of experiments.

Causal Transience

  • The value of Yi(1)Y_i(1) is not impacted by applying control to unit ii, then measure YiY_i.
  • This gives us the ability to measure both Yi(0)Y_i(0) and Yi(1)Y_i(1) for the same unit ii via a sequence of experiments under limited assumptions.

Example of when this is dubious: measure the impact of a treatment of an illness (tendency of patients to get better over time).

Lab controlled experiments and Unit Homogeneity

This is the assumption that different units respond identically to treatment, e.g. Yi(z)=Yj(z)Y_i(z) = Y_j(z) for zAz \in \mathcal{A}. E.g. knockout experiments on mice: engineer nearly genetically identical mice and vary a single gene. Potential outcomes should have the same distribution across units.

Statistical approaches to causality

A statistical approach to causal inference, we seek to infer some analogue of the difference between potential outcomes A frequent estimand is the Average Treatment Effect (ATE): τ=E[Yi(1)Yi(0)]=E[Yi(1)]E[Yi(0)] \begin{align*} \tau &= \probe[Y_i(1) - Y_i(0)] \\ &= \probe[Y_i(1)] - \probe[Y_i(0)] \end{align*} which is linear in a way that allows us to approximate the treatment effect using only the marginal expectations E[Yi(1)], E[Yi(0)]\probe[Y_i(1)],\ \probe[Y_i(0)]

There are estimands which do not immediately have this property and more work must be done.

  • The median of τi\tau_i
  • E[1{τi>0}] \probe [\boldsymbol{1}_{\{\tau_i > 0\}}]

A first look at SUTVA

If we are interested in tauˉi=E[τi]. \bar{tau}_i = \probe[\tau_i]. Why is this still challenging if we focus on marginals instead of the joint distribution? Suppose ZiZ_i is the chosen treatment that unit ii recieves. SUTVA (which will be introduced later) gives us Yi=Yi(Zi)={Yi(1)Zi=1Yi(0)zi=0 \begin{align*} Y_i &= Y_i(Z_i) \\ &= \begin{cases} Y_i(1) \quad Z_i = 1 \\ Y_i(0) \quad z_i = 0 \end{cases} \end{align*} and thus the observed data is {(Yi,Zi)}1in.\{(Y_i,Z_i)\}_{1 \leq i \leq n}. The population which is treated (and for which we can observe the treatment effect) is disjoint from the population for which we get to observe the control.

To put this mathematically, E[YiZi=1]=E[Yi(1)Zi=1]E[YiZi=0]=E[Yi(0)Zi=0] \begin{align*} \probe[Y_i \mid Z_i = 1] = \probe[Y_i(1) \mid Z_i = 1] \\ \probe[Y_i \mid Z_i = 0] = \probe[Y_i(0) \mid Z_i = 0] \end{align*}

But in general E[Yi(z)Zi=z]E[Yi(z)]. \probe[Y_i(z) \mid Z_i = z] \neq \probe[Y_i(z)]. Especially in observational settings, the way treatments are assigned (or self selected) can bias these populations.

Modeling causality as a missing-data problem

The crux of Rubin's causal model is considering causality as a missing-data problem (e.g. Pearl takes significant issue with this).

Fundamentally, the "science table" tends to look like this:

Yi(1)Y_i(1) Yi(0)Y_i(0) τi\tau_i
?2?
6??
?8?
?10?

Are these entries missing completely at random (MCAR)? We don't usually know

Randomized experiments vs Observational studies

One sufficient condition which gives E[YiZi=z]=E[Yi(z)]\probe[Y_i \mid Z_i = z] = \probe[Y_i(z)] is independence:

  • (Yi(1),Yi(0))Zi (Y_i(1), Y_i(0)) \bot Z_i
  • P(Zi=1(Yi(1),Yi(0)))=P(Zi=1) \probp(Z_i = 1 \mid (Y_i(1), Y_i(0))) = \probp(Z_i=1)

One can get this in a randomized experiment

Randomized expieriments

If we assign individuals into treatment groups, then we can enforce this independence by design.

A simple starting example: a completely randomized experiment:

  • n1n_1 individuals are given treatment, n0n_0 are given control, n=n0+n1n = n_0 + n_1. The assignment proportions may be imbalanced e.g. if treatment is very expensive.
  • Z(Z1,,Zn) Z \equiv (Z_1, \ldots, Z_n)^\top
  • Ω={zizi=n1} \Omega = \{z \mid \sum_{i} z_i = n_1\} is the set of allowable treatments.
  • P(Z=zZΩ,Y(1),Y(1))=P(Z=zZΩ)=1Ω=(nn1).\probp(Z=z \mid Z \in \Omega, Y(1), Y(1)) = \probp(Z=z \mid Z \in \Omega) = \frac{1}{|\Omega|} = {n \choose n_1}. This is the complete randomization.
  • P(Zi=1ZΩ,Yi(1),Yi(0))=P(Zi=1ZΩ)=n1n \probp(Z_i=1 \mid Z \in \Omega, Y_i(1), Y_i(0)) = \probp(Z_i=1 \mid Z \in \Omega) = \frac{n_1}{n}
  • Assignment is independent of potential outcomes. Both in an informal and formal sense.

Observational studies:

An observational study must be done when it's not feasible (e.g. ethical) to do a controled experimentation. Self selection is possible, so the above independence condition does not hold.

  • Observational experiments MUST be done in certain situations.
  • Poorly designed observational studies can be complete garbage.

Parent post: