Lecture 02
The code segments are translated to python in this notebook.
Let denote the treatment assignment and
denote the observed outcome for unit
.
We denote pre-treatment covariates by
. These should be measured before the treatment is measured.
Controlling for post-treatment variables can introduce bias to estimates of treatment effects (Rosenbaum, 1984).
For the example of college and earnings, we define
: whether you graduated from 4-year college
(
) or not () in 1964.
: income (in dollars) for the year 1973
: student
's high school characteristics, family economic status, sex. These were measured in the junior/senior year of high school.
For each unit we observe
We can formulate causality by defining:
: income of student
in 1973 if they graduated from college
: income of student
in 1973 if they did not graduate from college
: the difference in income if student
graduated from college
: the average difference in income that would happen if a student went to college.
Being sloppy:
A naive estimator of would be to take the difference between mean income in the population who graduated from college and those who didn't.
That is,
Note that this should be interpreted as "the two populations of interest should differ in mean income by this amount".
This completely fails to address any sort of counterfactual reality. That people can self-select into going to college
gives us .
The estimands that we want are which vary over the same population of individuals.
Supposing this is the case, then a non-zero
is attributable to the treatment. But if the populations differ (as in the college case), then
differences in the population could explain the treatment effect.
Note: the power of randomized trials is that the distribution of both unobserved covariates and observed covariates
are balanced between the two groups.
This can be seen in the attached notebook. While covariates like parental education are highly unbalanced when college is not randomized, the balance is nearly perfect when we fictitiously assign
students to go to college or not.
Formalizing conditions for treatment effect identification
In this section let's show why randomizing gives us
. This would give us
There are three assumptions needed to show the above equality: SUTVA, ignorability, and overlap.
SUTVA:
The Stable Unit Treatment Value Assumption links and
to the observed outcome
Rubin 1980.
SUTVA states that
or
which seems mathematically intuitive, but
the validity of this assumption in the context
of real-world complexity must be assessed by
subject-matter experts. There are two main things
that should be assessed:
- There are no hidden versions of treatment
or that treatment variation is negligible.
This measn that if
, either 1) there is only one version fpr unit
to receive the treatment (or control) or b) variations in how treatment is received do not affect potential outcomes.
- E.g. the college example above does not satisfy this assumption.
- There is no interference: that is, unit
's treatment value
does not impact whether another unit is treated.
- E.g. "being on a diet" does not satisfy assumption 1 (furthermore, even fixing a type of diet, this is violated)
Verbatim from the notes: the two articles by Hernan and Taubman and Cole and Frangakis provide excellent expositions on this topic. Also, check out a recent twitter thread by Miguel on this topic. In summary, the Rubin (via Frangakis, goes back to this paper) and Robins (via Hernan) school of causal inference emphasize the notion of well-defined interventions to define treatment in a good causal inference study. In fact, because of this phenomena, there is also debate as to whether we can estimate the causal effect of race since intervening on race is impossible. There was an entire journal dedicated to defining treatments in causal inference around 2017.
Takeaway: before you define a causal effect, think about whether the treatment is well-defined.
Ignorability:
This is called different things in different fields.
Firstly, we state unconditional ignorability.This holds in randomized experiments, and
where it is important that this set is taken together (otherwise our experiment would be meaningless).
In the sense of missing data, this gives us that missingness of or
by column happens completely at random (MCAR).
THis gets pretty mangy in observational contexts!