Class 5: DAGs, do()ing stuff, and measuring stuff

# In-person session 5

**February 9, 2023**

]

---

# Plan for today

.box-5.medium.sp-after-half[DAGs]

.box-6.medium.sp-after-half[Logic models, DAGs, and measurement]

.box-2.medium.sp-after-half[Potential outcomes vs. do() notation]

.box-4.medium.sp-after-half[do-calculus, adjustment, and CATEs]

---

layout: false
name: dags
class: center middle section-title section-title-5 animated fadeIn

# DAGs

---

---

.box-5.large[Causal thinking is necessary— even for descriptive work!]

---

]

???

Necessity of causal thinking: Mention the McElreath tweet on birth certificate introduction and death ages: <https://twitter.com/rlmcelreath/status/1427564280744976384>

<https://www.biorxiv.org/content/10.1101/704080v2>

---

.box-5.less-medium["Every time I get a haircut, I become more mature!"]

---

.box-5.less-medium["Every time I get a haircut, I become more mature!"]

---

.box-5.less-medium[Getting older opens a backdoor path]

---

.box-5.medium[But what does that mean, "opening a backdoor path"?]

.box-5.medium[How does statistical association get passed through paths?]

---

.box-5.less-medium.sp-after[How do I know which of these is which?]

.center[
<figure>
 <img src="img/05-class/dag-associations.png" alt="DAG associations" title="DAG associations" width="100%">
</figure>
]

---

.pull-left[
<figure>
 <img src="img/04-class/slider-switch-plain-80.jpg" alt="Switch and slider" title="Switch and slider" width="100%">
</figure>
]

.pull-right[
<img src="05-class_files/figure-html/confounding-dag-alone-1.png" width="100%" style="display: block; margin: auto;" />
]

---

.pull-left[
<figure>
 <img src="img/04-class/slider-switch-plain-80.jpg" alt="Switch and slider" title="Switch and slider" width="100%">
</figure>
]

.pull-right[
<img src="05-class_files/figure-html/mediating-dag-alone-1.png" width="100%" style="display: block; margin: auto;" />
]

---

.pull-left[
<figure>
 <img src="img/04-class/slider-switch-plain-80.jpg" alt="Switch and slider" title="Switch and slider" width="100%">
</figure>
]

.pull-right[
<img src="05-class_files/figure-html/colliding-dag-alone-1.png" width="100%" style="display: block; margin: auto;" />
]

---

.center[
<video controls loop>
 <source src="img/05-class/video-confounding-unblocked.mp4" type="video/mp4">
</video>
]

---

.center[
<video controls loop>
 <source src="img/05-class/video-confounding-blocked.mp4" type="video/mp4">
</video>
]

---

.center[
<video controls loop>
 <source src="img/05-class/video-mediation.mp4" type="video/mp4">
</video>
]

---

---

.box-5.medium[d-separation]

.box-inv-5[Except for the one arrow between X and Y, no statistical association can flow between X and Y]

.box-inv-5[This is **identification**— all alternative stories are ruled out and the relationship is isolated]

---

.box-5.large[How exactly do colliders mess up your results?]

.box-5.medium[It looks like you can still get the effect of X on Y]

---

---

.center[
<figure>
 <img src="img/04-class/facebook.png" alt="Facebook collider" title="Facebook collider" width="55%">
</figure>
]

???

<https://www.businessinsider.com/facebook-sent-incomplete-misinformation-data-flawed-researchers-2021-9>

<https://www.nytimes.com/live/2020/2020-election-misinformation-distortions#facebook-sent-flawed-data-to-misinformation-researchers>

---

# Does niceness improve appearance?

]

]

---

# Collider distorts the true effect!

]

]

---

.box-5.large[Effect of race on police use of force using administrative data]

---

.box-5.medium[Effect of race on police use of force using administrative data]

.pull-left[
<figure>
 <img src="img/05-class/klm-dag.png" alt="Use of force" title="Use of force" width="100%">
</figure>
]

.pull-right[
<figure>
 <img src="img/05-class/klm.png" alt="Use of force" title="Use of force" width="100%">
</figure>
]

---

---

.box-5.large[Smoking → Cardiac arrest example]

???

| Person | Smoker | Cardiac arrest | Cholesterol | Weight | Lifestyle healthiness |
|--------|--------|----------------|-------------|--------|-----------------------|
| 1      | TRUE   | TRUE           | 150         | 170    | 6                     |
| 2      | TRUE   | FALSE          | 170         | 180    | 3                     |
| 3      | FALSE  | FALSE          | 130         | 110    | 9                     |
| 4      | FALSE  | TRUE           | 140         | 140    | 8                     |
| 5      | TRUE   | TRUE           | 120         | 150    | 2                     |
| 6      | TRUE   | FALSE          | 130         | 230    | 3                     |
| 7      | FALSE  | FALSE          | 140         | 250    | 10                    |

```text
dag {
bb="0,0,1,1"
"Cardiac arrest" [outcome,pos="0.599,0.432"]
Cholesterol [pos="0.415,0.440"]
Lifestyle [pos="0.156,0.317"]
Smoking [exposure,pos="0.243,0.428"]
Weight [adjusted,pos="0.297,0.255"]
Cholesterol -> "Cardiac arrest"
Lifestyle -> Smoking
Lifestyle -> Weight
Smoking -> Cholesterol
Weight -> Cholesterol
}
```

---

.box-5.less-medium[How do you know if the DAG is right??]

.box-5.less-medium[How can you be sure you include everything in a DAG?]

.box-5.less-medium[How do you know when to stop?]

.box-5.less-medium[Is there a rule of thumb for the number of nodes?]

???

<https://evalsp23.classes.andrewheiss.com/example/dags.html#mosquito-net-example>

---

.box-5.medium[Why can we combine nodes in a DAG if they don't represent the same concept?]

.box-5.medium[Why include unmeasurable things in a DAG?]

???

<https://stats.andrewheiss.com/testy-turtle/notebook/causal-model.html>

---

.box-5.medium[Why do DAGs have to be acyclic?]

.box-5.medium[What if there really is reverse causation?]

---

.box-5.large[How do we actually adjust for these things?]

---

layout: false
name: logic-dag
class: center middle section-title section-title-6 animated fadeIn

# Logic models, DAGs, and measurement

---

---

.box-6.large[What's the difference between logic models and DAGs?]

---

# DAGs vs. Logic models

.box-6.large[DAGs are a *statistical* tool]

.box-6.large.sp-before[Logic models are a *managerial* tool]

---

.pull-left[
<figure>
 <img src="img/04-class/greenspace-eater.png" alt="Green space in Berkeley" title="Green space in Berkeley" width="100%">
</figure>
]

.pull-right[
<figure>
 <img src="img/04-class/greenspace-conversation.png" alt="Covid green spaces" title="Covid green spaces" width="100%">
</figure>
]

???

<https://theconversation.com/how-cities-can-add-accessible-green-space-in-a-post-coronavirus-world-139194>

<https://sf.eater.com/2020/5/14/21258980/berkeley-coronavirus-covid-19-jesse-arreguin-street-closures>

---

layout: false
name: po-do
class: center middle section-title section-title-2 animated fadeIn

# Potential outcomes vs. do() notation

---

---

# Expectations

.large[
`$$\operatorname{E}(\cdot), \mathbf{E}(\cdot), \mathbb{E}(\cdot) \quad \text{vs.}\quad \operatorname{P}(\cdot)$$`
]

.box-inv-2.small[Basically a fancy way of saying "average"]

---

# Outcomes and programs

---

# Causal effects with potential outcomes

$$
`\begin{aligned}
& \textbf{Potential outcomes notation:} \\
\delta\ =&\ {\textstyle \frac{1}{n} \sum_{i=1}^n} Y_i (1) - Y_i (0) \\
& \\
& \text{or alternatively with } \textbf{E} \\
\delta\ =&\ \textbf{E} [Y_i (1) - Y_i (0)] \\
\end{aligned}`
$$

---

# Causal effects with do()

$$
`\begin{aligned}
& \textbf{Pearl notation:} \\
\delta\ =&\ \textbf{E}[Y_i \mid \operatorname{do}(X = 1) - Y_i \mid \operatorname{do}(X = 0)] \\
& \\
& \text{or more simply} \\
\delta\ =&\ \textbf{E} [Y_i \mid \operatorname{do}(X)] \\
\end{aligned}`
$$

---

.large[
$$
`\begin{aligned}
\textbf{E} [Y_i\ \mid\ &\operatorname{do}(X)] \quad  \\
&= \\
\quad \textbf{E} [Y_i (1&) - Y_i (0)]
\end{aligned}`
$$
]

---

.box-2.medium[We can't see this]

`$$\textbf{E} [Y_i \mid \operatorname{do}(X)] \quad \text{or} \quad \textbf{E} [Y_i (1) - Y_i (0)]$$`

.box-2.medium[So we find the average causal effect (ACE)]

$$
\hat{\delta} = \textbf{E} [Y_i \mid X = 1] - \textbf{E} [Y_i \mid X = 0]
$$

---

.center[
<figure>
 <img src="img/05-class/cor-not-cause.png" alt="Correlation is not causation" title="Correlation is not causation" width="100%">
</figure>
]

---

layout: false
name: po-do
class: center middle section-title section-title-4 animated fadeIn

# do-calculus, adjustment, and CATEs

---

---

# DAGs and identification

.box-inv-4.medium[DAGs are a statistical tool, but they don't tell you what statistical method to use]

.box-inv-4.medium[DAGs help you with the **identification strategy**]

---

???

<https://twitter.com/RepThomasMassie/status/1491441851748204546>

---

# Easist identification

.box-inv-4.medium[Identification through research design]

.box-inv-4.sp-after[RCTs]

.box-4.medium[No need for any do-calculus!]

---

# Most other identification

.box-inv-4.medium[Identification through do-calculus]

.box-inv-4.sp-after[Rules for graph surgery]

---

.box-4.medium[Where can we learn more about *do*-calculus?]

.center[
<figure>
 <img src="img/05-class/do-calculus-math.png" alt="Do-calculus" title="Do-calculus" width="70%">
</figure>
]

---

**Rule 1**: Decide if we can ignore an observation

.small[
`$$P(y \mid z, \operatorname{do}(x), w) = P(y \mid \operatorname{do}(x), w) \qquad \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}}}$$`
]

**Rule 2**: Decide if we can treat an intervention as an observation

.small[
`$$P(y \mid \operatorname{do}(z), \operatorname{do}(x), w) = P(y \mid z, \operatorname{do}(x), w) \qquad \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}, \underline{Z}}}$$`
]

**Rule 3**: Decide if we can ignore an intervention

.small[
`$$P(y \mid \operatorname{do}(z), \operatorname{do}(x), w) = P(y \mid \operatorname{do}(x), w) \qquad \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}, \overline{Z(W)}}}$$`
]

---

---

.box-4.medium[Adjusting for backdoor confounding]

.center[
<figure>
 <img src="img/05-class/backdoor.png" alt="Backdoor adjustment" title="Backdoor adjustment" width="100%">
</figure>
]

---

---

.box-4.medium[Adjusting for frontdoor confounding]

???

Smoking/tar + Uber

Effect of shared rides on tips; use frontdoor magic

Like IV but in reverse:

- IV: instrument → treatment → outcome
- Frontdoor: treatment → instrumenty-mediator → outcome

```text
dag {
bb="0,0,1,1"
"Actually take shared ride" [pos="0.528,0.508"]
"Authorize shared ride" [exposure,pos="0.288,0.504"]
"Lots of unobserved stuff" [pos="0.521,0.342"]
"Tip driver" [outcome,pos="0.743,0.518"]
"Actually take shared ride" -> "Tip driver"
"Authorize shared ride" -> "Actually take shared ride"
"Lots of unobserved stuff" -> "Authorize shared ride"
"Lots of unobserved stuff" -> "Tip driver"
}
```

<https://twitter.com/andrewheiss/status/1361686426820222977>

---

.box-4.medium[More complex DAGs without obvious backdoor or frontdoor solutions]

.box-4.sp-after[Chug through the rules of do-calculus to see if the relationship is identifiable]

---

.center[
<figure>
 <img src="img/05-class/fusion1.png" alt="Causal Fusion example" title="Causal Fusion example" width="100%">
</figure>
]

---

.center[
<figure>
 <img src="img/05-class/fusion2.png" alt="Causal Fusion example" title="Causal Fusion example" width="100%">
</figure>
]

---

.center[
<figure>
 <img src="img/05-class/fusion3.png" alt="Causal Fusion example" title="Causal Fusion example" width="100%">
</figure>
]

---

.center[
<figure>
 <img src="img/05-class/fusion4.png" alt="Causal Fusion example" title="Causal Fusion example" width="100%">
</figure>
]

---

.box-4.less-medium.sp-after[When things are identified, there are still arrows leading into Y. What do we do with those? How do you explain those relationships?]

.box-4.less-medium[Outcomes have multiple causes. How do you justify that your proposed cause is the most causal factor?]

???

100% depends on your research question

---

.box-4.medium[Why can't we just subtract the averages between treated and untreated groups?]

---

.box-4.medium[When you're making groups for CATE, how do you decide what groups to put people in?]

---

# Unconfoundedness assumption

.box-inv-4[It seems unlikely. Wouldn't there be other factors within the older/younger group that make a person more/less likely to engage in treatment (e.g., health status)?]

---

.box-4.medium[Does every research question need an identification strategy?]

.box-inv-4.huge.sp-after[No!]

---

.center[
<figure>
 <img src="img/05-class/moderna-ebv.png" alt="Moderna EBV trials" title="Moderna EBV trials" width="65%">
</figure>
]

???

A correlational study found that MS was strongly associated with Epstein-Barr virus (EBV) - they don't know the exact mechanism yet, but because of mRNA vaccine technology, they can develop vaccines against EBV and help stop MS. They'll figure out exact mechanisms later. For now, they've started clinical trials.

<https://www.forbes.com/sites/roberthart/2022/01/14/moderna-starts-human-trials-of-mrna-vaccine-for-virus-that-likely-causes-multiple-sclerosis/?sh=74f52ca51a04>