class: center middle main-title section-title-7 # Randomization<br>and matching .class-info[ **Session 7** .light[PMAP 8521: Program evaluation<br> Andrew Young School of Policy Studies ] ] --- name: outline class: title title-inv-8 # Plan for today -- .box-4.medium.sp-after-half[The magic of randomization] -- .box-5.medium.sp-after-half[How to analyze RCTs] -- .box-6.medium.sp-after-half[The "gold" standard] -- .box-1.medium.sp-after-half[Adjustment with matching] --- name: magic-randomization class: center middle section-title section-title-4 animated fadeIn # The magic<br>of randomization --- layout: true class: title title-4 --- # Why randomize? .box-4.large[Fundamental problem<br>of causal inference] $$ \delta_i = Y_i^1 - Y_i^0 \quad \text{in real life is} \quad \delta_i = Y_i^1 - ??? $$ .box-inv-4[Individual-level effects are impossible to observe!] .box-inv-4[There are no individual counterfactuals!] --- # Why randomize? .medium[ $$ \delta = (\bar{Y}\ |\ P = 1) - (\bar{Y}\ |\ P = 0) $$ ] -- .box-inv-4.medium[Comparing average outcomes only works<br>if groups that received/didn't receive<br>treatment look the same] --- # Why randomize? .box-inv-4[With big enough samples, the magic of randomization<br>helps make comparison groups comparable] .center[ <figure> <img src="img/07/wb-4-1.png" alt="Figure 4.1 from WB book" title="Figure 4.1 from WB book" width="80%"> </figure> ] --- # RCTs and DAGs $$ E[\text{Malaria infection rate}\ |\ do(\text{Mosquito net})] $$ -- .pull-left[ .box-4.smaller[Observational DAG] <img src="07-slides_files/figure-html/observational-dag-1.png" width="90%" style="display: block; margin: auto;" /> ] -- .pull-right[ .box-4.smaller[Experimental DAG] <img src="07-slides_files/figure-html/experimental-dag-1.png" width="90%" style="display: block; margin: auto;" /> ] -- .box-inv-4.small[When you *do*() X, delete all arrows into X; **confounders don't influence treatment!**] --- # How to randomize? .center[ <figure> <img src="img/07/wb-4-4.png" alt="Figure 4.4 from WB book" title="Figure 4.4 from WB book" width="75%"> </figure> ] --- # Random assignment .box-inv-4.medium[Coins] -- .box-inv-4.medium[Dice] -- .box-inv-4.medium[Unbiased lottery] -- .box-inv-4.medium[Random numbers + threshold] -- .box-inv-4.medium[Atmospheric noise] .box-4.tiny[random.org] --- # How big of a sample? .box-inv-4[A training program causes incomes to rise by $40] .center.small[ <table> <thead> <tr> <th style="text-align:left;"> Person </th> <th style="text-align:left;"> Group </th> <th style="text-align:center;"> Before </th> <th style="text-align:center;"> After </th> <th style="text-align:center;"> Difference </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 295 </td> <td style="text-align:left;"> Control </td> <td style="text-align:center;"> 122.09 </td> <td style="text-align:center;"> 229.04 </td> <td style="text-align:center;"> 106.95 </td> </tr> <tr> <td style="text-align:left;"> 126 </td> <td style="text-align:left;"> Treatment </td> <td style="text-align:center;"> 205.60 </td> <td style="text-align:center;"> 199.84 </td> <td style="text-align:center;"> -5.76 </td> </tr> <tr> <td style="text-align:left;"> 400 </td> <td style="text-align:left;"> Control </td> <td style="text-align:center;"> 133.25 </td> <td style="text-align:center;"> 130.40 </td> <td style="text-align:center;"> -2.85 </td> </tr> <tr> <td style="text-align:left;"> 94 </td> <td style="text-align:left;"> Treatment </td> <td style="text-align:center;"> 270.11 </td> <td style="text-align:center;"> 206.56 </td> <td style="text-align:center;"> -63.54 </td> </tr> <tr> <td style="text-align:left;"> 250 </td> <td style="text-align:left;"> Control </td> <td style="text-align:center;"> 344.37 </td> <td style="text-align:center;"> 222.89 </td> <td style="text-align:center;"> -121.49 </td> </tr> <tr> <td style="text-align:left;"> 59 </td> <td style="text-align:left;"> Treatment </td> <td style="text-align:center;"> 312.41 </td> <td style="text-align:center;"> 268.06 </td> <td style="text-align:center;"> -44.35 </td> </tr> </tbody> </table> ] --- # Power .pull-left[ .box-4.small[Enroll 10 participants] <img src="07-slides_files/figure-html/power-small-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ .box-4.small[Enroll 200 participants] <img src="07-slides_files/figure-html/power-big-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # What's the right sample size? .box-inv-4[Use a statistical power calculator to<br>make sure you can potentially detect an effect] .center[ <figure> <img src="img/06/power-search.png" alt="Google power calculator" title="Google power calculator" width="50%"> </figure> ] --- layout: false name: rct-how class: center middle section-title section-title-5 animated fadeIn # How to analyze RCTs --- layout: true class: title title-5 --- # How to analyze RCTs .box-inv-5.sp-after[Surprisingly easy, statistically!] -- .box-5[Step 1: Check that key demographics<br>and other confounders are balanced] -- .box-5[Step 2: Find difference in average outcome<br>in treatment and control groups] --- # Example RCT .small-code[ ```r imaginary_program ``` ``` ## # A tibble: 800 × 6 ## person treatment age sex income_after male_num ## <int> <chr> <dbl> <chr> <dbl> <dbl> ## 1 498 Control 45 Female 179. 0 ## 2 308 Treatment 37 Male 247. 1 ## 3 677 Control 35 Female 369. 0 ## 4 31 Treatment 39 Female 203. 0 ## 5 543 Control 36 Female 190. 0 ## 6 434 Control 30 Female 278. 0 ## 7 234 Treatment 28 Male 356. 1 ## 8 272 Treatment 45 Male 260. 1 ## 9 523 Control 49 Female 174. 0 ## 10 649 Control 49 Male 224. 1 ## # … with 790 more rows ``` ] --- # 1. Check balance ```r imaginary_program %>% group_by(treatment) %>% summarize(avg_age = mean(age), prop_male = mean(sex == "Male")) ``` ``` ## # A tibble: 2 × 3 ## treatment avg_age prop_male ## <chr> <dbl> <dbl> ## 1 Control 35.1 0.562 ## 2 Treatment 35.1 0.512 ``` --- # 1. Check balance .left-code[ ```r ggplot(imaginary_program, aes(x = treatment, y = age, color = treatment)) + stat_summary(geom = "pointrange", fun.data = "mean_se", fun.args = list(mult=1.96)) + guides(color = FALSE) + labs(x = NULL, y = "Age") ``` ] .right-plot[ ![](07-slides_files/figure-html/balance-age-1.png) ] --- # 1. Check balance .left-code[ ```r ggplot(imaginary_program, aes(x = treatment, y = male_num, color = treatment)) + stat_summary(geom = "pointrange", fun.data = "mean_se", fun.args = list(mult=1.96)) + guides(color = FALSE) + labs(x = NULL, y = "Proportion male") ``` ] .right-plot[ ![](07-slides_files/figure-html/balance-sex-1.png) ] --- # 2. Calculate difference .left-code[ .box-inv-5.small[Group means] ```r imaginary_program %>% group_by(treatment) %>% summarize(avg_outcome = mean(income_after)) ``` ``` ## # A tibble: 2 × 2 ## treatment avg_outcome ## <chr> <dbl> ## 1 Control 205. ## 2 Treatment 251. ``` ```r 251 - 205 ``` ``` ## [1] 46 ``` ] -- .right-code[ .box-inv-5.small[Regression] ```r rct_model <- lm(income_after ~ treatment, data = imaginary_program) tidy(rct_model) ``` ``` ## # A tibble: 2 × 3 ## term estimate std.error ## <chr> <dbl> <dbl> ## 1 (Intercept) 205. 3.66 ## 2 treatmentTreatment 46.0 5.17 ``` ] --- # 2a. Show difference .left-code[ ```r ggplot(imaginary_program, aes(x = treatment, y = income_after, color = treatment)) + stat_summary(geom = "pointrange", fun.data = "mean_se", fun.args = list(mult=1.96)) + guides(color = FALSE) + labs(x = NULL, y = "Income") ``` ] .right-plot[ ![](07-slides_files/figure-html/rct-diff-1.png) ] --- # Should you control for stuff? -- .box-inv-5.large[No!] -- .box-inv-5[All arrows into the treatment node are removed;<br>there's theoretically no confounding!] --- layout: false name: gold-standard class: center middle section-title section-title-6 animated fadeIn # The "gold" standard --- layout: true class: title title-6 --- # Types of research .box-inv-6.medium.sp-after[Experimental studies vs.<br>observational studies] .box-inv-6.medium[Which is better?] --- layout: false .center[ <figure> <img src="img/07/nyt-wellness-program.png" alt="NYT wellness program report" title="NYT wellness program report" width="75%"> </figure> ] ??? https://www.nytimes.com/2018/08/06/upshot/employer-wellness-programs-randomized-trials.html --- class: bg-full background-image: url("img/07/gold-standard.png") --- class: bg-full background-image: url("img/07/jpal-nobel.png") ??? https://twitter.com/MIT/status/1183752282988564480 --- .box-6.large.sp-after[RCTs are great!] -- .box-6.large[Super impractical to do<br>all the time though!] --- layout: true class: title title-6 --- # "Gold standard" .box-inv-6.medium.sp-after["Gold standard" implies that all<br>causal inferences will be valid it<br>you do the experiment right] -- .box-6[We don't care if studies are experimental or not] -- .box-6[We care if our causal inferences are valid] -- .box-6[RCTs are a helpful baseline/rubric for other methods] --- # Moving to Opportunity .center[ <figure> <img src="img/07/hud.jpg" alt="HUD HQ in DC" title="HUD HQ in DC" width="70%"> </figure> ] ??? **MTO** - main question = does your neighborhood matter? Subtext = kids who grow up in poor neighborhood do worse - lots of theories about that. Problems reside in the family vs. something inherent in the neighborhood / social capital or other benefits inherent in the community (be in the network of people who know where the jobs for people with low education are) Alternative explanations: - Poor parents don't do well in the labor market, so they live in cheap neighborhoods, where they're surrounded by the same type of people - Racism and discrimination Randomly assign people to where they live - ideally from birth or even pre-birth - but families that we can possibly randomly assign already have kids (required for receiving public housing assistance) So they randomly assign families in public housing who are willing to move and accept risk of having that move controlled - no other way to really do this - you'd have to pay middle-class and higher people a ton of money to get them to move Then randomly assign them to (1) stay, (2) move to anywhere that would take voucher, (3) anywhere with less than 10% poverty rate + get relocation counseling People were hoping to get option 2, since people already weren't choosing 3 - it's uncomfortable to move to a neighborhood where you don't fit in https://commons.wikimedia.org/wiki/File:Housing_and_Urban_Development_headquarters.jpg --- # RCTs and validity .box-inv-6.medium[Randomization fixes a ton of<br>internal validity issues] -- .pull-left[ .box-6[**Selection**<br>Treatment and control<br>groups are comparable;<br>people don't self-select] ] -- .pull-right[ .box-6[**Trends**<br>Maturation, secular<br>trends, seasonality,<br>regression to the mean<br>all generally average out] ] --- # RCTs and validity -- .box-inv-6.medium[RCTs don't fix attrition!] .box-6[Worst threat to internal validity for RCTs] -- .box-inv-6.medium.sp-before[If attrition is correlated<br>with treatment, that's bad] -- .box-6[People might drop out because of the treatment,<br>or because they got/didn't get into the control group] ??? You don't have data on them. NA values. Even if people don't comply (like move to a private school), you can still get data on them, so that's okay. People might drop out randomly, and that would be fine. But if attrition is correlated with the treatment, then it's bad. People might drop because of the treatment, or because they didn't get the control group. Impossible to sign the bias—could be tons of different reasons. --- # Addressing attrition -- .box-inv-6.less-medium[Recruit as effectively as possible] .box-6.sp-after[You don't just want weird/WEIRD participants] -- .box-inv-6.less-medium[Get people on board] .box-6.sp-after[Get participants invested in the experiment] -- .box-inv-6.less-medium[Collect as much baseline information as possible] .box-6[Check for randomization of attrition] ??? - Recruit as effectively as possible (so you don't get volunteers that routinely sign up for randomized experiments). Use money for recruitment - Get people on board. Treatment might be less/more enjoyable than the control group, so people will feel lucky or unlucky to be in the treatment group - explain why the experiment itself is important. Make them invested in getting the experiment to work. Science! Why you want the right answer beyond whether or not the treatment works - Collect as much baseline information as possible before assigning to treatment and control groups - doesn't help reduce attrition like getting people on board, but it lets us see if attrition is random with respect to preexisting characteristics - checks for randomization failures. Don't rerandomize. Randomize and then suck it up. --- # RCTs and validity .box-inv-6.less-medium[Randomization failures] .box-6[Check baseline pre-data] -- .box-inv-6.less-medium[Noncompliance] .box-6[Some people assigned to treatment won't take it;<br>some people assigned to control will take it] .box-6[Intent-to-treat (ITT) vs. Treatment-on-the-treated (TTE)] ??? ITT is probably the most policy-related measure - if there's a low compliance rate but a good ITT effect, you can try to make the program nicer, better --- # Other limitations .box-inv-6.medium.sp-after[RCTs don't magically fix construct validity<br>or statistical conclusion validity] -- .box-inv-6.medium.sp-after[RCTs **definitely** don't<br>magically fix external validity] ??? MTO varied income, not race, since it's illegal to tell people they can only move to a white neighborhood. So they used income instead of race. Keys under the light issue, since the original issue was about race Scalability issue with STAR. It wasn't hard to hire 40 more teachers in TN. California couldn't find enough teachers, so they emergency certified "a bunch of morons," which messed up the program effect --- layout: false class: bg-full background-image: url("img/07/vox-nobel.png") ??? https://www.vox.com/future-perfect/2019/10/14/20913928/nobel-prize-economics-duflo-banerjee-kremer --- layout: true class: title title-6 --- # When to randomly assign -- .box-inv-6[Demand for treatment exceeds supply] -- .box-inv-6[Treatment will be phased in over time] -- .box-inv-6[Treatment is in equipoise (genuine uncertainty)] -- .box-inv-6[Local culture open to randomization] -- .box-inv-6[When you're a nondemocratic monopolist] -- .box-inv-6[When people won't know (and it's ethical!)] -- .box-inv-6[When lotteries are going to happen anyway] ??? - When demand for treatment exceeds supply or treatment must be phased in over time (instead of doing the closest place first, etc.) - When people don't know what they want to do = equipoise (medical trials have to be in equipoise - unethical to use a treatment that's clearly beneficial) - When the local culture is favorable to random assignment - prime people to be more comfortable with it - When you are a nondemocratic monopoly provider - if you're the only one with the treatment, you decide who gets it - like Google and Facebook and A/B testing - When people won't know (as long as it's ethical) - resume name experiments, e-mails to politicians - When lotteries are going to happen anyway --- # When to <span style="color: #F6D645;">not</span> randomly assign -- .box-inv-6[When you need immediate results] -- .box-inv-6[When it's unethical or illegal] -- .box-inv-6[When it's something that happened in the past] -- .box-inv-6[When it involves universal ongoing phenomena] ??? Past: effects of city segregation or political regime type Universal phenomena: climate change, social norms --- layout: false name: matching class: center middle section-title section-title-1 animated fadeIn # Adjustment<br>with matching --- .center[ <figure> <img src="img/05/mm-matching.png" alt="Matching table from Mastering 'Metrics" title="Matching table from Mastering 'Metrics" width="70%"> </figure> ] --- class: title title-1 # Why match? -- .box-inv-1.medium[Reduce model dependence] .box-1.sp-after[Imbalance → model dependence → researcher discretion → bias] -- .box-inv-1.medium.sp-after[Compare apples to apples] -- .box-inv-1.medium[It's a way to adjust for backdoors!] --- .smaller[ $$ \color{white}{\beta_0 \text{E}^2} $$ ] <img src="07-slides_files/figure-html/matching-general-1.png" width="100%" style="display: block; margin: auto;" /> --- .smaller[ $$ \color{white}{\beta_0 \text{E}^2} \text{Outcome} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Treatment} \color{white}{\beta_0 \text{E}^2} $$ ] <img src="07-slides_files/figure-html/matching-dependency1-1.png" width="100%" style="display: block; margin: auto;" /> --- .smaller[ $$ \text{Outcome} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Education}^2 + \beta_3 \text{Treatment} $$ ] <img src="07-slides_files/figure-html/dependency2-1.png" width="100%" style="display: block; margin: auto;" /> --- .smaller[ $$ \color{white}{\beta_0 \text{E}^2} $$ ] <img src="07-slides_files/figure-html/reduced-1.png" width="100%" style="display: block; margin: auto;" /> --- .smaller[ $$ \color{white}{\beta_0 \text{E}^2} \text{Outcome} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Treatment} \color{white}{\beta_0 \text{E}^2} $$ ] <img src="07-slides_files/figure-html/reduced-dependency1-1.png" width="100%" style="display: block; margin: auto;" /> --- .smaller[ $$ \text{Outcome} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Education}^2 + \beta_3 \text{Treatment} $$ ] <img src="07-slides_files/figure-html/reduced-dependency2-1.png" width="100%" style="display: block; margin: auto;" /> --- .box-1[How do we know that we can remove these points?] <img src="07-slides_files/figure-html/reduced-again-1.png" width="100%" style="display: block; margin: auto;" /> --- class: title title-1 # General process for matching -- .box-inv-1.medium[Step 1. Preprocessing] .box-1[Do something to guess or model the assignment to treatment] .box-1.sp-after[Use what you know about the DAG to inform this guessing!] -- .box-inv-1.medium[Step 2. Estimation] .box-1[Use the new trimmed/preprocessed data to build a model,<br>calculate difference in means, etc.] --- <img src="07-slides_files/figure-html/reduced-dependency-taller-1.png" width="100%" style="display: block; margin: auto;" /> --- class: title title-1 # Different methods .box-inv-1.medium[Nearest neighbor matching (NN)] .box-1.small.sp-after[Mahalanobis distance / Euclidean distance] -- .box-inv-1.medium.sp-after-half[~~Propensity score matching (PSM)~~] -- .box-inv-1.medium.sp-after-half[Inverse probability weighting (IPW)] -- .box-inv-1.small[(and lots of other methods we're not covering!)] --- class: title title-1 # Nearest neighbor matching .box-inv-1.medium.sp-after[Find untreated observations that are<br>very close/similar to treated<br>observations based on confounders] -- .box-inv-1.medium[Lots of mathy ways to measure distance] .box-1[Mahalanobis and Euclidean distance are fairly common] --- class: bg-full background-image: url("img/07/recession1.png") ??? https://www.cnbc.com/2020/02/05/70percent-chance-of-recession-in-next-six-months-study-from-mit-and-state-street-finds.html --- class: bg-full background-image: url("img/07/recession2.png") --- class: title title-1 # Matching and eugenics .box-inv-1.medium[Prasanta Chandra Mahalanobis] .pull-left.center[ <figure> <img src="img/07/mahalanobis.png" alt="Prasanta Chandra Mahalanobis" title="Prasanta Chandra Mahalanobis" width="50%"> </figure> ] .pull-right[ .box-1[Tried to prove brain size differences between castes; low-key eugenicist] ] ??? https://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis#/media/File:PCMahalanobis.png --- <img src="07-slides_files/figure-html/edu-age-dag-1.png" width="100%" style="display: block; margin: auto;" /> --- <img src="07-slides_files/figure-html/edu-age-unmatched-1.png" width="100%" style="display: block; margin: auto;" /> --- <img src="07-slides_files/figure-html/edu-age-matched-1.png" width="100%" style="display: block; margin: auto;" /> --- <img src="07-slides_files/figure-html/edu-age-trimmed-1.png" width="100%" style="display: block; margin: auto;" /> --- class: title title-1 # Potential problems with matching .box-inv-1[Nearest neighbor matching can be greedy!] .center[ <figure> <img src="07-slides_files/figure-html/edu-age-matched-1.png" alt="Lost data" title="Lost data" width="50%%"> </figure> ] -- .box-1[Solution: Don't throw everything away!] --- class: title title-1 # Propensity scores .box-inv-1.medium[Predict the probability of<br>assignment to treatment using a model] .box-1.sp-after[Logistic regression, probit regression, machine learning, etc.] -- .box-1.smaller[Here's logistic regression:] `$$\operatorname{log} \frac{p_\text{Treated}}{1 - p_\text{Treated}} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Age}$$` --- <img src="07-slides_files/figure-html/logit-graph-1.png" width="50%" style="display: block; margin: auto;" /> .smaller[ `$$\operatorname{log} \frac{p_\text{Manual}}{1 - p_\text{Manual}} = \beta_0 + \beta_1 \text{MPG}$$` ] .small-code[ ```r model_transmission <- glm(am ~ mpg, data = mtcars, family = binomial(link = "logit")) ``` ] --- .box-1[Log odds .tiny[(default coefficient unit of measurement; fairly uninterpretable)]] .box-1[Odds ratios .tiny[(e<sup>β</sup>; centered around 1: 1.5 means 50% more likely; 0.75 means 25% less likely)]] .pull-left-narrow.small-code[ ```r tidy(model_transmission) tidy(model_transmission, exponentiate = TRUE) ``` ] .pull-right-wide.small-code[ ``` ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -6.60 2.35 -2.81 0.00498 ## 2 mpg 0.307 0.115 2.67 0.00751 ``` ``` ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.00136 2.35 -2.81 0.00498 ## 2 mpg 1.36 0.115 2.67 0.00751 ``` ] --- .box-inv-1.smaller[Plug all the values of MPG into the model and find the predicted probability of manual transmission] .small-code[ ```r augment(model_transmission, data = mtcars, type.predict = "response") ``` ] .pull-left.small-code[ ```r ## # A tibble: 32 x 3 ## mpg am .fitted ## <dbl> <dbl> <dbl> ## 1 21 1 0.461 ## 2 21 1 0.461 ## 3 22.8 1 0.598 ## 4 21.4 0 0.492 ## 5 18.7 0 0.297 ## 6 18.1 0 0.260 *## 7 14.3 0 0.0986 *## 8 24.4 0 0.708 ## 9 22.8 0 0.598 ## 10 19.2 0 0.330 ## # … with 22 more rows ``` ] .pull-right[ .box-1.smaller[Row 7 is highly unlikely to be manual (1)] .box-1.smaller[Row 8 is highly likely to be manual] ] --- class: title title-1 # Propensity score matching .box-inv-1.medium.sp-after-half[Super popular method] -- .box-inv-1.medium.sp-after-half[There are mathy reasons why it's not great<br>for matching *for identification purposes*] -- .box-inv-1.medium.sp-after-half[Propensity scores are fine!<br>Using them for matching isn't!] --- class: bg-full background-image: url("img/07/gary-king.png") ??? https://gking.harvard.edu/files/gking/files/pan1900011_rev.pdf --- layout: true class: title title-1 --- # Weighting .box-inv-1[Make some observations more important than others] -- <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Young </th> <th style="text-align:center;"> Middle </th> <th style="text-align:center;"> Old </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Population </td> <td style="text-align:center;"> 30% </td> <td style="text-align:center;"> 40% </td> <td style="text-align:center;"> 30% </td> </tr> <tr> <td style="text-align:left;"> Sample </td> <td style="text-align:center;"> 60% </td> <td style="text-align:center;"> 30% </td> <td style="text-align:center;"> 10% </td> </tr> </tbody> </table> --- # Weighting .box-inv-1[Make some observations more important than others] <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Young </th> <th style="text-align:center;"> Middle </th> <th style="text-align:center;"> Old </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Population </td> <td style="text-align:center;"> 30% </td> <td style="text-align:center;"> 40% </td> <td style="text-align:center;"> 30% </td> </tr> <tr> <td style="text-align:left;"> Sample </td> <td style="text-align:center;"> 60% </td> <td style="text-align:center;"> 30% </td> <td style="text-align:center;"> 10% </td> </tr> <tr> <td style="text-align:left;"> Weight </td> <td style="text-align:center;"> &ensp;30 / 60&ensp;<br>**0.5** </td> <td style="text-align:center;"> &ensp;40 / 30&ensp;<br>**1.333** </td> <td style="text-align:center;"> &ensp;30 / 10&ensp;<br>**3** </td> </tr> </tbody> </table> -- .box-1[Multiply weights by average values<br>(or us in regression) to adjust for importance] --- # Inverse probability weighting .box-inv-1.medium[Use propensity scores to weight<br>observations by how "weird" they are] .box-1.small[Observations with high probability of treatment<br>who don't get it (and vice versa) have higher weight] $$ \frac{\text{Treatment}}{\text{Propensity}} + \frac{1 - \text{Treatment}}{1 - \text{Propensity}} $$ --- layout: false .small-code[ ```r augment(model_transmission, data = mtcars, type.predict = "response") %>% select(mpg, am, propensity = .fitted) %>% * mutate(ip_weight = (am / propensity) + ((1 - am) / (1 - propensity))) ``` ] .pull-left.small-code[ ```r ## # A tibble: 32 x 4 ## mpg am propensity ip_weight ## <dbl> <dbl> <dbl> <dbl> ## 1 21 1 0.461 2.17 ## 2 21 1 0.461 2.17 ## 3 22.8 1 0.598 1.67 ## 4 21.4 0 0.492 1.97 ## 5 18.7 0 0.297 1.42 ## 6 18.1 0 0.260 1.35 *## 7 14.3 0 0.0986 1.11 *## 8 24.4 0 0.708 3.43 ## 9 22.8 0 0.598 2.49 ## 10 19.2 0 0.330 1.49 ### … with 22 more rows ``` ] .pull-right[ .box-1.smaller[Row 7 is highly unlikely to be manual and isn't.<br>**Boring! Low IPW.**] .box-1.smaller[Row 8 is highly likely to be manual, but isn't.<br>**That's weird! High IPW.**] ] --- <img src="07-slides_files/figure-html/edu-age-ipw-1.png" width="100%" style="display: block; margin: auto;" /> --- class: middle .box-1.huge[Examples!]