Class 2: Regression and inference

# In-person session 2

**January 19, 2023**

]

---

# Plan for today

.box-6.medium.sp-after-half[R FAQs]

.box-5.medium.sp-after-half[Regression FAQs]

.box-2.medium.sp-after-half[Transforming data with **dplyr**]

.box-1.medium[Regression with R]

---

layout: false
name: r-faqs
class: center middle section-title section-title-6 animated fadeIn

# R FAQs

---

---

.box-6.large[RStudio fun]

---

.box-6.large[Column types]

.box-inv-6[Numeric, continuous, count, ordinal, interval, qualitative, categorical, a billion other inconsistent names]

---

.box-6.large[File paths, working directories, and RStudio projects]

???

- <https://www.theverge.com/22684730/students-file-folder-directory-structure-education-gen-z>
- Working directories and RStudio projects

---

.box-6.large[The hyperliterality of computers]

---

.box-6.large[R Markdown fun]

???

- Markdown metadata and outputs, figure sizes
- Show final Rmd products. Explanatory, like blog post code through or a problem set vs. stand-alone research paper

---

.box-6.large[Fun with histograms]

---

.box-6.large[Logging]

???

<https://stats.stackexchange.com/a/18639/3025>

---

layout: false
name: regression-faqs
class: center middle section-title section-title-5 animated fadeIn

# Regression FAQs

---

---

.box-5.large[Regression equations]

???

<https://evalsp23.classes.andrewheiss.com/slides/02-slides.html#38>

<https://www.andrewheiss.com/blog/2022/05/20/marginalia/#regression-sliders-switches-and-mixing-boards>

---

.box-5.medium.sp-after[Why use two steps to create a regression in R? (i.e. assigning it to an object with `<-`?)]

.box-5.medium[Why use `tidy()` from the broom package?]

???

Show model with `lm()`; show t-test with `t.test()`; show both through `tidy()`

Use **marginaleffects**

---

.box-5.medium[How was the 0.05 significance threshold determined?]

.box-5.medium[Could we say something is significant if p > 0.05, but just note that it is at a higher p-value? Or does it have to fall under 0.05?]

---

.box-5.large[Why all this convoluted logic of null worlds?]

---

# Different "dialects" of statistics

`$$P(\text{data} \mid H_0)$$`

]

`$$P(H \mid \text{data})$$`

]

---

---

.box-5.medium[Do we care about the actual coefficients or just whether or not they're significant?]

.box-5.medium[How does significance relate to causation?]

.box-5.medium[If we can't use statistics to assert causation how are we going to use this information in program evaluation?]

---

.box-5.large[What counts as a "good" R²?]

---

.center[
<figure>
 <img src="img/02-class/nice-plot-1.png" alt="Euler diagram" title="Euler diagram" width="45%">
</figure>
]

---

.center[
<figure>
 <img src="img/02-class/plot-diagram-prediction-1.png" alt="R2 prediction" title="R2 prediction" width="75%">
</figure>
]

---

.center[
<figure>
 <img src="img/02-class/plot-diagram-estimation-1.png" alt="R2 estimation" title="R2 estimation" width="75%">
</figure>
]

---

.center[
<figure>
 <img src="img/02-class/prediction-vs-estimation.jpg" alt="R2 estimation vs prediction" title="R2 estimation vs prediction" width="100%">
</figure>
]

---

layout: false
name: dplyr
class: center middle section-title section-title-2 animated fadeIn

# Transforming data with dplyr

---

layout: false
name: ggplot
class: center middle section-title section-title-1 animated fadeIn

# Regression with R