450C
Stanford University
Department of Political Science
Toby Nowacki
"What is the effect of X on Y?"
"Given the (known) DGP, what is the probability of observing our sample?"
In a clean causal inference setting, we know what the data-generating process is because we are in charge of assigning treatment (in an experiment) or assume that it is as-if random.
But not all questions lend themselves to clean causal inference.
What if we don't know the DGP?
"How can we best describe the process that generated this data?"
"Given our observed sample, what is the DGP?"
Previously: given our model, how likely are the results that we observe?
Now: given our results, how likely is the model that we assume?
This approach yields powerful solutions to some questions
Some methodological debates yielded dead ends (OLS v logit/probit)
Motivation: Suppose that we have N elections with two parties (A,B).
A's vote share in the last five elections (y_samp
) was:
## [1] 51.88486 51.50774 44.50988 44.34797 36.01733
Can we describe the underlying data-generating process?
What's our best guess?
How might we best model A's vote share?
What is your best guess for the vote share in the next election?
Let's assume: A's vote share is drawn i.i.d. from a normal distribution with (unknown) mean μ and (unknown) variance σ2.
This gives us enough structure to proceed with MLE.
If we knew mean and variance, we could calculate the probability of observing any value:
f(x)=pdf(N(μ,σ2))
L(μ,σ2|y)=∏f(yi|μ,σ2)=∏exp(−(yi−μ)22σ2)√2πσ2
L(μ,σ2|y)=exp(−∑(yi−μ)22σ2)(2π)n/2σ2n/2
ℓ(μ,σ2|y)=−∑(yi−μ)22σ2−n2log(σ2)+C
Now we can plug in any combination of candidate values for μ and σ2 into this function and we get a score.
We have a nicely defined function → time for some coding!
likelihood_normal <- function(dvec, mu, sigma2){ - sum((dvec - mu)^2) / (2 * sigma2) - length(dvec) / 2 * log(sigma2)}# Quick examplelikelihood_normal(y_samp, 43, 2)
## [1] -52.77709
Let's set up some more code to plot the likelihood for every combination of μ and σ2.
mu_rg <- seq(30, 70, by = 0.5)sigma2_rg <- seq(3, 60, by = 0.5)viz_df <- expand.grid(mu = mu_rg, sigma2 = sigma2_rg) %>% as.data.frame %>% rowwise() %>% mutate(likelihood = likelihood_normal(y_samp, mu, sigma2))max_row <- which.max(viz_df$likelihood)viz_df[max_row, ]
## Source: local data frame [1 x 3]## Groups: <by row>## ## # A tibble: 1 x 3## mu sigma2 likelihood## <dbl> <dbl> <dbl>## 1 45.5 34 -11.3
We can also find the parameter combination that optimises the likelihood algebraically.
Recall from Wednesday's lecture:
μ∗=σ2yin=ˉyσ2∗=1n∑(yi−ˉy)2
Implement the two functions for the MLE of mean and variance in R
and compute the estimated mean and variance of the MLE normal for y_samp
.
What if I told you that the data were generated with: μ=50,σ2=25
Our MLE estimate only takes the "sample" from the DGP.
We can't make any assumptions about the parameters in the DGP: that's the thing we're trying to estimate using MLE!
But because of the convergence in distribution, we can still infer how likely the observed MLE estimate is if we assume a true parameter θ0.
get_mean_variance <- function(n){ y_samp <- rnorm(n, 50, 5) y_mean <- mean(y_samp) y_sigma <- 1/n * sum((y_samp - y_mean)^2) return(c(y_mean, y_sigma))}n <- 5m <- 1000rep_vec <- 1:mnames(rep_vec) <- 1:msamp_df <- map_dfr(rep_vec, ~ get_mean_variance(n)) %>% t %>% as.data.frame
ggplot(samp_df, aes(V1, V2)) + geom_point(alpha = .5) + theme_tn() + labs(x = "mu_hat", y = "sigma2_hat")
This is a two-dimensional distribution. We can characterise its (empirical) mean and variance.
map_dfr(samp_df, ~ tibble(sampling_mean = mean(.x), sampling_var = var(.x))) %>% mutate(dim = c("mle_mean", "mle_var"))
## # A tibble: 2 x 3## sampling_mean sampling_var dim ## <dbl> <dbl> <chr> ## 1 49.9 4.97 mle_mean## 2 20.5 227. mle_var
What do these quantities correspond to?
What happens if we increase the sample size?
n <- 100samp_df <- map_dfr(rep_vec, ~ get_mean_variance(n)) %>% t %>% as.data.frame
ggplot(samp_df, aes(V1, V2)) + geom_point(alpha = .5) + theme_tn() + labs(x = "mu_hat", y = "sigma2_hat")
map_dfr(samp_df, ~ tibble(sampling_mean = mean(.x), sampling_var = var(.x))) %>% mutate(dim = c("mle_mean", "mle_var"))
## # A tibble: 2 x 3## sampling_mean sampling_var dim ## <dbl> <dbl> <chr> ## 1 50.0 0.239 mle_mean## 2 24.6 12.4 mle_var
What do these quantities correspond to?
Recall the asymptotic property of MLE estimators as n→∞:
p(ˆμ,ˆσ2)d→MVN((ˉy,1n∑(yi−ˉy)2),[σ2n002(σ2)2n])
Since we know the true parameters...
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |