Load the Data

The data and some of the examples are derivatives of these two (plus a couple more) sources.

What we’ve got as the dependent variable (Y) is the percentage change in the GOP support from 2012 to 2016.

  • The independent variables are the below.
  • pop_change or population change (percent) between Apr. 1, 2010 and July 1, 2014
  • age_65+: persons 65 and older in 2014, percent
  • black: blacks or African-Americans alone, percent, 2014
  • hispanic: Hispanic or Latino, percent, 2014
  • hs_grad: high school graduate or above, percentage of people aged 25+, 2009-2013
  • undergrad: Bachelor’s degree holder or above, percentage of people aged 25+, 2009-2013
  • homeownership_rate: like it says, 2009-2013
  • median_home_value: median value of owner-occupied housing units, USD, 2009-2013
  • median_income: median household income, USD, 2009-2013
  • poverty_rate: persons below poverty level, percent, 2009-2013

Note all of these fields (cols) are continuous scale. Try linear regression.

Frequentist lm()

You could / should think about what model to use, and the data also leaves something to be desidred, but as an exemplar, we are more interested in comparing notes, and contrasting, than in the actual analysis. We’ll therefore proceed to lm().

## 
## Call:
## lm(formula = gop_support_change ~ ., data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.268  -4.292  -0.703   3.870  52.981 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -1.088e+01  4.161e+00  -2.614  0.00898 ** 
## pop_change         -2.693e-01  3.920e-02  -6.869 7.78e-12 ***
## `age_65+`           2.066e-01  4.508e-02   4.583 4.76e-06 ***
## black              -1.095e-01  1.161e-02  -9.434  < 2e-16 ***
## hispanic           -1.526e-01  1.283e-02 -11.890  < 2e-16 ***
## hs_grad             2.606e-01  3.697e-02   7.051 2.19e-12 ***
## undergrad          -7.169e-01  3.000e-02 -23.895  < 2e-16 ***
## homeownership_rate -1.572e-04  2.509e-02  -0.006  0.99500    
## median_home_value  -1.748e-05  3.007e-06  -5.815 6.70e-09 ***
## median_income       1.541e-04  3.166e-05   4.866 1.19e-06 ***
## poverty_rate        2.275e-01  4.416e-02   5.151 2.75e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.482 on 3100 degrees of freedom
## Multiple R-squared:  0.4929, Adjusted R-squared:  0.4912 
## F-statistic: 301.3 on 10 and 3100 DF,  p-value: < 2.2e-16

So we would say that based on this simple (plain vanilla frequentist) linear model, for instance, with each unit change (percentage increase in this case) in the proportion of the county being Hispanic, the GOP support in that county drops ~.1526*** percent, where "***" denotes stat. significance > 99.9%.

But how does this compare to an approach we might take with Bayesianism and linear regression? And importantly, does this really make sense?

Bayesian Methods

rstanarm()’s stan_glm()

(Definition lifted from here: “rstanarm is an R package that emulates other R model-fitting functions but uses Stan (via the rstan package) for the back-end estimation.”)

By not specifying our priors (which is a perfectly valid option), we implicitly use the default weekly informative priors which are set to normal(0, 10) for the intercept and normal(0, 5) for the other regression coefficients.


SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 4e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.133796 seconds (Warm-up)
Chain 1:                0.440498 seconds (Sampling)
Chain 1:                0.574294 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1.6e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.132338 seconds (Warm-up)
Chain 2:                0.422851 seconds (Sampling)
Chain 2:                0.555189 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 1.5e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.126153 seconds (Warm-up)
Chain 3:                0.409034 seconds (Sampling)
Chain 3:                0.535187 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 1.5e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.123996 seconds (Warm-up)
Chain 4:                0.423385 seconds (Sampling)
Chain 4:                0.547381 seconds (Total)
Chain 4: 
## 
## Model Info:
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      gop_support_change ~ .
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 3111
##  predictors:   11
## 
## Estimates:
##                      mean   sd    10%   50%   90%
## (Intercept)        -10.9    4.1 -16.0 -11.0  -5.7
## pop_change          -0.3    0.0  -0.3  -0.3  -0.2
## `age_65+`            0.2    0.0   0.1   0.2   0.3
## black               -0.1    0.0  -0.1  -0.1  -0.1
## hispanic            -0.2    0.0  -0.2  -0.2  -0.1
## hs_grad              0.3    0.0   0.2   0.3   0.3
## undergrad           -0.7    0.0  -0.8  -0.7  -0.7
## homeownership_rate   0.0    0.0   0.0   0.0   0.0
## median_home_value    0.0    0.0   0.0   0.0   0.0
## median_income        0.0    0.0   0.0   0.0   0.0
## poverty_rate         0.2    0.0   0.2   0.2   0.3
## sigma                7.5    0.1   7.4   7.5   7.6
## 
## Fit Diagnostics:
##            mean   sd   10%   50%   90%
## mean_PPD 6.7    0.2  6.4   6.7   6.9  
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##                    mcse Rhat n_eff
## (Intercept)        0.1  1.0  2890 
## pop_change         0.0  1.0  4309 
## `age_65+`          0.0  1.0  2767 
## black              0.0  1.0  3730 
## hispanic           0.0  1.0  3297 
## hs_grad            0.0  1.0  2658 
## undergrad          0.0  1.0  2956 
## homeownership_rate 0.0  1.0  2980 
## median_home_value  0.0  1.0  2993 
## median_income      0.0  1.0  2474 
## poverty_rate       0.0  1.0  3183 
## sigma              0.0  1.0  5015 
## mean_PPD           0.0  1.0  3787 
## log-posterior      0.1  1.0  1719 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
## stan_glm
##  family:       gaussian [identity]
##  formula:      gop_support_change ~ .
##  observations: 3111
##  predictors:   11
## ------
##                    Median   MAD_SD  
## (Intercept)        -10.9646   4.0877
## pop_change          -0.2687   0.0390
## `age_65+`            0.2055   0.0440
## black               -0.1093   0.0116
## hispanic            -0.1521   0.0125
## hs_grad              0.2612   0.0369
## undergrad           -0.7165   0.0297
## homeownership_rate   0.0007   0.0246
## median_home_value    0.0000   0.0000
## median_income        0.0002   0.0000
## poverty_rate         0.2266   0.0432
## 
## Auxiliary parameter(s):
##       Median MAD_SD
## sigma 7.4801 0.0971
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
## 
## Computed from 4000 by 3111 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo -10685.3  72.7
## p_loo        19.0   2.5
## looic     21370.6 145.4
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     3110  100.0%  564       
##  (0.5, 0.7]   (ok)          1    0.0%  550       
##    (0.7, 1]   (bad)         0    0.0%  <NA>      
##    (1, Inf)   (very bad)    0    0.0%  <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Most ineterstingly perhaps, we can utilize MCMC plots. mcmc_intervals() has a default showing 50% intervals (the thick lines), and 90% intervals (the thinner outer segments), with the circular point in the middle representing the median.

So, for instnace, we might say that the median effect of being Hispanic (net of other factors) is ~-0.15 percent, and that there is not much uncertainty around this value. (Indeed summary(post1 had the hispanic effect at neg. 0.2 50%, which is a rounded value.)

By the bye, we can also calculate these probabilites manually using the posterior fit stanreg object. For instance, calc. the median effect of hispanic. Or to reproduce the median, 1st and 9th decile for the same.

## [1] -0.1520994
##   10%   50%   90% 
## -0.17 -0.15 -0.14

Now the thinner outer line is 99% credibility, and the thicker inner line represents 80% credibility around the posterior draws.

We might say that, conditional on the data and the model, there is 99% probability that the net effect of going up a percentage point in the county-level Hispanic / Latino population translates to between ~-0.17 and ~-0.13 change (decrease) in the GOP support in that county.

We would say that there is a net positive effect of being “poorer” and aged 65 plus, though there is a distribution around those median values (0.23, and ~0.21, respectively), but a negative effect in terms of the change in the GOP support for persons holding a bachelor’s (or higher) around this period, with there being less uncertainty about that value (the distribution being somewhat narrower in the center).

We see that there are some systematic discrepancies between the model-generated (simulated) data and the actual data in that the medians are off somewhat, which could be remedied e.g. by changing our priors.

There are many, many more sophisticated diagnostics tools out there, but for the interest of brevity, I don’t include them here.

rstan()

Another option (somewhat more interesting) is to do this in rstan. The reason being that that requires one to manually specify the model in a Stan program, which presumes a deeper familiarity / comfort with the syntax, and greater understanding of (i) your model, and (ii) Bayesian statistics. There are some workarounds (namely brms), but save for that, this is also the approach that promises to deliver potentially greater rewards.

There are two ways to do this. (i) To instantiate a hand-written Stan program, or (ii) the “stanfit to array” method. While the latter may be faster to type on the fly, there are discrepancies between the two instantiations. (I would also advise to peruse what a Stan program looks like, though, just to gain some familiarity.) Let’s start with the first alternative.

Direct Feed

Warning: There were 40 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 22. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
Warning: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems
Warning: The largest R-hat is 2.49, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

Evalueates with a number of warnings, which we’d definitely need to pay attention to. Especially worrisome is the large R-hat, as we should only keep samples with an R-hat less than 1.01. For instance, we could try increasing the number of iters and / or warmup. But since we’re doing a toy example, and are more interested in the method than the sample itself, we shall cautiously proceed.

## $summary
##                   mean      se_mean           sd          2.5%           25%
## alpha    -5.643825e+00 3.742245e+00 6.050255e+00 -1.777010e+01 -1.072475e+01
## beta[1]  -2.797613e-01 1.143882e-01 1.874313e-01 -6.449839e-01 -3.140699e-01
## beta[2]   4.073862e-01 5.498121e-01 8.146556e-01 -8.007572e-01  1.572087e-01
## beta[3]  -5.394443e-01 5.497283e-01 7.783000e-01 -1.891372e+00 -4.705748e-01
## beta[4]   1.666335e-01 6.550987e-01 9.587287e-01 -1.113213e+00 -1.675690e-01
## beta[5]   3.046706e-01 1.091231e-01 1.652439e-01  8.524253e-02  2.090472e-01
## beta[6]  -4.304885e-01 2.049281e-01 3.372039e-01 -7.684925e-01 -7.194674e-01
## beta[7]   7.806897e-02 3.965853e-01 6.256411e-01 -1.064679e+00 -2.786597e-02
## beta[8]  -1.679482e-05 8.065881e-05 1.324780e-04 -2.850348e-04 -2.075913e-05
## beta[9]  -2.069659e-04 1.305420e-03 2.011853e-03 -3.161358e-03 -2.226757e-03
## beta[10] -1.633722e-01 2.686959e-01 5.145957e-01 -1.175573e+00 -4.323969e-01
## sigma     7.631036e+00 7.452859e-01 3.883005e+00  3.287142e+00  7.104001e+00
## lp__     -5.026426e+04 2.923082e+04 7.624866e+04 -2.777874e+05 -6.077667e+04
##                    50%           75%         97.5%     n_eff      Rhat
## alpha    -4.524295e+00  3.749063e-01  3.751365e-01  2.613864  2.106949
## beta[1]  -2.638793e-01 -8.833821e-02 -5.868218e-02  2.684859  2.957257
## beta[2]   2.130624e-01  6.783222e-01  1.674224e+00  2.195430  5.360723
## beta[3]  -1.127710e-01 -9.825907e-02 -1.599613e-02  2.004462 31.828540
## beta[4]  -1.515365e-01  7.054339e-01  1.723215e+00  2.141795  6.547172
## beta[5]   2.608692e-01  3.923856e-01  5.580799e-01  2.293070  3.010321
## beta[6]  -6.806487e-01 -4.699477e-02 -5.053794e-03  2.707585  3.119563
## beta[7]   3.182071e-03  6.937067e-01  9.528238e-01  2.488727  4.051529
## beta[8]  -1.693310e-05  9.287369e-05  1.509566e-04  2.697640  3.641940
## beta[9]   1.438059e-04  1.823811e-04  3.275744e-03  2.375156  4.619087
## beta[10]  1.720597e-01  2.276467e-01  2.982045e-01  3.667840  2.638807
## sigma     7.458685e+00  7.563030e+00  1.743241e+01 27.145018  1.184475
## lp__     -7.817495e+03 -7.814246e+03 -7.811392e+03  6.804286  1.891442
## 
## $c_summary
## , , chains = chain:1
## 
##           stats
## parameter           mean           sd          2.5%           25%           50%
##   alpha    -1.078257e+01 4.127428e+00 -1.900760e+01 -1.356999e+01 -1.072152e+01
##   beta[1]  -2.715686e-01 3.803757e-02 -3.423761e-01 -2.987807e-01 -2.706132e-01
##   beta[2]   2.062127e-01 4.220532e-02  1.290390e-01  1.762080e-01  2.056600e-01
##   beta[3]  -1.098833e-01 1.104514e-02 -1.322337e-01 -1.171466e-01 -1.098480e-01
##   beta[4]  -1.525941e-01 1.246936e-02 -1.769956e-01 -1.615340e-01 -1.529115e-01
##   beta[5]   2.603368e-01 3.795429e-02  1.879903e-01  2.351353e-01  2.589766e-01
##   beta[6]  -7.168292e-01 3.115185e-02 -7.752745e-01 -7.366352e-01 -7.161933e-01
##   beta[7]  -5.641945e-04 2.419858e-02 -4.631415e-02 -1.633642e-02 -4.388926e-04
##   beta[8]  -1.748372e-05 3.044750e-06 -2.353423e-05 -1.965682e-05 -1.740555e-05
##   beta[9]   1.536635e-04 3.208543e-05  8.899636e-05  1.318763e-04  1.532369e-04
##   beta[10]  2.271903e-01 4.328898e-02  1.427492e-01  1.989586e-01  2.267151e-01
##   sigma     7.489441e+00 9.291613e-02  7.302373e+00  7.431041e+00  7.487831e+00
##   lp__     -7.814591e+03 2.391303e+00 -7.820164e+03 -7.816061e+03 -7.814308e+03
##           stats
## parameter            75%         97.5%
##   alpha    -7.862144e+00 -2.870583e+00
##   beta[1]  -2.459935e-01 -1.970984e-01
##   beta[2]   2.355479e-01  2.870316e-01
##   beta[3]  -1.025917e-01 -8.948434e-02
##   beta[4]  -1.441935e-01 -1.284327e-01
##   beta[5]   2.855940e-01  3.383278e-01
##   beta[6]  -6.964181e-01 -6.584558e-01
##   beta[7]   1.537231e-02  4.693211e-02
##   beta[8]  -1.539045e-05 -1.163928e-05
##   beta[9]   1.740971e-04  2.149337e-04
##   beta[10]  2.560421e-01  3.156325e-01
##   sigma     7.549453e+00  7.676185e+00
##   lp__     -7.812801e+03 -7.810851e+03
## 
## , , chains = chain:2
## 
##           stats
## parameter           mean           sd          2.5%           25%           50%
##   alpha    -1.317673e+00 1.994983e+00 -5.739514e+00 -2.577650e+00 -1.390793e-01
##   beta[1]  -5.204165e-01 1.740678e-01 -6.450784e-01 -6.449451e-01 -6.447793e-01
##   beta[2]  -4.560364e-01 4.689260e-01 -8.011637e-01 -8.006564e-01 -7.996866e-01
##   beta[3]  -5.239417e-02 4.781044e-02 -1.323599e-01 -1.089012e-01 -1.749011e-02
##   beta[4]  -7.460725e-01 4.801078e-01 -1.113389e+00 -1.113085e+00 -1.110906e+00
##   beta[5]   1.422492e-01 1.129119e-01  7.958036e-02  8.747409e-02  1.109860e-01
##   beta[6]  -2.423236e-01 3.270515e-01 -7.395499e-01 -6.845626e-01 -5.239647e-03
##   beta[7]  -6.366270e-01 5.310979e-01 -1.071722e+00 -1.061501e+00 -1.030028e+00
##   beta[8]  -1.800621e-04 1.279778e-04 -2.867809e-04 -2.830384e-04 -2.767319e-04
##   beta[9]   2.011288e-03 1.537303e-03 -2.506151e-05  1.537200e-04  3.151213e-03
##   beta[10] -6.754367e-01 6.445903e-01 -1.176817e+00 -1.175061e+00 -1.168816e+00
##   sigma     8.257512e+00 5.795448e+00  2.894723e+00  4.199566e+00  7.433392e+00
##   lp__     -6.906640e+04 8.485140e+04 -2.877534e+05 -1.376690e+05 -2.271964e+04
##           stats
## parameter            75%         97.5%
##   alpha    -1.390625e-01 -1.229687e-01
##   beta[1]  -2.994462e-01 -2.286109e-01
##   beta[2]   1.622419e-01  2.689916e-01
##   beta[3]  -1.704905e-02 -1.272861e-02
##   beta[4]  -1.673472e-01 -1.386905e-01
##   beta[5]   2.008534e-01  2.605752e-01
##   beta[6]  -5.099895e-03 -5.043428e-03
##   beta[7]  -2.656818e-02  3.465446e-01
##   beta[8]  -1.948543e-05 -1.273801e-05
##   beta[9]   3.259039e-03  3.301644e-03
##   beta[10]  1.567966e-01  2.254535e-01
##   sigma     7.621195e+00  3.032207e+01
##   lp__     -7.817282e+03 -7.812931e+03
## 
## , , chains = chain:3
## 
##           stats
## parameter           mean           sd          2.5%           25%           50%
##   alpha     3.750221e-01 6.003938e-04  3.749266e-01  3.749614e-01  3.749889e-01
##   beta[1]  -5.874805e-02 3.324668e-04 -5.882056e-02 -5.876119e-02 -5.871329e-02
##   beta[2]   1.673768e+00 3.611971e-03  1.673469e+00  1.674035e+00  1.674117e+00
##   beta[3]  -1.885536e+00 3.773542e-02 -1.891479e+00 -1.891040e+00 -1.888566e+00
##   beta[4]   1.717973e+00 3.467118e-02  1.718288e+00  1.720326e+00  1.720799e+00
##   beta[5]   5.555022e-01 1.681570e-02  5.553907e-01  5.566673e-01  5.569294e-01
##   beta[6]  -4.695834e-02 7.617352e-04 -4.725686e-02 -4.702728e-02 -4.700269e-02
##   beta[7]   9.491342e-01 2.226978e-02  9.493211e-01  9.503628e-01  9.507576e-01
##   beta[8]   1.478727e-04 5.459648e-06  1.420595e-04  1.470596e-04  1.483075e-04
##   beta[9]  -3.145657e-03 8.052943e-05 -3.165745e-03 -3.158171e-03 -3.154689e-03
##   beta[10] -4.329237e-01 5.670700e-03 -4.328616e-01 -4.324251e-01 -4.323799e-01
##   sigma     7.290665e+00 5.120612e+00  3.289568e+00  3.618035e+00  6.291320e+00
##   lp__     -1.163610e+05 8.799147e+04 -2.773882e+05 -2.301347e+05 -8.031604e+04
##           stats
## parameter            75%         97.5%
##   alpha     3.749996e-01  3.751815e-01
##   beta[1]  -5.869616e-02 -5.855289e-02
##   beta[2]   1.674198e+00  1.674258e+00
##   beta[3]  -1.888147e+00 -1.885052e+00
##   beta[4]   1.722907e+00  1.723284e+00
##   beta[5]   5.578584e-01  5.581060e-01
##   beta[6]  -4.699121e-02 -4.695205e-02
##   beta[7]   9.525743e-01  9.528752e-01
##   beta[8]   1.495829e-04  1.527521e-04
##   beta[9]  -3.147495e-03 -3.129211e-03
##   beta[10] -4.323085e-01 -4.322424e-01
##   sigma     7.654742e+00  1.863129e+01
##   lp__     -5.668882e+04 -1.757558e+04
## 
## , , chains = chain:4
## 
##           stats
## parameter           mean           sd          2.5%           25%           50%
##   alpha    -1.085008e+01 4.117619e+00 -1.917777e+01 -1.358951e+01 -1.072707e+01
##   beta[1]  -2.683120e-01 4.258112e-02 -3.514652e-01 -2.959329e-01 -2.698957e-01
##   beta[2]   2.056006e-01 4.394513e-02  1.210885e-01  1.752740e-01  2.034125e-01
##   beta[3]  -1.099641e-01 1.250928e-02 -1.342640e-01 -1.177316e-01 -1.095763e-01
##   beta[4]  -1.527729e-01 1.363018e-02 -1.789459e-01 -1.614894e-01 -1.528696e-01
##   beta[5]   2.605942e-01 3.909231e-02  1.846488e-01  2.352769e-01  2.599652e-01
##   beta[6]  -7.158430e-01 3.107566e-02 -7.775983e-01 -7.364005e-01 -7.160770e-01
##   beta[7]   3.328701e-04 2.521173e-02 -4.876894e-02 -1.721098e-02  4.209088e-04
##   beta[8]  -1.750616e-05 3.177052e-06 -2.360141e-05 -1.983615e-05 -1.756847e-05
##   beta[9]   1.528417e-04 3.065102e-05  9.670673e-05  1.318800e-04  1.521228e-04
##   beta[10]  2.276813e-01 4.150496e-02  1.418653e-01  2.015881e-01  2.270199e-01
##   sigma     7.486526e+00 9.523980e-02  7.302816e+00  7.423827e+00  7.483642e+00
##   lp__     -7.815033e+03 2.397938e+00 -7.820096e+03 -7.816417e+03 -7.814821e+03
##           stats
## parameter            75%         97.5%
##   alpha    -8.051108e+00 -2.850536e+00
##   beta[1]  -2.402894e-01 -1.816115e-01
##   beta[2]   2.348464e-01  2.960704e-01
##   beta[3]  -1.019813e-01 -8.610035e-02
##   beta[4]  -1.430143e-01 -1.273395e-01
##   beta[5]   2.869138e-01  3.343632e-01
##   beta[6]  -6.924851e-01 -6.598883e-01
##   beta[7]   1.817989e-02  4.850977e-02
##   beta[8]  -1.535842e-05 -1.079485e-05
##   beta[9]   1.722210e-04  2.161451e-04
##   beta[10]  2.563781e-01  3.118462e-01
##   sigma     7.552109e+00  7.665957e+00
##   lp__     -7.813352e+03 -7.811190e+03
m2 <- as.matrix(post2)
colnames(m2)[1:11] <- c('(Intercept)', colnames(d)[2:11])

df2 <- as.data.frame(round(quantile(m2[, 'pop_change'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)); colnames(df2) <- 'pop_change'
df2 <- cbind(as.data.frame(round(quantile(m2[, 3], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'age_65+'
df2 <- cbind(as.data.frame(round(quantile(m2[, 'black'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'black'
df2 <- cbind(as.data.frame(round(quantile(m2[, 'hispanic'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'hispanic'
df2 <- cbind(as.data.frame(round(quantile(m2[, 'hs_grad'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'hs_grad'
df2 <- cbind(as.data.frame(round(quantile(m2[, 'undergrad'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'undergrad'
df2 <- cbind(as.data.frame(round(quantile(m2[, 'homeownership_rate'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'homeownership_rate'
df2 <- cbind(as.data.frame(round(quantile(m2[, 'median_home_value'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'median_home_value'
df2 <- cbind(as.data.frame(round(quantile(m2[, 'median_income'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'median_income'
df2 <- cbind(as.data.frame(round(quantile(m2[, 'poverty_rate'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df2); colnames(df2)[1] <- 'poverty_rate'
## Warning: `as.tibble()` was deprecated in tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.
## # A tibble: 10 × 6
##    rowname            `2.5%` `25%` `50%` `75%` `97.5%`
##    <fct>               <dbl> <dbl> <dbl> <dbl>   <dbl>
##  1 poverty_rate        -1.18 -0.43  0.17  0.23    0.3 
##  2 median_income        0     0     0     0       0   
##  3 median_home_value    0     0     0     0       0   
##  4 homeownership_rate  -1.06 -0.03  0     0.69    0.95
##  5 undergrad           -0.77 -0.72 -0.68 -0.05   -0.01
##  6 hs_grad              0.09  0.21  0.26  0.39    0.56
##  7 hispanic            -1.11 -0.17 -0.15  0.71    1.72
##  8 black               -1.89 -0.47 -0.11 -0.1    -0.02
##  9 age_65+             -0.8   0.16  0.21  0.68    1.67
## 10 pop_change          -0.64 -0.31 -0.26 -0.09   -0.06
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Warning: Removed 2 rows containing missing values (geom_segment).

From within R

Warning: There were 77 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 21. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems
Warning: The largest R-hat is 1.55, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
## $summary
##                   mean      se_mean           sd          2.5%           25%
## alpha    -8.089427e+00 3.371873e+00 5.892194e+00 -1.876007e+01 -1.253637e+01
## beta[1]  -1.092389e-01 1.943874e-01 3.279540e-01 -3.444118e-01 -2.886433e-01
## beta[2]   3.893662e-01 2.213753e-01 3.743635e-01  1.179320e-01  1.836181e-01
## beta[3]  -1.156354e-01 7.536748e-03 1.624521e-02 -1.367531e-01 -1.281441e-01
## beta[4]  -1.070079e-01 5.522436e-02 9.598061e-02 -1.784274e-01 -1.601809e-01
## beta[5]   5.040721e-01 2.932880e-01 5.164141e-01  1.774184e-01  2.351958e-01
## beta[6]  -5.878351e-01 1.570408e-01 2.609274e-01 -7.738212e-01 -7.299527e-01
## beta[7]   1.709747e-01 2.058945e-01 3.605102e-01 -5.400072e-02 -1.566645e-02
## beta[8]   3.461403e-05 6.276317e-05 1.080758e-04 -2.338802e-05 -1.902770e-05
## beta[9]  -8.844660e-04 1.251914e-03 2.135451e-03 -5.321445e-03  1.057523e-04
## beta[10]  2.753851e-01 5.626334e-02 1.163544e-01  1.351983e-01  2.002963e-01
## sigma     7.366271e+00 4.107805e-01 2.450721e+00  3.811837e+00  7.377861e+00
## lp__     -3.188791e+04 2.622184e+04 6.026312e+04 -2.364669e+05 -7.819232e+03
##                    50%           75%         97.5%     n_eff     Rhat
## alpha    -8.916341e+00 -1.712196e+00  9.768782e-02  3.053597 1.682743
## beta[1]  -2.572258e-01 -2.120129e-01  5.526320e-01  2.846359 2.315812
## beta[2]   2.211476e-01  2.734301e-01  1.148398e+00  2.859754 2.323990
## beta[3]  -1.138458e-01 -1.043941e-01 -8.762310e-02  4.646035 1.329616
## beta[4]  -1.496916e-01 -1.336641e-01  8.761114e-02  3.020686 2.199290
## beta[5]   2.674629e-01  3.144107e-01  1.574168e+00  3.100328 2.237410
## beta[6]  -7.057480e-01 -6.716268e-01 -6.005388e-02  2.760671 2.409702
## beta[7]   5.492209e-03  3.761745e-02  9.197728e-01  3.065813 2.266112
## beta[8]  -1.661724e-05 -1.326041e-05  2.611732e-04  2.965151 2.341918
## beta[9]   1.425592e-04  1.687923e-04  2.139462e-04  2.909581 2.387176
## beta[10]  2.371775e-01  2.891482e-01  5.007450e-01  4.276759 1.717712
## sigma     7.469825e+00  7.545976e+00  8.635352e+00 35.593274 1.152376
## lp__     -7.815609e+03 -7.813554e+03 -7.811197e+03  5.281736 2.502053
## 
## $c_summary
## , , chains = chain:1
## 
##           stats
## parameter           mean           sd          2.5%           25%           50%
##   alpha    -1.089275e+01 4.204113e+00 -1.905170e+01 -1.369045e+01 -1.094454e+01
##   beta[1]  -2.711680e-01 3.992163e-02 -3.467507e-01 -2.994336e-01 -2.708615e-01
##   beta[2]   2.077396e-01 4.521436e-02  1.143327e-01  1.780911e-01  2.100598e-01
##   beta[3]  -1.100505e-01 1.170578e-02 -1.330819e-01 -1.177103e-01 -1.097276e-01
##   beta[4]  -1.529990e-01 1.261815e-02 -1.764700e-01 -1.613721e-01 -1.528102e-01
##   beta[5]   2.600372e-01 3.725606e-02  1.900449e-01  2.339418e-01  2.605320e-01
##   beta[6]  -7.177332e-01 3.180844e-02 -7.827212e-01 -7.392918e-01 -7.175234e-01
##   beta[7]  -7.944003e-04 2.623649e-02 -5.270366e-02 -1.819667e-02 -9.448664e-04
##   beta[8]  -1.760905e-05 2.929636e-06 -2.351628e-05 -1.961566e-05 -1.758788e-05
##   beta[9]   1.562375e-04 3.308665e-05  9.353016e-05  1.335349e-04  1.557011e-04
##   beta[10]  2.296064e-01 4.532492e-02  1.485630e-01  1.983394e-01  2.274494e-01
##   sigma     7.486758e+00 8.828622e-02  7.319347e+00  7.425178e+00  7.485725e+00
##   lp__     -7.814923e+03 2.487589e+00 -7.820834e+03 -7.816379e+03 -7.814579e+03
##           stats
## parameter            75%         97.5%
##   alpha    -7.990165e+00 -2.484948e+00
##   beta[1]  -2.427803e-01 -1.959469e-01
##   beta[2]   2.407183e-01  2.925905e-01
##   beta[3]  -1.019045e-01 -8.777471e-02
##   beta[4]  -1.448332e-01 -1.276708e-01
##   beta[5]   2.856762e-01  3.336807e-01
##   beta[6]  -6.968596e-01 -6.557484e-01
##   beta[7]   1.627053e-02  5.203847e-02
##   beta[8]  -1.560086e-05 -1.201464e-05
##   beta[9]   1.782638e-04  2.241522e-04
##   beta[10]  2.615096e-01  3.192757e-01
##   sigma     7.543243e+00  7.666810e+00
##   lp__     -7.813124e+03 -7.810968e+03
## 
## , , chains = chain:2
## 
##           stats
## parameter           mean           sd          2.5%           25%           50%
##   alpha    -9.811063e-02 5.835178e-01 -1.991075e+00  9.733619e-02  9.736015e-02
##   beta[1]   3.715588e-01 3.427045e-01 -3.282269e-01  5.525189e-01  5.525436e-01
##   beta[2]   9.375480e-01 3.919196e-01  1.377822e-01  1.135730e+00  1.148275e+00
##   beta[3]  -1.332392e-01 1.455861e-02 -1.458390e-01 -1.364984e-01 -1.363759e-01
##   beta[4]   3.021254e-02 1.060297e-01 -1.814223e-01  8.691249e-02  8.758265e-02
##   beta[5]   1.237576e+00 5.875328e-01  1.559294e-01  1.331732e+00  1.572109e+00
##   beta[6]  -2.000813e-01 2.629381e-01 -7.233552e-01 -6.018609e-02 -6.014427e-02
##   beta[7]   6.852888e-01 4.064511e-01 -6.229696e-02  6.736548e-01  9.179952e-01
##   beta[8]   1.909943e-04 1.187044e-04 -2.287044e-05  2.137460e-04  2.581738e-04
##   beta[9]  -4.000047e-03 2.301253e-03 -5.327046e-03 -5.317616e-03 -5.312115e-03
##   beta[10]  4.189146e-01 1.445921e-01  1.192601e-01  4.509383e-01  5.006532e-01
##   sigma     7.009818e+00 4.883066e+00  3.487205e+00  4.717937e+00  6.500748e+00
##   lp__     -1.041069e+05 8.703962e+04 -2.815820e+05 -1.563143e+05 -8.547478e+04
##           stats
## parameter            75%         97.5%
##   alpha     9.745290e-02  4.383232e-01
##   beta[1]   5.525676e-01  5.528527e-01
##   beta[2]   1.148384e+00  1.148409e+00
##   beta[3]  -1.362206e-01 -1.014445e-01
##   beta[4]   8.760291e-02  8.761967e-02
##   beta[5]   1.573365e+00  1.574486e+00
##   beta[6]  -6.010705e-02 -5.966620e-02
##   beta[7]   9.191746e-01  9.200914e-01
##   beta[8]   2.597673e-04  2.630330e-04
##   beta[9]  -4.378572e-03  1.783107e-04
##   beta[10]  5.007164e-01  5.008153e-01
##   sigma     7.530702e+00  1.742326e+01
##   lp__     -1.289981e+04 -7.814668e+03
## 
## , , chains = chain:3
## 
##           stats
## parameter           mean           sd          2.5%           25%           50%
##   alpha    -1.027851e+01 3.935162e+00 -1.815793e+01 -1.303319e+01 -1.018853e+01
##   beta[1]  -2.687844e-01 3.846027e-02 -3.467569e-01 -2.932410e-01 -2.676421e-01
##   beta[2]   2.068854e-01 4.596277e-02  1.120413e-01  1.758963e-01  2.080868e-01
##   beta[3]  -1.099855e-01 1.175710e-02 -1.325836e-01 -1.178441e-01 -1.101094e-01
##   beta[4]  -1.535383e-01 1.261821e-02 -1.776779e-01 -1.619497e-01 -1.541782e-01
##   beta[5]   2.563518e-01 3.546091e-02  1.894425e-01  2.316116e-01  2.567655e-01
##   beta[6]  -7.163233e-01 2.888384e-02 -7.746099e-01 -7.361904e-01 -7.148978e-01
##   beta[7]  -2.296453e-03 2.478498e-02 -5.316283e-02 -1.806397e-02 -2.194718e-03
##   beta[8]  -1.763236e-05 3.036748e-06 -2.369265e-05 -1.960225e-05 -1.756260e-05
##   beta[9]   1.536136e-04 3.042189e-05  9.456167e-05  1.340505e-04  1.532408e-04
##   beta[10]  2.249860e-01 4.153726e-02  1.421934e-01  1.973850e-01  2.265887e-01
##   sigma     7.482734e+00 9.666691e-02  7.297999e+00  7.413139e+00  7.482942e+00
##   lp__     -7.814843e+03 2.476517e+00 -7.820385e+03 -7.816417e+03 -7.814500e+03
##           stats
## parameter            75%         97.5%
##   alpha    -7.454780e+00 -2.630131e+00
##   beta[1]  -2.436641e-01 -1.959149e-01
##   beta[2]   2.380468e-01  2.960170e-01
##   beta[3]  -1.019364e-01 -8.650623e-02
##   beta[4]  -1.449055e-01 -1.294188e-01
##   beta[5]   2.798369e-01  3.232117e-01
##   beta[6]  -6.966730e-01 -6.607049e-01
##   beta[7]   1.380535e-02  4.562846e-02
##   beta[8]  -1.559695e-05 -1.179859e-05
##   beta[9]   1.732190e-04  2.132616e-04
##   beta[10]  2.518159e-01  3.079415e-01
##   sigma     7.548322e+00  7.671876e+00
##   lp__     -7.812937e+03 -7.811112e+03
## 
## , , chains = chain:4
## 
##           stats
## parameter           mean           sd          2.5%           25%           50%
##   alpha    -1.108834e+01 4.458940e+00 -1.988968e+01 -1.405500e+01 -1.090460e+01
##   beta[1]  -2.685623e-01 3.850300e-02 -3.453408e-01 -2.938178e-01 -2.682774e-01
##   beta[2]   2.052920e-01 4.666143e-02  1.151465e-01  1.731513e-01  2.060298e-01
##   beta[3]  -1.092664e-01 1.245699e-02 -1.324887e-01 -1.174216e-01 -1.095481e-01
##   beta[4]  -1.517067e-01 1.352355e-02 -1.775057e-01 -1.609095e-01 -1.520997e-01
##   beta[5]   2.623237e-01 3.972196e-02  1.883534e-01  2.368804e-01  2.622825e-01
##   beta[6]  -7.172027e-01 2.969132e-02 -7.761633e-01 -7.363015e-01 -7.172031e-01
##   beta[7]   1.700741e-03 2.640753e-02 -4.899237e-02 -1.640645e-02  1.982792e-03
##   beta[8]  -1.729672e-05 2.936367e-06 -2.299477e-05 -1.925433e-05 -1.729124e-05
##   beta[9]   1.523316e-04 3.163501e-05  9.020852e-05  1.311849e-04  1.533244e-04
##   beta[10]  2.280334e-01 4.471717e-02  1.340770e-01  1.991702e-01  2.284190e-01
##   sigma     7.485773e+00 9.365341e-02  7.309991e+00  7.423684e+00  7.484107e+00
##   lp__     -7.814997e+03 2.547451e+00 -7.820770e+03 -7.816592e+03 -7.814730e+03
##           stats
## parameter            75%         97.5%
##   alpha    -7.990486e+00 -2.892546e+00
##   beta[1]  -2.433668e-01 -1.966578e-01
##   beta[2]   2.364995e-01  2.934979e-01
##   beta[3]  -1.011386e-01 -8.487518e-02
##   beta[4]  -1.427053e-01 -1.259372e-01
##   beta[5]   2.866227e-01  3.477804e-01
##   beta[6]  -6.971213e-01 -6.603525e-01
##   beta[7]   1.921063e-02  5.307765e-02
##   beta[8]  -1.525424e-05 -1.155501e-05
##   beta[9]   1.748310e-04  2.145334e-04
##   beta[10]  2.594984e-01  3.155510e-01
##   sigma     7.550317e+00  7.679438e+00
##   lp__     -7.813050e+03 -7.811035e+03

We’d need to increase max_treedepth, and R-hat is way too large, indicating that the algorithm isn’t quite working, but that would require more debugging, and time to evaluate.

m3 <- as.matrix(post3)
colnames(m3)[1:11] <- c('(Intercept)', colnames(d)[2:11])
df3 <- as.data.frame(round(quantile(m3[, 'pop_change'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)); colnames(df3) <- 'pop_change'
df3 <- cbind(as.data.frame(round(quantile(m3[, 3], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'age_65+'
df3 <- cbind(as.data.frame(round(quantile(m3[, 'black'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'black'
df3 <- cbind(as.data.frame(round(quantile(m3[, 'hispanic'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'hispanic'
df3 <- cbind(as.data.frame(round(quantile(m3[, 'hs_grad'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'hs_grad'
df3 <- cbind(as.data.frame(round(quantile(m3[, 'undergrad'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'undergrad'
df3 <- cbind(as.data.frame(round(quantile(m3[, 'homeownership_rate'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'homeownership_rate'
df3 <- cbind(as.data.frame(round(quantile(m3[, 'median_home_value'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'median_home_value'
df3 <- cbind(as.data.frame(round(quantile(m3[, 'median_income'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'median_income'
df3 <- cbind(as.data.frame(round(quantile(m3[, 'poverty_rate'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df3); colnames(df3)[1] <- 'poverty_rate'
## # A tibble: 10 × 6
##    rowname            `2.5%` `25%` `50%` `75%` `97.5%`
##    <fct>               <dbl> <dbl> <dbl> <dbl>   <dbl>
##  1 poverty_rate         0.14  0.2   0.24  0.29    0.5 
##  2 median_income       -0.01  0     0     0       0   
##  3 median_home_value    0     0     0     0       0   
##  4 homeownership_rate  -0.05 -0.02  0.01  0.04    0.92
##  5 undergrad           -0.77 -0.73 -0.71 -0.67   -0.06
##  6 hs_grad              0.18  0.24  0.27  0.31    1.57
##  7 hispanic            -0.18 -0.16 -0.15 -0.13    0.09
##  8 black               -0.14 -0.13 -0.11 -0.1    -0.09
##  9 age_65+              0.12  0.18  0.22  0.27    1.15
## 10 pop_change          -0.34 -0.29 -0.26 -0.21    0.55

Next up, how about we model a prior predictive and our priors using JQPD (Johnson Quantile Parametrized Distributions) and / or GLD (General Lambda Distribution), which allow one to discard the notion of some predefined distributions, and specify prior beliefs about one’s parameters with things like the interquartile range and so on. The benefit of this is that with a couple of functions, you can simulate millions of distributions. Let’s look into that next.

JQPD & GLD

Johnson Quantile-Parameterized Distribution (JQPD) & General Lambda Distribution (GLD).

Prior Distribution

We’ll draw S times from the prior distribution of GOP support change over a sample N of data points, whereby each S simulated outcome vector of size N has a calcualted prior \(R{_s^2}=\frac{\frac{1}{N-1}\sum{_{n=1}^N}(\tilde{\mu}{_n}-\overline{\mu})^2}{\tilde{\sigma}{^2}+\frac{1}{N-1}\sum{_{n=1}^N}(\tilde{\mu}{_n}-\overline{\mu})^2}\)

with \(\overline{\mu}\) being the average of the conditional means over the N “observations” in the s-th simulaiton.

Get some warnings, and messages, which we need to pay attention to.

R_squared <- replicate(1000, {
  alpha_ <- rnorm(1, mean = m_alpha, sd = s_alpha) # num. of observations, mu, sigma
  
  beta_PC_ <- GLD_rng(medians['PC'], IQR = uppers['PC'] - lowers['PC'],
                      asymmetry = a_s$PC[1], steepness = a_s$PC[2])
  beta_A_ <- GLD_rng(medians['A'], IQR = uppers['A'] - lowers['A'],
                     asymmetry = a_s$A[1], steepness = a_s$A[2])
  beta_B_ <- GLD_rng(medians['B'], IQR = uppers['B'] - lowers['B'],
                     asymmetry = a_s$B[1], steepness = a_s$B[2])
  beta_H_ <- GLD_rng(medians['H'], IQR = uppers['H'] - lowers['H'],
                     asymmetry = a_s$H[1], steepness = a_s$H[2])
  beta_HS_ <- GLD_rng(medians['HS'], IQR = uppers['HS'] - lowers['HS'],
                      asymmetry = a_s$HS[1], steepness = a_s$HS[2])
  beta_U_ <- GLD_rng(medians['U'], IQR = uppers['U'] - lowers['U'],
                     asymmetry = a_s$U[1], steepness = a_s$U[2])
  beta_HR_ <- GLD_rng(medians['HR'], IQR = uppers['HR'] - lowers['HR'],
                      asymmetry = a_s$HR[1], steepness = a_s$HR[2])
  beta_MHV_ <- GLD_rng(medians['MHV'], IQR = uppers['MHV'] - lowers['MHV'],
                      asymmetry = a_s$MHV[1], steepness = a_s$MHV[2])
  beta_MI_ <- GLD_rng(medians['MI'], IQR = uppers['MI'] - lowers['MI'],
                      asymmetry = a_s$MI[1], steepness = a_s$MI[2])
  beta_PR_ <- GLD_rng(medians['PR'], IQR = uppers['PR'] - lowers['PR'],
                      asymmetry = a_s$PR[1], steepness = a_s$PR[2])
  
  mu_ <- alpha_ + beta_PC_ * X_[, 'pop_change'] + beta_A_ * X_[, '`age_65+`'] +
         beta_B_ * X_[, 'black'] + beta_H_ * X_[, 'hispanic'] + beta_HS_ * X_[, 'hs_grad'] +
         beta_U_ * X_[, 'undergrad'] + beta_HR_ * X_[, 'homeownership_rate'] + beta_MHV_ * X_[, 'median_home_value'] +
         beta_MI_ * X_[, 'median_income'] + beta_PR_ * X_[, 'poverty_rate']
  sigma_ <- JQPDS_rng(lower_bound = 0, alpha = 0.25, quantiles = q_sigma)
  numer <- var(mu_)
  denom <- sigma_^2 + numer
  R_squared_ <- numer / denom
  R_squared_
})
summary(R_squared)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1082  0.9938  0.9989  0.9796  0.9998  1.0000

Drawing from the Posterior

Warning in qbeta(0.5, half_K, qexp(eta)): qbeta(a, *) =: x0 with |pbeta(x0,*) -
alpha| = 0.46348 is not accurate
## stan_lm
##  family:       gaussian [identity]
##  formula:      gop_support_change ~ .
##  observations: 3111
##  predictors:   11
## ------
##                    Median MAD_SD
## (Intercept)        -10.77   2.84
## pop_change          -0.27   0.04
## `age_65+`            0.21   0.05
## black               -0.11   0.01
## hispanic            -0.15   0.01
## hs_grad              0.26   0.03
## undergrad           -0.72   0.03
## homeownership_rate   0.00   0.02
## median_home_value    0.00   0.00
## median_income        0.00   0.00
## poverty_rate         0.23   0.04
## 
## Auxiliary parameter(s):
##               Median MAD_SD
## R2            0.49   0.01  
## log-fit_ratio 0.00   0.01  
## sigma         7.48   0.09  
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg

Check

m4 <- as.matrix(post4)
colnames(m4)[1:11] <- c('(Intercept)', colnames(d)[2:11])

df4 <- as.data.frame(round(quantile(m4[, 'pop_change'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)); colnames(df4) <- 'pop_change'
df4 <- cbind(as.data.frame(round(quantile(m4[, 3], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'age_65+'
df4 <- cbind(as.data.frame(round(quantile(m4[, 'black'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'black'
df4 <- cbind(as.data.frame(round(quantile(m4[, 'hispanic'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'hispanic'
df4 <- cbind(as.data.frame(round(quantile(m4[, 'hs_grad'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'hs_grad'
df4 <- cbind(as.data.frame(round(quantile(m4[, 'undergrad'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'undergrad'
df4 <- cbind(as.data.frame(round(quantile(m4[, 'homeownership_rate'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'homeonwership_rate'
df4 <- cbind(as.data.frame(round(quantile(m4[, 'median_home_value'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'median_home_value'
df4 <- cbind(as.data.frame(round(quantile(m4[, 'median_income'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'median_income'
df4 <- cbind(as.data.frame(round(quantile(m4[, 'poverty_rate'], probs = c(0.025, 0.25, 0.5, 0.75, 0.975)), digits = 2)), df4); colnames(df4)[1] <- 'poverty_rate'
## # A tibble: 10 × 6
##    rowname            `2.5%` `25%` `50%` `75%` `97.5%`
##    <fct>               <dbl> <dbl> <dbl> <dbl>   <dbl>
##  1 poverty_rate         0.15  0.2   0.23  0.25    0.3 
##  2 median_income        0     0     0     0       0   
##  3 median_home_value    0     0     0     0       0   
##  4 homeownership_rate  -0.05 -0.02  0     0.02    0.05
##  5 undergrad           -0.77 -0.74 -0.72 -0.7    -0.66
##  6 hs_grad              0.2   0.24  0.26  0.28    0.32
##  7 hispanic            -0.18 -0.16 -0.15 -0.14   -0.13
##  8 black               -0.13 -0.12 -0.11 -0.1    -0.09
##  9 age_65+              0.12  0.18  0.21  0.24    0.3 
## 10 pop_change          -0.34 -0.3  -0.27 -0.24   -0.19

So, what shall we say as a conclusion? It may not be quite trivial to “guess” the prior stuff using quantile functions. But, the tradeoff is that we can dispense with pre-defined distributions. This is arguably a cleverer approach, but more difficult to implement.