Abstract
This is a lab script for POST 8000, a graduate-level quantitative methods for public policy class that I teach at Clemson University. It will not be the most sophisticated R-related write-up of mine—check my blog for those—but it should be useful for discussion around the associated R script for the week’s ‘lab’ session.Some of what I write here is taken from my website. Please read these:
Here are the R packages we’ll be using today. {tidyverse}
is for all things workflow and {stevedata}
has toy data we’ll use. {brms}
is the only new addition to this list. It’s my favorite wrapper for estimating Bayesian models. It’s a wrapper for the Stan programming language, which it should install for you.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.5 ✓ dplyr 1.0.3
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(brms)
## Loading required package: Rcpp
## This version of bslib is designed to work with shiny version 1.6.0 or higher.
## Loading 'brms' package (version 2.14.4). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
##
## Attaching package: 'brms'
## The following object is masked from 'package:stats':
##
## ar
library(stevedata)
We’ll start with a basic illustration of Bayesian inference as it’s implemented computationally and join it with a pitch of why you should think about this approach, no matter its limitations (i.e. computational run-time). The biggest benefit is that Bayesian inference is actually providing an answer to the question you’re asking. In a lot of applications, you want to know the probability of some hypothesis, given the data you obtained analyzed. The alternative/standard approach—the so-called “frequentist” approach—is giving you the probability of the data, given some alternative hypothesis. Thus, the frequentist answer to the question you’re asking is basically indirect. The answer you get from the frequentist approach amounts to “here are some values I can’t rule out.” The Bayesian answer is a bit more direct: the probability of theta, given the data.
However, this approach requires the introduction of prior beliefs that effectively weight the likelihood function of the data to produce the posterior estimates that you want. These priors amount to reasonable expectations of the data, which you can approach any number of ways. Here’s one way of thinking about it, with respect to a simple example that I’ve seen around the block a few times and incidentally involves a university that has a special place in my heart. Suspend disbelief about adequately sourcing the information presented here and take it as a matter of fact. But, the introduction of physical training during World War II apparently made for quite the fit male student body at the University of Illinois at this time. A simple random sample, purportedly of 100 (male) students, resulted in a mean mile run time of 7.11 minutes with a standard deviation of .74 minutes. That would be an amazing level of fitness; I’d think.
Let’s mimic that here, standardizing those times to seconds, and recreating a random sample of 100 observations that would roughly correspond to that.
<- 7.11*60
mean_time <- .74*60
sd_time
set.seed(8675309)
<- tibble(y = rnorm(100, mean_time, sd_time))
illinois_times
mean_time
## [1] 426.6
sd_time
## [1] 44.4
mean(illinois_times$y)
## [1] 428.9222
sd(illinois_times$y)
## [1] 41.25398
Good enough.
Here’s what we want to know, though. We observe these mile run times where the mean is 428.92 seconds and the standard deviation is 41.25 seconds. What should we think is the mean mile time (and error around the mean) given the data we observed?
Here, we reiterate a few things at stake: for one, the probability of A given B is not equal to the probability of B given A. Frequentist approaches aren’t going to give you a direct answer to this question. Further, we have 100 observations from which to infer. In the broad scheme of things, that’s not a whole lot. This means if we have strong beliefs about what mile times should be, that should exert some influence on how we interpret the data we observe.
We could approach this a lot of different ways, but let’s do two simple approaches to thinking about prior expectations. First, though, let’s use the {brms}
to remind us what parameters are at stake here.
<- bf(y ~ 1)
mile_form get_prior(mile_form, data = illinois_times)
## prior class coef group resp dpar nlpar bound source
## student_t(3, 429, 42.9) Intercept default
## student_t(3, 0, 42.9) sigma default
In this simple model, we’re calculating just two things: an intercept that will amount to the mean and a residual standard deviation that will amount to the standard deviation of the mean. This approach will also show you what the defaults are. My blog post on how to think about priors for a Bayesian analysis will explain where {brms}
is getting these priors. If you were curious, this is where {brms}
would be starting for these two parameters.
set.seed(8675309)
tibble(int_prior = rstudent_t(10000, 3, 429, 42.9),
sig_prior = rstudent_t(10000, 3, 0, 42.9)) %>%
gather(var, value) %>%
ggplot(.,aes(value)) +
facet_wrap(~var, scales = "free") +
geom_density()
When curious about priors, embrace simulating values from them to see what they look like. Here, the intercept prior seems okay even as the left tail of it cluster around a five-minute mile (and I think we can probably rule that out as a plausible value since that would be an elite time). The sigma amounts to a “half-t” because standard deviations cannot be negative. These aren’t awful priors, but I think we can do a tiny bit better while encouraging you to think about your data a little bit more.
This seems daunting, but this is a clear “use your head” issue here and, thing is, you don’t even have to use all your head! Just use a little of it. Here are two examples to get you started. For one, here’s a news source that suggests the average mile time for young men in and around that age bracket is “about 9 to 10 minutes.”. Let’s take the upper bound of that as the mean and use that minute window in the “about 9 to 10 minutes” to suggest a standard deviation of about a minute. We’ll make this a skinny prior too. In other words, we’re reasonably sure about these as prior estimates for the mean mile time and the standard deviation mile time.
<- c(set_prior("normal(600, 10)", class="Intercept"),
ten_minute_prior set_prior("normal(60, 10)", class = "sigma"))
Here’s another approach that requires only a little of your head. We know that a four-minute mile is near impossible to crack. People have, but that is an elite time. So, we don’t think the mean could possibly be less than four. We also know the average person walks at a pace of around 3 to 4 miles per hour. That would mean a person could complete a mile just by walking in around 15 minutes. It could not possibly be less than 4 minutes and, in all likelihood, it could not be more than 15. However, any time between those extremes is equally likely. If we standardize that to seconds, we have a minimum of 240 and a maximum of 900. In other words, the distribution here could follow a uniform distribution between both extremes for the mean. We’ll set a similar one for the standard deviation of the mean too. Let’s say: 0 to 60. I want to note that thinking of an upper bound for the residual standard deviation is really dumb and {brms}
/Stan is going to bark at you for this. It’ll still run, though. It’ll just tell you that it think it’s stupid to impose an upper bound on a standard deviation like this.
<- bf(y ~ 0 + Intercept) # see what I did there
unif_form get_prior(unif_form, data=illinois_times)
## prior class coef group resp dpar nlpar bound source
## (flat) b default
## (flat) b Intercept (vectorized)
## student_t(3, 0, 42.9) sigma default
<- c(prior("uniform(240, 900)", lb=240, ub=900, class="b"),
uniform_prior prior("uniform(0, 60)", class="sigma"))
I’ll only add the caveat here that Bayesian modelers are generally loathe to encourage people, especially beginners, to hide behind the uniform distribution. It truly is the laziest distribution. I’m sure it’s why {brms}
is going to make it just a tiny bit more difficult to do this.
No matter, let’s see what our priors mean for the inferences we want to make. Let’s start with the default priors given by {brms}
. I want to note, in this application, I’m going to care mostly about the intercept (i.e. the mean mile time).
<- brm(mile_form,
M1 family = gaussian(), # the normal distribution
data=illinois_times,
seed = 8675309, # reproducible seed
sample_prior = TRUE, # give us samples from the prior distribution too
refresh = 0, # by default, Stan spams the console with sampling updates. Disable that.
)
## Compiling Stan program...
## Start sampling
## Running MCMC with 4 chains, at most 32 in parallel...
##
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
## Chain 4 finished in 0.1 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.4 seconds.
Now, let’s use some {brms}
functions to extract the output of interest to us. posterior_samples()
returns the posterior estimates (b_Intercept
and sigma
) along with simulations from the prior distribution as well (i.e. prior_Intercept
and prior_sigma
). Some “tidy”-friendly verbs will make the following plot.
posterior_samples(M1) %>% as_tibble() %>%
select(b_Intercept, prior_Intercept) %>%
gather(var, val) %>%
bind_rows(., tibble(var = "actual", val = illinois_times$y)) %>%
ggplot(.,aes(val, fill=var)) + geom_density(alpha=0.8) +
theme_bw() + theme(legend.position = "bottom")
I want you to notice what happened. The {brms}
default priors lean on the available data. As a result, the posterior estimate of the mean basically converges on the actual mean. {brms}
default approach, in the absence of further guidance, is to devise priors that exert minimal weight on the posterior estimates.
Let’s see what a bit more informative prior looks like. Here, we suspect the mean is about 10 minutes, give or take 10 seconds. We’re comparably sure about the residual standard deviation as well, suggesting a mean of 60 seconds with a standard deviation of 10 seconds.
<- brm(mile_form,
M2 family = gaussian(),
data=illinois_times,
seed = 8675309,
sample_prior = TRUE,
refresh = 0,
prior = ten_minute_prior # specify our priors
)
## Compiling Stan program...
## Start sampling
## Running MCMC with 4 chains, at most 32 in parallel...
##
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
## Chain 4 finished in 0.1 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.3 seconds.
posterior_samples(M2) %>% as_tibble() %>%
select(b_Intercept, prior_Intercept) %>%
gather(var, val) %>%
bind_rows(., tibble(var = "actual", val = illinois_times$y)) %>%
ggplot(.,aes(val, fill=var)) + geom_density(alpha=0.8) +
theme_bw() + theme(legend.position = "bottom")
Here, observe that the prior had a fairly strong effect on the posterior estimates. We were so sure the mean was 10 minutes. The data we observed, which had a mean of just over 7 minutes, lead us to update our assessments of what the mean could be. We thought it was 10 minutes. We observe a mean of just over 7 minutes. However, the somewhat paltry nature of the data (100 observations) and the strength of the prior mean we update our belief that the mean is about 7.99 minutes.
Now, the stupid uniform distribution is going to produce a take-away similar to the default {brms}
approach. Because the uniform distribution is flat with no natural peak, the posterior estimates are going to converge on the observed data.
<- brm(unif_form,
M3 family = gaussian(),
data=illinois_times,
seed = 8675309,
sample_prior = TRUE,
refresh = 0,
prior = uniform_prior)
## Warning: It appears as if you have specified an upper bounded prior on a parameter that has no natural upper bound.
## If this is really what you want, please specify argument 'ub' of 'set_prior' appropriately.
## Warning occurred for prior
## sigma ~ uniform(0, 60)
## Compiling Stan program...
## Start sampling
## Running MCMC with 4 chains, at most 32 in parallel...
##
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.0 seconds.
## Chain 3 finished in 0.0 seconds.
## Chain 4 finished in 0.0 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.3 seconds.
posterior_samples(M3) %>% as_tibble() %>%
select(b_Intercept, prior_b) %>%
gather(var, val) %>%
bind_rows(., tibble(var = "actual", val = illinois_times$y)) %>%
ggplot(.,aes(val, fill=var)) + geom_density(alpha=0.8) +
theme_bw() + theme(legend.position = "bottom")
If you have weak data and some prior expectations about it, you should use a Bayesian approach. If you want a bit more direct answer to your question about the state of the world, given some observed data, you should also use a Bayesian approach.
The “why” for Bayes is multiple and the rationale above reduces to 1) it’s actually answering the question you’re asking and 2) it allows prior beliefs about the data to inform our probabilistic assessments about the nature of the world given the question you’re asking. If you don’t have a lot of data to bear, the prior expectations of it can exert a fair bit of influence on the posterior estimates. The “how” may not be 100% obvious, though. So, here are some tips for how to use Bayesian approaches to statistical inference for practical purposes.
My first introduction to Bayesian inference was in the form of a competitive hypothesis test pitting two arguments about union density. This will clearly be aped from my blog post on this from 2019. Here’s the tl;dr, though. The outcome of interest is union density for 20 rich countries around the end of the 1970s. One side says the primary driver of union density is logged labor force size (intuition: higher labor force -> greater difficulty in achieving collective action). Another side says the primary driver of union density is industrialization (intuition: industrialization played an outsized role in the labor movement and addresses the collective action problem through concentration of the population into urban centers). The problem, however, is that both industrialization and logged labor force size are almost perfectly correlated.
The data are available in {stevedata}
as uniondensity
.
uniondensity
## # A tibble: 20 x 5
## country union left size concen
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Sweden 82.4 112. 8.28 1.55
## 2 Israel 80 73.2 6.90 1.71
## 3 Iceland 74.3 17.2 4.39 2.06
## 4 Finland 73.3 59.3 7.62 1.56
## 5 Belgium 71.9 43.2 8.12 1.52
## 6 Denmark 69.8 90.2 7.71 1.52
## 7 Ireland 68.1 0 6.79 1.75
## 8 Austria 65.6 48.7 7.81 1.53
## 9 NZ 59.4 60 6.96 1.64
## 10 Norway 58.9 83.1 7.41 1.58
## 11 Australia 51.4 33.7 8.60 1.37
## 12 Italy 50.6 0 9.67 0.86
## 13 UK 48 43.7 10.2 1.13
## 14 Germany 39.6 35.3 10.0 0.92
## 15 Netherlands 37.7 31.5 8.41 1.25
## 16 Switzerland 35.4 11.9 7.81 1.68
## 17 Canada 31.2 0 9.26 1.35
## 18 Japan 31 1.92 10.6 1.11
## 19 France 28.2 8.67 9.84 0.95
## 20 USA 24.5 0 11.4 1
?uniondensity
Here’s the correlation of the two variables.
%>%
uniondensity summarize(cor = cor(concen, size))
## # A tibble: 1 x 1
## cor
## <dbl>
## 1 -0.922
This means any regression model is going to struggle to find the partial effect of either one for evaluating this hypothesis. No matter, Bayes is going to provide a path forward through the use of prior beliefs about the very paltry nature of the data. First, let’s specify the regression model we’ll be using. Note, only, there is a control variable for an index of left-wing governments over time in these 20 countries.
<- bf(union ~ left + size + concen) union_form
Let’s see what parameters in the model would need prior beliefs slapped on them.
get_prior(union_form, data=uniondensity)
## prior class coef group resp dpar nlpar bound
## (flat) b
## (flat) b concen
## (flat) b left
## (flat) b size
## student_t(3, 55.1, 25.4) Intercept
## student_t(3, 0, 25.4) sigma
## source
## default
## (vectorized)
## (vectorized)
## (vectorized)
## default
## default
Now, let’s create two sets of priors. One is “Team Logged Labor Force Size” an the other is “Team Industrial Concentration.” Please read the underlying article/texts for the justification for these particular values.
<- c(set_prior("normal(3,1.5)", class = "b", coef= "left"),
labor_priors # team labor force size
set_prior("normal(-5,2.5)", class = "b", coef="size"),
# doesn't think industrial concentration has anything to do with it
set_prior("normal(0,10^6)", class="b", coef="concen"),
set_prior("normal(0,10^6)", class="Intercept"))
<- c(set_prior("normal(3,1.5)", class = "b", coef= "left"),
industry_priors # doesn't think labor force size has anything to do with it
set_prior("normal(0,10^6)", class = "b", coef="size"),
# team industrial concentration
set_prior("normal(10,5)", class="b", coef="concen"),
set_prior("normal(0,10^6)", class="Intercept"))
Now, let’s run two models comparing what these priors beliefs do to the nature of our data, which have so few observations to bear on this debate.
<- list()
union_mods
$"labor" <- brm(union_form,
union_modsdata = uniondensity,
seed = 8675309,
sample_prior = TRUE,
refresh = 0,
prior = labor_priors)
## Compiling Stan program...
## Start sampling
## Running MCMC with 4 chains, at most 32 in parallel...
##
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
## Chain 4 finished in 0.1 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.3 seconds.
$"industry" <- brm(union_form,
union_modsdata = uniondensity,
seed = 8675309,
sample_prior = TRUE,
refresh = 0,
prior = industry_priors)
## Compiling Stan program...
## Start sampling
## Running MCMC with 4 chains, at most 32 in parallel...
##
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
## Chain 4 finished in 0.1 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.3 seconds.
Let’s compare the results.
summary(union_mods[[1]])
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: union ~ left + size + concen
## Data: uniondensity (Number of observations: 20)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 82.70 33.42 16.79 148.54 1.00 2140 2618
## left 0.28 0.08 0.12 0.43 1.00 2760 2610
## size -5.45 2.10 -9.49 -1.28 1.00 2259 2675
## concen 4.76 12.91 -20.65 30.70 1.00 1968 2584
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 10.84 1.97 7.77 15.44 1.00 2874 2814
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
summary(union_mods[[2]])
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: union ~ left + size + concen
## Data: uniondensity (Number of observations: 20)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 70.76 20.19 31.73 111.14 1.00 3430 3051
## left 0.27 0.08 0.12 0.43 1.00 3573 2814
## size -4.79 1.83 -8.45 -1.27 1.00 3655 3155
## concen 9.43 4.74 0.27 18.75 1.00 4006 3584
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 10.86 2.00 7.77 15.41 1.00 3557 2589
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The takeaway, as it was in Western and Jackman (1994) and in the replication on my blog: if you were on “Team Logged Labor Force Size”, the data observed suggest the effect is there whether you built in strong prior beliefs it should be there OR you built in ignorance priors and were agnostic to the effect of that variable. If you were on “Team Industrial Concentration”, the effect is there only if you built in strong prior beliefs that it should be there. Thus, given the nature of these analyses, you’re probably going to give the day to “Team Logged Labor Force Size” and tell “Team Industrial Concentration” that you’d have to build in strong prior beliefs about the effect to result in a posterior distribution of results consistent with the prior beliefs.
Dedicated Bayesian researchers will say that this is just “a” thing you can do with Bayes that you really can’t do with standard models. They’ll probably also discourage you from thinking too much about this or that it’s all you can do with Bayes, but I think it as a nice place to start thinking about Bayes.
I think I lack the space and time to really say how else you can use Bayesian analysis. Other things you can do with Bayes include:
{brms}
/{stan}
do multinomial models and hurdle models better than any other frequentist package I’ve seen.One thing I like about Bayesian models, the more I use them, is how verbose they are about potential problems. Standard modeling approaches will either leave you to figure out a potential problem on your own (e.g. weak data in the above case of unemployment and immigration sentiment), or fail completely (e.g. separation problems in a logistic regression). Bayesian models in the Stan programming language are, by contrast, much more verbose about potential issues and things for you to explore. Here are the things you should do to assess your model output.
First, plot the chains/samplers. If everything has gone well, you should see something like this. Informally—super informally—you want to see “fuzzy caterpillars.” If you see “fuzzy caterpillars” with no weird spikes, the sampler had no issues.
plot(immig_mods$"uempl_prior")
Next, look at the summary of the model.
summary(immig_mods$uempl_prior)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: immigsent ~ agea + female + eduyrs + uempla + hinctnta + lrscale
## Data: ESS9GB (Number of observations: 1454)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 11.73 1.06 9.64 13.75 1.00 6897 3188
## agea -0.00 0.01 -0.02 0.02 1.00 6332 3559
## female -0.25 0.34 -0.92 0.43 1.00 7188 3004
## eduyrs 0.49 0.05 0.39 0.58 1.00 6779 3071
## uempla -1.60 0.86 -3.25 0.11 1.00 7375 2926
## hinctnta 0.33 0.06 0.22 0.45 1.00 7123 3593
## lrscale -0.58 0.09 -0.75 -0.41 1.00 6887 3046
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 6.39 0.12 6.15 6.63 1.00 6337 3069
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Take inventory of the following things: Rhat
, Bulk_ESS
, and Tail_ESS
. Rhat
is the R-hat convergence diagnostic that compares between- and within-chain estimates for the parameters of interest. If things have not mixed well, Rhat
will be larger than 1. If there is a problem, believe me, {stan}
will tell you.
Bulk_ESS
is the estimated bulk effective sample size using rank normalized draws. It’s useful for measuring sampling efficiency Tail_ESS
is the estimated tail effective sample size. It’s useful for measuring the sampling efficiency of the tails of the distribution, which, well, are at the tails and are things you want to sample well even as they are the tails. You want both to be at least 100 per chain you estimate. If they are, it suggests the posterior estimates are reliable. If you fail to get this from your model, trust me, Stan will let you know.
Stan is really good at pointing at problems in your data and model—and, from my experience, frustratingly good. As it points to problems, it also points to potential solutions. I’ll only reiterate a few things you may encounter, which are mostly going to be applicable to more complicated models than your standard linear models (and what have you).
If your chains have not mixed well or your effective sample size is really low, Stan think you should use stronger priors on your data. You will typically encounter this in cases where you’re electing to not think too much about your data (speaking from experience) and you’re hoping the programming language does that for you. For simple models, that works well enough. For more complicated models that have things like random intercepts, slopes, and built-in covariance structures, you should think a little bit more about your data.
You may get a warning about “divergent transitions.” From experience, this means you’re probably using a very complicated model and the data for it have some features that are hard to explore. You have one of two options here for this. You can explore model reparameterization, which I almost never do, or you can increase the adapt_delta
as a control function. In {brms}
, something like this should work in the brm()
wrapper: control = list(adapt_delta = .9)
. Of note: I believe the default is .8, I kick it up to .9 if I have a problem and then keep increasing it if I still have this problem. Should you fix this problem, you’ll address the concern of bias in the sampler. However, sampling time does go up.
You may also get a warning that says something to the effect of “maximum treedepth exceeded.” This is more an efficiency concern. I think of this as an easy problem to fix. Add control = list(max_treedepth = 15)
to your brm()
wrapper. The default maximum treedepth is 10, btw.