Open… access code peer review data

**Affiliations:** 1) Max Planck Institute for
Evolutionary Anthropology, Leipzig, Germany, 2) University of California
Los Angeles, USA, 3) University of California Santa Barbara, USA, 4)
Arizona State University, Tempe, AZ USA. *Corresponding author: dieter_lukas@eva.mpg.de

**This is one of three post-study manuscript of the
preregistration that was pre-study peer reviewed and received an In
Principle Recommendation on 26 Mar 2019 by:**

Aurélie Coulon (2019) Can context changes improve behavioral
flexibility? Towards a better understanding of species adaptability to
environmental changes. *Peer Community in Ecology*, 100019. 10.24072/pci.ecology.100019.
Reviewers: Maxime Dahirel and Andrea Griffin

**Preregistration:** html,
pdf,
rmd

**Post-study manuscript** (submitted to PCI Ecology for
post-study peer review on 3 Jan 2022): html,
preprint here

Behavioral flexibility, adapting behavior to changing situations, is hypothesized to be related to adapting to new environments and geographic range expansions. However, flexibility is rarely directly tested in a way that allows insight into how flexibility works. Research on great-tailed grackles, a bird species that has rapidly expanded their range into North America over the past 140 years, shows that grackle flexibility is manipulatable using colored tube reversal learning and that flexibility is generalizable across contexts multi-access box). Here, we use these grackle results to conduct a set of posthoc analyses using a model that breaks down performance on the reversal learning task into different components. We show that the rate of learning to be attracted to an option (phi) is a stronger predictor of reversal performance than the rate of deviating from learned attractions that were rewarded (lambda). This result was supported in simulations and in the data from the grackles: learning rates in the manipulated grackles doubled by the end of the manipulation compared to control grackles, while the rate of deviation slightly decreased. Grackles with intermediate rates of deviation in their last reversal, independently of whether they had gone through the serial reversal manipulation, solved fewer loci on the plastic and wooden multi-access boxes, and those with intermediate learning rates in their last reversal were faster to attempt a new locus on both multi-access boxes. These findings provide additional insights into how grackles changed their behavior when conditions changed. Their ability to rapidly change their learned associations validates that the manipulation had an effect on the cognitive ability we think of as flexibility.

The field of comparative cognition is strongly suspected to be in a
replicability crisis, which calls into question the validity of the
conclusions produced by this research (Brecht et
al., 2021; Farrar, Boeckle, et al., 2020; Farrar, Altschul, et al.,
2020; Farrar et al., 2021; Lambert et al., 2022; Tecwyn, 2021).
The lack of replicability in experimental design, analyses, and results
is, in part, because of the lack of clear theoretical frameworks (Frankenhuis et al., 2022), the resulting heavy
reliance on measuring operationalized variables that are assumed to
represent broad concepts, as well as small sample sizes (Farrar, Boeckle, et al., 2020). One solution is
to start from *mechanistic models* informed by a theoretical
framework that can represent and make predictions about how individuals
behave in a given task, rather than just relying on *statistical
models* that simply describe the observed data (McElreath, 2020). Statistical models cannot
infer what leads to the differences in behavior, whereas mechanistic
models offer the opportunity to infer the underlying processes (McElreath, 2020).

Here, we apply a mechanistic model to a commonly studied trait in animal cognition: behavioral flexibility. Recent work provides clearer conceptualizations of behavioral flexibility that allow us to apply such a mechanistic model. The theoretical framework argues that the critical element of behavioral flexibility is that individuals change their behavior when circumstances change (Mikhalevich et al., 2017), with freedom from instinctual constraints (Lea et al., 2020). These theoretical models point out that behavioral flexibility appears to contain two internal learning processes: the suppression of a previous behavioral choice and the simultaneous adoption of a new behavioral choice. Based on this framework, Blaisdell et al. (2021) showed how reversal learning experiments, where individuals have to choose between two options until they learn to prefer the rewarded option and then the reward is moved to the other option and they reverse their preference, reflect these learning processes. Blaisdell et al. (2021) built a mechanistic model by adapting Bayesian reinforcement learning models to infer the potential cognitive processes underlying behavioral flexibility.

As their name implies, Bayesian reinforcement learning models (Doya, 2007) assume that individuals will gain from learning which of the options leads to the reward. This learning is assumed to occur through reinforcement because individuals repeatedly experience that an option is either rewarded or not. The approach is represented as Bayesian because individuals continuously update their knowledge about the reward with each choice (Deffner et al., 2020). At their core, these models contain two individual-specific parameters that we aim to estimate from reversal performance: how quickly individuals update their attraction to an option based on the reward they received during their most recent choice relative to the rewards they received when choosing this option previously (their learning rate, termed “phi” \(\phi\)), and whether individuals already act on small differences in their attraction or whether they continue to explore the less attractive option (the deviation rate, termed “lambda” \(\lambda\)). Applied to the serial reversal learning setup, where an individual’s preferences are reversed multiple times, the model assumes that, at the beginning of the experiment, individuals have equally low attractions to both options. Depending on which option they choose first, they either experience the reward or not. Experiencing the reward will potentially increase their attraction to this option: if \(\phi\) is zero, their attraction remains unchanged; if \(\phi\) is one, their attraction is completely dominated by the reward they just gained. In environments that are predictable for short periods of time, similar to the rewarded option during a single reversal in our experiment, individuals are likely to gain more rewards if they update their information based on their latest experience. In situations where rewards change frequently or novel options become available often, individuals are expected to deviate from their learned attractions to continue to explore, while in more stable environments individuals benefit from large \(\lambda\) values to exploit the associations they formed (Cohen et al., 2007). While performance in the reversal learning task has sometimes been decomposed between the initial association learning and the reversal learning phase (e.g. Federspiel et al., 2017), the reinforcement learning model does not make such a distinction. However, it does predict a difference between phases because individuals’ internal states, in particular their attraction toward the different options, are expected to continuously change throughout the experiment. We also expect individuals to “learn to learn” over subsequent reversals (Neftci & Averbeck, 2019), changing their learning and deviation rate over repeated reversals. The parameters of the serial reversal model can also capture broader concepts that have previously been used to describe variation in reversal learning performance, such as “proactive interference” (Morand-Ferron et al., 2022) as the tendency to continue to choose the previously rewarded option which would occur if individuals do not update their attractions quickly.

We applied this model to our great-tailed grackle (*Quiscalus
mexicanus*, hereafter grackle) research on behavioral flexibility,
which we measured as reversal learning of a color preference using two
differently colored tubes (one light gray and one
dark gray C. Logan et al., 2022). In one population, we conducted
a flexibility manipulation using serial reversal learning - reversing
individuals until their reversal speeds were consistently fast (at or
less than 50 trials in two consecutive reversals). We randomly assigned
individuals to a manipulated group who received serial reversals, or to
a control group who received one reversal and then a similar amount of
experience in making choices between two yellow tubes that both
contained rewards (C. Logan et al., 2022).
After the manipulation, grackles were given a flexibility and
innovativeness test using one or two different multi-access boxes to
determine whether improving flexibility in reversal learning also
improved flexibility (the latency to attempt to solve a new locus) and
innovativeness (the number of loci solved) in a different context (the
multi-access boxes). We found that we were able to manipulate reversal
learning performance (flexibility) and this improved flexibility and
problem solving in a new context (multi-access boxes) (C. Logan et al., 2022). However, we were left
with some lingering questions: what specifically did we manipulate about
flexibility? And how might the cognitive changes induced by the
manipulation transfer to influence performance in a new context? These
questions are the focus of the current article.

How are the two parameters \(\phi\) or \(\lambda\) linked to individual differences in reversal learning behavior in simulations? Can we reliably estimate \(\phi\) or \(\lambda\) based on the performance of individuals in the reversal learning task?

Prediction 1: We predicted that the Bayesian reinforcement learning model can reliably infer these two components based on the choices individuals make, which we tested by assigning individuals \(\phi\) and \(\lambda\) values, simulating their choices based on these, and back-estimating \(\phi\) and \(\lambda\) from the simulated choice data.

Prediction 2: We predicted that both \(\phi\) and \(\lambda\) influence the performance of individuals in a reversal learning task, with higher \(\phi\) (faster learning rate) and lower \(\lambda\) (less exploration) values leading to individuals more quickly reaching the passing criterion after a reversal in the color of the rewarded option.Which of the two parameters \(\phi\) or \(\lambda\) explain more of the variation in the reversal performance of the tested grackles, and which changed more across the serial reversals?

Prediction 3: We predicted that whichever of the two parameters, \(\phi\) or \(\lambda\), explains more of the variation in the first reversal performance is also the parameter that shows more change after the manipulation. However, in the serial reversals, birds need to be able to quickly learn the new reward location and also be ready to explore the other option. Accordingly, birds might end up with one of two solutions: they might adopt a strategy of weighting recent information more heavily while also showing low exploration, or they might show high exploration while being slow at updating their attractions.Are \(\phi\) or \(\lambda\), the two components of flexibility in reversal learning, associated with performance on the multi-access boxes across control and manipulated birds?

Prediction 4: We predicted that birds that are more flexible, presumably those who have a high \(\phi\) (faster learning rate), have shorter latencies to attempt a new locus and solve more loci on the two multi-access boxes. Given that birds might use different strategies to be flexible (see prediction 3), we also explore whether the relationship between \(\phi\) or \(\lambda\) and the performance on the multi-access boxes is non-linear.

We used the version of the Bayesian model that was developed by Blaisdell et al. (2021) and modified by Logan CJ et al. (2020) (see their Analysis Plan > “Flexibility analysis” for model specifications and validation). This model uses data from every trial of reversal learning (rather than only using the total number of trials to pass criterion) and represents behavioral flexibility using two parameters: the learning rate of attraction to either option (\(\phi\)) and the rate of deviating from learned attractions (\(\lambda\)). The model repeatedly estimates the series of choices each bird made, based on two equations

Equation 1 (attraction and \(\phi\)): \(A_{j,i,t+1}\)=(1−\(\phi_j\))\(A_{j,i,t}\)+\(\phi_j\) \(\pi_{j,i,t}\)

Equation 1 tells us how attractions A of individual j to the two different options (i=1,2) change from one trial to the next (time t+1) as a function of previous attractions \(A_{j,i,t}\) (how preferable option i is to the bird j at time t) and recently experienced payoffs \(\pi\) (i.e., 1 when they received a reward in a given trial, 0 when not). The (bird-specific) parameter \(\phi_j\) describes the weight of recent experience. The higher the value of \(\phi_j\), the faster the bird updates their attraction. Attraction scores thus reflect the accumulated learning history up to this point. At the beginning of the experiment, we assume that individuals have the same low attraction to both options (\(A_{j,1}\) = \(A_{j,2}\) = 0.1).

Equation 2 (choice and \(\lambda\)): \(P(j,i)_{t+1}\)=\(\displaystyle \frac{exp(\lambda_j A_{j,i,t})}{\sum_{i = 1}^{2} exp(\lambda_j A_{j,i,t})}\)

Equation 2 expresses the probability P that an individual j chooses option i in the next trial, t+1, based on the attractions. The parameter \(\lambda_j\) represents the rate of deviating from learned attractions of an individual. It controls how sensitive choices are to differences in attraction scores. As \(\lambda_j\) gets larger, choices become more deterministic and individuals consistently choose the option with the higher attraction even if attractions are very similar, as \(\lambda_j\) gets smaller, choices become more exploratory (random choice independent of the attractions if \(\lambda_j\)=0).

We implemented the Bayesian reinforcement learning model in the statistical language Stan (Team et al., 2019), calling the model and analyzing its output in R [current version 4.1.2; R Core Team (2017)]. The values for \(\phi\) and \(\lambda\) for each individual are estimated as the mean from 2000 samples from the posterior.

We ran the Bayesian model on simulated data to first understand
whether we could recover the \(\phi\)
and \(\lambda\) values assigned to each
individual from the choices individuals made based on their phis and
lambdas in the initial and first reversal learning phases; and second to
see whether inter-individual variation in \(\phi\) or in \(\lambda\) contributed more to variation in
their performance. The settings for the simulations were based on the
previous analysis of data from grackles in a different population (Santa
Barbara, Blaisdell et al. (2021)). We
re-analyzed data we had simulated for power analyses to estimate sample
sizes for population comparisons (Logan CJ et
al., 2020). In brief, we simulated 20 individuals each from 32
different populations (640 individuals). The \(\phi\) and \(\lambda\) values for each individual were
drawn from a distribution representing that population, with different
mean \(\phi\) (8 different means) and
mean \(\lambda\) (4 different values)
for each population (32 populations as the combination of each \(\phi\) and lambda). Based on their \(\phi\) and \(\lambda\) value, each individual was
simulated to pass first through the initial association learning phase
and, after they reached criterion, a reversal learning phase. Each
choice each individual made was simulated consecutively, updating their
internal attraction to the two options based on their \(\phi\) values and setting their next choice
based on their \(\lambda\) weighing of
their attractions. We first attempted to recover \(\phi\) and \(\lambda\) for different subsets of the data
(initial association learning and reversal learning separately or
combined). Next, we determined how the \(\phi\) and \(\lambda\) values that were assigned to the
individuals influenced their performance in the reversal learning trial,
building a regression model to determine which of the two parameters had
a more direct influence on the number of trials individuals needed to
reach criterion:

number of trials to reverse ~ normal(mu, sigma)

mu <- a + b * \(\phi\) + c * \(\lambda\)

The model was also estimated in stan, using functions from the package
‘rethinking’ (McElreath, 2020) to build
the model.

```
################################################################################################ Load
################################################################################################ previously
################################################################################################ simulated
################################################################################################ data
################################################################################################ from
################################################################################################ xpop
# There are two separate sets of simulations, with initial
# attractions at 0.1 and eight different phi and four different
# lambda combinations
simulatedreversaldata_attractionscores_1 <- read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/gxpopbehaviorhabitat_SimulatedReversalData_Grackles_PhiLambda_Attraction02_Aug2021.csv"),
header = T, sep = ",", stringsAsFactors = F)
simulatedreversaldata_attractionscores_2 <- read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/gxpopbehaviorhabitat_SimulatedReversalData_Grackles_PhiLambda_Attraction04_Aug2021.csv"),
header = T, sep = ",", stringsAsFactors = F)
# In both sets of simulations, populations with different phi and
# lambda values were counted from 1-16; for the second set we change
# this to 17-32
simulatedreversaldata_attractionscores_2$Site <- simulatedreversaldata_attractionscores_2$Site +
16
# In both simulations, individuals were counted from 1-320; for the
# second set we change the ids to start at 321
simulatedreversaldata_attractionscores_2$Bird_ID <- simulatedreversaldata_attractionscores_2$Bird_ID +
320
# We combine the two data sets for the further analyses
library(dplyr)
simulatedreversaldata_attractionscores <- bind_rows(simulatedreversaldata_attractionscores_1,
simulatedreversaldata_attractionscores_2)
################################################################################################
# In the simulations, trials were counted continuously for each bird.
# We now want to change this so that it restarts counting trials from
# 1 upward once a bird switches to reversal.
for (birds in 1:length(unique(simulatedreversaldata_attractionscores$Bird_ID))) {
currentbird <- unique(simulatedreversaldata_attractionscores$Bird_ID)[birds]
maximuminitial <- max(simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID ==
currentbird & simulatedreversaldata_attractionscores$Reversal ==
"initial", ]$Trial)
simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID ==
currentbird & simulatedreversaldata_attractionscores$Reversal ==
"reversal", ]$Trial <- simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID ==
currentbird & simulatedreversaldata_attractionscores$Reversal ==
"reversal", ]$Trial - maximuminitial
}
# We need to adjust the coding during the reversal learning so that
# 'correct' now matches whether it is correct or not.
simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Choice ==
0, ]$Choice <- 2
# To use the model to estimate the phi and lambda parameters, we
# first need to change the column names to match these to the
# specifications in the model: change Bird_ID to id; change Reversal
# to Choice, change CorrectChoice to Correct, change Site to Expid
colnames(simulatedreversaldata_attractionscores) <- c("counter", "id",
"Session", "Trial", "Reversal", "Choice", "Correct", "Phi_mean", "Lambda_mean",
"Site", "Phi_sd", "Lambda_sd", "ThisBirdsPhi", "ThisBirdsLambda", "Attraction1",
"Attraction2")
# There are several simulated individuals who never reached the
# criterion during the initial learning phase. We need to remove
# these from the dataset
birdswithreversal <- as.data.frame(simulatedreversaldata_attractionscores %>%
group_by(id) %>%
summarise(experiments = length(unique(Reversal))))
birdswithreversal <- birdswithreversal[birdswithreversal$experiments ==
2, ]
simulatedreversaldata_attractionscores <- simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$id %in%
birdswithreversal$id, ]
# Next, we need to change the ids of the birds to be continuous again
# so the STAN model will include them all
simulatedreversaldata_attractionscores$id <- as.integer(as.factor(simulatedreversaldata_attractionscores$id))
# We first focus only on the performance in the reversal trials
simulatedreversaldata_attractionscores_reversalphase <- simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Reversal ==
"reversal", ]
# Let's start with 30 individuals for comparison
firstreversal_simulated <- simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$id %in%
c(20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 300,
320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560,
580, 600, 620), ]
firstreversal_simulated$id <- as.numeric(as.factor(firstreversal_simulated$id))
# We can now extract the relevant data from the first reversal for
# the STAN model to estimate phi and lambda at the beginning
datfirstsimulated <- as.list(firstreversal_simulated)
datfirstsimulated$N <- nrow(firstreversal_simulated)
datfirstsimulated$N_id <- length(unique(firstreversal_simulated$id))
# Next, we also look at the estimation of the phi and lambda values
# based on their performance in the initial association learning
# phase
# We first focus only on the performance in the reversal trials
simulatedreversaldata_attractionscores_learningphase <- simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Reversal ==
"initial", ]
# Let's start with 30 individuals for comparison
initiallearning_simulated <- simulatedreversaldata_attractionscores_learningphase[simulatedreversaldata_attractionscores_learningphase$id %in%
c(20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 300,
320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560,
580, 600, 620), ]
initiallearning_simulated$id <- as.numeric(as.factor(initiallearning_simulated$id))
# We can now extract the relevant data from the first reversal for
# the STAN model to estimate phi and lambda at the beginning
datinitialsimulated <- as.list(initiallearning_simulated)
datinitialsimulated$N <- nrow(initiallearning_simulated)
datinitialsimulated$N_id <- length(unique(initiallearning_simulated$id))
# The STAN model is set up to have the inital attraction for each
# option set to 0.1, and that individuals only learn the reward of
# the option they chose in a given trial.
reinforcement_model_nonzeroattraction <- "
data{
int N;
int N_id;
int id[N];
int Trial[N];
int Choice[N];
int Correct[N];
}
parameters{
real logit_phi;
real log_L;
// Varying effects clustered on individual
matrix[2,N_id] z_ID;
vector<lower=0>[2] sigma_ID; //SD of parameters among individuals
cholesky_factor_corr[2] Rho_ID;
}
transformed parameters{
matrix[N_id,2] v_ID; // varying effects on stuff
v_ID = ( diag_pre_multiply( sigma_ID , Rho_ID ) * z_ID )';
}
model{
matrix[N_id,2] A; // attraction matrix
logit_phi ~ normal(0,1);
log_L ~ normal(0,1);
// varying effects
to_vector(z_ID) ~ normal(0,1);
sigma_ID ~ exponential(1);
Rho_ID ~ lkj_corr_cholesky(4);
// initialize attraction scores
for ( i in 1:N_id ) {
A[i,1] = 0.1; A[i,2] = 0.1';
}
// loop over Choices
for ( i in 1:N ) {
vector[2] pay;
vector[2] p;
real L;
real phi;
// first, what is log-prob of observed choice
L = exp(log_L + v_ID[id[i],1]);
p = softmax(L*A[id[i],1:2]' );
Choice[i] ~ categorical( p );
// second, update attractions conditional on observed choice
phi = inv_logit(logit_phi + v_ID[id[i],2]);
pay[1:2] = rep_vector(0,2);
pay[ Choice[i] ] = Correct[i];
A[ id[i] , Choice[i] ] = ( (1-phi)*(A[ id[i] , Choice[i] ]) + phi*pay[Choice[i]])';
}//i
}
"
# This is the modified, simplified model that appears to be more
# accurate
reinforcement_model_nonzeroattraction_alternativepriors <- "
data{
int N;
int N_id;
int id[N];
int Trial[N];
int Choice[N];
int Correct[N];
}
parameters{
real logit_phi;
real log_L;
// Varying effects clustered on individual
matrix[N_id,2] v_ID;
}
model{
matrix[N_id,2] A; // attraction matrix
logit_phi ~ normal(0,1);
log_L ~ normal(0,1);
// varying effects
to_vector(v_ID) ~ normal(0,1);
// initialize attraction scores
for ( i in 1:N_id ) {
A[i,1] = 0.1; A[i,2] = 0.1;
}
// loop over Choices
for ( i in 1:N ) {
vector[2] pay;
vector[2] p;
real L;
real phi;
// first, what is log-prob of observed choice
L = exp(log_L + v_ID[id[i],1]);
p = softmax(L*A[id[i],1:2]' );
Choice[i] ~ categorical( p );
// second, update attractions conditional on observed choice
phi = inv_logit(logit_phi + v_ID[id[i],2]);
pay[1:2] = rep_vector(0,2);
pay[ Choice[i] ] = Correct[i];
A[ id[i] , Choice[i] ] = ( (1-phi)*(A[ id[i] , Choice[i] ]) + phi*pay[Choice[i]]);
}//i
}
"
# We run this model for the first reversal
m_firstsimulated <- stan(model_code = reinforcement_model_nonzeroattraction,
data = datfirstsimulated, iter = 5000, cores = 4, chains = 4, control = list(adapt_delta = 0.9,
max_treedepth = 12))
sfirstsimulated <- extract.samples(m_firstsimulated)
firstreversal_simulatedlambda <- sapply(1:datfirstsimulated$N_id, function(x) exp(mean(sfirstsimulated$log_L) +
mean(sfirstsimulated$v_ID[, x, 1])))
firstreversal_simulatedphi <- sapply(1:datfirstsimulated$N_id, function(x) inv_logit(mean(sfirstsimulated$logit_phi) +
mean(sfirstsimulated$v_ID[, x, 2])))
# alternative using cmdstan
library(cmdstanr)
currentlocation <- getwd()
cmdstanlocation <- cmdstan_path()
setwd(cmdstanlocation)
# access the output file created by the model running the
# reinforcement model
write(reinforcement_model_nonzeroattraction_alternativepriors, file = "myowntrial.stan")
file <- file.path(cmdstan_path(), "myowntrial.stan")
mod <- cmdstan_model(file)
options(mc.cores = 4)
datfirstsimulated$Reversal <- as.numeric(as.factor(datfirstsimulated$Reversal))
# RUN the model
fit <- mod$sample(data = datfirstsimulated, seed = 123, chains = 4, parallel_chains = 4,
refresh = 500)
# Extract relevant variables
outcome_firstsimulated <- data.frame(fit$summary())
rownames(outcome_firstsimulated) <- outcome_firstsimulated$variable
# Show the 90% compatibility intervals for the association between
# latency to switch loci on the plastic multi-access box and lambda
# and phi, and the interaction between lambda and phi from the
# reinforcement learning model
library(posterior)
library(rethinking)
drawsarray_firstsimulated <- fit$draws()
drawsdataframe_firstsimulated <- as_draws_df(drawsarray_firstsimulated)
drawsdataframe_firstsimulated <- data.frame(drawsdataframe_firstsimulated)
firstsimulated_lambda <- sapply(1:datfirstsimulated$N_id, function(x) exp(mean(drawsdataframe_firstsimulated$log_L) +
mean(drawsdataframe_firstsimulated[, x + 3])))
firstsimulated_phi <- sapply(1:datfirstsimulated$N_id, function(x) inv_logit(mean(drawsdataframe_firstsimulated$logit_phi) +
mean(drawsdataframe_firstsimulated[, x + 33])))
# Remove the stan command line file we created for this particular
# model from your computer
fn <- "myowntrial"
file.remove(fn)
# Reset your working directory to what it was before we ran the model
setwd(currentlocation)
# And we run this model for the initial learning phase
m_initialsimulated <- stan(model_code = reinforcement_model_nonzeroattraction,
data = datinitialsimulated, iter = 5000, cores = 4, chains = 4, control = list(adapt_delta = 0.9,
max_treedepth = 12))
sinitialsimulated <- extract.samples(m_initialsimulated)
initiallearning_simulatedlambda <- sapply(1:datinitialsimulated$N_id, function(x) exp(mean(sinitialsimulated$log_L) +
mean(sinitialsimulated$v_ID[, x, 1])))
initiallearning_simulatedphi <- sapply(1:datinitialsimulated$N_id, function(x) inv_logit(mean(sinitialsimulated$logit_phi) +
mean(sinitialsimulated$v_ID[, x, 2])))
# alternative run the model for the initial learning phase with
# cmdstanr
currentlocation <- getwd()
cmdstanlocation <- cmdstan_path()
setwd(cmdstanlocation)
# access the output file created by the model running the
# reinforcement model
write(reinforcement_model_nonzeroattraction_alternativepriors, file = "myowntrial.stan")
file <- file.path(cmdstan_path(), "myowntrial.stan")
mod <- cmdstan_model(file)
options(mc.cores = 4)
datinitialsimulated$Reversal <- as.numeric(as.factor(datinitialsimulated$Reversal))
# RUN the model
fit <- mod$sample(data = datinitialsimulated, seed = 123, chains = 4, parallel_chains = 4,
refresh = 500)
# Extract relevant variables
outcome_firstsimulated <- data.frame(fit$summary())
rownames(outcome_firstsimulated) <- outcome_firstsimulated$variable
# Show the 90% compatibility intervals for the association between
# latency to switch loci on the plastic multi-access box and lambda
# and phi, and the interaction between lambda and phi from the
# reinforcement learning model
library(posterior)
library(rethinking)
drawsarray_initialsimulated <- fit$draws()
drawsdataframe_initialsimulated <- as_draws_df(drawsarray_initialsimulated)
drawsdataframe_initialsimulated <- data.frame(drawsdataframe_initialsimulated)
initialsimulated_lambda <- sapply(1:datinitialsimulated$N_id, function(x) exp(mean(drawsdataframe_initialsimulated$log_L) +
mean(drawsdataframe_initialsimulated[, x + 3])))
initialsimulated_phi <- sapply(1:datinitialsimulated$N_id, function(x) inv_logit(mean(drawsdataframe_initialsimulated$logit_phi) +
mean(drawsdataframe_initialsimulated[, x + 33])))
# Remove the stan command line file we created for this particular
# model from your computer
fn <- "myowntrial"
file.remove(fn)
# Reset your working directory to what it was before we ran the model
setwd(currentlocation)
# We now can get back the phi and lambda values 30 individuals were
# assigned at the beginning of the simulation
simulatedphis <- unique(simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$id %in%
c(20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 300,
320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560,
580, 600, 620), ]$ThisBirdsPhi)
simulatedlambdas <- unique(simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$id %in%
c(20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 300,
320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560,
580, 600, 620), ]$ThisBirdsLambda)
# Some of the phi values estimated from the performance during the
# initial learning are estimated as higher than what the individuals
# had during the simulation.
plot(firstsimulated_phi ~ simulatedphis, xlim = c(0, 0.08), ylim = c(0,
0.08))
abline(a = 0, b = 1)
# In contrast, some of the lambda values estimated from the
# performance during the initial learning are estimated as lower than
# what the individuals had during the simulation
plot(firstsimulated_lambda ~ simulatedlambdas, xlim = c(0, 10), ylim = c(0,
10))
abline(a = 0, b = 1)
# The issue likely arises because the STAN model assumes that the phi
# and lambda values are correlated - whereas in the simulations they
# were allowed to vary independently from each other
plot(firstsimulated_phi ~ firstsimulated_lambda)
plot(simulatedphis ~ simulatedlambdas)
# In the simulation, we set some high lambda values and low phi
# values - because of the assumed correlation, the STAN model
# estimates higher phi values than simulated in cases when lambda was
# high, and lower lambda values than simulated when phi was low
plot(firstsimulated_phi[simulatedlambdas < 5] ~ simulatedphis[simulatedlambdas <
5], xlim = c(0, 0.08), ylim = c(0, 0.08))
points(firstsimulated_phi[simulatedlambdas > 5] ~ simulatedphis[simulatedlambdas >
5], xlim = c(0, 0.08), ylim = c(0, 0.08), col = "red")
abline(a = 0, b = 1)
# We can see how skewed the attraction scores were in the simulation
# at the beginning of the first reversal learning trial and use these
# values as priors in the STAN model (instead of the current setup
# where both attraction scores are set to be 0.1)
median(simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$Trial ==
1, ]$Attraction1/simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$Trial ==
1, ]$Attraction2)
median(simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$Trial ==
1, ]$Attraction1)
median(simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$Trial ==
1, ]$Attraction2)
# Based on this we want to set it to 0.1 and 0.7
# Try different priors to reduce the correlation between estimated
# phis and lambdas
reinforcement_model_nonzeroattraction_alternativepriors <- "
data{
int N;
int N_id;
int id[N];
int Trial[N];
int Choice[N];
int Correct[N];
}
parameters{
real logit_phi;
real log_L;
// Varying effects clustered on individual
matrix[N_id,2] v_ID;
}
model{
matrix[N_id,2] A; // attraction matrix
logit_phi ~ normal(0,1);
log_L ~ normal(0,1);
// varying effects
to_vector(v_ID) ~ normal(0,1);
// initialize attraction scores
for ( i in 1:N_id ) {
A[i,1] = 0.1; A[i,2] = 0.1;
}
// loop over Choices
for ( i in 1:N ) {
vector[2] pay;
vector[2] p;
real L;
real phi;
// first, what is log-prob of observed choice
L = exp(log_L + v_ID[id[i],1]);
p = softmax(L*A[id[i],1:2]' );
Choice[i] ~ categorical( p );
// second, update attractions conditional on observed choice
phi = inv_logit(logit_phi + v_ID[id[i],2]);
pay[1:2] = rep_vector(0,2);
pay[ Choice[i] ] = Correct[i];
A[ id[i] , Choice[i] ] = ( (1-phi)*(A[ id[i] , Choice[i] ]) + phi*pay[Choice[i]]);
}//i
}
"
m_initialsimulated_alternativepriors <- stan(model_code = reinforcement_model_nonzeroattraction_alternativepriors,
data = datinitialsimulated, iter = 5000, cores = 4, chains = 4, control = list(adapt_delta = 0.9,
max_treedepth = 12))
sinitialsimulatedalternativepriors <- extract.samples(m_initialsimulated_alternativepriors)
initiallearning_simulatedlambda_alternativepriors <- sapply(1:datinitialsimulated$N_id,
function(x) exp(mean(sinitialsimulatedalternativepriors$log_L) + mean(sinitialsimulatedalternativepriors$v_ID[,
x, 1])))
initiallearning_simulatedphi_alternativepriors <- sapply(1:datinitialsimulated$N_id,
function(x) inv_logit(mean(sinitialsimulatedalternativepriors$logit_phi) +
mean(sinitialsimulatedalternativepriors$v_ID[, x, 2])))
# Need to change the priors for the attraction scores 0.1 and 0.7
# Based on this information, we can now modify the STAN model to have
# the prior for the attraction for option set 1 (the option rewarded
# during the initial learning) to 0.7 and for option 2 set to 0.1,
# and that individuals only learn the reward of the option they chose
# in a given trial.
reinforcement_model_nonzeroattraction_skewedpriorattraction <- "
data{
int N;
int N_id;
int id[N];
int Trial[N];
int Choice[N];
int Correct[N];
}
parameters{
real logit_phi;
real log_L;
// Varying effects clustered on individual
matrix[2,N_id] z_ID;
vector<lower=0>[2] sigma_ID; //SD of parameters among individuals
cholesky_factor_corr[2] Rho_ID;
}
transformed parameters{
matrix[N_id,2] v_ID; // varying effects on stuff
v_ID = ( diag_pre_multiply( sigma_ID , Rho_ID ) * z_ID )';
}
model{
matrix[N_id,2] A; // attraction matrix
logit_phi ~ normal(0,1);
log_L ~ normal(0,1);
// varying effects
to_vector(z_ID) ~ normal(0,1);
sigma_ID ~ exponential(1);
Rho_ID ~ lkj_corr_cholesky(4);
// initialize attraction scores
for ( i in 1:N_id ) {
A[i,1] = 0.7; A[i,2] = 0.1';
}
// loop over Choices
for ( i in 1:N ) {
vector[2] pay;
vector[2] p;
real L;
real phi;
// first, what is log-prob of observed choice
L = exp(log_L + v_ID[id[i],1]);
p = softmax(L*A[id[i],1:2]' );
Choice[i] ~ categorical( p );
// second, update attractions conditional on observed choice
phi = inv_logit(logit_phi + v_ID[id[i],2]);
pay[1:2] = rep_vector(0,2);
pay[ Choice[i] ] = Correct[i];
A[ id[i] , Choice[i] ] = ( (1-phi)*(A[ id[i] , Choice[i] ]) + phi*pay[Choice[i]])';
}//i
}
"
# We run this model for the first reversal
m_firstsimulated_skewedpriorattraction <- stan(model_code = reinforcement_model_nonzeroattraction_skewedpriorattraction,
data = datfirstsimulated, iter = 5000, cores = 4, chains = 4, control = list(adapt_delta = 0.9,
max_treedepth = 12))
sfirstsimulatedskewedpriorattraction <- extract.samples(m_firstsimulated_skewedpriorattraction)
firstreversalsimulated_lambda_skewedpriorattraction <- sapply(1:datfirstsimulated$N_id,
function(x) exp(mean(sfirstsimulatedskewedpriorattraction$log_L) +
mean(sfirstsimulatedskewedpriorattraction$v_ID[, x, 1])))
firstreversalsimulated_phi_skewedpriorattraction <- sapply(1:datfirstsimulated$N_id,
function(x) inv_logit(mean(sfirstsimulatedskewedpriorattraction$logit_phi) +
mean(sfirstsimulatedskewedpriorattraction$v_ID[, x, 2])))
plot(firstreversalsimulated_phi_skewedpriorattraction ~ simulatedphis,
xlim = c(0, 0.06), ylim = c(0, 0.06))
# In these estimations based on the performance during single setups
# (either just the initial learning or the first reversal learning)
# the model always estimates that lambda and phi are correlated. This
# likely reflects equifinality - individuals can achieve the same
# performance with a range of phis and lambdas, and the model will
# slide to the middle along the line for each individual:
plot(x = "lambda", y = "phi", xlim = c(0, 10), ylim = c(0, 0.1))
# Individuals who needed a long time to learn the association will be
# in the bottom left corner
abline(a = 0.04, b = -0.01, lty = 2)
abline(a = 0.06, b = -0.01, lty = 2)
abline(a = 0.08, b = -0.01, lty = 2)
# Individuals who needed a short time to learn the association will
# be in the top right corner
abline(a = 0.1, b = -0.01, lty = 2)
abline(a = 0.12, b = -0.01, lty = 2)
abline(a = 0.14, b = -0.01, lty = 2)
points(x = 1, y = 0.03, cex = 2)
points(x = 2, y = 0.04, cex = 2)
points(x = 3, y = 0.05, cex = 2)
points(x = 4, y = 0.06, cex = 2)
points(x = 5, y = 0.07, cex = 2)
points(x = 6, y = 0.08, cex = 2)
abline(a = 0.02, b = 0.01, col = "red", lwd = 1.5)
points(initiallearning_simulatedphi ~ initiallearning_simulatedlambda,
pch = 2)
# Maybe the model can better separate the lambda and phi values when
# combining data from multiple runs - in the case of the simulations
# that means combining the data from the initial learning with the
# data of the first reversal
# Let's start with 30 individuals for comparison
initialandreversal_simulated <- simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$id %in%
c(20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 300,
320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560,
580, 600, 620), ]
initialandreversal_simulated$id <- as.numeric(as.factor(initialandreversal_simulated$id))
# We can now extract the relevant data from the first reversal for
# the STAN model to estimate phi and lambda at the beginning
datinitialandreversalsimulated <- as.list(initialandreversal_simulated)
datinitialandreversalsimulated$N <- nrow(initialandreversal_simulated)
datinitialandreversalsimulated$N_id <- length(unique(initialandreversal_simulated$id))
m_initialandreversal <- stan(model_code = reinforcement_model_nonzeroattraction,
data = datinitialandreversalsimulated, iter = 5000, cores = 4, chains = 4,
control = list(adapt_delta = 0.9, max_treedepth = 12))
sinitialandreversal <- extract.samples(m_initialandreversal)
initialandreversal_lambda <- sapply(1:datinitialandreversalsimulated$N_id,
function(x) exp(mean(sinitialandreversal$log_L) + mean(sinitialandreversal$v_ID[,
x, 1])))
initialandreversal_phi <- sapply(1:datinitialandreversalsimulated$N_id,
function(x) inv_logit(mean(sinitialandreversal$logit_phi) + mean(sinitialandreversal$v_ID[,
x, 2])))
plot(initialandreversal_phi ~ simulatedphis)
abline(a = 0, b = 1)
plot(initialandreversal_lambda ~ simulatedlambdas)
abline(a = 0, b = 1)
plot(initialandreversal_phi ~ initialandreversal_lambda)
# setup with cmdstanr
currentlocation <- getwd()
cmdstanlocation <- cmdstan_path()
setwd(cmdstanlocation)
datinitialandreversalsimulated$Reversal <- as.numeric(as.factor(datinitialandreversalsimulated$Reversal))
# access the output file created by the model running the
# reinforcement model /
# reinforcement_model_nonzeroattraction_alternativepriors
write(reinforcement_model_nonzeroattraction_alternativepriors, file = "myowntrial.stan")
file <- file.path(cmdstan_path(), "myowntrial.stan")
mod <- cmdstan_model(file)
options(mc.cores = 4)
# RUN the model
fit <- mod$sample(data = datinitialandreversalsimulated, seed = 123, chains = 4,
parallel_chains = 4, refresh = 500)
# Show the 90% compatibility intervals for the association between
# latency to switch loci on the plastic multi-access box and lambda
# and phi, and the interaction between lambda and phi from the
# reinforcement learning model
drawsarray <- fit$draws()
drawsdataframe <- as_draws_df(drawsarray)
drawsdataframe <- data.frame(drawsdataframe)
initialandreversal_lambda <- sapply(1:datinitialandreversalsimulated$N_id,
function(x) exp(mean(drawsdataframe$log_L) + mean(drawsdataframe[,
x + 3])))
initialandreversal_phi <- sapply(1:datinitialandreversalsimulated$N_id,
function(x) inv_logit(mean(drawsdataframe$logit_phi) + mean(drawsdataframe[33])))
lambda_i = exp(log_L + v_ID[, 1])
phi_i = inv_logit(logit_phi + v_ID[, 2])
# Remove the stan command line file we created for this particular
# model from your computer
fn <- "myowntrial"
file.remove(fn)
# Reset your working directory to what it was before we ran the model
setwd(currentlocation)
simulatedphi <- initialandreversal_simulated %>%
group_by(id) %>%
summarise(mean(Phi_mean))
simulatedphi <- as.data.frame(simulatedphi)
simulatedphis <- simulatedphi[, 2]
plot(firstsimulated_phi ~ simulatedphis, xlim = c(0, 0.07), ylim = c(0,
0.07), bty = "n", cex = 3, pch = 18, col = "#0072B2", ann = F)
points(initialandreversal_phi ~ simulatedphis, col = "#E69F00", pch = 16,
cex = 2)
abline(a = 0, b = 1, lty = 2)
legend(x = "topleft", legend = c(pch16 = "Initial plus Reversal", pch18 = "Single Reversal"),
pch = c(16, 18), col = c("#E69F00", "#0072B2"), box.lty = 0, cex = 1.2,
pt.cex = 1.4)
mtext("simulated phi", side = 1, at = 0.04, line = 3, cex = 1.5)
mtext("estimated phi", side = 2, at = 0.04, line = 2.5, cex = 1.5)
mtext("1:1 line", side = 1, at = 0.065, line = -18)
plot(firstsimulated_lambda ~ simulatedlambdas, xlim = c(0, 10), ylim = c(0,
10))
points(initialandreversal_lambda ~ simulatedlambdas, col = "red")
abline(a = 0, b = 1)
# OPEN QUESTIONS: Did the manipulation work? Is it easier to change
# phi or lambda to get at 50 or fewer trials?
# We might want to compare first 20 trials to last 20 trials, both
# for the simulated data, and for the observed data look at the first
# and last 20 trials for each the first and the last reversal
summarysimulateddata <- matrix(nrow = length(unique(simulatedreversaldata_attractionscores$id)),
ncol = 5)
summarysimulateddata <- as.data.frame(summarysimulateddata)
colnames(summarysimulateddata) <- c("id", "ThisBirdsPhi", "ThisBirdsLambda",
"TrialsInitial", "TrialsReversal")
summarysimulateddata$id <- unique(simulatedreversaldata_attractionscores$id)
for (i in 1:nrow(summarysimulateddata)) {
summarysimulateddata[i, ]$TrialsInitial <- max(filter(simulatedreversaldata_attractionscores,
id == unique(simulatedreversaldata_attractionscores$id)[i], Reversal ==
"initial")$Trial)
}
for (i in 1:nrow(summarysimulateddata)) {
summarysimulateddata[i, ]$TrialsReversal <- max(filter(simulatedreversaldata_attractionscores,
id == unique(simulatedreversaldata_attractionscores$id)[i], Reversal ==
"reversal")$Trial)
}
for (i in 1:nrow(summarysimulateddata)) {
summarysimulateddata[i, ]$ThisBirdsPhi <- max(filter(simulatedreversaldata_attractionscores,
id == unique(simulatedreversaldata_attractionscores$id)[i])$ThisBirdsPhi)
}
for (i in 1:nrow(summarysimulateddata)) {
summarysimulateddata[i, ]$ThisBirdsLambda <- max(filter(simulatedreversaldata_attractionscores,
id == unique(simulatedreversaldata_attractionscores$id)[i])$ThisBirdsLambda)
}
plot(summarysimulateddata$TrialsReversal ~ summarysimulateddata$ThisBirdsPhi)
plot(summarysimulateddata$TrialsReversal ~ summarysimulateddata$ThisBirdsLambda)
dat_trialsphiandlambda <- list(Trials = (summarysimulateddata$TrialsReversal),
bird = c(as.numeric(as.factor(summarysimulateddata$id))), phi = standardize(c(summarysimulateddata$ThisBirdsPhi)),
lambda = standardize(c(summarysimulateddata$ThisBirdsLambda)))
trials.phiandlambda <- ulam(alist(Trials ~ normal(mu, sigma), mu <- a +
b * phi + c * lambda, a ~ normal(70, 40), b ~ normal(0, 20), c ~ normal(0,
20), sigma ~ exponential(1)), data = dat_trialsphiandlambda, chains = 4,
cores = 4, iter = 10000)
precis(trials.phiandlambda, depth = 2)
# mean sd 5.5% 94.5% n_eff Rhat4 a 92.33 0.94 90.84 93.83 24367 1 b
# -20.62 0.94 -22.12 -19.11 25492 1 c -14.25 0.94 -15.74 -12.75 24876
# 1 sigma 23.38 0.64 22.37 24.43 24251 1
summarysimulateddata_forplotting <- matrix(ncol = 3, nrow = 2 * nrow(summarysimulateddata))
summarysimulateddata_forplotting <- as.data.frame(summarysimulateddata_forplotting)
colnames(summarysimulateddata_forplotting) <- c("TrialsReversal", "Predictor",
"Value")
summarysimulateddata_forplotting$TrialsReversal <- c(summarysimulateddata$TrialsReversal,
summarysimulateddata$TrialsReversal)
summarysimulateddata_forplotting$Predictor <- c(rep("phi", nrow(summarysimulateddata)),
rep("lambda", nrow(summarysimulateddata)))
summarysimulateddata_forplotting$Value <- c(standardize(summarysimulateddata$ThisBirdsPhi),
standardize(summarysimulateddata$ThisBirdsLambda))
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal >
181, ]$TrialsReversal <- 8
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal >
151, ]$TrialsReversal <- 7
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal >
131, ]$TrialsReversal <- 6
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal >
111, ]$TrialsReversal <- 5
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal >
91, ]$TrialsReversal <- 4
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal >
71, ]$TrialsReversal <- 3
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal >
51, ]$TrialsReversal <- 2
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal >
31, ]$TrialsReversal <- 1
summarysimulateddata_forplotting$TrialsReversal <- as.factor(summarysimulateddata_forplotting$TrialsReversal)
library(ggplot2)
ggplot(summarysimulateddata_forplotting, aes(x = TrialsReversal, y = Value,
fill = Predictor)) + geom_boxplot() + xlab("Trials simulated individuals needed in reversal") +
scale_y_continuous(name = "Standardised phi/lambda of simulated individuals") +
theme_classic() + scale_x_discrete(name = "Trials simulated individuals needed in reversal",
breaks = 1:8, labels = c("31-50", "51-70", "71-90", "91-110", "111-130",
"131-150", "151-180", "181-220")) + theme(axis.text.x = element_text(size = 14,
colour = "black", hjust = 0.5, angle = 0)) + theme(axis.title.x = element_text(size = 18,
colour = "black", face = "bold", hjust = 0.5, vjust = -0.5, angle = 0)) +
theme(axis.text.y = element_text(size = 14, colour = "black", hjust = 0.5,
angle = 0)) + theme(axis.title.y = element_text(size = 16, colour = "black",
face = "bold", hjust = 0.5, angle = 90)) + theme(legend.title = element_text(size = 13))
```

The collection of the great-tailed grackle data, as described in the main article (C. Logan et al., 2022), was based on our preregistration that received in principle acceptance at PCI Ecology (PDF version). All of the analyses of C. Logan et al. (2022) data reported here were not part of the original preregistration.

The data are available at the Knowledge Network for Biocomplexity’s data repository: https://knb.ecoinformatics.org/view/doi:10.5063/F1H41PWS.

Great-tailed grackles were caught in the wild in Tempe, Arizona, USA for individual identification (colored leg bands in unique combinations). Some individuals were brought temporarily into aviaries for testing, and then released back to the wild. Individuals first participated in the reversal learning tasks. A subset of individuals was part of the control group, where they learned the association of reward with one color before experiencing one reversal to learn that the other color is rewarded. The other subset of individuals was part of the manipulated group. These individuals went through a series of reversals until they reached the criterion of having formed an association (17 out of 20 choices correct) in less than 50 trials in two consecutive reversals.

We fit the Bayesian reinforcement learning model to the data of both the control and the manipulated birds. For the manipulated birds, we calculated \(\phi\) and \(\lambda\) separately for their performance in the beginning (initial association and first reversal) and at the end of the manipulation (final two reversals). Next, as with the simulated data, we fit a series of regression models to determine how \(\phi\) and \(\lambda\) link to the number of trials birds needed during their reversals.

```
### Code below copied from Blaisdell et al. 2021
# Using OBSERVED not simulated data
# We want to estimate lambda and phi differently. For the initial
# values, we combine the data from the first association learning
# with the first reversal.
dflex <- read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/g_flexmanip_data_reverseraw.csv"),
header = T, sep = ",", stringsAsFactors = F)
library(rstan)
library(rethinking)
library(cmdstanr)
library(posterior)
# If you have cmdstan installed, use the following:
# set_ulam_cmdstan(TRUE)
# PREPARE reversal learning data exclude yellow tube trials for
# control birds because we are only interested in reversal data
dflex <- subset(dflex, dflex$Reversal != "Control: Yellow Tube" & dflex$ID !=
"Memela")
# include only those trials where the bird made a choice (0 or 1)
dflex <- subset(dflex, dflex$CorrectChoice != -1)
# reverse number. 0=initial discrimination
dflex$Reversal <- as.integer(dflex$Reversal)
dflex$Correct <- as.integer(dflex$CorrectChoice)
dflex$Trial <- as.integer(dflex$Trial)
# exclude NAs from the CorrectChoice column
dflex <- subset(dflex, is.na(dflex$Correct) == FALSE)
# Want data ONLY from initial learning and first reversal to
# determine phi and lambda at the beginning. This is for all birds,
# including those that did not experience the reversal manipulation
# experiment
reduceddata <- matrix(ncol = ncol(dflex), nrow = 0)
reduceddata <- data.frame(reduceddata)
for (i in 1:length(unique(dflex$ID))) {
thisbird <- unique(dflex$ID)[i]
thisbirddata <- dflex[dflex$ID == thisbird, ]
thisbirdslastreversal <- thisbirddata[thisbirddata$Reversal %in% c(0,
1), ]
reduceddata <- rbind(reduceddata, thisbirdslastreversal)
}
dflex_beginning <- reduceddata
# We want to remove the birds who did not go through at least the
# first reversal trial
birdscompletedreversal <- unique(dflex_beginning[dflex_beginning$Reversal ==
1, ]$ID)
dflex_beginning <- dflex_beginning[dflex_beginning$ID %in% birdscompletedreversal,
]
length(unique(dflex_beginning$ID)) #21 birds
# Construct Choice variable
dflex_beginning$Choice <- NA
for (i in 1:nrow(dflex_beginning)) {
if (dflex_beginning$Reversal[i] %in% seq(0, max(unique(dflex_beginning$Reversal)),
by = 2)) {
if (dflex_beginning$Correct[i] == 1) {
dflex_beginning$Choice[i] <- 1
} else {
dflex_beginning$Choice[i] <- 2
}
} else {
if (dflex_beginning$Correct[i] == 1) {
dflex_beginning$Choice[i] <- 2
} else {
dflex_beginning$Choice[i] <- 1
}
}
}
dflex_beginning <- dflex_beginning[with(dflex_beginning, order(dflex_beginning$ID)),
]
colnames(dflex_beginning)[4] <- "id"
# Sort birds alphabetically
dflex_beginning <- dflex_beginning[with(dflex_beginning, order(dflex_beginning$id)),
]
birdnames <- unique(dflex_beginning$id)
# Convert bird names into numeric ids
dflex_beginning$id <- as.numeric(as.factor(dflex_beginning$id))
datinitialandfirstreversal <- as.list(dflex_beginning)
datinitialandfirstreversal$N <- nrow(dflex_beginning)
datinitialandfirstreversal$N_id <- length(unique(dflex_beginning$id))
# The STAN model is set up to have the initial attraction for each
# option set to 0.1, and that individuals only learn the reward of
# the option they chose in a given trial.
reinforcement_model_nonzeroattraction <- "
data{
int N;
int N_id;
int id[N];
int Trial[N];
int Choice[N];
int Correct[N];
}
parameters{
real logit_phi;
real log_L;
// Varying effects clustered on individual
matrix[2,N_id] z_ID;
vector<lower=0>[2] sigma_ID; //SD of parameters among individuals
cholesky_factor_corr[2] Rho_ID;
}
transformed parameters{
matrix[N_id,2] v_ID; // varying effects on stuff
v_ID = ( diag_pre_multiply( sigma_ID , Rho_ID ) * z_ID )';
}
model{
matrix[N_id,2] A; // attraction matrix
logit_phi ~ normal(0,1);
log_L ~ normal(0,1);
// varying effects
to_vector(z_ID) ~ normal(0,1);
sigma_ID ~ exponential(1);
Rho_ID ~ lkj_corr_cholesky(4);
// initialize attraction scores
for ( i in 1:N_id ) {
A[i,1] = 0.1; A[i,2] = 0.1';
}
// loop over Choices
for ( i in 1:N ) {
vector[2] pay;
vector[2] p;
real L;
real phi;
// first, what is log-prob of observed choice
L = exp(log_L + v_ID[id[i],1]);
p = softmax(L*A[id[i],1:2]' );
Choice[i] ~ categorical( p );
// second, update attractions conditional on observed choice
phi = inv_logit(logit_phi + v_ID[id[i],2]);
pay[1:2] = rep_vector(0,2);
pay[ Choice[i] ] = Correct[i];
A[ id[i] , Choice[i] ] = ( (1-phi)*(A[ id[i] , Choice[i] ]) + phi*pay[Choice[i]])';
}//i
}
"
m_initialandreversal <- stan(model_code = reinforcement_model_nonzeroattraction,
data = datinitialandfirstreversal, iter = 5000, cores = 4, chains = 4,
control = list(adapt_delta = 0.9, max_treedepth = 12))
sinitialandreversal <- extract.samples(m_initialandreversal)
initialandreversal_lambda <- sapply(1:datinitialandfirstreversal$N_id,
function(x) exp(mean(sinitialandreversal$log_L) + mean(sinitialandreversal$v_ID[,
x, 1])))
initialandreversal_phi <- sapply(1:datinitialandfirstreversal$N_id, function(x) inv_logit(mean(sinitialandreversal$logit_phi) +
mean(sinitialandreversal$v_ID[, x, 2])))
plot(initialandreversal_phi ~ initialandreversal_lambda)
# Next, for comparison, want data ONLY from last two reversal trials
# to determine phi and lambda at the end. This is for the manipulated
# birds only because the control group only went through a single
# reversal.
# Need to do the analysis for the last two reversals with the skewed
# priors for the attraction values for the manipulated birds.
# link manipulatedbirdids to birdnames
dflex_last_manipulated <- dflex[dflex$ID == "Chalupa" | dflex$ID == "Mole" |
dflex$ID == "Habanero" | dflex$ID == "Diablo" | dflex$ID == "Burrito" |
dflex$ID == "Adobo" | dflex$ID == "Chilaquile" | dflex$ID == "Pollito" |
dflex$ID == "Memela", ]
colnames(dflex_last_manipulated)[4] <- "id"
# Sort birds alphabetically
dflex_last_manipulated <- dflex_last_manipulated[with(dflex_last_manipulated,
order(dflex_last_manipulated$id)), ]
birdnames_manipulated <- unique(dflex_last_manipulated$id)
# Convert bird names into numeric ids
dflex_last_manipulated$id <- as.numeric(as.factor(dflex_last_manipulated$id))
length(unique(dflex_last_manipulated$id)) #8 birds
# Construct Choice variable
dflex_last_manipulated$Choice <- NA
for (i in 1:nrow(dflex_last_manipulated)) {
if (dflex_last_manipulated$Reversal[i] %in% seq(0, max(unique(dflex_last_manipulated$Reversal)),
by = 2)) {
if (dflex_last_manipulated$Correct[i] == 1) {
dflex_last_manipulated$Choice[i] <- 1
} else {
dflex_last_manipulated$Choice[i] <- 2
}
} else {
if (dflex_last_manipulated$Correct[i] == 1) {
dflex_last_manipulated$Choice[i] <- 2
} else {
dflex_last_manipulated$Choice[i] <- 1
}
}
}
# Want data ONLY from last two reversals to determine phi and lambda
# at the beginning. This is for all birds, including those that did
# not experience the reversal manipulation experiment
reduceddata <- matrix(ncol = ncol(dflex), nrow = 0)
reduceddata <- data.frame(reduceddata)
for (i in 1:length(unique(dflex_last_manipulated$id))) {
thisbird <- unique(dflex_last_manipulated$id)[i]
thisbirddata <- dflex_last_manipulated[dflex_last_manipulated$id ==
thisbird, ]
thisbirdslastreversal <- thisbirddata[thisbirddata$Reversal %in% c(max(thisbirddata$Reversal) -
1, max(thisbirddata$Reversal)), ]
reduceddata <- rbind(reduceddata, thisbirdslastreversal)
}
dflex_last_manipulated <- reduceddata
datlasterversalsskewed <- as.list(dflex_last_manipulated)
datlasterversalsskewed$N <- nrow(dflex_last_manipulated)
datlasterversalsskewed$N_id <- length(unique(dflex_last_manipulated$id))
# The STAN model is set up to have theattraction for the previously
# rewarded option set to 0.7 and the unrewarded option set to 0.1
# when birds start with their final reversals, and that individuals
# only learn the reward of the option they chose in a given trial.
reinforcement_model_nonzeroattraction_skewedpriorattraction <- "
data{
int N;
int N_id;
int id[N];
int Trial[N];
int Choice[N];
int Correct[N];
}
parameters{
real logit_phi;
real log_L;
// Varying effects clustered on individual
matrix[2,N_id] z_ID;
vector<lower=0>[2] sigma_ID; //SD of parameters among individuals
cholesky_factor_corr[2] Rho_ID;
}
transformed parameters{
matrix[N_id,2] v_ID; // varying effects on stuff
v_ID = ( diag_pre_multiply( sigma_ID , Rho_ID ) * z_ID )';
}
model{
matrix[N_id,2] A; // attraction matrix
logit_phi ~ normal(0,1);
log_L ~ normal(0,1);
// varying effects
to_vector(z_ID) ~ normal(0,1);
sigma_ID ~ exponential(1);
Rho_ID ~ lkj_corr_cholesky(4);
// initialize attraction scores
for ( i in 1:N_id ) {
A[i,1] = 0.7; A[i,2] = 0.1';
}
// loop over Choices
for ( i in 1:N ) {
vector[2] pay;
vector[2] p;
real L;
real phi;
// first, what is log-prob of observed choice
L = exp(log_L + v_ID[id[i],1]);
p = softmax(L*A[id[i],1:2]' );
Choice[i] ~ categorical( p );
// second, update attractions conditional on observed choice
phi = inv_logit(logit_phi + v_ID[id[i],2]);
pay[1:2] = rep_vector(0,2);
pay[ Choice[i] ] = Correct[i];
A[ id[i] , Choice[i] ] = ( (1-phi)*(A[ id[i] , Choice[i] ]) + phi*pay[Choice[i]])';
}//i
}
"
m_lastreversals_skewed <- stan(model_code = reinforcement_model_nonzeroattraction_skewedpriorattraction,
data = datlasterversalsskewed, iter = 5000, cores = 4, chains = 4,
control = list(adapt_delta = 0.9, max_treedepth = 12))
slastreversals_skewed <- extract.samples(m_lastreversals_skewed)
lastreversals_lambda_skewed <- sapply(1:datlasterversalsskewed$N_id, function(x) exp(mean(slastreversals_skewed$log_L) +
mean(slastreversals_skewed$v_ID[, x, 1])))
lastreversals_phi_skewed <- sapply(1:datlasterversalsskewed$N_id, function(x) inv_logit(mean(slastreversals_skewed$logit_phi) +
mean(slastreversals_skewed$v_ID[, x, 2])))
# We can now combine the information on the estimated phis and
# lambdas for the initial performance of all birds and the last
# performance of the manipulated birds into a single table
eachbirdslearningparameters <- matrix(nrow = datinitialandfirstreversal$N_id,
ncol = 8)
eachbirdslearningparameters <- data.frame(eachbirdslearningparameters)
colnames(eachbirdslearningparameters) <- c("Bird", "Number", "beginningphi",
"beginninglambda", "manipulatedphi", "manipulatedlambda", &quo
```