A comparison of the RNN variants to behaviour expected from different cognitive strategies. We trained six task-optimized RNN variants using different combinations of constraints. The plot shows the negative log-likelihood (NLL) of choices made by each RNN variant conditioned on the four cognitive strategies (O, optimal; H, hierarchical; P, postdictive; C, counterfactual). The legend below the abscissa shows the subsets of constraints (attention bottleneck, rationality and counterfactual noise) that were included (check mark) or excluded (X mark) for each RNN variant. Each violin plot’s whiskers mark the maximum, median and minimum of the NLLs obtained from 50 model initializations. Asterisks mark the most likely strategy to each RNN variant ( P < 0.001, one-tailed t -test against the second-lowest strategies, n = 50). The yellow band highlights the RNN with all three constraints present, which is the variant that best matched the participants’ counterfactual strategy. We refer to this variant as RNN best .