Cross-validation performance for various numbers of states (n s ), for macaques (left) and mice (right). Circles indicate the maximum CV R 2 and the shaded region extends until the 5th percentile over folds and repetitions (n folds = 5, n rep = 5) for each n s . For both species, increasing the number of states improves model performance to a plateau at an R 2 ≈ 0.8. Lasso is a regularized Linear Regression (i.e., a MSLR with 1 internal state). The arrows indicate the number of states we selected, based on the maximum difference of the CV performance curve (see Fig. S10 ). Insets show a box plot with the mean of the average model performance, and model performance for individual sessions as dots, for held out data at the selected number of states; dashed horizontal lines indicate the 99 t h percentile of the surrogate performances (see Methods). Note that the shuffled R 2 is negative because only uncorrelated predictors are expected to be centred at 0, and due to finite sampling effects, there is always a non-zero correlation between the shuffling and the ground-truth. Furthermore, as we are dealing with skewed distributions (see Fig. S1 ), the null tendency is not captured by the mean, as assumed by the default R 2 .