On a given trial, a ProDAtt or ExDAtt model agent encounters an example vector of features. (1) To recognize the latent context, the agent uses a Bayesian surprise threshold. Bayesian surprise is a well-described transformation of the posterior probability 59 . If no state is less surprising than a threshold, the agent uses the vector to create a new state. If more than one state is below the threshold, the agent includes those states in the context. (2) The agent then calculates the mutual information of each feature. By comparing the entropy of a feature within each state to its entropy across states, the mutual information identifies feature dimensions that maximally discriminate between states in a context. Attention weights (from the mutual information) are modulated by the integrated reward history, such that shifts in reward statistics increase overall attention. (3) Attention weights are used to scale the feature values of each state. (4) The agent then uses attention-scaled features in state estimation, recalculating the surprise metric. If no state is below the second surprise threshold, the agent creates a new state. Otherwise, it selects the least surprising. Each state learns a set of values for the available actions in the task. (5) Once a state has been selected, that state’s values are used to choose an action. (6) The agent updates the state representation and the action value. It also tracks the reward history.