We trained networks in contexts K acc (blue) and K rew (red) to have the same levels of discrimination accuracy (top, Bayesian paired t -test P MCMC > 0.51 for all pairwise combinations) and investigated the amount of reward loss according to the reward contingencies in K rew (bottom). As predicted by the normative model, we found that reward loss in K rew is greater when the network was trained in K acc relative to K rew . Next, we investigated whether freezing the K acc network information bottleneck layers after training (purple) would allow this network to reach optimal reward loss when retrained using the reward contingencies in K rew . We found that irrespective of the degree of complexity of the downstream network, it was not possible to reduce the levels of reward loss if the information bottleneck encoding was fixed to maximize accuracy, reaching levels matching the network trained from scratch in K rew (Bayesian paired t -test P MCMC = 0.86). However, when the originally trained K acc network was allowed to learn to minimize reward loss according to the K rew reward contingencies without freezing any network weights, it could reach optimal levels of reward loss reduction (light orange) (Bayesian paired t -test P MCMC < 0.001). Critically, in this case the encoding scheme changed from infomax to fitness-maximizing as predicted by the normative model. The points represent the means and the error bars represent 1 s.d. across neural network simulations. *** P MCMC < 0.001 (Bayesian paired t -tests); NS, not significant.