Heatmap from Scientific Research

Citation
Illustration of the GP-UCB model based on observations at the eighth trial (showing normalized reward). We use GP regression as a psychological model of reward generalization 38 , making Bayesian estimates about the expected rewards and uncertainty for each option. The free parameter lambda (equation ( 2 )) controls the extent that past observations generalize to new options. The expected rewards m ( x ) and uncertainty estimates v ( x ) are combined using UCB sampling (equation ( 3 )) to produce a valuation for each option. The exploration bonus beta governs the value of exploring uncertain options relative to exploiting high reward expectations. Lastly, UCB values are entered into a softmax function (equation ( 4 )) to make probabilistic predictions about where the participant will search next. The decision temperature parameter tau governs the amount of random (undirected) exploration.
Related Plots
Browse by Category
Popular Collections
Related Tags
Discover More Scientific Plots
Browse thousands of high-quality scientific visualizations from open-access research