Uncertainty and exploration.

In order to discover the most rewarding actions, agents must collect information about their environment, potentially foregoing reward. The optimal solution to this “explore–exploit” dilemma is often computationally challenging, but principled algorithmic approximations exist. These approximations utilize uncertainty about action values in different ways. Some random exploration algorithms scale the level of choice stochasticity with the level of uncertainty. Other directed exploration algorithms add a “bonus” to action values with high uncertainty. Random exploration algorithms are sensitive to total uncertainty across actions, whereas directed exploration algorithms are sensitive to relative uncertainty. This article reports a multiarmed bandit experiment in which total and relative uncertainty were orthogonally manipulated. We found that humans employ both exploration strategies, and that these strategies are independently controlled by different uncertainty computations. (PsycINFO Database Record (c) 2019 APA, all rights reserved)