Understanding the development of reward learning through the lens of meta-learning

Nussenbaum, Kate; Hartley, Catherine A.

doi:10.1038/s44159-024-00304-1

Perspective
Published: 18 April 2024

Understanding the development of reward learning through the lens of meta-learning

Nature Reviews Psychology (2024)Cite this article

481 Accesses
41 Altmetric
Metrics details

Subjects

Abstract

Determining how environments shape how people learn is central to understanding individual differences in goal-directed behaviour. Studies of the effects of early-life adversity on reward learning have revealed that the environments that infants and children experience exert lasting influences on reward-guided behaviour. However, the varied findings from this research are difficult to reconcile under a unified computational account. Studies of adaptive reinforcement learning have demonstrated that learning algorithms and parameters dynamically adapt to support reward-guided behaviour in varied contexts, but this body of research has largely focused on learning that proceeds within the short timeframes of experimental tasks. In this Perspective, we argue that, to understand how the structure of experienced environments shapes reward learning across development, computational accounts of the effects of environmental statistics on reinforcement learning need to be extended to encompass learning across multiple nested timescales of experience. To this end, we consider the development of reward learning through the lens of meta-learning models, in particular meta-reinforcement learning. This computational formalization can inspire new hypotheses and methods for empirical research to understand how features of experienced environments give rise to individual differences in learning and adaptive behaviour across development.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Learning to reinforcement learn over multiple timescales.**

**Fig. 2: Development through the lens of meta-reinforcement learning.**

The rational use of causal inference to guide reinforcement learning strengthens with age

Article Open access 27 October 2020

Developmental changes in exploration resemble stochastic optimization

Article Open access 17 August 2023

Beyond dichotomies in reinforcement learning

Article 01 September 2020

References

Scott, L. S., Pascalis, O. & Nelson, C. A. A domain-general theory of the development of perceptual discrimination. Curr. Dir. Psychol. Sci. 16, 197–201 (2007).
Article PubMed PubMed Central Google Scholar
Scott, L. S. & Monesson, A. The origin of biases in face perception. Psychol. Sci. 20, 676–680 (2009).
Article PubMed Google Scholar
Werker, J. F. & Tees, R. C. Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant. Behav. Dev. 7, 49–63 (1984).
Article Google Scholar
Hospodar, C. M., Hoch, J. E., Lee, D. K., Shrout, P. E. & Adolph, K. E. Practice and proficiency: factors that facilitate infant walking skill. Dev. Psychobiol. 63, e22187 (2021).
Article PubMed PubMed Central Google Scholar
Saccani, R., Valentini, N. C., Pereira, K. R., Müller, A. B. & Gabbard, C. Associations of biological factors and affordances in the home with infant motor development. Pediatr. Int. 55, 197–203 (2013).
Article PubMed Google Scholar
Sheridan, M. A., Peverill, M., Finn, A. S. & McLaughlin, K. A. Dimensions of childhood adversity have distinct associations with neural systems underlying executive functioning. Dev. Psychopathol. 29, 1777–1794 (2017).
Article PubMed PubMed Central Google Scholar
Amso, D., Salhi, C. & Badre, D. The relationship between cognitive enrichment and cognitive control: a systematic investigation of environmental influences on development through socioeconomic status. Dev. Psychobiol. 61, 159–178 (2019).
Article PubMed Google Scholar
Harlow, H. F. The formation of learning sets. Psychol. Rev. 56, 51–65 (1949).
Article PubMed Google Scholar
Nussenbaum, K., Velez, J. A., Washington, B. T., Hamling, H. E. & Hartley, C. A. Flexibility in valenced reinforcement learning computations across development. Child Dev. 93, 1601–1615 (2022).
Article PubMed PubMed Central Google Scholar
Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Article PubMed Google Scholar
Gagne, C., Zika, O., Dayan, P. & Bishop, S. J. Impaired adaptation of learning to contingency volatility in internalizing psychopathology. eLife 9, e61387 (2020).
Article PubMed PubMed Central Google Scholar
Browning, M., Behrens, T. E., Jocham, G., O’Reilly, J. X. & Bishop, S. J. Anxious individuals have difficulty learning the causal statistics of aversive environments. Nat. Neurosci. 18, 590–596 (2015).
Article PubMed PubMed Central Google Scholar
Hanson, J. L., Williams, A. V., Bangasser, D. A. & Peña, C. J. Impact of early life stress on reward circuit function and regulation. Front. Psychiatry 12, 744690 (2021).
Article PubMed PubMed Central Google Scholar
Galván, A. Neural plasticity of development and learning. Hum. Brain Mapp. 31, 879–890 (2010).
Article PubMed PubMed Central Google Scholar
Wilkinson, M. P., Slaney, C. L., Mellor, J. R. & Robinson, E. S. J. Investigation of reward learning and feedback sensitivity in non-clinical participants with a history of early life stress. PLoS One 16, e0260444 (2021).
Article PubMed PubMed Central Google Scholar
Birn, R. M., Roeber, B. J. & Pollak, S. D. Early childhood stress exposure, reward pathways, and adult decision making. Proc. Natl Acad. Sci. USA 114, 13549–13554 (2017).
Article PubMed PubMed Central Google Scholar
Dorfman, H. M. & Gershman, S. J. Controllability governs the balance between Pavlovian and instrumental action selection. Nat. Commun. 10, 5826 (2019).
Article PubMed PubMed Central Google Scholar
Botvinick, M. et al. Reinforcement learning, fast and slow. Trends Cogn. Sci. 23, 408–422 (2019).
Article PubMed Google Scholar
Li, Z., Zhou, F., Chen, F. & Li, H. Meta-SGD: learning to learn quickly for few-shot learning. Preprint at arXiv https://doi.org/10.48550/arXiv.1707.09835 (2017).
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868 (2018).
Article PubMed Google Scholar
Wang, J. X. et al. Learning to reinforcement learn. Preprint at arXiv https://doi.org/10.48550/arXiv.1611.05763 (2016).
Duan, Y. et al. RL2: fast reinforcement learning via slow reinforcement learning. Preprint at arXiv https://doi.org/10.48550/arXiv.1611.02779 (2016).
Weng, L. Meta Reinforcement Learning https://lilianweng.github.io/posts/2019-06-23-meta-rl/ (2019).
Langdon, A. et al. Meta-learning, social cognition and consciousness in brains and machines. Neural Netw. 145, 80–89 (2022).
Article PubMed Google Scholar
Binz, M. et al. Meta-learned models of cognition. Behav. Brain Sci. https://doi.org/10.1017/S0140525X23003266 (2023).
Schaul, T. & Schmidhuber, J. Metalearning. Scholarpedia J. 5, 4650 (2010).
Article Google Scholar
Wang, J. X. Meta-learning in natural and artificial intelligence. Curr. Opin. Behav. Sci. 38, 90–95 (2021).
Article Google Scholar
Lansdell, B. J. & Kording, K. P. Towards learning-to-learn. Curr. Opin. Behav. Sci. 29, 45–50 (2019).
Article Google Scholar
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 70, 1126–1135 (PMLR, 2017).
Doya, K. Metalearning and neuromodulation. Neural Netw. 15, 495–506 (2002).
Article PubMed Google Scholar
Griffiths, T. L. et al. Doing more with less: meta-reasoning and meta-learning in humans and machines. Curr. Opin. Behav. Sci. 29, 24–30 (2019).
Article Google Scholar
Behrens, T. E. J. et al. What is a cognitive map? Organizing knowledge for flexible behavior. Neuron 100, 490–509 (2018).
Article PubMed Google Scholar
Crowley, K. & Siegler, R. S. Explanation and generalization in young children’s strategy learning. Child Dev. 70, 304–316 (1999).
Article PubMed Google Scholar
Bielaczyc, K., Pirolli, P. L. & Brown, A. L. Training in self-explanation and self-regulation strategies: investigating the effects of knowledge acquisition activities on problem solving. Cogn. Instr. 13, 221–252 (1995).
Article Google Scholar
Bakst, L. & McGuire, J. T. Experience-driven recalibration of learning from surprising events. Cognition 232, 105343 (2023).
Article PubMed Google Scholar
Dubey, R., Grant, E., Luo, M., Narasimhan, K. & Griffiths, T. Connecting context-specific adaptation in humans to meta-learning. Preprint at https://doi.org/10.48550/arXiv.2011.13782 (2020).
Verbeke, P. & Verguts, T. Humans adaptively select different computational strategies in different learning environments. Preprint at bioRxiv https://doi.org/10.1101/2023.01.27.525944 (2023).
Werchan, D. M., Collins, A. G. E., Frank, M. J. & Amso, D. 8-month-old infants spontaneously learn and generalize hierarchical rules. Psychol. Sci. 26, 805–815 (2015).
Article PubMed Google Scholar
Mark, S., Moran, R., Parr, T., Kennerley, S. W. & Behrens, T. E. J. Transferring structural knowledge across cognitive maps in humans and models. Nat. Commun. 11, 4783 (2020).
Article PubMed PubMed Central Google Scholar
Brown, A., Kane, M. J. & Echols, C. H. Young children’s mental models determine analogical transfer across problems with a common goal structure. Cogn. Dev. 1, 103–121 (1986).
Article Google Scholar
Nussenbaum, K. et al. Causal information‐seeking strategies change across childhood and adolescence. Cognit. Sci. 44, e12888 (2020).
Article Google Scholar
Kuhn, D. & Phelps, E. The development of problem-solving strategies. Adv. Child Dev. Behav. 17, 1–44 (1982).
Article PubMed Google Scholar
Rescorla, R. A. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. Classical Conditioning Curr. Res. Theory 2, 64–69 (1972).
Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning. An Introduction (MIT Press, 1998).
Kool, W., Gershman, S. J. & Cushman, F. A. Cost-benefit arbitration between multiple reinforcement-learning systems. Psychol. Sci. 28, 1321–1333 (2017).
Article PubMed Google Scholar
Ruel, A., Devine, S. & Eppinger, B. Resource-rational approach to meta-control problems across the lifespan. Wiley Interdiscip. Rev. Cogn. Sci. 12, e1556 (2021).
Article PubMed Google Scholar
Raab, H. A., Goldway, N., Foord, C. & Hartley, C. A. Adolescents flexibly adapt action selection based on controllability inferences. Learn. Mem. 31, a053901 (2024).
Article PubMed Google Scholar
Salter Ainsworth, M. D. The Bowlby-Ainsworth attachment theory. Behav. Brain Sci. 1, 436–438 (1978).
Article Google Scholar
Diederen, K. M. J. & Schultz, W. Scaling prediction errors to reward variability benefits error-driven learning in humans. J. Neurophysiol. 114, 1628–1640 (2015).
Article PubMed PubMed Central Google Scholar
Payzan-LeNestour, E. & Bossaerts, P. Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings. PLoS Comput. Biol. 7, e1001048 (2011).
Article PubMed PubMed Central Google Scholar
Piray, P. & Daw, N. D. A model for learning based on the joint estimation of stochasticity and volatility. Nat. Commun. 12, 6587 (2021).
Article PubMed PubMed Central Google Scholar
Dayan, P., Kakade, S. & Montague, P. R. Learning and selective attention. Nat. Neurosci. 3, 1218–1223 (2000).
Article PubMed Google Scholar
Kalman, R. E. A new approach to linear filtering and prediction problems. J. Basic Eng. 82, 35–45 (1960).
Article Google Scholar
Soltani, A. & Izquierdo, A. Adaptive learning under expected and unexpected uncertainty. Nat. Rev. Neurosci. 20, 635–644 (2019).
Article PubMed PubMed Central Google Scholar
Nassar, M. R., Wilson, R. C., Heasly, B. & Gold, J. I. An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. J. Neurosci. 30, 12366–12378 (2010).
Article PubMed PubMed Central Google Scholar
McGuire, J. T., Nassar, M. R., Gold, J. I. & Kable, J. W. Functionally dissociable influences on learning rate in a dynamic environment. Neuron 84, 870–881 (2014).
Article PubMed PubMed Central Google Scholar
Costa, V. D., Tran, V. L., Turchi, J. & Averbeck, B. B. Reversal learning and dopamine: a Bayesian perspective. J. Neurosci. 35, 2407–2416 (2015).
Article PubMed PubMed Central Google Scholar
Mathys, C., Daunizeau, J., Friston, K. J. & Stephan, K. E. A Bayesian foundation for individual learning under uncertainty. Front. Hum. Neurosci. 5, 39 (2011).
Article PubMed PubMed Central Google Scholar
Piray, P. & Daw, N. D. A simple model for learning in volatile environments. PLoS Comput. Biol. 16, e1007963 (2020).
Article PubMed PubMed Central Google Scholar
Farashahi, S. et al. Metaplasticity as a neural substrate for adaptive learning and choice under uncertainty. Neuron 94, 401–414.e6 (2017).
Article PubMed PubMed Central Google Scholar
Nassar, M. R. et al. Rational regulation of learning dynamics by pupil-linked arousal systems. Nat. Neurosci. 15, 1040–1046 (2012).
Article PubMed PubMed Central Google Scholar
Cazé, R. D. & van der Meer, M. A. A. Adaptive properties of differential learning rates for positive and negative outcomes. Biol. Cybern. 107, 711–719 (2013).
Article PubMed Google Scholar
Louie, K. & Glimcher, P. W. Efficient coding and the neural representation of value. Ann. N. Y. Acad. Sci. 1251, 13–32 (2012).
Article PubMed Google Scholar
Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).
Article PubMed PubMed Central Google Scholar
Gershman, S. J. Do learning rates adapt to the distribution of rewards? Psychonomic Bull. Rev. 22, 1320–1327 (2015).
Article Google Scholar
Daw, N. D., Kakade, S. & Dayan, P. Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–616 (2002).
Article PubMed Google Scholar
Frank, M. J., Seeberger, L. C. & O’Reilly, R. C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943 (2004).
Article PubMed Google Scholar
Lefebvre, G., Lebreton, M., Meyniel, F., Bourgeois-Gironde, S. & Palminteri, S. Behavioural and neural characterization of optimistic reinforcement learning. Nat. Hum. Behav. 1, 0067 (2017).
Article Google Scholar
Niv, Y., Edlund, J. A., Dayan, P. & O’Doherty, J. P. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. J. Neurosci. 32, 551–562 (2012).
Article PubMed PubMed Central Google Scholar
Rosenbaum, G., Grassie, H. & Hartley, C. A. Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory. eLife 11, e64620 (2022).
Article PubMed PubMed Central Google Scholar
Chambon, V. et al. Information about action outcomes differentially affects learning from self-determined versus imposed choices. Nat. Hum. Behav. 4, 1067–1079 (2020).
Article PubMed Google Scholar
Palminteri, S., Lefebvre, G., Kilford, E. J. & Blakemore, S.-J. Confirmation bias in human reinforcement learning: evidence from counterfactual feedback processing. PLoS Comput. Biol. 13, e1005684 (2017).
Article PubMed PubMed Central Google Scholar
Habicht, J., Bowler, A., Moses-Payne, M. E. & Hauser, T. U. Children are full of optimism, but those rose-tinted glasses are fading — reduced learning from negative outcomes drives hyperoptimism in children. J. Exp. Psychol. Gen. 151, 1843–1853 (2022).
Article PubMed PubMed Central Google Scholar
Villano, W. J. et al. Individual differences in naturalistic learning link negative emotionality to the development of anxiety. Sci. Adv. 9, eadd2976 (2023).
Article PubMed PubMed Central Google Scholar
Cools, R. et al. Striatal dopamine predicts outcome-specific reversal learning and its sensitivity to dopaminergic drug administration. J. Neurosci. 29, 1538–1543 (2009).
Article PubMed PubMed Central Google Scholar
Michely, J., Eldar, E., Erdman, A., Martin, I. M. & Dolan, R. J. Serotonin modulates asymmetric learning from reward and punishment in healthy human volunteers. Commun. Biol. 5, 812 (2022).
Article PubMed PubMed Central Google Scholar
Cools, R., Robinson, O. J. & Sahakian, B. Acute tryptophan depletion in healthy volunteers enhances punishment prediction but does not affect reward prediction. Neuropsychopharmacology 33, 2291–2299 (2008).
Article PubMed Google Scholar
Tanaka, S. C. et al. Serotonin affects association of aversive outcomes to past actions. J. Neurosci. 29, 15669–15674 (2009).
Article PubMed PubMed Central Google Scholar
den Ouden, H. E. M. et al. Dissociable effects of dopamine and serotonin on reversal learning. Neuron 80, 1090–1100 (2013).
Article Google Scholar
Moscarello, J. M. & Hartley, C. A. Agency and the calibration of motivated behavior. Trends Cogn. Sci. 21, 725–735 (2017).
Article PubMed Google Scholar
Ligneul, R. Prediction or causation? Towards a redefinition of task controllability. Trends Cogn. Sci. 25, 431–433 (2021).
Article PubMed Google Scholar
Raab, H. A., Foord, C., Ligneul, R. & Hartley, C. A. Developmental shifts in computations used to detect environmental controllability. PLoS Comput. Biol. 18, e1010120 (2022).
Article PubMed PubMed Central Google Scholar
Ligneul, R., Mainen, Z. F., Ly, V. & Cools, R. Stress-sensitive inference of task controllability. Nat. Hum. Behav. 6, 812–822 (2022).
Article PubMed Google Scholar
Csifcsák, G., Melsæter, E. & Mittner, M. Intermittent absence of control during reinforcement learning interferes with Pavlovian bias in action selection. J. Cogn. Neurosci. 32, 646–663 (2020).
Article PubMed Google Scholar
Dorfman, H. M., Bhui, R., Hughes, B. L. & Gershman, S. J. Causal inference about good and bad outcomes. Psychol. Sci. 30, 516–525 (2019).
Article PubMed PubMed Central Google Scholar
Cohen, A. O., Nussenbaum, K., Dorfman, H. M., Gershman, S. J. & Hartley, C. A. The rational use of causal inference to guide reinforcement learning strengthens with age. NPJ Sci. Learn. 5, 16 (2020).
Article PubMed PubMed Central Google Scholar
Pulcu, E. & Browning, M. Affective bias as a rational response to the statistics of rewards and punishments. eLife 6, e27879 (2017).
Article PubMed PubMed Central Google Scholar
Dorfman, H. M. et al. Causal inference gates corticostriatal learning. J. Neurosci. 41, 6892–6904 (2021).
Article PubMed PubMed Central Google Scholar
O’Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).
Article PubMed Google Scholar
Amat, J. et al. Medial prefrontal cortex determines how stressor controllability affects behavior and dorsal raphe nucleus. Nat. Neurosci. 8, 365–371 (2005).
Article PubMed Google Scholar
Gershman, S. J., Guitart-Masip, M. & Cavanagh, J. F. Neural signatures of arbitration between Pavlovian and instrumental action selection. PLoS Comput. Biol. 17, e1008553 (2021).
Article PubMed PubMed Central Google Scholar
Palminteri, S. & Lebreton, M. The computational roots of positivity and confirmation biases in reinforcement learning. Trends Cogn. Sci. 26, 607–621 (2022).
Article PubMed Google Scholar
Langer, E. J. The illusion of control. J. Pers. Soc. Psychol. 32, 311–328 (1975).
Article Google Scholar
Lefebvre, G., Summerfield, C. & Bogacz, R. A normative account of confirmation bias during reinforcement learning. Neural Comput. 34, 307–337 (2022).
Article PubMed PubMed Central Google Scholar
Huys, Q. J. M. & Dayan, P. A Bayesian formulation of behavioral control. Cognition 113, 314–328 (2009).
Article PubMed Google Scholar
Schubert, J. A., Jagadish, A. K., Binz, M. & Schulz, E. A rational analysis of the optimism bias using meta-reinforcement learning. In 2023 Conference on Cognitive Computational Neuroscience 557–559 (2023).
Greenough, W. T., Black, J. E. & Wallace, C. S. in Brain Development and Cognition: A Reader 2nd ed., 186–216 (Wiley, 2008).
Knudsen, E. I. Sensitive periods in the development of the brain and behavior. J. Cogn. Neurosci. 16, 1412–1425 (2004).
Article PubMed Google Scholar
Gabard-Durnam, L. & McLaughlin, K. A. Sensitive periods in human development: charting a course for the future. Curr. Opin. Behav. Sci. 36, 120–128 (2020).
Article Google Scholar
Hensch, T. K. Critical period regulation. Annu. Rev. Neurosci. 27, 549–579 (2004).
Article PubMed Google Scholar
Takesian, A. E. & Hensch, T. K. Balancing plasticity/stability across brain development. Prog. Brain Res. 207, 3–34 (2013).
Article PubMed Google Scholar
Fawcett, T. W. & Frankenhuis, W. E. Adaptive explanations for sensitive windows in development. Front. Zool. 12, S3 (2015).
Article PubMed PubMed Central Google Scholar
Golarai, G. & Ghahremani, D. G. The development of race effects in face processing from childhood through adulthood: neural and behavioral evidence. Dev. Sci. 24, e13058 (2021).
Article PubMed Google Scholar
Kuhl, P. K. et al. Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e). Philos. Trans. R. Soc. Lond. B Biol. Sci. 363, 979–1000 (2008).
Article PubMed Google Scholar
Lin, W. C., Delevich, K. & Wilbrecht, L. A role for adaptive developmental plasticity in learning and decision making. Curr. Opin. Behav. Sci. 36, 48–54 (2020).
Article PubMed PubMed Central Google Scholar
Anzures, G. et al. Developmental origins of the other-race effect. Curr. Dir. Psychol. Sci. 22, 173–178 (2013).
Article PubMed PubMed Central Google Scholar
Kuhl, P. K., Tsao, F.-M. & Liu, H.-M. Foreign-language experience in infancy: effects of short-term exposure and social interaction on phonetic learning. Proc. Natl Acad. Sci. USA 100, 9096–9101 (2003).
Article PubMed PubMed Central Google Scholar
Best, C. T., McRoberts, G. W., LaFleur, R. & Silver-Isenstadt, J. Divergent developmental patterns for infants’ perception of two nonnative consonant contrasts. Infant. Behav. Dev. 18, 339–350 (1995).
Article Google Scholar
Kelly, D. J. et al. The other-race effect develops during infancy: evidence of perceptual narrowing. Psychol. Sci. 18, 1084–1089 (2007).
Article PubMed Google Scholar
McLaughlin, K. A., Sheridan, M. A. & Lambert, H. K. Childhood adversity and neural development: deprivation and threat as distinct dimensions of early experience. Neurosci. Biobehav. Rev. 47, 578–591 (2014).
Article PubMed PubMed Central Google Scholar
Ellis, B. J., Sheridan, M. A., Belsky, J. & McLaughlin, K. A. Why and how does early adversity influence development? Toward an integrated model of dimensions of environmental experience. Dev. Psychopathol. 34, 447–471 (2022).
Article PubMed Google Scholar
Mehta, M. A. et al. Hyporesponsive reward anticipation in the basal ganglia following severe institutional deprivation early in life. J. Cogn. Neurosci. 22, 2316–2325 (2010).
Article PubMed Google Scholar
Hanson, J. L. et al. Behavioral problems after early life stress: contributions of the hippocampus and amygdala. Biol. Psychiatry 77, 314–323 (2015).
Article PubMed Google Scholar
Dillon, D. G. et al. Childhood adversity is associated with left basal ganglia dysfunction during reward anticipation in adulthood. Biol. Psychiatry 66, 206–213 (2009).
Article PubMed PubMed Central Google Scholar
Park, A. T. et al. Early childhood stress is associated with blunted development of ventral tegmental area functional connectivity. Dev. Cogn. Neurosci. 47, 100909 (2021).
Article PubMed Google Scholar
Marusak, H. A., Hatfield, J. R. B., Thomason, M. E. & Rabinak, C. A. Reduced ventral tegmental area–hippocampal connectivity in children and adolescents exposed to early threat. Biol. Psychiatry Cognit. Neurosci. Neuroimaging 2, 130–137 (2017).
Article Google Scholar
Fareri, D. S. et al. Altered ventral striatal-medial prefrontal cortex resting-state connectivity mediates adolescent social problems after early institutional care. Dev. Psychopathol. 29, 1865–1876 (2017).
Article PubMed PubMed Central Google Scholar
Evans, G. W., Li, D. & Whipple, S. S. Cumulative risk and child development. Psychol. Bull. 139, 1342–1396 (2013).
Article PubMed Google Scholar
Ellis, B. J., Bianchi, J., Griskevicius, V. & Frankenhuis, W. E. Beyond risk and protective factors: an adaptation-based approach to resilience. Perspect. Psychol. Sci. 12, 561–587 (2017).
Article PubMed Google Scholar
Frankenhuis, W. E., Panchanathan, K. & Nettle, D. Cognition in harsh and unpredictable environments. Curr. Opin. Psychol. 7, 76–80 (2016).
Article Google Scholar
Ellwood-Lowe, M. E., Whitfield-Gabrieli, S. & Bunge, S. A. Brain network coupling associated with cognitive performance varies as a function of a child’s environment in the ABCD study. Nat. Commun. 12, 7183 (2021).
Article PubMed PubMed Central Google Scholar
Amso, D. Neighborhood poverty and brain development: adaptation or maturation, fixed or reversible? JAMA Netw. Open 3, e2024139 (2020).
Article PubMed Google Scholar
Burk, D. C. & Averbeck, B. B. Environmental uncertainty and the advantage of impulsive choice strategies. PLoS Comput. Biol. 19, e1010873 (2023).
Article PubMed PubMed Central Google Scholar
Frankenhuis, W. E. & Gopnik, A. Early adversity and the development of explore-exploit tradeoffs. Trends Cogn. Sci. 27, 616–630 (2023).
Article PubMed Google Scholar
Santarelli, S. et al. Evidence supporting the match/mismatch hypothesis of psychiatric disorders. Eur. Neuropsychopharmacol. 24, 907–918 (2014).
Article PubMed Google Scholar
Schmidt, M. V. Animal models for depression and the mismatch hypothesis of disease. Psychoneuroendocrinology 36, 330–338 (2011).
Article PubMed Google Scholar
Humphreys, K. L. et al. Exploration-exploitation strategy is dependent on early experience. Dev. Psychobiol. 57, 313–321 (2015).
Article PubMed PubMed Central Google Scholar
Harms, M. B., Shannon Bowen, K. E., Hanson, J. L. & Pollak, S. D. Instrumental learning and cognitive flexibility processes are impaired in children exposed to early life stress. Dev. Sci. 21, e12596 (2018).
Article PubMed Google Scholar
Hanson, J. L. et al. Early adversity and learning: implications for typical and atypical behavioral development. J. Child Psychol. Psychiatry 58, 770–778 (2017).
Article PubMed PubMed Central Google Scholar
Lloyd, A., McKay, R., Sebastian, C. L. & Balsters, J. H. Are adolescents more optimal decision-makers in novel environments? Examining the benefits of heightened exploration in a patch foraging paradigm. Dev. Sci. 24, e13075 (2021).
Article PubMed Google Scholar
Kamkar, N. H., Lewis, D. J., van den Bos, W. & Morton, J. B. Ventral striatal activity links adversity and reward processing in children. Dev. Cogn. Neurosci. 26, 20–27 (2017).
Article PubMed PubMed Central Google Scholar
Smith, K. E. & Pollak, S. D. Early life stress and perceived social isolation influence how children use value information to guide behavior. Child Dev. 93, 804–814 (2022).
Article PubMed Google Scholar
Gerin, M. I. et al. A neurocomputational investigation of reinforcement-based decision making as a candidate latent vulnerability mechanism in maltreated children. Dev. Psychopathol. 29, 1689–1705 (2017).
Article PubMed Google Scholar
Zador, A. M. A critique of pure learning and what artificial neural networks can learn from animal brains. Nat. Commun. 10, 3770 (2019).
Article PubMed PubMed Central Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article PubMed Google Scholar
Harhen, N. C. & Bornstein, A. M. Interval timing as a computational pathway from early life adversity to affective disorders. Top. Cogn. Sci. 16, 92–112 (2024).
Article PubMed Google Scholar
Saxe, A. M., McClelland, J. L. & Ganguli, S. A mathematical theory of semantic development in deep neural networks. Proc. Natl Acad. Sci. USA 116, 11537–11546 (2019).
Article PubMed PubMed Central Google Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Article Google Scholar
Andrychowicz, M. et al. Learning to learn by gradient descent by gradient descent. Adv. Neural Inf. Process. Syst. 29, 3988–3996 (2016).
Google Scholar
Bechtle, S. et al. Meta-learning via learned loss. In Proc. IEEE International Conference on Pattern Recognition https://doi.org/10.1109/ICPR48806.2021.9412010 (ICPR, 2021).
Sutton, R. S. Adapting bias by gradient descent: an incremental version of delta-bar-delta. AAAI 92, 171–176 (1992).
Google Scholar
Nichol, A., Achiam, J. & Schulman, J. On first-order meta-learning algorithms. Preprint at https://doi.org/10.48550/arXiv.1803.02999 (2018).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article PubMed Google Scholar
Xu, Z. et al. Meta-gradient reinforcement learning with an objective discovered online. Adv. Neural Inf. Proc. Syst. 33, 15254–15264 (2020).
Google Scholar
Ritter, S., Wang, J. X., Kurth-Nelson, Z. & Botvinick, M. Episodic control as meta-reinforcement learning. Preprint at bioRxiv https://doi.org/10.1101/360537 (2018).
Hattori, R. et al. Meta-reinforcement learning via orbitofrontal cortex. Nat. Neurosci. 26, 2182–2191 (2023).
Article PubMed PubMed Central Google Scholar
You, K., Long, M., Wang, J. & Jordan, M. I. How does learning rate decay help modern neural networks? Preprint at https://doi.org/10.48550/arXiv.1908.01878 (2019).
Frankenhuis, W. E. & Walasek, N. Modeling the evolution of sensitive periods. Dev. Cogn. Neurosci. 41, 100715 (2020).
Article PubMed Google Scholar
Xu, Z., van Hasselt, H. & Silver, D. Meta-gradient reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.1805.09801 (2018).
Zahavy, T. et al. A self-tuning actor-critic algorithm. Adv. Neural Inf. Process. Syst. 33, 20913–20924 (2020).
Google Scholar
Zheng, Z., Oh, J. & Satinder, S. On learning intrinsic rewards for policy gradient methods. Preprint at https://doi.org/10.48550/arXiv.1804.06459 (2018).
Sanders, B. & Becker-Lausen, E. The measurement of psychological maltreatment: early data on the Child Abuse and Trauma Scale. Child Abuse Negl. 19, 315–323 (1995).
Article PubMed Google Scholar
Rudolph, K. D. et al. Toward an interpersonal life-stress model of depression: the developmental context of stress generation. Dev. Psychopathol. 12, 215–234 (2000).
Article PubMed Google Scholar
Young, E. S., Frankenhuis, W. E. & Ellis, B. J. Theory and measurement of environmental unpredictability. Evol. Hum. Behav. 41, 550–556 (2020).
Article Google Scholar
Roy, D. et al. in Symbol Grounding and Beyond (eds. Vogt, P., Sugita, Y., Tuci, E. & Nehaniv, C.) 192–196 (Springer, 2006).
Sullivan, J., Mei, M., Perfors, A., Wojcik, E. & Frank, M. C. SAYCam: a large, longitudinal audiovisual dataset recorded from the infant’s perspective. Open Mind 5, 20–29 (2021).
Article PubMed PubMed Central Google Scholar
Ugarte, E. & Hastings, P. Assessing unpredictability in caregiver-child relationships: insights from theoretical and empirical perspectives. Dev. Psychopathol. https://doi.org/10.1017/S0954579423000305 (2022).
Tamis-LeMonda, C. S., Kuchirko, Y. & Song, L. Why is infant language learning facilitated by parental responsiveness? Curr. Dir. Psychol. Sci. 23, 121–126 (2014).
Article Google Scholar
Ainsworth, M. D. S., Bell, S. M. & Stayton, D. F. in The Integration of a Child into a Social World (ed. Richards, M. P. M.) 316, 99–135 (Cambridge Univ. Press, 1974).
Csikszentmihalyi, M., Larson, R. & Prescott, S. The ecology of adolescent activity and experience. J. Youth Adolesc. 6, 281–294 (1977).
Article PubMed Google Scholar
Russell, M. A. & Gajos, J. M. Annual research review: ecological momentary assessment studies in child psychology and psychiatry. J. Child Psychol. Psychiatry 61, 376–394 (2020).
Article PubMed PubMed Central Google Scholar
Heller, A. S. et al. Association between real-world experiential diversity and positive affect relates to hippocampal–striatal functional connectivity. Nat. Neurosci. 23, 800–804 (2020).
Article PubMed PubMed Central Google Scholar
Saragosa-Harris, N. M. et al. Real-world exploration increases across adolescence and relates to affect, risk taking, and social connectivity. Psychol. Sci. 33, 1664–1679 (2022).
Article PubMed Google Scholar
Bath, K., Manzano-Nieves, G. & Goodwill, H. Early life stress accelerates behavioral and neural maturation of the hippocampus in male mice. Horm. Behav. 82, 64–71 (2016).
Article PubMed PubMed Central Google Scholar
Rice, C. J., Sandman, C. A., Lenjavi, M. R. & Baram, T. Z. A novel mouse model for acute and long-lasting consequences of early life stress. Endocrinology 149, 4892–4900 (2008).
Article PubMed PubMed Central Google Scholar
Ivy, A. S., Brunson, K. L., Sandman, C. & Baram, T. Z. Dysfunctional nurturing behavior in rat dams with limited access to nesting material: a clinically relevant model for early-life stress. Neuroscience 154, 1132–1142 (2008).
Article PubMed Google Scholar
Goodkin, F. Rats learn the relationship between responding and environmental events: an expansion of the learned helplessness hypothesis. Learn. Motiv. 7, 382–393 (1976).
Article Google Scholar
Overmier, J. B., Patterson, J. & Wielkiewicz, R. M. in Coping and Health (eds Levine, S. & Ursin, H.) 1–38 (Springer, 1980).
Powell, S. B., Newman, H. A., McDonald, T. A., Bugenhagen, P. & Lewis, M. H. Development of spontaneous stereotyped behavior in deer mice: effects of early and late exposure to a more complex environment. Dev. Psychobiol. 37, 100–108 (2000).
Article PubMed Google Scholar
Marques, J. M. & Olsson, I. A. S. The effect of preweaning and postweaning housing on the behaviour of the laboratory mouse (Mus musculus). Lab. Anim. 41, 92–102 (2007).
Article PubMed Google Scholar
Ivy, A. S. et al. Hippocampal dysfunction and cognitive impairments provoked by chronic early-life stress involve excessive activation of CRH receptors. J. Neurosci. 30, 13005–13015 (2010).
Article PubMed PubMed Central Google Scholar
Moriceau, S., Shionoya, K., Jakubs, K. & Sullivan, R. M. Early-life stress disrupts attachment learning: the role of amygdala corticosterone, locus ceruleus corticotropin releasing hormone, and olfactory bulb norepinephrine. J. Neurosci. 29, 15745–15755 (2009).
Article PubMed PubMed Central Google Scholar
Hartley, C. A., Nussenbaum, K. & Cohen, A. O. Interactive development of adaptive learning and memory. Annu. Rev. Psychol. 3, 59–85 (2021).
Article Google Scholar
Zhihong Zeng, A. Survey of affect recognition methods: audio, visual, and spontaneous expressions, 2009. IEEE Trans. Pattern Anal. Mach. Intell. 31, 39–58 (2021).
Article Google Scholar
Belo, J. P. R., Azevedo, H., Ramos, J. J. G. & Romero, R. A. F. Deep Q-network for social robotics using emotional social signals. Front. Robot. AI 9, 880547 (2022).
Article PubMed PubMed Central Google Scholar
Qureshi, A. H., Nakamura, Y., Yoshikawa, Y. & Ishiguro, H. Intrinsically motivated reinforcement learning for human–robot interaction in the real-world. Neural Netw. 107, 23–33 (2018).
Article PubMed Google Scholar
Kuhn, D. A developmental model of critical thinking. Educ. Res. 28, 16–46 (1999).
Article Google Scholar
Kuhn, D. Education for Thinking (Harvard Univ. Press, 2005).
Joshi, S., Li, Y., Kalwani, R. M. & Gold, J. I. Relationships between pupil diameter and neuronal activity in the locus coeruleus, colliculi, and cingulate cortex. Neuron 89, 221–234 (2016).
Article PubMed Google Scholar
Murphy, P. R., O’Connell, R. G., O’Sullivan, M., Robertson, I. H. & Balsters, J. H. Pupil diameter covaries with BOLD activity in human locus coeruleus. Hum. Brain Mapp. 35, 4140–4154 (2014).
Article PubMed PubMed Central Google Scholar
Reimer, J. et al. Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex. Nat. Commun. 7, 13289 (2016).
Article PubMed PubMed Central Google Scholar
Bouret, S. & Sara, S. J. Network reset: a simplified overarching theory of locus coeruleus noradrenaline function. Trends Neurosci. 28, 574–582 (2005).
Article PubMed Google Scholar
Cook, J. L. et al. Catecholaminergic modulation of meta-learning. eLife 8, e51439 (2019).
Article PubMed PubMed Central Google Scholar
Newcombe, N. S. What is neoconstructivism? neoconstructivism. Child Dev. Perspect. 5, 157–160 (2011).
Article Google Scholar
Newcombe, N. S. Cognitive development: changing views of cognitive change. Wiley Interdiscip. Rev. Cogn. Sci. 4, 479–491 (2013).
Article PubMed Google Scholar
Westermann, G. et al. Neuroconstructivism. Dev. Sci. 10, 75–83 (2007).
Article PubMed Google Scholar
Karmiloff-Smith, A. Beyond Modularity: A Developmental Perspective on Cognitive Science (MIT Press, 1995).
Johnson, M. H. Functional brain development in infants: elements of an interactive specialization framework. Child Dev. 71, 75–81 (2000).
Article PubMed Google Scholar
Westermann, G., Sirois, S., Shultz, T. R. & Mareschal, D. Modeling developmental cognitive neuroscience. Trends Cogn. Sci. 10, 227–232 (2006).
Article PubMed Google Scholar
Mareschal, D. & Shultz, T. R. Generative connectionist networks and constructivist cognitive development. Cogn. Dev. 11, 571–603 (1996).
Article Google Scholar
Astle, D. E., Johnson, M. H. & Akarca, D. Toward computational neuroconstructivism: a framework for developmental systems neuroscience. Trends Cogn. Sci. 27, 726–744 (2023).
Article PubMed Google Scholar
Elman, J. L. Learning and development in neural networks: the importance of starting small. Cognition 48, 71–99 (1993).
Article PubMed Google Scholar
Munakata, Y. & McClelland, J. L. Connectionist models of development. Dev. Sci. 6, 413–429 (2003).
Article Google Scholar
Fahlman, S. E. The recurrent cascade-correlation architecture. Adv. Neural Inf. Process. Syst. 3, 190–196 (1990).
Google Scholar
Mata, R., Josef, A. K. & Hertwig, R. Propensity for risk taking across the life span and around the globe. Psychol. Sci. 27, 231–243 (2016).
Article PubMed Google Scholar
Falk, A. et al. Global evidence on economic preferences. Q. J. Econ. 133, 1645–1692 (2018).
Article Google Scholar
Kidd, C., Palmeri, H. & Aslin, R. N. Rational snacking: young children’s decision-making on the marshmallow task is moderated by beliefs about environmental reliability. Cognition 126, 109–114 (2013).
Article PubMed Google Scholar
Yanaoka, K. et al. Cultures crossing: the power of habit in delaying gratification. Psychol. Sci. 33, 1172–1181 (2022).
Article PubMed PubMed Central Google Scholar
Amir, D. et al. The developmental origins of risk and time preferences across diverse societies. J. Exp. Psychol. Gen. 149, 650–661 (2020).
Article PubMed Google Scholar
Amir, D. & Jordan, M. R. The behavioral constellation of deprivation may be best understood as risk management. Behav. Brain Sci. 40, e316 (2017).
Article PubMed Google Scholar
Abebe, T. Reconceptualising children’s agency as continuum and interdependence. Soc. Sci. 8, 81 (2019).
Article Google Scholar
Henrich, J., Heine, S. J. & Norenzayan, A. The weirdest people in the world? Behav. Brain Sci. 33, 61–83 (2010).
Article PubMed Google Scholar
Nielsen, M., Haun, D., Kärtner, J. & Legare, C. H. The persistent sampling bias in developmental psychology: a call to action. J. Exp. Child Psychol. 162, 31–38 (2017).
Article PubMed PubMed Central Google Scholar
Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: statistics, structure, and abstraction. Science 331, 1279–1285 (2011).
Article PubMed Google Scholar
Wellman, H. M. & Gelman, S. A. Cognitive development: foundational theories of core domains. Annu. Rev. Psychol. 43, 337–375 (1992).
Article PubMed Google Scholar
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).
Article PubMed Google Scholar
Nettle, D., Frankenhuis, W. E. & Rickard, I. J. The evolution of predictive adaptive responses in human life history. Proc. Biol. Sci. 280, 20131343 (2013).
PubMed PubMed Central Google Scholar
Gogtay, N. et al. Dynamic mapping of human cortical development during childhood through early adulthood. Proc. Natl Acad. Sci. USA 101, 8174–8179 (2004).
Article PubMed PubMed Central Google Scholar
Averbeck, B. B. Pruning recurrent neural networks replicates adolescent changes in working memory and reinforcement learning. Proc. Natl Acad. Sci. USA 119, e2121331119 (2022).
Article PubMed PubMed Central Google Scholar
Ajemian, R., D’Ausilio, A., Moorman, H. & Bizzi, E. A theory for how sensorimotor skills are learned and retained in noisy and nonstationary neural circuits. Proc. Natl Acad. Sci. USA 110, E5078–E5087 (2013).
Article PubMed PubMed Central Google Scholar
Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Article PubMed Google Scholar
Findling, C. & Wyart, V. Computation noise promotes cognitive resilience to adverse conditions during decision-making. Preprint at bioRxiv https://doi.org/10.1101/2020.06.10.145300 (2020).
Plappert, M. et al. Parameter space noise for exploration. Preprint at:arXiv https://doi.org/10.48550/arXiv.1706.01905 (2017).
Fortunato, M. et al. Noisy networks for exploration. In Proc. International Conference on Learning Representations (ICLR) (2018).
McIntosh, A. R. et al. The development of a noisy brain. Arch. Ital. Biol. 148, 323–337 (2010).
PubMed Google Scholar
Smith, L. B., Jayaraman, S., Clerkin, E. & Yu, C. The developing infant creates a curriculum for statistical learning. Trends Cogn. Sci. 22, 325–336 (2018).
Article PubMed PubMed Central Google Scholar
Kidd, C. & Hayden, B. Y. The psychology and neuroscience of curiosity. Neuron 88, 449–460 (2015).
Article PubMed PubMed Central Google Scholar
Gottlieb, J., Oudeyer, P.-Y., Lopes, M. & Baranes, A. Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends Cogn. Sci. 17, 585–593 (2013).
Article PubMed PubMed Central Google Scholar
Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In Proc. 26th Annual International Conference on Machine Learning 41–48 (Association for Computing Machinery, 2009).
Oudeyer, P.-Y. & Kaplan, F. What is intrinsic motivation? A typology of computational approaches. Front. Neurorobot. 1, 6 (2007).
Article PubMed PubMed Central Google Scholar
Forestier, S., Mollard, Y. & Oudeyer, P.-Y. Intrinsically motivated goal exploration processes with automatic curriculum learning. J. Mach. Learn. Res. 23, 1–41 (2022).
Google Scholar

Download references

Acknowledgements

The authors thank Bruno Averbeck, Rheza Budiono, Nathaniel Daw, Nora Harhen and Akshay Jagadish for helpful feedback on the manuscript. Preparation of this manuscript was supported by the C.V. Starr Fellowship (to K.N.).

Author information

Authors and Affiliations

Department of Psychology, New York University, New York, NY, USA
Kate Nussenbaum & Catherine A. Hartley
Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
Kate Nussenbaum
Center for Neural Science, New York University, New York, NY, USA
Catherine A. Hartley

Authors

Kate Nussenbaum
View author publications
You can also search for this author in PubMed Google Scholar
Catherine A. Hartley
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors conceptualized the article content. K.N. wrote the article draft and all authors edited the manuscript before submission.

Corresponding authors

Correspondence to Kate Nussenbaum or Catherine A. Hartley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Psychology thanks Dorsa Amir, who co-reviewed with Annya Dahmani; Jane Wang; and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nussenbaum, K., Hartley, C.A. Understanding the development of reward learning through the lens of meta-learning. Nat Rev Psychol (2024). https://doi.org/10.1038/s44159-024-00304-1

Download citation

Accepted: 21 March 2024
Published: 18 April 2024
DOI: https://doi.org/10.1038/s44159-024-00304-1