Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Backpropagation and the brain

Abstract

During learning, the brain modifies synapses to improve behaviour. In the cortex, synapses are embedded within multilayered networks, making it difficult to determine the effect of an individual synaptic modification on the behaviour of the system. The backpropagation algorithm solves this problem in deep artificial neural networks, but historically it has been viewed as biologically problematic. Nonetheless, recent developments in neuroscience and the successes of artificial neural networks have reinvigorated interest in whether backpropagation offers insights for understanding learning in the cortex. The backpropagation algorithm learns quickly by computing synaptic updates using feedback connections to deliver error signals. Although feedback connections are ubiquitous in the cortex, it is difficult to see how they could deliver the error signals required by strict formulations of backpropagation. Here we build on past and recent developments to argue that feedback connections may instead induce neural activities whose differences can be used to locally approximate these signals and hence drive effective learning in deep networks in the brain.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A spectrum of learning algorithms.
Fig. 2: Comparison of backprop-trained networks with neural responses in visual ventral cortex.
Fig. 3: Target propagation algorithms.
Fig. 4: Empirical findings suggest new ideas for how backprop-like learning might be approximated by the brain.

Similar content being viewed by others

References

  1. Hebb, D. O. The Organization of Behavior: A Neuropsychological Approach (John Wiley & Sons, 1949).

  2. Markram, H. & Sakmann, B. Action potentials propagating back into dendrites trigger changes in efficacy of single-axon synapses between layer V pyramidal neurons. Soc. Neurosci. Abstr. 21, 2007 (1995).

    Google Scholar 

  3. Markram, H., Lübke, J., Frotscher, M. & Sakmann, B. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275, 213–215 (1997).

    Article  CAS  PubMed  Google Scholar 

  4. Gerstner, W., Kempter, R., van Hemmen, J. L. & Wagner, H. A neuronal learning rule for sub-millisecond temporal coding. Nature 383, 76–78 (1996).

    Article  CAS  PubMed  Google Scholar 

  5. Bliss, T. V. & Lømo, T. Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path. J. Physiol. 232, 331–356 (1973).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Bishop, C. M. Neural Networks for Pattern Recognition (Oxford University Press, 1995).

  7. Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD Thesis, Harvard Univ. P. (1974).

  8. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning Internal Representations by Error Propagation. Technical Report (DTIC Document, 1985).

  9. LeCun, Y. in Proc. Cognitiva 85, 559–604 (CESTA, 1985).

  10. Parker, D. B. Learning-Logic: Casting the Cortex of the Human Brain in Silicon. Technical Report Tr-47 (Center for Computational Research in Economics and Management Science, MIT, 1985).

  11. Hannun, A. et al. Deep speech: scaling up end-to-end speech recognition. Preprint at http://arXiv.org/1412.5567 (2014).

  12. Krizhevsky, A., Sutskever, I. & Hinton, G. E. in Adv. Neural Inf. Process. Syst. 1097–1105 (NIPS, 2012).

  13. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vision Patt. Recog., 770–778 (2016).

  14. Vaswani, A. et al. in Adv. Neural Inf. Process. Syst. 6000–6010 (NIPS, 2017).

  15. Oord, A. v. d., Kalchbrenner, N. & Kavukcuoglu, K. Pixel recurrent neural networks. PMLR 48, 1747–1756 (2016).

    Google Scholar 

  16. Van den Oord, A. et al. Wavenet: a generative model for raw audio. Preprint at https://arXiv.org/1609.03499 (2016)

  17. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N. & Wu, Y. Exploring the limits of language modeling. Preprint at https://arXiv.org/1602.02410 (2016).

  18. Oh, J., Guo, X., Lee, H., Lewis, R. L. & Singh, S. in Adv. Neural Inf. Process. Syst. 2863–2871 (NIPS, 2015).

  19. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  CAS  PubMed  Google Scholar 

  20. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article  CAS  PubMed  Google Scholar 

  21. Silver, D. et al. Mastering the game of go without human knowledge. Nature 550, 354–359 (2017).

    Article  CAS  PubMed  Google Scholar 

  22. Moravčík, M. et al. DeepStack: expert-level artificial intelligence in heads-up no-limit poker. Science 356, 508–513 (2017).

    Article  PubMed  CAS  Google Scholar 

  23. Gilbert, C. D. & Li, W. Top-down influences on visual processing. Nat. Rev. Neurosci. 14, 350–363 (2013).

    Article  CAS  PubMed  Google Scholar 

  24. Tong, F. Primary visual cortex and visual awareness. Nat. Rev. Neurosci. 4, 219–229 (2003).

    Article  CAS  PubMed  Google Scholar 

  25. Grossberg, S. Competitive learning: from interactive activation to adaptive resonance. Cogn. Sci. 11, 23–63 (1987).

    Article  Google Scholar 

  26. Marr, D. Simple memory: a theory for archicortex. Philos. Trans. R. Soc. Lond. B Biol. Sci. 262, 23–81 (1971).

    Article  CAS  PubMed  Google Scholar 

  27. Hinton, G. E. & McClelland, J. L. in Adv. Neural Inf. Process. Syst. 358–366 (NIPS, 1988).

  28. Crick, F. The recent excitement about neural networks. Nature 337, 129–132 (1989).

    Article  CAS  PubMed  Google Scholar 

  29. Roelfsema, P. R. & Holtmaat, A. Control of synaptic plasticity in deep cortical networks. Nat. Rev. Neurosci. 19, 166–180 (2018).

    Article  CAS  PubMed  Google Scholar 

  30. Whittington, J. C. & Bogacz, R. Theories of error back-propagation in the brain. Trends Cogn. Sci. 23, 235–250 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Almeida, L. B. in Artificial Neural Networks 102–111 (ACM Digital Library, 1990).

  32. Pineda, F. J. Generalization of back-propagation to recurrent neural networks. Phys. Rev. Lett. 59, 2229–2232 (1987).

    Article  CAS  PubMed  Google Scholar 

  33. Pineda, F. J. Dynamics and architecture for neural computation. J. Complex. 4, 216–245 (1988).

    Article  Google Scholar 

  34. O’Reilly, R. C. Biologically plausible error-driven learning using local activation differences: the generalized recirculation algorithm. Neural Comput. 8, 895–938 (1996).

    Article  Google Scholar 

  35. Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A learning algorithm for Boltzmann machines. Cogn. Sci. 9, 147–169 (1985).

    Article  Google Scholar 

  36. Hinton, G. E., Dayan, P., Frey, B. J. & Neal, R. M. The ‘wake–sleep’ algorithm for unsupervised neural networks. Science 268, 1158–1161 (1995).

    Article  CAS  PubMed  Google Scholar 

  37. Movellan, J. R. in Connectionist Models: Proc. 1990 Summer School 10–17 (ScienceDirect, 1991).

  38. LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M. & Huang, F. in Predicting Structured Data Vol. 1 (eds Bakir, G., Hofman, T., Scholkopf, B., Smola, A. & Taskar, B.) 191–245 (MIT Press, 2006).

  39. Xie, X. & Seung, H. S. Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural Comput. 15, 441–454 (2003).

    Article  PubMed  Google Scholar 

  40. Bengio, Y. How auto-encoders could provide credit assignment in deep networks via target propagation. Preprint at http://arXiv.org/1407.7906 (2014).

  41. Lee, D.-H., Zhang, S., Fischer, A. & Bengio, Y. in Joint Eur. Conf. Machine Learning Knowl. Discov. Databases 498–515 (Springer, 2015).

  42. Mazzoni, P., Anderson, R. A. & Jordan, M. I. A more biologically plausible learning rule for neural networks. Proc. Natl Acad. Sci. USA 88, 4433–4437 (1991).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Seung, H. S. Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 40, 1063–1073 (2003).

    Article  CAS  PubMed  Google Scholar 

  44. Werfel, J., Xie, X. & Seung, H. S. Learning curves for stochastic gradient descent in linear feedforward networks. Neural Comput. 17, 2699–2718 (2005).

    Article  PubMed  Google Scholar 

  45. Spall, J. C. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control. 37, 332–341 (1992).

    Article  Google Scholar 

  46. Williams, R. J. in Reinforcement Learning 5–32 (Springer, 1992).

  47. Flower, B. & Jabri, M. Summed weight neuron perturbation: an O(n) improvement over weight perturbation. in Adv. Neural Inf. Process. Syst. 212–219 (NIPS, 1993).

  48. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).

  49. Deisenroth, M. P. et al. A survey on policy search for robotics. Found. Trends R. Robot. 2, 1–142 (2013).

    Google Scholar 

  50. Lillicrap, T. P. et al. Continuous control with deep reinforcement learning. Preprint at http://arXiv.org/1509.02971 (2015).

  51. Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagation errors. Nature 323, 533–536 (1986).

    Article  Google Scholar 

  52. Andersen, P., Sundberg, S., Sveen, O., Swann, J. & Wigström, H. Possible mechanisms for long-lasting potentiation of synaptic transmission in hippocampal slices from guinea-pigs. J. Physiol. 302, 463–482 (1980).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Guillery, R. & Sherman, S. M. Thalamic relay functions and their role in corticocortical communication: generalizations from the visual system. Neuron 33, 163–175 (2002).

    Article  CAS  PubMed  Google Scholar 

  54. Sherman, S. M. & Guillery, R. Distinct functions for direct and transthalamic corticocortical connections. J. Neurophysiol. 106, 1068–1077 (2011).

    Article  PubMed  Google Scholar 

  55. Viaene, A. N., Petrof, I. & Sherman, S. M. Properties of the thalamic projection from the posterior medial nucleus to primary and secondary somatosensory cortices in the mouse. Proc. Natl Acad. Sci. USA 108, 18156–18161 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Abdelghani, M., Lillicrap, T. & Tweed, D. Sensitivity derivatives for flexible sensorimotor learning. Neural Comput. 20, 2085–2111 (2008).

    Article  CAS  PubMed  Google Scholar 

  57. Lillicrap, T. P., Cownden, D., Tweed, D. B. & Akerman, C. J. Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 7, 13276 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Cadieu, C. F. et al. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput. Biol. 10, e1003963 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Yamins, D. L. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Elston, G. N. Cortex, cognition and the cell: new insights into the pyramidal neuron and prefrontal function. Cereb. Cortex 13, 1124–1138 (2003).

    Article  PubMed  Google Scholar 

  61. Oh, S. W. et al. A mesoscale connectome of the mouse brain. Nature 508, 207–214 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Jiang, X. et al. Principles of connectivity among morphologically defined cell types in adult neocortex. Science 350, aac9462 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Mountcastle, V. B. et al. Modality and topographic properties of single neurons of cat’s somatic sensory cortex. J. Neurophysiol. 20, 408–434 (1957).

    Article  CAS  PubMed  Google Scholar 

  65. Mountcastle, V. B., Motter, B., Steinmetz, M. & Sestokas, A. Common and differential effects of attentive fixation on the excitability of parietal and prestriate (V4) cortical visual neurons in the macaque monkey. J. Neurosci. 7, 2239–2255 (1987).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Douglas, R. J., Martin, K. A. & Whitteridge, D. A canonical microcircuit for neocortex. Neural Comput. 1, 480–488 (1989).

    Article  Google Scholar 

  67. Bastos, A. M. et al. Canonical microcircuits for predictive coding. Neuron 76, 695–711 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Zipser, D. & Andersen, R. A. A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature 331, 679–684 (1988).

    Article  CAS  PubMed  Google Scholar 

  69. Lillicrap, T. P. & Scott, S. H. Preference distributions of primary motor cortex neurons reflect control solutions optimized for limb biomechanics. Neuron 77, 168–179 (2013).

    Article  CAS  PubMed  Google Scholar 

  70. Khaligh-Razavi, S.-M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  71. Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).

    Article  PubMed  Google Scholar 

  72. Wenliang, L. K. & Seitz, A. R. Deep neural networks for modeling visual perceptual learning. J. Neurosci. 38, 6028–6044 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Pinto, N., Cox, D. D. & DiCarlo, J. J. Why is real-world visual object recognition hard? PLoS Comput. Biol. 4, e27 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  74. Freeman, J. & Simoncelli, E. P. Metamers of the ventral stream. Nat. Neurosci. 14, 1195–1201 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Ullman, S., Assif, L., Fetaya, E. & Harari, D. Atoms of recognition in human and computer vision. Proc. Natl Acad. Sci. USA 113, 2744–2749 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V. & McDermott, J. H. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98, 630–644 (2018).

    Article  CAS  PubMed  Google Scholar 

  77. Mitchell, M. An Introduction to Genetic Algorithms (MIT Press, 1998).

  78. Saxe, A. M. Deep Linear Neural Networks: A Theory of Learning in the Brain and Mind. PhD thesis, Stanford Univ. (2015).

  79. Zmarz, P. & Keller, G. B. Mismatch receptive fields in mouse visual cortex. Neuron 92, 766–772 (2016).

    Article  CAS  PubMed  Google Scholar 

  80. Issa, E. B., Cadieu, C. F. & DiCarlo, J. J. Neural dynamics at successive stages of the ventral visual stream are consistent with hierarchical error signals. eLife 7, e42870 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  PubMed  Google Scholar 

  82. Zipser, D. & Rumelhart, D. in Computational Neuroscience (ed. Schwartz, E. L.) 192–200 (1990).

  83. Stork, D. G. in Int. Joint Conf. Neural Netw. 2 (1989), 241–246.

  84. Brandt, R. D. & Lin, F. in Proc. 1996 IEEE Int. Conf. Neural Netw. 300–305 (1996).

  85. Brandt, R. D. & Lin, F. in Proc. 1996 IEEE Int. Symp. Intell. Control 86–90 (1996).

  86. Oztas, E. Neuronal tracing. Neuroanatomy 2, 2–5 (2003).

    Google Scholar 

  87. Harris, K. D. Stability of the fittest: organizing learning through retroaxonal signals. Trends Neurosci. 31, 130–136 (2008).

    Article  CAS  PubMed  Google Scholar 

  88. Venkateswararao, L. C. Adaptive Optimal-Control Algorithms for Brainlike Networks PhD Thesis, Univ. Toronto (2010).

  89. Hinton, G. The ups and downs of Hebb synapses. Can. Psychol. 44, 10–13 (2003).

    Article  Google Scholar 

  90. Kolen, J. F. & Pollack, J. B. in IEEE World Congress Comput. Intell. 3, 1375–1380 (IEEE, 1994).

  91. Körding, K. P. & König, P. Supervised and unsupervised learning with two sites of synaptic integration. J. Comput. Neurosci. 11, 207–215 (2001).

    Article  PubMed  Google Scholar 

  92. Lillicrap, T. P., Cownden, D., Tweed, D. B. & Akerman, C. J. Random feedback weights support learning in deep neural networks. Preprint at http://arXiv.org/1411.0247 (2014).

  93. Nøkland, A. in Adv. Neural Inf. Process. Syst. 1045–1053 (NIPS, 2016).

  94. Guergiuev, J., Lillicrap, T. P. & Richards, B. A. Deep learning with segregated dendrites. eLife 6, e22901 (2017).

    Article  Google Scholar 

  95. Liao, Q., Leibo, J. Z. & Poggio, T. How important is weight symmetry in backpropagation? Preprint at https://arXiv.org/1510.05067 (2015).

  96. Samadi, A., Lillicrap, T. P. & Tweed, D. B. Deep learning with dynamic spiking neurons and fixed feedback weights. Neural Comput. 29, 578–602 (2017).

    Article  PubMed  Google Scholar 

  97. Moskovitz, T. H., Litwin-Kumar, A. & Abbott, L. Feedback alignment in deep convolutional networks. Preprint at https://arXiv.org/1812.06488 (2018).

  98. Xiao, W., Chen, H., Liao, Q. & Poggio, T. Biologically-plausible learning algorithms can scale to large datasets. Preprint at https://arXiv.org/1811.03567 (2018).

  99. Amit, Y. Deep learning with asymmetric connections and Hebbian updates. Front. Comput Neurosci. 13, 18 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  100. Bartunov, S. et al. in Adv. Neural Inf. Process. Syst. 9390–9400 (NIPS, 2018).

  101. Akrout, M., Wilson, C., Humphreys, P. C., Lillicrap, T. & Tweed, D. Using weight mirrors to improve feedback alignment. Preprint at https://arXiv.org/1904.05391 (2019).

  102. Pascanu, R., Mikolov, T. & Bengio, Y. in Proc. Int. Conf. Machine Learning 1310–1318 (ICML, 2013).

  103. Coesmans, M., Weber, J. T., De Zeeuw, C. I. & Hansel, C. Bidirectional parallel fiber plasticity in the cerebellum under climbing fiber control. Neuron 44, 691–700 (2004).

    Article  CAS  PubMed  Google Scholar 

  104. Yang, Y. & Lisberger, S. G. Purkinje-cell plasticity and cerebellar motor learning are graded by complex-spike duration. Nature 510, 529–532 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Li, W., Piëch, V. & Gilbert, C. D. Contour saliency in primary visual cortex. Neuron 50, 951–962 (2006).

    Article  CAS  PubMed  Google Scholar 

  106. Motter, B. C. Focal attention produces spatially selective processing in visual cortical areas V1, V2, and V4 in the presence of competing stimuli. J. Neurophysiol. 70, 909–919 (1993).

    Article  CAS  PubMed  Google Scholar 

  107. Moran, J. & Desimone, R. Selective attention gates visual processing in the extrastriate cortex. Front. Cognit. Neurosci. 229, 342–345 (1985).

    Google Scholar 

  108. Spitzer, H., Desimone, R. & Moran, J. Increased attention enhances both behavioral and neuronal performance. Science 240, 338–340 (1988).

    Article  CAS  PubMed  Google Scholar 

  109. Chelazzi, L., Miller, E. K. & Duncanf, J. A neural basis for visual search in inferior temporal cortex. Nature 363, 27 (1993).

    Article  Google Scholar 

  110. Chelazzi, L., Miller, E. K., Duncan, J. & Desimone, R. Responses of neurons in macaque area V4 during memory-guided visual search. Cereb. Cortex 11, 761–772 (2001).

    Article  CAS  PubMed  Google Scholar 

  111. Treue, S. & Maunsell, J. H. Attentional modulation of visual motion processing in cortical areas MT and MST. Nature 382, 539–541 (1996).

    Article  CAS  PubMed  Google Scholar 

  112. Luck, S. J., Chelazzi, L., Hillyard, S. A. & Desimone, R. Neural mechanisms of spatial selective attention in areas V1, V2, and V4 of macaque visual cortex. J. Neurophysiol. 77, 24–42 (1997).

    Article  CAS  PubMed  Google Scholar 

  113. Ito, M. & Gilbert, C. D. Attention modulates contextual influences in the primary visual cortex of alert monkeys. Neuron 22, 593–604 (1999).

    Article  CAS  PubMed  Google Scholar 

  114. McAdams, C. J. & Maunsell, J. H. Effects of attention on orientation-tuning functions of single neurons in macaque cortical area V4. J. Neurosci. 19, 431–441 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Reynolds, J. H. & Desimone, R. Interacting roles of attention and visual salience in V4. Neuron 37, 853–863 (2003).

    Article  CAS  PubMed  Google Scholar 

  116. Abbott, L. F., Varela, J., Sen, K. & Nelson, S. Synaptic depression and cortical gain control. Science 275, 221–224 (1997).

    Article  Google Scholar 

  117. Chance, F. S., Abbott, L. & Reyes, A. D. Gain modulation from background synaptic input. Neuron 35, 773–782 (2002).

    Article  CAS  PubMed  Google Scholar 

  118. Girard, P., Hupé, J. & Bullier, J. Feedforward and feedback connections between areas V1 and V2 of the monkey have similar rapid conduction velocities. J. Neurophysiol. 85, 1328–1331 (2001).

    Article  CAS  PubMed  Google Scholar 

  119. De Pasquale, R. & Sherman, S. M. Synaptic properties of corticocortical connections between the primary and secondary visual cortical areas in the mouse. J. Neurosci. 31, 16494–16506 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  120. Kosslyn, S. M. & Thompson, W. L. When is early visual cortex activated during visual mental imagery? Psychol. Bull. 129, 723–746 (2003).

    Article  PubMed  Google Scholar 

  121. Bridge, H., Harrold, S., Holmes, E. A., Stokes, M. & Kennard, C. Vivid visual mental imagery in the absence of the primary visual cortex. J. Neurol. 259, 1062–1070 (2012).

    Article  PubMed  Google Scholar 

  122. Manita, S. et al. A top-down cortical circuit for accurate sensory perception. Neuron 86, 1304–1316 (2015).

    Article  CAS  PubMed  Google Scholar 

  123. Fyall, A. M., El-Shamayleh, Y., Choi, H., Shea-Brown, E. & Pasupathy, A. Dynamic representation of partially occluded objects in primate prefrontal and visual cortex. eLife 6, e25784 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  124. Mignard, M. & Malpeli, J. G. Paths of information flow through visual cortex. Science 251, 1249–1252 (1991).

    Article  CAS  PubMed  Google Scholar 

  125. Markov, N. T. & Kennedy, H. The importance of being hierarchical. Curr. Opin. Neurobiol. 23, 187–194 (2013).

    Article  CAS  PubMed  Google Scholar 

  126. Ahissar, M. & Hochstein, S. The reverse hierarchy theory of visual perceptual learning. Trends Cognit. Sci. 8, 457–464 (2004).

    Article  Google Scholar 

  127. Lee, T. S. & Mumford, D. Hierarchical Bayesian inference in the visual cortex. J. Opt. Soc. Am. A Opt Image Sci. Vis. 20, 1434–1448 (2003).

    Article  PubMed  Google Scholar 

  128. Lewicki, M. S. & Sejnowski, T. J. in Adv. Neural Inf. Process. Syst. 529–535 (NIPS, 1997).

  129. Knill, D. C. & Richards, W. Perception as Bayesian Inference (Cambridge Univ. Press, 1996).

  130. Dayan, P., Hinton, G. E., Neal, R. M. & Zemel, R. S. The Helmholtz machine. Neural Comput. 7, 889–904 (1995).

    Article  CAS  PubMed  Google Scholar 

  131. Von Helmholtz, H.& Southall, J. P. C. Treatise on Physiological Optics (Courier Corp., 2005).

  132. Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. in Readings in Computer Vision 522–533 (Elsevier, 1987).

  133. Whittington, J. C. & Bogacz, R. An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity. Neural Comput. 29, 1229–1262 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  134. Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritic error backpropagation in deep cortical microcircuits. Preprint at https://arXiv.org/1801.00062 (2017).

  135. Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. in Adv. Neural Inf. Process. Syst. 8721–8732 (NIPS, 2018).

  136. Scellier, B. & Bengio, Y. Towards a biologically plausible backprop. Preprint at https://arXiv.org/1602.05179.914 (2016).

  137. Scellier, B. & Bengio, Y. Equilibrium propagation: bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  138. Hinton, G. How to do backpropagation in a brain. Deep Learning Workshop (NIPS, 2007).

  139. Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. in Proc. 25th Int. Conf. Machine Learning 1096–1103 (ICML, 2008).

  140. Smolensky, P. Information Processing in Dynamical Systems: Foundations of Harmony Theory Technical Report (Univ. Colorado at Boulder, 1986).

  141. LeCun, Y. in Disordered Systems and Biological Organization 233–240 (Springer, 1986).

  142. LeCun, Y. Modèles connexionnistes de l’apprentissage. PhD Thesis, Univ. Paris 6 (1987).

  143. Coogan, T. & Burkhalter, A. Conserved patterns of cortico-cortical connections define areal hierarchy in rat visual cortex. Exp. Brain Res. 80, 49–53 (1990).

    Article  CAS  PubMed  Google Scholar 

  144. D’Souza, R. D. & Burkhalter, A. A laminar organization for selective cortico-cortical communication. Front. Neuroanat. 11, 71 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  145. Wimmer, V. C., Bruno, R. M., De Kock, C. P., Kuner, T. & Sakmann, B. Dimensions of a projection column and architecture of VPM and POm axons in rat vibrissal cortex. Cereb. Cortex 20, 2265–2276 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  146. Williams, L. E. & Holtmaat, A. Higher-order thalamocortical inputs gate synaptic long-term potentiation via disinhibition. Neuron 101, 91–102 (2019).

    Article  CAS  PubMed  Google Scholar 

  147. Larkum, M. E., Zhu, J. J. & Sakmann, B. A new cellular mechanism for coupling inputs arriving at different cortical layers. Nature 398, 338–341 (1999).

    Article  CAS  PubMed  Google Scholar 

  148. Gordon, U., Polsky, A. & Schiller, J. Plasticity compartments in basal dendrites of neocortical pyramidal neurons. J. Neurosci. 26, 12717–12726 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Branco, T., Clark, B. A. & Häusser, M. Dendritic discrimination of temporal input sequences in cortical neurons. Science 329, 1671–1675 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Branco, T. & Häusser, M. Synaptic integration gradients in single cortical pyramidal cell dendrites. Neuron 69, 885–892 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. Losonczy, A., Makara, J. K. & Magee, J. C. Compartmentalized dendritic plasticity and input feature storage in neurons. Nature 452, 436–441 (2008).

    Article  CAS  PubMed  Google Scholar 

  152. Polsky, A., Mel, B. W. & Schiller, J. Computational subunits in thin dendrites of pyramidal cells. Nat. Neurosci. 7, 621–627 (2004).

    Article  CAS  PubMed  Google Scholar 

  153. Urbanczik, R. & Senn, W. Learning by the dendritic prediction of somatic spiking. Neuron 81, 521–528 (2014).

    Article  CAS  PubMed  Google Scholar 

  154. Naud, R. & Sprekeler, H. Sparse bursts optimize information transmission in a multiplexed neural code. PNAS 115, E6329–E6338 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Schiess, M., Urbanczik, R. & Senn, W. Somato-dendritic synaptic plasticity and error-backpropagation in active dendrites. PLoS Comput. Biol. 12, e1004638 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  156. Klausberger, T. & Somogyi, P. Neuronal diversity and temporal dynamics: the unity of hippocampal circuit operations. Science 321, 53–57 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Sjöström, P. J. & Häusser, M. A cooperative switch determines the sign of synaptic plasticity in distal dendrites of neocortical pyramidal neurons. Neuron 51, 227–238 (2006).

    Article  PubMed  CAS  Google Scholar 

  158. Richards, B. A. & Lillicrap, T. P. Dendritic solutions to the credit assignment problem. Curr. Opin. Neurobiol. 54, 28–36 (2019).

    Article  CAS  PubMed  Google Scholar 

  159. Muller, S. Z., Zadina, A., Abbott, L. & Sawtell, N. Continual learning in a multi-layer network of an electric fish. Cell 179, 1382–1392.e10 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Bittner, K. C. et al. Conjunctive input processing drives feature selectivity in hippocampal CA1 neurons. Nat. Neurosci. 18, 1133–1142 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Bittner, K. C., Milstein, A. D., Grienberger, C., Romani, S. & Magee, J. C. Behavioral time scale synaptic plasticity underlies CA1 place fields. Science 357, 1033–1036 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. Guerguiev, J., Lillicrap, T. P. & Richards, B. A. Towards deep learning with segregated dendrites. eLife 6, e22901 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  163. Kwag, J. & Paulsen, O. The timing of external input controls the sign of plasticity at local synapses. Nat. Neurosci. 12, 1219–1221 (2009).

    Article  CAS  PubMed  Google Scholar 

  164. Dale, H. Pharmacology and nerve-endings. Proc. R. Soc. Med. 28, 319–332 (1935).

    CAS  PubMed  PubMed Central  Google Scholar 

  165. Osborne, N. N. Is Dale’s principle valid? Trends Neurosci. 2, 73–75 (1979).

    Article  Google Scholar 

  166. O’Donohue, T. L., Millington, W. R., Handelmann, G. E., Contreras, P. C. & Chronwall, B. M. On the 50th anniversary of Dale’s law: multiple neurotransmitter neurons. Trends Pharmacol. Sci. 6, 305–308 (1985).

    Article  Google Scholar 

  167. Draye, J.-P., Cheron, G., Libert, G. & Godaux, E. Emergence of clusters in the hidden layer of a dynamic recurrent neural network. Biol. Cybern. 76, 365–374 (1997).

    Article  CAS  PubMed  Google Scholar 

  168. De Kamps, M. & van der Velde, F. From artificial neural networks to spiking neuron populations and back again. Neural Netw. 14, 941–953 (2001).

    Article  PubMed  Google Scholar 

  169. Parisien, C., Anderson, C. H. & Eliasmith, C. Solving the problem of negative synaptic weights in cortical models. Neural Comput. 20, 1473–1494 (2008).

    Article  PubMed  Google Scholar 

  170. Zeiler, M. D. & Fergus, R. in Eur. Conf. Comput. Vision 818–833 (2014).

Download references

Author information

Authors and Affiliations

Authors

Contributions

T.P.L. and A.S. contributed equally to this work. T.P.L., G.H. and A.S. researched data for the article, and T.P.L., G.H., C.J.A. and A.S. wrote the article. The authors all provided substantial contributions to discussion of the content and reviewed and edited the manuscript before submission. The authors contributed equally to all aspects of the article.

Corresponding authors

Correspondence to Timothy P. Lillicrap or Geoffrey Hinton.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Neuroscience thanks Y. Amit, J. DiCarlo, W. Senn and T. Toyoizumi for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Glossary

Auto-encoders

Networks showing unsupervised learning in which the target is the input itself. One application of auto-encoding is the training of feedback connections to coherently carry ‘targets’ to earlier layers.

Backpropagation of error (backprop)

An algorithm for explicitly computing the changes to prescribe to synapses in deep networks in order to improve performance. It involves the flow of error signals through feedback connections from the output of the network towards the input.

Credit assignment

Determination of the degree to which a particular parameter, such as a synaptic weight, contributes to the magnitude of the error signal.

Deep learning

Learning in networks that consist of hierarchical stacks, or layers, of neurons. Deep learning is especially difficult because of the difficulty inherent in assigning credit to a vast number of synapses situated deep within the network.

Error function

An explicit quantitative measure for determining the quality of a network’s output. It is also frequently called a loss or objective function.

Error signals

Contribution to the error by the activities of neurons situated closer to the output. In backpropagation, these signals are sent backward through the network in order to inform learning.

ImageNet

A large dataset of images with their corresponding word labels. The task associated with the dataset is to guess the correct label for each image. ImageNet has become a de facto standard for measuring the strength of deep-learning algorithms and architectures.

Internal representations

Hidden activity of a network that represents the network’s input data. ‘Useful’ representations tend to be those that efficiently code for redundant features of the input data and lead to good generalization, such as the existence of oriented edges in handwritten digits.

Learning

The modification of network parameters, such as synaptic weights, to enable better performance according to some measure, such as an error function.

Reinforcement learning

Learning in an interactive trial-and-error loop, whereby an agent acts stochastically in an environment and uses the correlations between actions and the accumulated scalar rewards to improve performance.

Supervised learning

Learning in which the error function involves an explicit target. The target tends to contain information that is unavailable to the network, such as ground truth labels.

Target

The desired output of a network, given some input. Deviation from the target is quantified with an error function.

Unsupervised learning

Learning in which the error function does not involve a separate output target. Instead, errors are computed using other information readily available to the network, such as the input itself or the next observation in a sequence.

Weights

Network parameters that determine the strength of neuron–neuron connections. A presynaptic neuron connected to a postsynaptic neuron with a high weight will greatly influence the activity of the postsynaptic neurons, and vice versa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lillicrap, T.P., Santoro, A., Marris, L. et al. Backpropagation and the brain. Nat Rev Neurosci 21, 335–346 (2020). https://doi.org/10.1038/s41583-020-0277-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41583-020-0277-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing