Task representations in neural networks trained to perform many cognitive tasks

Yang, Guangyu Robert; Joglekar, Madhura R.; Song, H. Francis; Newsome, William T.; Wang, Xiao-Jing

doi:10.1038/s41593-018-0310-2

Article
Published: 14 January 2019

Task representations in neural networks trained to perform many cognitive tasks

Nature Neuroscience volume 22, pages 297–306 (2019)Cite this article

39k Accesses
189 Citations
285 Altmetric
Metrics details

Subjects

Abstract

The brain has the ability to flexibly perform many tasks, but the underlying mechanism cannot be elucidated in traditional experimental and modeling studies designed for one task at a time. Here, we trained single network models to perform 20 cognitive tasks that depend on working memory, decision making, categorization, and inhibitory control. We found that after training, recurrent units can develop into clusters that are functionally specialized for different cognitive processes, and we introduce a simple yet effective measure to quantify relationships between single-unit neural representations of tasks. Learning often gives rise to compositionality of task representations, a critical feature for cognitive flexibility, whereby one task can be performed by recombining instructions for other tasks. Finally, networks developed mixed task selectivity similar to recorded prefrontal neurons after learning multiple tasks sequentially with a continual-learning technique. This work provides a computational platform to investigate neural representations of many cognitive tasks.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A recurrent neural network model is trained to perform a large number of cognitive tasks.**

**Fig. 2: The emergence of functionally specialized clusters for task representation.**

**Fig. 3: The activation function dictates whether clusters emerge in a network.**

**Fig. 4: A diversity of neural relationships between pairs of tasks.**

**Fig. 5: Dissecting a reference network for the context-dependent DM tasks.**

**Fig. 6: Compositional representation of tasks in state space.**

**Fig. 7: Performing tasks with algebraically composite rule inputs.**

**Fig. 8: Sequential training of cognitive tasks.**

Machine learning reveals the control mechanics of an insect wing hinge

Article 17 April 2024

Johan M. Melis, Igor Siwanowicz & Michael H. Dickinson

Control of working memory by phase–amplitude coupling of human hippocampal neurons

Article Open access 17 April 2024

Jonathan Daume, Jan Kamiński, … Ueli Rutishauser

The language network as a natural kind within the broader landscape of the human brain

Article 12 April 2024

Evelina Fedorenko, Anna A. Ivanova & Tamar I. Regev

Code availability

All training and analysis codes are available on GitHub (https://github.com/gyyang/multitask).

Data availability

We provide data files in Python and MATLAB readable formats for all trained models for further analyses on Github (https://github.com/gyyang/multitask).

References

Fuster, J. The Prefrontal Cortex (Academic Press, Cambridge, 2015).
Book Google Scholar
Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001).
Article CAS Google Scholar
Wang, X.-J. in Principles of Frontal Lobe Function (Stuss, D. T. & Knight, R. T. eds.) (Cambridge Univ. Press, New York, 2013).
Wallis, J. D., Anderson, K. C. & Miller, E. K. Single neurons in prefrontal cortex encode abstract rules. Nature 411, 953–956 (2001).
Article CAS Google Scholar
Sakai, K. Task set and prefrontal cortex. Annu. Rev. Neurosci. 31, 219–245 (2008).
Article CAS Google Scholar
Cole, M. W., Etzel, J. A., Zacks, J. M., Schneider, W. & Braver, T. S. Rapid transfer of abstract rules to novel contexts in human lateral prefrontal cortex. Front. Hum. Neurosci. 5, 142 (2011).
Article Google Scholar
Tschentscher, N., Mitchell, D. & Duncan, J. Fluid intelligence predicts novel rule implementation in a distributed frontoparietal control network. J. Neurosci. 37, 4841–4847 (2017).
Article CAS Google Scholar
Hanes, D. P., Patterson, W. F. II & Schall, J. D. Role of frontal eye fields in countermanding saccades: visual, movement, and fixation activity. J. Neurophysiol. 79, 817–834 (1998).
Article CAS Google Scholar
Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Article CAS Google Scholar
Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).
Article CAS Google Scholar
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).
Article CAS Google Scholar
Cole, M. W., Laurent, P. & Stocco, A. Rapid instructed task learning: a new window into the human brain’s unique capacity for flexible cognitive control. Cogn. Affect. Behav. Neurosci. 13, 1–22 (2013).
Article Google Scholar
Reverberi, C., Görgen, K. & Haynes, J.-D. Compositionality of rule representations in human prefrontal cortex. Cereb. Cortex 22, 1237–1246 (2012).
Article Google Scholar
Zipser, D. & Andersen, R. A. A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature 331, 679–684 (1988).
Article CAS Google Scholar
Song, H. F., Yang, G. R. & Wang, X.-J. Training excitatory-inhibitory recurrent neural networks for cognitive tasks: a simple and flexible framework. PLoS Comput. Biol. 12, e1004792 (2016).
Article Google Scholar
Carnevale, F., de Lafuente, V., Romo, R., Barak, O. & Parga, N. Dynamic control of response criterion in premotor cortex during perceptual detection under temporal uncertainty. Neuron 86, 1067–1077 (2015).
Article CAS Google Scholar
Rajan, K., Harvey, C. D. & Tank, D. W. Recurrent network models of sequence generation and memory. Neuron 90, 128–142 (2016).
Article CAS Google Scholar
Chaisangmongkon, W., Swaminathan, S. K., Freedman, D. J. & Wang, X.-J. Computing by robust transience: how the fronto-parietal network performs sequential, category-based decisions. Neuron 93, 1504–1517 (2017).
Article CAS Google Scholar
Eliasmith, C. et al. A large-scale model of the functioning brain. Science 338, 1202–1205 (2012).
Article CAS Google Scholar
Funahashi, S., Bruce, C. J. & Goldman-Rakic, P. S. Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J. Neurophysiol. 61, 331–349 (1989).
Article CAS Google Scholar
Gold, J. I. & Shadlen, M. N. The neural basis of decision making. Annu. Rev. Neurosci. 30, 535–574 (2007).
Article CAS Google Scholar
Siegel, M., Buschman, T. J. & Miller, E. K. Cortical information flow during flexible sensorimotor decisions. Science 348, 1352–1355 (2015).
Article CAS Google Scholar
Raposo, D., Kaufman, M. T. & Churchland, A. K. A category-free neural population supports evolving demands during decision-making. Nat. Neurosci. 17, 1784–1792 (2014).
Article CAS Google Scholar
Romo, R., Brody, C. D., Hernández, A. & Lemus, L. Neuronal correlates of parametric working memory in the prefrontal cortex. Nature 399, 470–473 (1999).
Article CAS Google Scholar
Munoz, D. P. & Everling, S. Look away: the anti-saccade task and the voluntary control of eye movement. Nat. Rev. Neurosci. 5, 218–228 (2004).
Article CAS Google Scholar
Miller, E. K., Erickson, C. A. & Desimone, R. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J. Neurosci. 16, 5154–5167 (1996).
Article CAS Google Scholar
Freedman, D. J. & Assad, J. A. Neuronal mechanisms of visual categorization: an abstract view on decision making. Annu. Rev. Neurosci. 39, 129–147 (2016).
Article CAS Google Scholar
Priebe, N. J. & Ferster, D. Inhibition, spike threshold, and stimulus selectivity in primary visual cortex. Neuron 57, 482–497 (2008).
Article CAS Google Scholar
Abbott, L. F. & Chance, F. S. Drivers and modulators from push-pull and balanced synaptic input. Prog. Brain. Res. 149, 147–155 (2005).
Article CAS Google Scholar
Wang, X.-J. Probabilistic decision making by slow reverberation in cortical circuits. Neuron 36, 955–968 (2002).
Article CAS Google Scholar
Sussillo, D. & Barak, O. Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput. 25, 626–649 (2013).
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 26, 3111–3119 (2013).
Google Scholar
Benna, M. K. & Fusi, S. Computational principles of synaptic memory consolidation. Nat. Neurosci. 19, 1697–1706 (2016).
Article CAS Google Scholar
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).
Article CAS Google Scholar
Zenke, F., Poole, B. & Ganguli, S. Continual learning through synaptic intelligence. ICML 70, 3987–3995 (2017).
Google Scholar
Kanwisher, N. Functional specificity in the human brain: a window into the functional architecture of the mind. Proc. Natl Acad. Sci. USA 107, 11163–11170 (2010).
Article CAS Google Scholar
Rigotti, M., Ben Dayan Rubin, D., Wang, X.-J. & Fusi, S. Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. Front. Comput. Neurosci. 4, 24 (2010).
Article Google Scholar
Cole, M. W. et al. Multi-task connectivity reveals flexible hubs for adaptive task control. Nat. Neurosci. 16, 1348–1355 (2013).
Article CAS Google Scholar
Yang, G. R., Ganichev, I., Wang, X.-J., Shlens, J. & Sussillo, D. A dataset and architecture for visual reasoning with a working memory. ECCV 714–731 (2018)..
Lake, B. M. & Baroni, M. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. ICML 80, 2873–2882 (2017).
Google Scholar
Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).
Article CAS Google Scholar
Song, H. F., Yang, G. R. & Wang, X.-J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 6, e21492 (2017).
Article Google Scholar
Kingma, D. & Ba, J. Adam: A method for stochastic optimization. ICLR (2015)..
Le, Q. V., Jaitly, N. & Hinton, G. E. A simple way to initialize recurrent networks of rectified linear units. Preprint at arXiv https://arxiv.org/abs/1504.00941 (2015).

Download references

Acknowledgements

We thank current and former members of the Wang lab, especially S.Y. Li, O. Marschall, and E. Ohran for fruitful discussions; J.A. Li, J.D. Murray, D. Ehrlich, and J. Jaramillo for critical comments on the manuscript; and S. Wang for assistance with the NYU HPC clusters. We are grateful to V. Mante for providing data and for discussion. This work was supported by an Office of Naval Research grant no. N00014-13-1-0297, a National Science Foundation grant no. 16-31586, a Google Computational Neuroscience Grant (X.J.W.), a Samuel J. and Joan B. Williamson Fellowship, a National Science Foundation Grant Number 1707398, and the Gatsby Charitable Foundation (G.R.Y.).

Author information

Madhura R. Joglekar
Present address: Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
H. Francis Song
Present address: DeepMind, London, UK

Authors and Affiliations

Center for Neural Science, New York University, New York, NY, USA
Guangyu Robert Yang, Madhura R. Joglekar, H. Francis Song & Xiao-Jing Wang
Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY, USA
Guangyu Robert Yang
Department of Neurobiology, Stanford University, Stanford, CA, USA
William T. Newsome
Howard Hughes Medical Institute, Stanford University, Stanford, CA, USA
William T. Newsome
Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
Xiao-Jing Wang

Authors

Guangyu Robert Yang
View author publications
You can also search for this author in PubMed Google Scholar
Madhura R. Joglekar
View author publications
You can also search for this author in PubMed Google Scholar
H. Francis Song
View author publications
You can also search for this author in PubMed Google Scholar
William T. Newsome
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Jing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.R.Y. and X.J.W. designed the study. G.R.Y., M.R.J., H.F.S, W.T.N., and X.J.W. had frequent discussions. G.R.Y. and M.R.J. performed the research. G.R.Y., H.F.S, W.T.N., and X.J.W. wrote the manuscript.

Corresponding author

Correspondence to Xiao-Jing Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Sample trials from the 20 tasks trained.

(a) Convention is the same as Fig. 1a. Output activities are obtained from a sample network after training. Green lines are the target activities for the fixation output unit.

Supplementary Figure 2 Psychometric tests for a range of tasks.

(a) Decision-making performances improve with longer stimulus presentation time and stronger stimulus coherence in the DM 1 task in a sample reference network. (b) Discrimination thresholds decrease with longer stimulus presentation time in the DM 1 task. The discrimination thresholds are estimated by fitting cumulative Weibull functions. (c-f) Same analyses as (a,b) for the Ctx DM 1 (c,d) and MultSen DM (e,f) task. In all n=20 independent networks studied, performance improves with longer stimulus presentation time. However, in many networks the improvement is different from that expected of perfect integration (red line). This variation has no impact on other results. (g) A sample network is able to perform well above chance in the Dly DM 1 task for a delay period of up to five seconds.

Supplementary Figure 3 Task and epoch variances.

(a) Visualization of the task variance map using classical multi-dimensional scaling (MDS). MDS tends to preserve global structures, while tSNE tends to emphasize local structures (for example, clustering). (b) Epoch variance is computed in a similar way to task variance, except that it is computed for individual task epochs instead of tasks. There are clusters of units that are selective in specific epochs. (c) Visualization of the epoch variance map in the same style as Fig. 2d.

Supplementary Figure 4 Determining number of clusters.

The silhouette score as a function of the number of clusters for an example network with the Softplus activation function (a) and one with the Tanh activation function (b). The silhouette score assesses the quality of a clustering scheme (see Methods). The ‘optimal’ or natural number of clusters is chosen to be the one with the highest silhouette score.

Supplementary Figure 5 Connectivity matrix.

The full connectivity matrix for an example reference network. The network units are first sorted according to their cluster identity. Within each cluster, the units are sorted according to their preferred input directions, as defined by the input direction making the strongest connection weights to each unit (summed across modality 1 and 2). Color range is determined separately for each sub-matrix for better visualization. Red means more excitatory and blue means more inhibitory.

Supplementary Figure 6 Fractional variance distributions for all pairs of tasks.

(a) There is a total of 190 unique pairs of tasks from all 20 tasks trained. Each fractional variance distribution (black) shown here is averaged across 20 independently trained networks. As a control, we also computed fractional variance distributions (gray) from activities of surrogate units that are generated by randomly mixing activities of the original network units (see Methods). The y-axis range is shared across all plots.

Supplementary Figure 7 Detailed behavioral effect of lesioning on the Ctx DM 1 task.

(a-e) The network choice in the Ctx DM 1 task for different combinations of modality 1 and modality 2 coherence in various networks. (a) The intact network’s choice only depends on the coherence of modality 1. (b) Lesioning group 1 makes the network more dependent on the coherence of modality 2. (c) Lesioning group 2 has no impact for the Ctx DM 1 task. (d) Lesioning both group 1 and 2 allow the network to weigh both modalities equally. (e) Lesioning group 12 led to failure in making decisions. Although some preference towards modality 1 is preserved, the network is largely unable to choose decisively.

Supplementary Figure 8 Representation of all tasks in state space.

(a) The representation of each task is computed the same way as in Fig. 6. Here showing the representation of all tasks in the top two principal components. RT Go and RT Anti tasks are not shown here because there is no well-defined stimulus epoch in these tasks.

Supplementary Figure 9 Visualization of connection weights of rule inputs.

(a) Connection weights from rule input units representing Go, Dly Go, Anti, Dly Anti tasks visualized in the space spanned by the top two principal components (PCs) for a sample network. Similar to Fig. 6, the top two PCs are rotated and reflected (rPCs) to form the two axes. (b) The same analysis as in (a) is performed for 40 networks, and the results are overlaid. (c) Connection weights from rule input units representing Ctx DM 1, Ctx DM 2, Ctx Dly DM 1, and Ctx Dly DM 2 tasks visualized in the top two PCs for a sample network. (d) The same analysis as in (c) for 40 networks.

Supplementary Figure 10 Distributed rule representation.

(a) The same analysis and box-plot convention as Fig. 7b,c, except that the networks are trained using distributed, instead of one-hot, rule representations.

Supplementary Figure 11 Lack of compositionality for the family of matching tasks.

(a) Visualization of task-based network activity for the DMS, DNMS, DMC, and DNMC tasks, for an example network (left) and for 40 networks (right). These plots have the same style as Fig. 6. (b) Visualization of connection weights for the same set of tasks in an example network (left) and for 40 networks (right). The rule weights are not compositional. These plots have the same style as Supplementary Fig. 9. (c) The DMS task can not be performed with a compositional rule input. The box plot convention is the same as the one in Fig. 7b.

Supplementary Figure 12 Partially plastic networks and experimental data.

(a) Networks where only 10% of connection weights are trained show a mixed FTV distribution for the Ctx DM 1 and Ctx DM 2 tasks. Solid lines are median over 60 networks. Shaded areas indicate the 95% confidence interval of the median estimated from bootstrapping. (b-e) FTV distributions derived from experimental data (reference 11). (b) Monkey A, single units. (c) Monkey A, all units. (d) Monkey F, single units. (e) Monkey F, all units.

Supplementary information

Supplementary Figures 1–12

Supplementary Figs. 1–12 and Supplementary Table 1

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, G.R., Joglekar, M.R., Song, H.F. et al. Task representations in neural networks trained to perform many cognitive tasks. Nat Neurosci 22, 297–306 (2019). https://doi.org/10.1038/s41593-018-0310-2

Download citation

Received: 16 July 2018
Accepted: 30 November 2018
Published: 14 January 2019
Issue Date: February 2019
DOI: https://doi.org/10.1038/s41593-018-0310-2

This article is cited by

Natural language instructions induce compositional generalization in networks of neurons
- Reidar Riveland
- Alexandre Pouget
Nature Neuroscience (2024)
A prefrontal-thalamic circuit encodes social information for social recognition
- Zihao Chen
- Yechao Han
- Yang Zhan
Nature Communications (2024)
Preparatory activity and the expansive null-space
- Mark M. Churchland
- Krishna V. Shenoy
Nature Reviews Neuroscience (2024)
Brain imaging of a gamified cognitive flexibility task in young and older adults
- Ping Wang
- Sheng-Ju Guo
- Hui-Jie Li
Brain Imaging and Behavior (2024)
Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings
- Jascha Achterberg
- Danyal Akarca
- Duncan E. Astle
Nature Machine Intelligence (2023)