A diverse range of factors affect the nature of neural representations underlying short-term memory

Orhan, A. Emin; Ma, Wei Ji

doi:10.1038/s41593-018-0314-y

Article
Published: 21 January 2019

A diverse range of factors affect the nature of neural representations underlying short-term memory

Nature Neuroscience volume 22, pages 275–283 (2019)Cite this article

11k Accesses
60 Citations
82 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 06 February 2019

This article has been updated

Abstract

Sequential and persistent activity models are two prominent models of short-term memory in neural circuits. In persistent activity models, memories are represented in persistent or nearly persistent activity patterns across a population of neurons, whereas in sequential models, memories are represented dynamically by a sequential activity pattern across the population. Experimental evidence for both models has been reported previously. However, it has been unclear under what conditions these two qualitatively different types of solutions emerge in neural circuits. Here, we address this question by training recurrent neural networks on several short-term memory tasks under a wide range of circuit and task manipulations. We show that both sequential and nearly persistent solutions are part of a spectrum that emerges naturally in trained networks under different conditions. Our results help to clarify some seemingly contradictory experimental results on the existence of sequential versus persistent activity-based short-term memory mechanisms in the brain.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Intrinsic circuit properties and their effect on the sequentiality of recurrent activity in trained networks.**

**Fig. 3: The temporal complexity of the task increases the sequentiality of the recurrent activity in trained networks.**

**Fig. 4: Hebbian short-term synaptic plasticity, delay duration variability, and structured dynamic inputs affect the sequentiality of the recurrent activity in trained networks.**

**Fig. 5: Multitask learning experiments.**

**Fig. 6: Circuit mechanism that generates sequential versus persistent activity.**

Circuit mechanisms for the maintenance and manipulation of information in working memory

Article 10 June 2019

Multiplexing working memory and time in the trajectories of neural networks

Article 20 April 2023

An oscillatory mechanism for multi-level storage in short-term memory

Article Open access 10 August 2023

Code availability

The code for reproducing the experiments and analyses reported in this article is available at https://github.com/eminorhan/recurrent-memory.

Data availability

The raw simulation data used for generating each figure are available upon request.

Change history

25 January 2019
In the version of this article initially published online, a word was misprinted in the abstract. Extra letters were removed from the word “Experimentalrep” to correct it to “Experimental”. The error has been corrected in the print, PDF and HTML versions of this article.
06 February 2019
The original and corrected figures are shown in the accompanying Publisher Correction.

References

Fuster, J. M. & Alexander, G. E. Neuron activity related to short-term memory. Science 173, 652–654 (1971).
Article CAS Google Scholar
Wang, X. J. Synaptic reverberation underlying mnemonic persistent activity. Trends Neurosci. 24, 455–463 (2001).
Article CAS Google Scholar
Goldman, M. S. Memory without feedback in a neural network. Neuron 61, 621–634 (2009).
Article CAS Google Scholar
Druckmann, S. & Chklovskii, D. B. Neural circuits underlying persistent representations despite time varying activity. Curr. Biol. 22, 2095–2103 (2012).
Article CAS Google Scholar
Murray, J. D. et al. Stable population coding for working memory coexists with heterogeneous neural dynamics in prefrontal cortex. Proc. Natl Acad. Sci. USA 114, 394–399 (2017).
Article CAS Google Scholar
Lundqvist, M., Herman, P. & Miller, E. K. Working memory: delay activity, yes! Persistent activity? Maybe not. J. Neurosci. 38, 7013–7019 (2018).
Article CAS Google Scholar
Constantinidis, C. et al. Persistent spiking activity underlies working memory. J. Neurosci. 38, 7020–7028 (2018).
Article CAS Google Scholar
Funahashi, S., Bruce, C. J. & Goldman-Rakic, P. S. Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J. Neurophysiol. 61, 331–349 (1989).
Article CAS Google Scholar
Miller, E. K., Erickson, C. A. & Desimone, R. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J. Neurosci. 16, 5154–5167 (1996).
Article CAS Google Scholar
Romo, R., Brody, C. D., Hernández, A. & Lemus, L. Neural correlates of parametric working memory in the prefrontal cortex. Nature 399, 470–473 (1999).
Article CAS Google Scholar
Goard, M. J., Pho, G. N., Woodson, J. & Sur, M. Distinct roles of visual, parietal, and frontal motor cortices in memory-guided sensorimotor decisions. eLife 5, e13764 (2016).
Article Google Scholar
Guo, Z. V. et al. Maintenance of persistent activity in a frontal thalamocortical loop. Nature 545, 181–186 (2017).
Article CAS Google Scholar
Baeg, E. H. et al. Dynamics of population code for working memory in the prefrontal cortex. Neuron 40, 177–188 (2003).
Article CAS Google Scholar
Fujisawa, S., Amarasingham, A., Harrison, M. T. & Buzsáki, G. Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex. Nat. Neurosci. 11, 823–833 (2008).
Article CAS Google Scholar
MacDonald, C. J., Lepage, K. Q., Eden, U. T. & Eichenbaum, H. Hippocampal ‘time cells’ bridge the gap in memory for discontiguous events. Neuron 71, 737–749 (2011).
Article CAS Google Scholar
Harvey, C. D., Coen, P. & Tank, D. W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature 484, 62–68 (2012).
Article CAS Google Scholar
Schmitt, L. I. et al. Thalamic amplification of cortical connectivity sustains attentional control. Nature 545, 219–223 (2017).
Article CAS Google Scholar
Scott, B. B. et al. Fronto-parietal cortical circuits encode accumulated evidence with a diversity of timescales. Neuron 95, 385–398 (2017).
Article CAS Google Scholar
Murray, J. D. et al. A hierarchy of intrinsic timescales across cortex. Nat. Neurosci. 17, 1661–1663 (2014).
Article CAS Google Scholar
Runyan, C. A., Piasini, E., Panzeri, S. & Harvey, C. D. Distinct timescales of population coding across cortex. Nature 548, 92–96 (2017).
Article CAS Google Scholar
Sussillo, D., Churchland, M. M., Kaufman, M. T. & Shenoy, K. V. A neural network that finds a naturalistic solution for the production of muscle activity. Nat. Neurosci. 18, 1025–1033 (2015).
Article CAS Google Scholar
Cueva, C. J. & Wei, X. X. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. Preprint at https://arxiv.org/abs/1803.07770 (2018).
Banino, A. et al. Vector-based navigation using grid-like representations in artificial agents. Nature 557, 429–433 (2018).
Article CAS Google Scholar
Wilken, P. & Ma, W. J. A detection theory account of change detection. J. Vis. 4, 1120–1135 (2004).
Article Google Scholar
Barron, A. R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993).
Article Google Scholar
Zucker, R. S. & Regehr, W. G. Short-term synaptic plasticity. Annu. Rev. Physiol. 64, 355–405 (2002).
Article CAS Google Scholar
Mongillo, G., Barak, O. & Tsodyks, M. Synaptic theory of working memory. Science 319, 1543–1546 (2008).
Article CAS Google Scholar
Rose, N. S. et al. Reactivation of latent working memories with transcranial magnetic stimulation. Science 354, 1136–1139 (2016).
Article CAS Google Scholar
Wolff, M. J., Jochim, J., Akyürek, E. G. & Stokes, M. G. Dynamic hidden states underlying working-memory-guided behavior. Nat. Neurosci. 20, 864–871 (2017).
Article CAS Google Scholar
Hinton, G. E. & Plaut, D. C. Using fast weights to deblur old memories. Proc. 9th Annual Conference of the Cognitive Science Society, 177–186 (Erlbaum, 1987).
Sompolinsky, H. & Kanter, I. Temporal association in asymmetric neural networks. Phys. Rev. Lett. 57, 2861–2864 (1986).
Article CAS Google Scholar
Fiete, I. R., Senn, W., Wang, C. Z. H. & Hahnloser, R. H. R. Spike-time-dependent plasticity and heterosynaptic competition organize networks to produce long scale-free sequences of neural activity. Neuron 65, 563–576 (2010).
Article CAS Google Scholar
Klampfl, S. & Maass, W. Emergence of dynamic memory traces in cortical microcircuit models through STDP. J. Neurosci. 33, 11515–11529 (2013).
Article CAS Google Scholar
Krumin, M., Lee, J. J., Harris, K. D. & Carandini, M. Decision and navigation in mouse parietal cortex. ELife 7, e42583 (2018).
Article Google Scholar
Rajan, K., Harvey, C. D. & Tank, D. W. Recurrent network models of sequence generation and memory. Neuron 90, 128–142 (2016).
Article CAS Google Scholar
Orhan, A. E. & Ma, W. J. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nat. Commun. 8, 138 (2017).
Article Google Scholar
Ganguli, S., Huh, D. & Sompolinsky, H. Memory traces in dynamical systems. Proc. Natl Acad. Sci. USA 105, 18970–18975 (2008).
Article CAS Google Scholar
Clevert, D. A., Unterthiner, T., Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). Preprint at https://arxiv.org/abs/1511.07289 (2016).
Glorot X., Bordes A., Bengio Y. Deep sparse rectifier neural networks. In Proc. 14th International Conference on Artificial Intelligence and Statistics (2011).
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).
Article CAS Google Scholar
Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Article CAS Google Scholar
Wang, J., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible timing by temporal scaling of cortical responses. Nat. Neurosci. 21, 102–110 (2018).
Article CAS Google Scholar
Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).
Keshvari, S., van den Berg, R. & Ma, W. J. No evidence for an item limit in change detection. PLoS Comput. Biol. 9, e1002927 (2013).
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by grant no. R01EY020958 from the National Eye Institute. We thank the staff at the High-Performance Computing Cluster at New York University, especially S. Wang, for their help with troubleshooting.

Author information

Authors and Affiliations

Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
A. Emin Orhan
Center for Neural Science, New York University, New York, NY, USA
Wei Ji Ma
Department of Psychology, New York University, New York, NY, USA
Wei Ji Ma

Authors

A. Emin Orhan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ji Ma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.E.O. conceived the study and developed the research plan with input from W.J.M. In several iterations, A.E.O. performed the experiments and the analyses. A.E.O. and W.J.M. then discussed the results, which helped refine the experiments and the analyses. A.E.O. wrote the initial draft of the paper. A.E.O. and W.J.M. reviewed and edited later iterations of the paper.

Corresponding author

Correspondence to A. Emin Orhan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 Initial, untrained network dynamics for different (λ₀,σ₀) values.

The heat maps show the normalized responses of the recurrent units to a unit pulse delivered at time t = 0 to all units. Here, λ₀ takes 10 uniformly-spaced values between 0.8 and 0.98 (columns) and σ₀ takes 10 uniformly-spaced values between 0 and 0.4025 (rows).

Supplementary Fig. 2 Normalized responses of the recurrent units in networks trained with strong initial network coupling and no regularization.

Each plot corresponds to an example trial from one of the six basic tasks. The SIs of the trials are indicated at the top of the plots. Trials are ordered by increasing SI from left to right. All trials shown here are from networks trained with λ₀ = 0.96, σ₀ = 0.313, ρ = 0. After training, all networks shown here achieved a test set performance within 25% of the optimal performance. In Supplementary Figs. 2–5, only the active recurrent units are shown.

Supplementary Fig. 3 Normalized responses of the recurrent units in networks trained with weak initial network coupling and no regularization.

Each plot corresponds to an example trial from one of the six basic tasks. The SIs of the trials are indicated at the top of the plots. Trials are ordered by increasing SI from left to right. All trials shown here are from networks trained with λ₀ = 0.96, σ₀ = 0.134, ρ = 0. After training, all networks shown here achieved a test set performance within 50% of the optimal performance.

Supplementary Fig. 4 Normalized responses of the recurrent units in networks trained with strong initial network coupling and strong regularization.

Each plot corresponds to an example trial from one of the six basic tasks. The SIs of the trials are indicated at the top of the plots. Trials are ordered by increasing SI from left to right. All trials shown here are from networks trained with λ₀ = 0.96, σ₀ = 0.313, ρ = 10⁻³. After training, all networks shown here achieved a test set performance within 50% of the optimal performance.

Supplementary Fig. 5 Normalized responses of the recurrent units in networks trained with weak initial network coupling and strong regularization.

Each plot corresponds to an example trial from one of the six basic tasks. The SIs of the trials are indicated at the top of the plots. Trials are ordered by increasing SI from left to right. All trials shown here are from networks trained with λ₀ = 0.96, σ₀ = 0.134, ρ = 10⁻³. After training, all networks shown here achieved a test set performance within 50% of the optimal performance.

Supplementary Fig. 6 Average normalized activity of recurrent units in an example network trained in the 2AFC task.

The network shown here was trained with λ₀ = 0.96, σ₀ = 0.313, ρ = 0. After training, the network achieved a test set performance within 0.1% of the optimal performance. As in ref. ¹⁶, we divided the recurrent units into left-preferring and right-preferring ones based on whether they responded more strongly during correct left choices or during correct right choices. The upper panel shows the average normalized responses of the left-preferring units in the correct left and correct right trials, respectively. Similarly, the lower panel shows the average normalized responses of the right-preferring units in the correct left and correct right trials. As reported in ref. ¹⁶, the trained network developed choice-specific sequences in the 2AFC task (cf. Figure 2c in ref. ¹⁶). Only the most active 150 units from each group are shown in this figure; as always, the original network contained 500 recurrent units. This figure also demonstrates that the sequences are consistent from trial to trial, since the sequential activity pattern does not disappear when the responses are averaged over multiple trials.

Supplementary Fig. 7 A simplified model of recurrent dynamics.

A simplified model that only incorporated the ReLU nonlinearity and the mean recurrent connection weight profiles shown in the upper panel (with no fluctuations around the mean) qualitatively captured the difference between the emergent sequential vs. persistent activity patterns (lower panel, left and right plots respectively). The networks simulated here had 500 recurrent units (only the most active 50 units are shown in the lower panel). All recurrent units received a unit pulse input at t = 0. The self-recurrence term in the recurrent connectivity matrix (not shown in the upper panel for clarity) was set to 1 in both cases. In the sequential case, the off-diagonal band was set to 0.09 in the forward direction and 0.01 in the backward direction, that is W_i,i−1 = 0.09 and W_i−1,i = 0.01. The recurrent units did not have a bias term and they did not receive any direct inputs during the trial other than the unit pulse injected at the beginning of the trial.

Supplementary Fig. 8 Results for the clipped ReLU networks.

The clipped ReLU nonlinearity is similar to ReLU except that it is bounded above by a maximum value: that is f(x) = clip(x, r_min, r_max), where r_min = 0 and r_max = 100. a SI increased significantly with σ₀. Linear regression slope: 0.55 ± 0.28, R² = 0.01 (two-sided Wald test, n = 280 experimental conditions, p = 0.049). In a–c, solid black lines are the linear fits and shaded regions are 95% confidence intervals for the linear regression. b SI decreased significantly with λ₀. Linear regression slope: −3.87 ± 0.66, R² = 0.11 (two-sided Wald test, n = 280 experimental conditions, p = 0.000). Note that this result differs from the corresponding result in the case of ReLU networks, where λ₀ did not have a significant effect on the SI (Fig. 2c). c SI decreased significantly with ρ. Linear regression slope: −418 ± 64, R² = 0.13 (two-sided Wald test, n = 280 experimental conditions, p = 0.000). d SI as a function of task. Overall, the ordering of the tasks by SI was similar to that obtained with the ReLU nonlinearity (Fig. 3a). However, note that training was substantially more difficult with the clipped ReLU nonlinearity than with the ReLU nonlinearity. Across all tasks and all conditions, ReLU networks had a training success (defined as reaching within 50% of the optimal performance) of ~60%, whereas the clipped ReLU networks had a training success of only ~9.3%. In particular, we were not able to successfully train any networks in the CD task and very few in the 2AFC task. As a consequence, some of the differences between the tasks ended up not being significant in the clipped ReLU case. Error bars represent mean ± standard errors across different hyperparameter settings. Exact sample sizes for the derived statistics shown in d are reported in Supplementary Table 1. e, f Recurrent connection weight profiles (as in Fig. 6a–c) in conditions where SI > 4.8 and in conditions where SI < 3, respectively. The weights were smaller in magnitude in f, because most of the low SI networks were trained under strong regularization. Solid lines represent mean weights and shaded regions represent standard deviations of weights. Both means and standard deviations are averages over multiple networks.

Supplementary Fig. 9 Changing the amount of input noise.

In these simulations, we set ρ = 0 and varied the gain of the input population(s), g. g = 1 corresponds to the original case reported in the main text; lower and higher values of g correspond to higher and lower amounts of input noise, respectively. a Combined across all noise conditions, SI increased significantly with σ₀. Linear regression slope: 0.76 ± 0.08, R² = 0.04 (two-sided Wald test, n = 2239 experimental conditions, p = 0.000). In a–c, solid black lines are the linear fits and shaded regions are 95% confidence intervals for the linear regression. b λ₀ did not have a significant effect on SI (two-sided Wald test, n = 2239 experimental conditions, p = 0.958). c The input gain g slightly increased the SI. Linear regression slope: 0.04 ± 0.02, R² = 0.003 (two-sided Wald test, n = 2239 experimental conditions, p = 0.003). d Again, combined across all input noise levels, the ordering of the tasks by SI was similar to that obtained in the main set of experiments, where g = 1 (Fig. 3a). Error bars represent mean ± standard errors across different hyperparameter settings and noise levels. Exact sample sizes for the derived statistics shown in d are reported in Supplementary Table 1.

Supplementary Fig. 10 Results for the lowest level of input noise (g = 2.5).

a SI increased significantly with σ₀. Linear regression slope: 0.76 ± 0.18, R² = 0.05 (two-sided Wald test, n = 365 experimental conditions, p = 0.000). In a-b, solid black lines are the linear fits and shaded regions are 95% confidence intervals for the linear regression. b λ₀ did not have a significant effect on SI (two-sided Wald test, n = 365 experimental conditions, p = 0.253). c The ordering of the tasks by SI was similar to that obtained in the main set of experiments. Error bars represent mean ± standard errors across different hyperparameter settings. Exact sample sizes for the derived statistics shown in c are reported in Supplementary Table 1. d, e Recurrent connection weight profiles (as in Fig. 6a–c) in conditions where SI > 4.9 and in conditions where SI < 2.8, respectively. Solid lines represent mean weights and shaded regions represent standard deviations of weights. Both means and standard deviations are averages over multiple networks.

Supplementary Fig. 11 Results for the highest level of input noise (g = 0.5).

a SI increased significantly with σ₀ Linear regression slope: 0.91 ± 0.21, R² = 0.05 (two-sided Wald test, n = 361 experimental conditions, p = 0.000). In a, b, solid black lines are the linear fits and shaded regions are 95% confidence intervals for the linear regression. b λ₀ did not have a significant effect on SI (two-sided Wald test, n = 361 experimental conditions, p = 0.457). c The ordering of the tasks by SI was similar to that obtained in the main set of experiments. Error bars represent mean ± standard errors across different hyperparameter settings. Exact sample sizes for the derived statistics shown in c are reported in Supplementary Table 1. d, e Recurrent connection weight profiles (as in Fig. 6a–c) in conditions where SI > 4.6 and in conditions where SI < 2.3, respectively. Solid lines represent mean weights and shaded regions represent standard deviations of weights. Both means and standard deviations are averages over multiple networks.

Supplementary Fig. 12 Schur decomposition of trained and random connectivity matrices.

a Schur mode interaction matrices for the mean recurrent connectivity patterns shown in Fig. 6a–c. Only significant Schur modes with at least one interaction of magnitude greater than 0.04 with another Schur mode are shown here. b The corresponding significant Schur modes. Networks with more sequential activity (SI > 5) have more high-frequency Schur modes than networks with less sequential activity (SI < 2.5). The random networks are close to normal.

Supplementary Fig. 13 Results from networks explicitly trained to generate sequential activity as in ref. 35.

a, b are analogous to Fig. 6a, b and show the recurrent weight profiles obtained in trained networks with ReLU and tanh nonlinearities, respectively. c, d show example trials for the corresponding networks (trained with the same initial condition). Only networks with sequentiality index larger than 5.45 were included in the results shown here.

Supplementary Fig. 14 Circuit mechanism that generates sequential vs. persistent activity in networks with alternative activation functions.

This figure is analogous to Fig. 6a, b, but the results shown are for networks with the exponential linear (elu) activation function (a) and networks with the softplus activation function (b). Note that the elu activation function typically produced larger SIs than softplus, hence slightly different SI thresholds were used in the two cases to determine low and high SI networks.

Supplementary information

Supplementary Text and Figures

Supplementary Figs. 1–14 and Supplementary Table 1

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Orhan, A.E., Ma, W.J. A diverse range of factors affect the nature of neural representations underlying short-term memory. Nat Neurosci 22, 275–283 (2019). https://doi.org/10.1038/s41593-018-0314-y

Download citation

Received: 09 February 2018
Accepted: 04 December 2018
Published: 21 January 2019
Issue Date: February 2019
DOI: https://doi.org/10.1038/s41593-018-0314-y

This article is cited by

Low-dimensional encoding of decisions in parietal cortex reflects long-term training history
- Kenneth W. Latimer
- David J. Freedman
Nature Communications (2023)
Dynamical latent state computation in the male macaque posterior parietal cortex
- Kaushik J. Lakshminarasimhan
- Eric Avila
- Dora E. Angelaki
Nature Communications (2023)
The neuroconnectionist research programme
- Adrien Doerig
- Rowan P. Sommers
- Tim C. Kietzmann
Nature Reviews Neuroscience (2023)
Multiplexing working memory and time in the trajectories of neural networks
- Shanglin Zhou
- Michael Seay
- Dean V. Buonomano
Nature Human Behaviour (2023)
Spiking Recurrent Neural Networks Represent Task-Relevant Neural Sequences in Rule-Dependent Computation
- Xiaohe Xue
- Ralf D. Wimmer
- Zhe Sage Chen
Cognitive Computation (2023)