Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A diverse range of factors affect the nature of neural representations underlying short-term memory

A Publisher Correction to this article was published on 06 February 2019

This article has been updated

Abstract

Sequential and persistent activity models are two prominent models of short-term memory in neural circuits. In persistent activity models, memories are represented in persistent or nearly persistent activity patterns across a population of neurons, whereas in sequential models, memories are represented dynamically by a sequential activity pattern across the population. Experimental evidence for both models has been reported previously. However, it has been unclear under what conditions these two qualitatively different types of solutions emerge in neural circuits. Here, we address this question by training recurrent neural networks on several short-term memory tasks under a wide range of circuit and task manipulations. We show that both sequential and nearly persistent solutions are part of a spectrum that emerges naturally in trained networks under different conditions. Our results help to clarify some seemingly contradictory experimental results on the existence of sequential versus persistent activity-based short-term memory mechanisms in the brain.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Experimental setup.
Fig. 2: Intrinsic circuit properties and their effect on the sequentiality of recurrent activity in trained networks.
Fig. 3: The temporal complexity of the task increases the sequentiality of the recurrent activity in trained networks.
Fig. 4: Hebbian short-term synaptic plasticity, delay duration variability, and structured dynamic inputs affect the sequentiality of the recurrent activity in trained networks.
Fig. 5: Multitask learning experiments.
Fig. 6: Circuit mechanism that generates sequential versus persistent activity.

Similar content being viewed by others

Code availability

The code for reproducing the experiments and analyses reported in this article is available at https://github.com/eminorhan/recurrent-memory.

Data availability

The raw simulation data used for generating each figure are available upon request.

Change history

  • 25 January 2019

    In the version of this article initially published online, a word was misprinted in the abstract. Extra letters were removed from the word “Experimentalrep” to correct it to “Experimental”. The error has been corrected in the print, PDF and HTML versions of this article.

  • 06 February 2019

    The original and corrected figures are shown in the accompanying Publisher Correction.

References

  1. Fuster, J. M. & Alexander, G. E. Neuron activity related to short-term memory. Science 173, 652–654 (1971).

    Article  CAS  Google Scholar 

  2. Wang, X. J. Synaptic reverberation underlying mnemonic persistent activity. Trends Neurosci. 24, 455–463 (2001).

    Article  CAS  Google Scholar 

  3. Goldman, M. S. Memory without feedback in a neural network. Neuron 61, 621–634 (2009).

    Article  CAS  Google Scholar 

  4. Druckmann, S. & Chklovskii, D. B. Neural circuits underlying persistent representations despite time varying activity. Curr. Biol. 22, 2095–2103 (2012).

    Article  CAS  Google Scholar 

  5. Murray, J. D. et al. Stable population coding for working memory coexists with heterogeneous neural dynamics in prefrontal cortex. Proc. Natl Acad. Sci. USA 114, 394–399 (2017).

    Article  CAS  Google Scholar 

  6. Lundqvist, M., Herman, P. & Miller, E. K. Working memory: delay activity, yes! Persistent activity? Maybe not. J. Neurosci. 38, 7013–7019 (2018).

    Article  CAS  Google Scholar 

  7. Constantinidis, C. et al. Persistent spiking activity underlies working memory. J. Neurosci. 38, 7020–7028 (2018).

    Article  CAS  Google Scholar 

  8. Funahashi, S., Bruce, C. J. & Goldman-Rakic, P. S. Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J. Neurophysiol. 61, 331–349 (1989).

    Article  CAS  Google Scholar 

  9. Miller, E. K., Erickson, C. A. & Desimone, R. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J. Neurosci. 16, 5154–5167 (1996).

    Article  CAS  Google Scholar 

  10. Romo, R., Brody, C. D., Hernández, A. & Lemus, L. Neural correlates of parametric working memory in the prefrontal cortex. Nature 399, 470–473 (1999).

    Article  CAS  Google Scholar 

  11. Goard, M. J., Pho, G. N., Woodson, J. & Sur, M. Distinct roles of visual, parietal, and frontal motor cortices in memory-guided sensorimotor decisions. eLife 5, e13764 (2016).

    Article  Google Scholar 

  12. Guo, Z. V. et al. Maintenance of persistent activity in a frontal thalamocortical loop. Nature 545, 181–186 (2017).

    Article  CAS  Google Scholar 

  13. Baeg, E. H. et al. Dynamics of population code for working memory in the prefrontal cortex. Neuron 40, 177–188 (2003).

    Article  CAS  Google Scholar 

  14. Fujisawa, S., Amarasingham, A., Harrison, M. T. & Buzsáki, G. Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex. Nat. Neurosci. 11, 823–833 (2008).

    Article  CAS  Google Scholar 

  15. MacDonald, C. J., Lepage, K. Q., Eden, U. T. & Eichenbaum, H. Hippocampal ‘time cells’ bridge the gap in memory for discontiguous events. Neuron 71, 737–749 (2011).

    Article  CAS  Google Scholar 

  16. Harvey, C. D., Coen, P. & Tank, D. W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature 484, 62–68 (2012).

    Article  CAS  Google Scholar 

  17. Schmitt, L. I. et al. Thalamic amplification of cortical connectivity sustains attentional control. Nature 545, 219–223 (2017).

    Article  CAS  Google Scholar 

  18. Scott, B. B. et al. Fronto-parietal cortical circuits encode accumulated evidence with a diversity of timescales. Neuron 95, 385–398 (2017).

    Article  CAS  Google Scholar 

  19. Murray, J. D. et al. A hierarchy of intrinsic timescales across cortex. Nat. Neurosci. 17, 1661–1663 (2014).

    Article  CAS  Google Scholar 

  20. Runyan, C. A., Piasini, E., Panzeri, S. & Harvey, C. D. Distinct timescales of population coding across cortex. Nature 548, 92–96 (2017).

    Article  CAS  Google Scholar 

  21. Sussillo, D., Churchland, M. M., Kaufman, M. T. & Shenoy, K. V. A neural network that finds a naturalistic solution for the production of muscle activity. Nat. Neurosci. 18, 1025–1033 (2015).

    Article  CAS  Google Scholar 

  22. Cueva, C. J. & Wei, X. X. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. Preprint at https://arxiv.org/abs/1803.07770 (2018).

  23. Banino, A. et al. Vector-based navigation using grid-like representations in artificial agents. Nature 557, 429–433 (2018).

    Article  CAS  Google Scholar 

  24. Wilken, P. & Ma, W. J. A detection theory account of change detection. J. Vis. 4, 1120–1135 (2004).

    Article  Google Scholar 

  25. Barron, A. R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993).

    Article  Google Scholar 

  26. Zucker, R. S. & Regehr, W. G. Short-term synaptic plasticity. Annu. Rev. Physiol. 64, 355–405 (2002).

    Article  CAS  Google Scholar 

  27. Mongillo, G., Barak, O. & Tsodyks, M. Synaptic theory of working memory. Science 319, 1543–1546 (2008).

    Article  CAS  Google Scholar 

  28. Rose, N. S. et al. Reactivation of latent working memories with transcranial magnetic stimulation. Science 354, 1136–1139 (2016).

    Article  CAS  Google Scholar 

  29. Wolff, M. J., Jochim, J., Akyürek, E. G. & Stokes, M. G. Dynamic hidden states underlying working-memory-guided behavior. Nat. Neurosci. 20, 864–871 (2017).

    Article  CAS  Google Scholar 

  30. Hinton, G. E. & Plaut, D. C. Using fast weights to deblur old memories. Proc. 9th Annual Conference of the Cognitive Science Society, 177–186 (Erlbaum, 1987).

  31. Sompolinsky, H. & Kanter, I. Temporal association in asymmetric neural networks. Phys. Rev. Lett. 57, 2861–2864 (1986).

    Article  CAS  Google Scholar 

  32. Fiete, I. R., Senn, W., Wang, C. Z. H. & Hahnloser, R. H. R. Spike-time-dependent plasticity and heterosynaptic competition organize networks to produce long scale-free sequences of neural activity. Neuron 65, 563–576 (2010).

    Article  CAS  Google Scholar 

  33. Klampfl, S. & Maass, W. Emergence of dynamic memory traces in cortical microcircuit models through STDP. J. Neurosci. 33, 11515–11529 (2013).

    Article  CAS  Google Scholar 

  34. Krumin, M., Lee, J. J., Harris, K. D. & Carandini, M. Decision and navigation in mouse parietal cortex. ELife 7, e42583 (2018).

    Article  Google Scholar 

  35. Rajan, K., Harvey, C. D. & Tank, D. W. Recurrent network models of sequence generation and memory. Neuron 90, 128–142 (2016).

    Article  CAS  Google Scholar 

  36. Orhan, A. E. & Ma, W. J. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nat. Commun. 8, 138 (2017).

    Article  Google Scholar 

  37. Ganguli, S., Huh, D. & Sompolinsky, H. Memory traces in dynamical systems. Proc. Natl Acad. Sci. USA 105, 18970–18975 (2008).

    Article  CAS  Google Scholar 

  38. Clevert, D. A., Unterthiner, T., Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). Preprint at https://arxiv.org/abs/1511.07289 (2016).

  39. Glorot X., Bordes A., Bengio Y. Deep sparse rectifier neural networks. In Proc. 14th International Conference on Artificial Intelligence and Statistics (2011).

  40. Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).

    Article  CAS  Google Scholar 

  41. Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).

    Article  CAS  Google Scholar 

  42. Wang, J., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible timing by temporal scaling of cortical responses. Nat. Neurosci. 21, 102–110 (2018).

    Article  CAS  Google Scholar 

  43. Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).

  44. Keshvari, S., van den Berg, R. & Ma, W. J. No evidence for an item limit in change detection. PLoS Comput. Biol. 9, e1002927 (2013).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by grant no. R01EY020958 from the National Eye Institute. We thank the staff at the High-Performance Computing Cluster at New York University, especially S. Wang, for their help with troubleshooting.

Author information

Authors and Affiliations

Authors

Contributions

A.E.O. conceived the study and developed the research plan with input from W.J.M. In several iterations, A.E.O. performed the experiments and the analyses. A.E.O. and W.J.M. then discussed the results, which helped refine the experiments and the analyses. A.E.O. wrote the initial draft of the paper. A.E.O. and W.J.M. reviewed and edited later iterations of the paper.

Corresponding author

Correspondence to A. Emin Orhan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 Initial, untrained network dynamics for different (λ0,σ0) values.

The heat maps show the normalized responses of the recurrent units to a unit pulse delivered at time t = 0 to all units. Here, λ0 takes 10 uniformly-spaced values between 0.8 and 0.98 (columns) and σ0 takes 10 uniformly-spaced values between 0 and 0.4025 (rows).

Supplementary Fig. 2 Normalized responses of the recurrent units in networks trained with strong initial network coupling and no regularization.

Each plot corresponds to an example trial from one of the six basic tasks. The SIs of the trials are indicated at the top of the plots. Trials are ordered by increasing SI from left to right. All trials shown here are from networks trained with λ0 = 0.96, σ0 = 0.313, ρ = 0. After training, all networks shown here achieved a test set performance within 25% of the optimal performance. In Supplementary Figs. 25, only the active recurrent units are shown.

Supplementary Fig. 3 Normalized responses of the recurrent units in networks trained with weak initial network coupling and no regularization.

Each plot corresponds to an example trial from one of the six basic tasks. The SIs of the trials are indicated at the top of the plots. Trials are ordered by increasing SI from left to right. All trials shown here are from networks trained with λ0 = 0.96, σ0 = 0.134, ρ = 0. After training, all networks shown here achieved a test set performance within 50% of the optimal performance.

Supplementary Fig. 4 Normalized responses of the recurrent units in networks trained with strong initial network coupling and strong regularization.

Each plot corresponds to an example trial from one of the six basic tasks. The SIs of the trials are indicated at the top of the plots. Trials are ordered by increasing SI from left to right. All trials shown here are from networks trained with λ0 = 0.96, σ0 = 0.313, ρ = 10−3. After training, all networks shown here achieved a test set performance within 50% of the optimal performance.

Supplementary Fig. 5 Normalized responses of the recurrent units in networks trained with weak initial network coupling and strong regularization.

Each plot corresponds to an example trial from one of the six basic tasks. The SIs of the trials are indicated at the top of the plots. Trials are ordered by increasing SI from left to right. All trials shown here are from networks trained with λ0 = 0.96, σ0 = 0.134, ρ = 10−3. After training, all networks shown here achieved a test set performance within 50% of the optimal performance.

Supplementary Fig. 6 Average normalized activity of recurrent units in an example network trained in the 2AFC task.

The network shown here was trained with λ0 = 0.96, σ0 = 0.313, ρ = 0. After training, the network achieved a test set performance within 0.1% of the optimal performance. As in ref. 16, we divided the recurrent units into left-preferring and right-preferring ones based on whether they responded more strongly during correct left choices or during correct right choices. The upper panel shows the average normalized responses of the left-preferring units in the correct left and correct right trials, respectively. Similarly, the lower panel shows the average normalized responses of the right-preferring units in the correct left and correct right trials. As reported in ref. 16, the trained network developed choice-specific sequences in the 2AFC task (cf. Figure 2c in ref. 16). Only the most active 150 units from each group are shown in this figure; as always, the original network contained 500 recurrent units. This figure also demonstrates that the sequences are consistent from trial to trial, since the sequential activity pattern does not disappear when the responses are averaged over multiple trials.

Supplementary Fig. 7 A simplified model of recurrent dynamics.

A simplified model that only incorporated the ReLU nonlinearity and the mean recurrent connection weight profiles shown in the upper panel (with no fluctuations around the mean) qualitatively captured the difference between the emergent sequential vs. persistent activity patterns (lower panel, left and right plots respectively). The networks simulated here had 500 recurrent units (only the most active 50 units are shown in the lower panel). All recurrent units received a unit pulse input at t = 0. The self-recurrence term in the recurrent connectivity matrix (not shown in the upper panel for clarity) was set to 1 in both cases. In the sequential case, the off-diagonal band was set to 0.09 in the forward direction and 0.01 in the backward direction, that is Wi,i−1 = 0.09 and Wi−1,i = 0.01. The recurrent units did not have a bias term and they did not receive any direct inputs during the trial other than the unit pulse injected at the beginning of the trial.

Supplementary Fig. 8 Results for the clipped ReLU networks.

The clipped ReLU nonlinearity is similar to ReLU except that it is bounded above by a maximum value: that is f(x) = clip(x, rmin, rmax), where rmin = 0 and rmax = 100. a SI increased significantly with σ0. Linear regression slope: 0.55 ± 0.28, R2 = 0.01 (two-sided Wald test, n = 280 experimental conditions, p = 0.049). In ac, solid black lines are the linear fits and shaded regions are 95% confidence intervals for the linear regression. b SI decreased significantly with λ0. Linear regression slope: −3.87 ± 0.66, R2 = 0.11 (two-sided Wald test, n = 280 experimental conditions, p = 0.000). Note that this result differs from the corresponding result in the case of ReLU networks, where λ0 did not have a significant effect on the SI (Fig. 2c). c SI decreased significantly with ρ. Linear regression slope: −418 ± 64, R2 = 0.13 (two-sided Wald test, n = 280 experimental conditions, p = 0.000). d SI as a function of task. Overall, the ordering of the tasks by SI was similar to that obtained with the ReLU nonlinearity (Fig. 3a). However, note that training was substantially more difficult with the clipped ReLU nonlinearity than with the ReLU nonlinearity. Across all tasks and all conditions, ReLU networks had a training success (defined as reaching within 50% of the optimal performance) of ~60%, whereas the clipped ReLU networks had a training success of only ~9.3%. In particular, we were not able to successfully train any networks in the CD task and very few in the 2AFC task. As a consequence, some of the differences between the tasks ended up not being significant in the clipped ReLU case. Error bars represent mean ± standard errors across different hyperparameter settings. Exact sample sizes for the derived statistics shown in d are reported in Supplementary Table 1. e, f Recurrent connection weight profiles (as in Fig. 6a–c) in conditions where SI > 4.8 and in conditions where SI < 3, respectively. The weights were smaller in magnitude in f, because most of the low SI networks were trained under strong regularization. Solid lines represent mean weights and shaded regions represent standard deviations of weights. Both means and standard deviations are averages over multiple networks.

Supplementary Fig. 9 Changing the amount of input noise.

In these simulations, we set ρ = 0 and varied the gain of the input population(s), g. g = 1 corresponds to the original case reported in the main text; lower and higher values of g correspond to higher and lower amounts of input noise, respectively. a Combined across all noise conditions, SI increased significantly with σ0. Linear regression slope: 0.76 ± 0.08, R2 = 0.04 (two-sided Wald test, n = 2239 experimental conditions, p = 0.000). In ac, solid black lines are the linear fits and shaded regions are 95% confidence intervals for the linear regression. b λ0 did not have a significant effect on SI (two-sided Wald test, n = 2239 experimental conditions, p = 0.958). c The input gain g slightly increased the SI. Linear regression slope: 0.04 ± 0.02, R2 = 0.003 (two-sided Wald test, n = 2239 experimental conditions, p = 0.003). d Again, combined across all input noise levels, the ordering of the tasks by SI was similar to that obtained in the main set of experiments, where g = 1 (Fig. 3a). Error bars represent mean ± standard errors across different hyperparameter settings and noise levels. Exact sample sizes for the derived statistics shown in d are reported in Supplementary Table 1.

Supplementary Fig. 10 Results for the lowest level of input noise (g = 2.5).

a SI increased significantly with σ0. Linear regression slope: 0.76 ± 0.18, R2 = 0.05 (two-sided Wald test, n = 365 experimental conditions, p = 0.000). In a-b, solid black lines are the linear fits and shaded regions are 95% confidence intervals for the linear regression. b λ0 did not have a significant effect on SI (two-sided Wald test, n = 365 experimental conditions, p = 0.253). c The ordering of the tasks by SI was similar to that obtained in the main set of experiments. Error bars represent mean ± standard errors across different hyperparameter settings. Exact sample sizes for the derived statistics shown in c are reported in Supplementary Table 1. d, e Recurrent connection weight profiles (as in Fig. 6a–c) in conditions where SI > 4.9 and in conditions where SI < 2.8, respectively. Solid lines represent mean weights and shaded regions represent standard deviations of weights. Both means and standard deviations are averages over multiple networks.

Supplementary Fig. 11 Results for the highest level of input noise (g = 0.5).

a SI increased significantly with σ0 Linear regression slope: 0.91 ± 0.21, R2 = 0.05 (two-sided Wald test, n = 361 experimental conditions, p = 0.000). In a, b, solid black lines are the linear fits and shaded regions are 95% confidence intervals for the linear regression. b λ0 did not have a significant effect on SI (two-sided Wald test, n = 361 experimental conditions, p = 0.457). c The ordering of the tasks by SI was similar to that obtained in the main set of experiments. Error bars represent mean ± standard errors across different hyperparameter settings. Exact sample sizes for the derived statistics shown in c are reported in Supplementary Table 1. d, e Recurrent connection weight profiles (as in Fig. 6a–c) in conditions where SI > 4.6 and in conditions where SI < 2.3, respectively. Solid lines represent mean weights and shaded regions represent standard deviations of weights. Both means and standard deviations are averages over multiple networks.

Supplementary Fig. 12 Schur decomposition of trained and random connectivity matrices.

a Schur mode interaction matrices for the mean recurrent connectivity patterns shown in Fig. 6a–c. Only significant Schur modes with at least one interaction of magnitude greater than 0.04 with another Schur mode are shown here. b The corresponding significant Schur modes. Networks with more sequential activity (SI > 5) have more high-frequency Schur modes than networks with less sequential activity (SI < 2.5). The random networks are close to normal.

Supplementary Fig. 13 Results from networks explicitly trained to generate sequential activity as in ref. 35.

a, b are analogous to Fig. 6a, b and show the recurrent weight profiles obtained in trained networks with ReLU and tanh nonlinearities, respectively. c, d show example trials for the corresponding networks (trained with the same initial condition). Only networks with sequentiality index larger than 5.45 were included in the results shown here.

Supplementary Fig. 14 Circuit mechanism that generates sequential vs. persistent activity in networks with alternative activation functions.

This figure is analogous to Fig. 6a, b, but the results shown are for networks with the exponential linear (elu) activation function (a) and networks with the softplus activation function (b). Note that the elu activation function typically produced larger SIs than softplus, hence slightly different SI thresholds were used in the two cases to determine low and high SI networks.

Supplementary information

Supplementary Text and Figures

Supplementary Figs. 1–14 and Supplementary Table 1

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Orhan, A.E., Ma, W.J. A diverse range of factors affect the nature of neural representations underlying short-term memory. Nat Neurosci 22, 275–283 (2019). https://doi.org/10.1038/s41593-018-0314-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-018-0314-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing