Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Perceptual straightening of natural videos

An Author Correction to this article was published on 15 May 2019

This article has been updated

Abstract

Many behaviors rely on predictions derived from recent visual input, but the temporal evolution of those inputs is generally complex and difficult to extrapolate. We propose that the visual system transforms these inputs to follow straighter temporal trajectories. To test this ‘temporal straightening’ hypothesis, we develop a methodology for estimating the curvature of an internal trajectory from human perceptual judgments. We use this to test three distinct predictions: natural sequences that are highly curved in the space of pixel intensities should be substantially straighter perceptually; in contrast, artificial sequences that are straight in the intensity domain should be more curved perceptually; finally, naturalistic sequences that are straight in the intensity domain should be relatively less curved. Perceptual data validate all three predictions, as do population models of the early visual system, providing evidence that the visual system specifically straightens natural videos, offering a solution for tasks that rely on prediction.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Quantifying straightness of image sequences in the intensity and perceptual domains.
Fig. 2: Measuring perceptual straightness of image sequences.
Fig. 3: Curvature reduction for natural image sequences.
Fig. 4: Curvature increase for artificial image sequences.
Fig. 5: Curvature conservation for naturalistic, intensity-linear image sequences.
Fig. 6: Changes in curvature induced by models of the visual system.

Similar content being viewed by others

Data availability

The data supporting the findings of this study are available from the corresponding author on reasonable request.

Code availability

The code used to analyze the data of this study is available from the corresponding author on reasonable request.

Change history

  • 15 May 2019

    The original and corrected figures are shown in the accompanying Author Correction.

References

  1. Barlow, H. B. Possible principles underlying the transformation of sensory messages. Sensory Communication (ed. Rosenblith, W.) 217–234 (M.I.T. Press, 1961).

  2. Atick, J. J. & Redlich, A. N. Towards a theory of early visual processing. Neural Comput. 320, 1–13 (1990).

    Google Scholar 

  3. van Hateren, J. H. A theory of maximizing sensory information. Biol. Cybern. 68, 23–29 (1992).

    Article  Google Scholar 

  4. Meister, M., Lagnado, L. & Baylor, D. A. Concerted signaling by retinal ganglion cells. Science 270, 1207–1210 (1995).

    Article  CAS  Google Scholar 

  5. Balasubramanian, V. & Berry, M. J. A test of metabolically efficient coding in the retina. Network 13, 531–552 (2002).

    Article  Google Scholar 

  6. Puchalla, J. L., Schneidman, E., Harris, R. A. & Berry, M. J. Redundancy in the population code of the retina. Neuron 46, 493–504 (2005).

    Article  CAS  Google Scholar 

  7. Doi, E. et al. Efficient coding of spatial information in the primate retina. J. Neurosci. 32, 16256–16264 (2012).

    Article  CAS  Google Scholar 

  8. Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962).

    Article  CAS  Google Scholar 

  9. Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

    Article  CAS  Google Scholar 

  10. Bell, A. J. & Sejnowski, T. J. The ‘independent components’ of natural scenes are edge filters. Vision Res. 37, 3327–3338 (1997).

    Article  CAS  Google Scholar 

  11. Goris, R. L. T., Simoncelli, E. P. & Movshon, J. A. Origin and function of tuning diversity in macaque visual cortex. Neuron 88, 819–831 (2015).

    Article  CAS  Google Scholar 

  12. Rust, N. C. & DiCarlo, J. J. Selectivity and tolerance (‘invariance’) both increase as visual information propagates from cortical area V4 to IT. J. Neurosci. 30, 12978–12995 (2010).

    Article  CAS  Google Scholar 

  13. Le Gall, D. MPEG: a video compression standard for multimedia applications. Commun. ACM 34, 46–58 (1991).

    Article  Google Scholar 

  14. Tishby, N., Pereira, F. C. & Bialek, W. The information bottleneck method. In Proc. Allerton Conference on Communication, Control and Computing 37, 368–377 (1999).

    Google Scholar 

  15. Wiskott, L. & Sejnowski, T. J. Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14, 715–70 (2002).

    Article  Google Scholar 

  16. Richthofer, S. & Wiskott, L. Predictable feature analysis. In Proceedings IEEE 1fourth International Conference on Machine Learning and Applications (2016).

  17. Palmer, S. E., Marre, O., Berry, M. J. & Bialek, W. Predictive information in a sensory population. Proc. Natl Acad. Sci. USA 112, 6908–13 (2015).

    Article  CAS  Google Scholar 

  18. DiCarlo, J. J. & Cox, D. D. Untangling invariant object recognition. Trends Cogn. Sci. 11, 333–341 (2007).

    Article  Google Scholar 

  19. Noreen, D. L. Optimal decision rules for some common psychophysical paradigms. Proc. of the Symposium in Applied Mathematics of the American Mathematical Society and the Society for Industrial and Applied Mathematics 13, 237–279 (1981).

    Google Scholar 

  20. Tenenbaum, J. B., De Silva, V. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–23 (2000).

    Article  CAS  Google Scholar 

  21. Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–6 (2000).

    Article  CAS  Google Scholar 

  22. Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J. & Ganguli, S. Exponential expressivity in deep neural networks through transient chaos. Advances in Neural Information Processing Systems 29, 3360–3368 (2016).

    Google Scholar 

  23. Mante, V., Bonin, V. & Carandini, M. Functional mechanisms shaping lateral geniculate responses to artificial and natural stimuli. Neuron 58, 625–638 (2008).

    Article  CAS  Google Scholar 

  24. Berardino, A., Ballé, J., Laparra, V. & Simoncelli, E. P. Eigen-distortions of hierarchical representations. Advances in Neural Information Processing Systems 30, 3530–3539 (2017).

    Google Scholar 

  25. Adelson, E. H. & Bergen, J. R. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 2, 284 (1985).

    Article  CAS  Google Scholar 

  26. Carandini, M. & Heeger, D. J. Normalization as a canonical neural computation. Nat. Rev. Neurosci. 13, 51–62 (2012).

    Article  CAS  Google Scholar 

  27. Mallat, S. Group invariant scattering.Commun. Pur. Appl. Math. 65, 1331–1398 (2012).

    Article  Google Scholar 

  28. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  CAS  Google Scholar 

  29. Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad. Sci. USA 111, 8619–8624 (2014).

    Article  CAS  Google Scholar 

  30. Khaligh-Razavi, S. M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).

    Article  Google Scholar 

  31. Tacchetti, A., Isik, L. & Poggio, T. Invariant recognition drives neural representations of action sequences. PLoS Comput. Biol. 13, e1005859 (2017).

    Article  Google Scholar 

  32. Hong, H., Yamins, D. L. K., Majaj, N. J. & Dicarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19, 613–22 (2016).

    Article  CAS  Google Scholar 

  33. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 1–9 (2012).

    Google Scholar 

  34. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. International Conference on Learning Representations 3, 1–14 (2015).

    Google Scholar 

  35. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. International Conference on Machine Learning 7, 1–9 (2015).

    Google Scholar 

  36. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. Conference on Computer Vision and Pattern Recognition 29, 770–778 (2016).

    Google Scholar 

  37. Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. Conference on Computer Vision and Pattern Recognition 30, 2261–2269 (2017).

    Google Scholar 

  38. Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).

    Article  CAS  Google Scholar 

  39. Barlow, H. Redundancy reduction revisited. Network 12, 241–253 (2001).

    Article  CAS  Google Scholar 

  40. Machens, C. K., Gollisch, T., Kolesnikova, O. & Herz, A. V. M. Testing the efficiency of sensory coding with optimal stimulus ensembles. Neuron 47, 447–456 (2005).

    Article  CAS  Google Scholar 

  41. Geisler, W. S. Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59, 167–192 (2008).

    Article  Google Scholar 

  42. Bialek, W., De Ruyter Van Steveninck, R. R. & Tishby, N. Efficient representation as a design principle for neural coding and computation. In Proc. International Symposium on Information Theory, 659–663 (2006).

  43. Fukushima, K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernet. 36, 193–202 (1980).

    Article  CAS  Google Scholar 

  44. Serre, T., Oliva, A. & Poggio, T. A feedforward architecture accounts for rapid categorization. Proc. Natl Acad. Sci. USA 104, 6424–6429 (2007).

    Article  CAS  Google Scholar 

  45. Bai, Y., et al. Neural straightening of natural videos in macaque primary visual cortex. Soc. Neurosci. Abstr. 485.07 (2018).

  46. Hénaff, O. J. & Simoncelli, E. P. Geodesics of learned representations. In Proc. International Conferenceon Learning Representations 4, 1–10 (2016).

    Google Scholar 

  47. Hénaff, O.J., Goris, R.L.T. & Simoncelli, O.J. Perceptual evaluation of artificial visual recognition systems using geodesics. Cosyne Abstr. II-72 (2016).

  48. Li, N. & DiCarlo, J. J. Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science 321, 1502–1507 (2008).

    Article  CAS  Google Scholar 

  49. Li, N. & DiCarlo, J. J. Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. Neuron 67, 1062–1075 (2010).

    Article  CAS  Google Scholar 

  50. Cox, D. D., Meier, P., Oertelt, N. & DiCarlo, J. J. ‘Breaking’ position-invariant object recognition. Nat. Neurosci. 8, 1145–1147 (2005).

    Article  CAS  Google Scholar 

  51. Seshadrinathan, K., Soundararajan, R., Bovik, A. C. & Cormack, L. K. Study of subjective and objective quality assessment of video. IEEE Transactions on Image Process. 19, 1427–1441 (2010).

    Article  Google Scholar 

  52. Seshadrinathan, K., Soundararajan, R., Bovik, A. C. & Cormack, L. K. A subjective study to evaluate video quality assessment algorithms. In SPIE Proceedings Human Vision and Electronic Imaging, 1–10 (2010).

  53. Wichmann, F. A. & Hill, N. J. The psychometric function: I. Fitting, sampling, and goodness of fit. Percept. Psychophys. 63, 1293–1313 (2001).

    Article  CAS  Google Scholar 

  54. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. & Saul, L. K. Introduction to variational methods for graphical models. Mach. Learn. 37, 183–233 (1999).

    Article  Google Scholar 

  55. Kingma, D. P. & Welling, M. Auto-encoding variational bayes. In Proc. International Conference on Learning Representations 2, 1–14 (2014).

    Google Scholar 

  56. Simoncelli, E. P. & Freeman, W. T. in Proceedings second IEEE., International Conference on Image Processing, 444–447 (1995).

  57. Green, D. G. Regional variations in the visual acuity for interference fringes on the retina. J. Physiol. 207, 351–6 (1970).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank S. Palmer and J. Salisbury for making available the video sequences in their Chicago Motion Database. We are also grateful to Y. Bai for helpful comments on the manuscript. This work was supported by the Howard Hughes Medical Institute (O.J.H, R.L.T.G., E.P.S).

Author information

Authors and Affiliations

Authors

Contributions

O.J.H., R.L.T.G. and E.P.S. conceived the project and designed the experiments. O.J.H. designed the analysis and performed the experiments. O.J.H, R.L.T.G. and E.P.S. wrote the manuscript.

Corresponding author

Correspondence to Olivier J. Hénaff.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Journal peer review information: Nature Neuroscience thanks Konrad Kording and other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Fig. Supplementary 1 Recovery analysis for curvature estimation.

We simulated 4 observers with different sensitivities, viewing 21 different sequences with varying perceptual curvature, and evaluated our ability to estimate the perceptual curvature from the same amount of data we use in our experiment. Simulated observers’ sensitivities span the range of human sensitivities, and perceptual curvatures vary from 0° to 180°. (a) Greedy, two-step estimation, that first estimates the most likely perceptual trajectory, and then measures its curvature, is plagued by substantial bias. (b) Our method, which estimates the most likely perceptual curvature given many plausible perceptual trajectories, is largely unbiased.

Supplementary Fig. 2 Initial, middle and final frames from the first six natural and artificial sequences used in our experiment.

Natural image sequences follow the top (blue) path, whereas artificial sequences follow the bottom (green) path between the same end-points.

Supplementary Fig. 3 Initial, middle and final frames from the last five natural and artificial sequences used in our experiment.

Natural image sequences follow the top (blue) path, whereas artificial sequences follow the bottom (green) path between the same end-points.

Supplementary Fig. 4 Predictability of natural, artificial and naturalistic sequences, for first, second, third and fourth-order predictors in the intensity and perceptual domains.

Each predictor is fit independently to a sequence in the pixel-intensity and perceptual domains by regressing the previous 2 (first-order), 3 (second-order), 4 (third-order) or 5 (fourth-order) samples onto the next one. We then compare the errors of these predictors in each domain. As expected, higher-order predictors are more accurate than lower-order ones, but all show the same pattern of errors across domains and sequences types. Circles indicate the median across sequences, error bars (where visible) represent the 68% confidence interval. Left: natural sequences (experiment 1, n = 12 sequences). Middle: artificial sequences (experiment 2, n = 12 sequences). Right: naturalistic ‘contrast’ sequences (experiment 3, n = 9 sequences). The ‘control’ trajectories show the same curvature as in the intensity domain, but are otherwise identical to the human observers’ perceptual trajectories.

Supplementary Fig. 5 Changes in curvature in contemporary deep convolutional neural network architectures.

Despite their strong performance in object recognition, none of these architectures straighten natural videos. Circles indicate the median across sequences, error bars representing the 68% confidence interval are smaller than these circles (n = 12 sequences for natural and artificial stimuli, n = 9 sequences for naturalistic ‘contrast’ stimuli). (a) 19-layer VGG architecture34. (b) 19-layer VGG architecture with batch normalization35. (c) 152-layer Residual Network architecture36. (d) 121-layer Dense Network architecture37.

Supplementary Fig. 6 Recovery analysis for multiple simulated populations.

In Figs. 3c, 4c and 5c we show a single, typical, population of simulated controls (whose median change in curvature is depicted here by a gray arrow). The simulation process is inherently variable, as is the subsequent recovery, due to finite numbers of subjects and trials. Here we evaluate the dispersion of the median curvature change across repetitions of the simulation and recovery procedure (gray histogram, n = 5 independent repetitions). Experiments 1 and 2: the curvature change for human observers is much larger than for any of the simulated controls (p < 0.001, two-tailed Z-test). Experiment 3: human observers show increased curvature relative to controls, but much less so than in experiment 2 (p = 0.02, two-tailed Z-test). Together, these experiments show that the typical simulated populations shown in Figs. 35 are representative of the distribution across simulated populations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hénaff, O.J., Goris, R.L.T. & Simoncelli, E.P. Perceptual straightening of natural videos. Nat Neurosci 22, 984–991 (2019). https://doi.org/10.1038/s41593-019-0377-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-019-0377-4

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing