Documente Academic
Documente Profesional
Documente Cultură
Journal
Perspective
How the Brain Might Work:
Statistics Flowing in Redundant Population Codes
Xaq Pitkow1,2 and Dora E Angelaki1,2
1 Department of Neuroscience, Baylor College of Medicine
2 Department of Electrical and Computer Engineering, Rice University
It is widely believed that the brain performs approximate probabilistic inference to estimate causal variables in
the world from ambiguous sensory data. To understand these computations, we need to analyze how
information is represented and transformed by the actions of nonlinear recurrent neural networks. We
propose that these probabilistic computations function by a message-passing algorithm operating at the level
of redundant neural populations. To explain this framework, we review its underlying concepts, including
graphical models, sufficient statistics, and message-passing, and then describe how these concepts could be
implemented by recurrently connected probabilistic population codes. The relevant information flow in these
networks will be most interpretable at the population level, particularly for redundant neural codes. We
therefore outline a general approach to identify the essential features of a neural message-passing algorithm.
Finally, we argue that to reveal the most important aspects of these neural computations, we must study large-
scale activity patterns during moderately complex, naturalistic behaviors.
encoding
encoding
distinct but possibly overlapping spatial patterns of neural Recoding: nonlinear transformations
activity (Figure 1B, Raju and Pitkow 2016). These activity The brain does not simply encode information and leave it
patterns serve as summary statistics (see Box) that tell us untouched. Instead, it recodes or transforms this information to
about different properties or parameters of the posterior make it more useful. It is crucial that this recoding is nonlinear,
distribution. for two reasons: first, for isolating task-relevant latent variables,
Not every aspect of the neural responses will be a summary and second, for updating parameters for probabilistic inference.
statistic encoding information about latent variables. The other In the first case, nonlinear computation is needed to extract
aspects would count as irrelevant noise. In quantifying the task-relevant variables, because in natural tasks they are
neural code, we can then compress the neural responses, usually nonlinearly entangled with task-irrelevant nuisance
aiming to keep those aspects that are relevant to the brains variables (DiCarlo and Cox 2007). This is true even when there
posterior probability. If this compression happens to be is no uncertainty about the underlying variables. Figure 2A
lossless, then these summary statistics are called sufficient illustrates how nonlinear computations can allow subsequent
statistics. linear computation to isolate task variables.
This concept of statistics is similar to the familiar notions of Ethological tasks typically require complex nonlinearities to
time- or trial-averaged means and covariances, except they are untangle task-relevant properties from nuisance variables. In
potentially more general functions of neural activity and principle, any untangling can be accomplished by a simple
estimate a wider range of parameters about the posterior network with one layer of nonlinearities, since this architecture
probabilities. They are more like a single-trial population is a universal function approximator for both feedforward nets
average in fact for a population of homogeneous poisson (Cybenko 1989; Hornik 1991) and recurrent nets (Schfer and
neurons with gaussian tuning curves, the population sum is Zimmerman 2007). However, in practice this can be more
proportional to the inverse variance of the posterior (more easily accomplished by a deep cascade of simpler nonlinear
spikes, more certainty). For linear probabilistic population transformations. This may be because the parameters are
codes, the sufficient statistics are weighted averages of neural easier to learn, because the hierarchical structure imposed by
activity (Ma et al. 2006). More generally the statistics may be the deep model are better matched to natural inputs (Montfar
arbitrary nonlinear functions, depending on how the relevant et al. 2014), because certain representations use brain
latent variables influence the neurons being considered. resources more economically, or all of the above. Indeed,
Collectively, these summary statistics are a dimensionality trained artificial deep neural networks have notable similarities
reduction that encodes the brains probabilistic knowledge. This with biologicalneural networks (Yamins et al. 2014).
reduction enables us to relate the transformations of neural To what extent are these nonlinear networks probabilistic?
activity to the transformations of the posterior probabilities One can trivially interpret neural responses as encoding
during inference. probabilities, because neuronal responses differ upon repeated
presentations of the same stimulus, and according to Bayes
5
Journal
Fine-scale
Perspective classification
task
nonlinear
Nonlinear
basis for
statistics
r
decoding
^
x1
rule that means that any given neural A embedding
C
response could arise from multiple Coarse-scale
^
x2 decoding
different stimuli. But to actually use
rr
that encoding of probability to linearly
separable ^
x3
perform inference (exactly or x
Variable rrr ^
x
approximately), the brain's decisions r
Decoded
Neural
must be influenced by uncertainty, response variable
such as placing less weight on less B expansive
nonlinear
r 3
r e l i a b l e i n f o r m a t i o n . C r u c i a l l y, embedding r 2
r2 r 1
reliability often varies over time, so
good inferences and the brain
r1
computations that produce them Neural
f(r) nonlinearities
should be sensitive to the current
limited resolution in sensors limited resolution in cortex
uncertainty, rather than merely the
typical uncertainty (Ma 2012).
Using probabilistic relationships
Tuned
x 2=+ x 2= E r2
[r T]
typically requires a second kind of D polarity x 2 conditional
mean
+ Neural
outputs
nonlinear transformation, one that + r1x1,x 2 [T r ]+
accounts not only for relations x1
between latent variables, but also Untuned
marginal
angle x 1
variables are related through a sparse probabilistic graphical Chen A, DeAngelis GC, Angelaki DE (2013a). Functional specializations of the
ventral intraparietal area for multisensory heading discrimination. J Neurosci. 33:
model of the world. Third, recurrent circuitry implements a 35673581.
nonlinear message-passing algorithm that selects and localizes Chen X, DeAngelis GC, Angelaki DE (2013b). Diverse spatial reference frames of
the brain's statistical summaries of latent variables, so that all vestibular signals in parietal cortex. Neuron. 80: 13101321.
task-relevant information is actionable. Chen X, DeAngelis GC, Angelaki DE (2013c). Eye-centered representation of
Finally, we emphasized the advantage in studying the optic flow tuning in the ventral intraparietal area. J Neurosci. 33: 1857418582.
computation at the level of neural population activity, rather Cohen MR, Newsome WT (2008). Context-dependent changes in functional
than at the level of single neurons or membrane potentials: If circuitry in visual area MT. Neuron. 60(1):16273.
the brain does use redundant population codes, then many fine Cohen MR, Newsome WT (2009). Estimates of the contribution of single neurons
to perception depend on timescale and noise correlation. J Neurosci. 29(20):
details of neural processing dont matter for computation. 663548.
Instead it can be beneficial to characterize computation at a Cybenko G (1989). Approximation by superpositions of a sigmoidal function.
more abstract representational level, operating on variables Mathematics of control, signals and systems. 2(4):30314.
encoded by populations, rather than on the substrate. Daunizeau J, Den Ouden HE, Pessiglione M, Kiebel SJ, Stephan KE, Friston KJ.
Naturally, understanding the brain is difficult. Even with the (2010). Observing the observer (I): meta-Bayesian models of learning and
decision-making. PLoS One. 5(12):e15554.
torrents of data that may be collected in the near future, it will
be challenging to build models that reflect meaningful neural Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ (2006). Cortical
substrates for exploratory decisions in humans. Nature. 441(7095): 8769.
computation. A model of approximate probabilistic inference
Denve S, Duhamel JR, Pouget A (2007). Optimal sensorimotor integration in
operating at the representational level will be an invaluable recurrent cortical networks: a neural implementation of Kalman filters. J Neurosci.
abstraction. 27(21): 574456.
1 The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The
views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either
expressed or implied, of IARPA, DoI/IBC, or the U.S. Government.
10
Journal
Perspective
DiCarlo JJ, Cox DD (2007). Untangling invariant object recognition. Trends in Liu S, Dickman JD, Newlands SD, DeAngelis GC, Angelaki DE (2013a). Reduced
cognitive sciences. 11(8): 33341. choice-related activity and correlated noise accompany perceptual deficits
following unilateral vestibular lesion. PNAS. 110: 1799918004.
Dolan RJ, Dayan P (2013). Goals and habits in the brain. Neuron. 80(2):31225.
Liu S, Gu Y, DeAngelis GC, Angelaki DE (2013b). Choice-related activity and
Fetsch CR, Pouget A, DeAngelis GC, Angelaki DE (2011). Neural correlates of
correlated noise in subcortical vestibular neurons. Nat Neurosci. 16: 8997.
reliability-based cue weighting during multisensory integration. Nature
neuroscience. 15(1):14654. Ma WJ*, Beck J*, Latham P, Pouget A (2006). Bayesian inference with
probabilistic population codes. Nat Neurosci 9(11): 14328.
Ganguli S, Huh D, Sompolinsky H (2008). Memory traces in dynamical systems.
PNAS. 105(48):189705. Ma WJ (2010). Signal detection theory, uncertainty, and Poisson-like population
codes. Vision Research 50: 230819.
Gao P, Ganguli S (2015). On simplicity and complexity in the brave new world of
large-scale neuroscience. Current Opinion in Neurobiology 32:14855. Ma WJ (2012). Organizing probabilistic models of perception. Trends in cognitive
sciences. 16(10):511-8.
Geman S, Geman D (1984). Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images. IEEE Transactions on pattern analysis and Mante V, Sussillo D, Shenoy KV, Newsome WT (2013). Context-dependent
machine intelligence. 6: 72141. computation by recurrent dynamics in prefrontal cortex. Nature. 503(7474):78-84.
Goodman ND, Ullman TD, Tenenbaum JB (2009). Learning a theory of causality. Marder E, Taylor AL (2011). Multiple models to capture the variability in biological
Psychological review. 118(1):1109. neurons and networks. Nat Neurosci 14: 133138.
Gu Y, Angelaki DE, DeAngelis GC (2008). Neural correlates of multisensory cue Marr D (1982). Vision: A computational investigation into the human
integration in macaque MSTd. Nat neurosci. 11(10):120110. representation and processing of visual information. Henry Holt and Co. Inc., New
York, NY.
Fetsch CR, Pouget A, DeAngelis GC, Angelaki DE (2011). Neural correlates of
reliability-based cue weighting during multisensory integration. Nat Neurosci. 15: Minka TP (2001). Expectation propagation for approximate Bayesian inference.
146154. Uncertainty in Artificial Intelligence (UAI), ed. Breese and Koller, Morgan
Kaufmann Pub. 362369.
Fiser J, Berkes P, Orbn G, Lengyel M (2010). Statistically optimal perception and
learning: from behavior to neural representations. Trends in Cog. Sci. 14(3):119 Montfar GF, Pascanu R, Cho K, Bengio Y (2014). On the number of linear
30. regions of deep neural networks. Advances in Neural Information Processing
Systems.
Haefner R, Berkes P, Fiser J (2016). Perceptual Decision-Making as Probabilistic
Inference by Neural Sampling. Neuron 90(3): 64960. Moreno-Bote M, Knill D, Pouget A (2011). Bayesian sampling in visual perception.
PNAS 108(30): 124916.
Haefner RM, Gerwinn S, Macke JH, Bethge M (2013). Inferring decoding
strategies from choice probabilities in the presence of correlated variability. Moreno-Bote R, Beck J, Kanitscheider I, Pitkow X, Latham P, Pouget A (2014).
Nature neuroscience. 16(2):23542. Information-limiting correlations. Nat neurosci. 17(10): 14107.
Heinemann U, Globerson A (2011) What cannot be learned with Bethe Newsome WT, Britten KH, Movshon JA (1989). Neuronal correlates of a
approximations. In: Uncertainty in Artificial Intelligence. Corvallis, Oregon: AUAI perceptual decision. Nature 341: 524.
Press, pp. 319326.
Ng AY, Russell SJ (2000). Algorithms for inverse reinforcement learning. In:
Helmholtz H (1925). Physiological optics III. Optical Society of America: 318. Proceedings of ICML.
Hinton GE, Sejnowski TJ (1983). Optimal perceptual inference. Proceedings of Nienborg H, Cumming BG (2007). Psychophysically measured task strategy for
the IEEE conference on Computer Vision and Pattern Recognition. 448453. disparity discrimination is reflected in V2 neurons. Nat Neurosci. 10: 16081614.
Hornik K (1991). Approximation capabilities of multilayer feedforward networks. Olshausen BA, Field DJ (1996). Emergence of simple-cell receptive field
Neural Networks. 4(2): 2517. properties by learning a sparse code for natural images. Nature. 381(6583): 607
9.
Hoyer PO, Hyvrinen A (2003). Interpreting Neural Response Variability as Monte
Carlo Sampling of the Posterior. Advances in Neural Information Processing Orbn G, Berkes P, Fiser J, Lengyel M (2016). Neural variability and sampling-
Systems. based probabilistic representations in the visual cortex. Neuron.
Jazayeri M, Movshon JA (2006). Optimal representation of sensory information by Pearl J (1988). Probabilistic reasoning in intelligent systems: networks of
neural populations. Nat neurosci. 9(5):6906. plausible inference. Morgan Kaufmann.
Jiang X, Shen S, Cadwell CR, Berens P, Sinz F, Ecker AS, Patel S, Tolias AS. Pitkow X (2010). Exact feature probabilities in images with occlusion. Journal of
(2015) Principles of connectivity among morphologically defined cell types in adult Vision. 10(14):42, 120.
neocortex. Science. 350(6264): aac9462.
Pitkow X, Ahmadian Y, Miller KD (2011). Learning unbelievable probabilities.
Jonas E, Krding KP. Could a neuroscientist understand a microprocessor?. Advances in Neural Information Processing Systems. 738746.
PLoS Computational Biology. 13(1): e1005268.
Pitkow X, Lakshminarasimhan K, Pouget A (2014). How brain areas can predict
Kanitscheider I, Coen-Cagli R, Pouget A (2015). Origin of information-limiting behavior, yet have no influence. COSYNE Abstract.
noise correlations. PNAS. 112(50):E6973-82.
Pitkow X, Liu S, Angelaki DE, DeAngelis GC, Pouget A (2015). How can single
Kim JS, Greene MJ, Zlateski A, Lee K, Richardson M, Turaga SC, Purcaro M, sensory neurons predict behavior? Neuron. 87(2): 41123.
Balkam M, Robinson A, Behabadi BF, Campos M, Denk W, Seung S, EyeWirers
Rao RP (2004). Bayesian computation in recurrent neural circuits. Neural
(2014). Space-time wiring specificity supports direction selectivity in the retina.
computation. 16(1):1-38.
Nature. 509(7500): 3316.
Rao RP (2010). Decision making under uncertainty: a neural model based on
Kira S, Yang T, Shadlen MN (2015). A neural implementation of Walds sequential
partially observable markov decision processes. Frontiers in computational
probability ratio test. Neuron. 85(4): 86173.
neuroscience. 4:146.
Koller D, Friedman N (2009). Probabilistic graphical models: principles and
Raju R, Pitkow X (2015). Marginalization in random nonlinear neural networks.
techniques. MIT press.
American Physical Society Meeting Abstracts. Vol 1.
Krizhevsky A, Sutskever I, Hinton GE (2012). ImageNet classification with deep
Raju R, Pitkow X (2016). Inference by Reparameterization in Neural Population
convolutional neural networks. Advances in Neural Information Processing
Codes. Advances in Neural Information Processing Systems.
Systems.
Rigotti M, Barak O, Warden MR, Wang X-J, Daw ND, Miller EK, Fusi S (2013).
Kriegeskorte N, Mur M, Bandettini PA (2008). Representational similarity analysis
The importance of mixed selectivity in complex cognitive tasks. Nature 497: 585
connecting the branches of systems neuroscience. Frontiers in systems
590.
neuroscience 2:4.
Rust NC, Movshon JA (2005). In praise of artifice. Nat. Neurosci. 8(12):1647-50.
Lakshminarasimhan K, Pouget A, DeAngelis G, Angelaki D, Pitkow X (2017).
Inferring decoding strategies for multiple correlated neural populations. bioRxiv: Saez A, Rigotti M, Ostojic S, Fusi S, Salzman CD (2015). Abstract context
108019. representations in primate amygdala and prefrontal cortex. Neuron. 87(4):869-81.
Laplace PS (1812). Thorie Analytique des Probabilits (Ve Courcier, Paris). Savin C, Denve S (2014). Spatio-temporal representations of uncertainty in
spiking neural networks. Advances in Neural Information Processing Systems 27.
Lee TS, Mumford D (2003). Hierarchical Bayesian inference in the visual cortex. J
Opt Soc Am A. 20(7):143448. Schfer AM, Zimmerman HG (2007). Recurrent neural networks are universal
11
Journal
Perspective
approximators. Int. J of Neural Systems, 17(4): 253263.
Sutton RS, Barto AG (1998). Reinforcement learning: An introduction. Cambridge:
MIT press, 1998.
Uka T, DeAngelis GC (2004). Contribution of area MT to stereoscopic depth
perception: choice-related response modulations reflect task strategy. Neuron. 42:
297310.
Wainwright MJ, Jordan MI (2008). Graphical models, exponential families, and
variational inference. Foundations and Trends in Machine Learning. 1(12): 1
305.
Wainwright MJ, Simoncelli EP (2000). Scale mixtures of Gaussians and the
statistics of natural images. Advances in neural information processing systems.
12(1): 855861.
Wolpert D (1996). The Lack of A Priori Distinctions Between Learning Algorithms.
Neural Computation 8: 13411390.
Yamins D*, Hong H*, Cadieu C, Solomon EA, Seibert D, DiCarlo JJ (2014).
Performance-Optimized Hierarchical Models Predict Neural Responses in Higher
Visual Cortex.PNAS. doi: 10.1073/pnas.1403112111.
Yang Q, Pitkow X (2015). Robust nonlinear neural codes. COSYNE Abstract.
Yu AJ, Dayan P (2005). Uncertainty, neuromodulation, and attention. Neuron.
46(4): 68192.
Zeiler MD, Fergus R (2014). Visualizing and understanding convolutional
networks. In: European conference on computer vision. Springer International
Publishing. p 818833.
Zohary E, Shadlen MN, Newsome WT (1994). Correlated neuronal discharge rate
and its implication for psychophysical performance. Nature 370: 140143.
Zylberberg J, Pouget A, Latham PE, Shea-Brown E (2016). Robust information
propagation through noisy neural circuits. arXiv:1608.05706.