Sunteți pe pagina 1din 40

NEURO VISION

What is so Great about it?


Deceptively simple anatomical appearance.
Incredibly complicated structure.
Developed through millennia of evolution.
Hard-wired to prefer certain objects right at
birth.
Invariant to position, scale and rotation of the
object.
Tuned to quickly recognize objects.

IS
HUMAN
VISION
PERFECT ?

Tricks of the brain

PENROSES TRIANGLE

Id love to see someone try to get to


the top

PENROSES STAIRS

Why is it so?
Our vision system neural network has been tuned
to perform recognition, processing and
classification of phenomena that was vital to our
survival and progress. In this context, every
species has a different vision system.
Hence, we are not very good in dealing with
artificially generated images, as these phenomena
rarely occurred in nature during our evolution.
However, we are the best for natural images!

How human visual perception works.

How human visual perception works


Perceptions of static scenes are inadequate to describe
motion
Gibsons theory of affordances
Vision evolved in organisms embedded in a
dynamically changing environment
What is important to an organism is a collection of
processes, not a single unique one
These processes are at different levels of abstraction.
E.g. We see waves on a shore, and also the
innumerable molecules in it moving

How human visual perception


works(contd.)
Seeing involves multi-level process simulations in partial
registration using different ontologies, with rich (but
changing) structural relations between levels
Use of structures of various sorts
Agglomeration/grouping: Structures of different sizes at
same level of abstraction
Interpretation: Structures at different levels of abstractionmapping to a new ontology
Fragments recognized in parallel, assembled into larger
wholes-may trigger higher level fragments, or redirect
processing at lower levels to resolve ambiguities, etc.

Functions of vision
Segment the image (or scene) and recognize the
objects distinguished
Compute distance to contact in every direction
Provide feedback and triggers for action
Provide a low-level summary of the 2-D and/or 3D features of the image, leaving it to the central
non-visual processes to draw conclusions
Is something left out?

Visual/spatial reasoning
Our ability to use diagrams and visual images to
reason about very abstract mathematical problems,
like thinking about the complexity of a search
strategy
Seeing that 7+5=12 by a rearrangement of dots
Seeing that angles of a triangle add up to a
straight line
Visualize infinitely thin and long lines of
Euclidean geometry
Many more examples

Visual/spatial reasoning(contd.)
Uses of spatial reasoning: Knowing where to search for an object
thrown over a wall, assembling toy crane from a toy set, uses of spatial
concepts(notion of search space) in programming design
Reasoning using a grasp of spatial structures requires at least: the
ability to see various structures involved in the proof, the possibilities
for variatins(rearrangements) in them, the invariant structures during
the rearrangements, etc.
In contrast, a reasoning system like logic is completely discrete and all
syntactic composition involves function appllication
Specification of the requirements for visual reasoning is very vague,
and would not be easy to mechanize

Visual perception involves much


more..
Visual perception involves affordances
Affordances are the possibilities for, and
constraints on action and change in a
situation. Seeing the possibility of things
that do not exist, but might exist
Example: A person perceiving a chair can
immediately see the possibility of sitting on
it, that is, the chair "affords" sitting

Visual perception involves much


more..(contd.)
POPEYE(1970s): The Popeye project
investigated how it is possible for humans to see
structure in very cluttered scenes, where structure
exists at different levels of abstraction-it showed
that we recognize fll words before individual
alphabets
Consider looking at a smiling or a sad face. Does
it involve only perceiving the structure of the
pattern? We are able to perceive mental states of
happiness or sadness

Visual perception involves much


more..(contd.)
What may appear to be only one task, might consist of many
different tasks in different contexts, e.g. estimating the length
of a plank to fit across a ditch
When a number of images are speedily flashed before the
eyes in order, the speed with which people can see at least
roughly, what sort of scene is depicted by each image,
implies that our visual mechanisms are capable of finding
low level features, using them to cue in features of the
images at various levels of size and abstraction, arriving at
percepts involving known types of objects, within 1 or 2
seconds
High level precisions are made in less than 1/2 a second

Artificial Vision systems-what they


aim at
recognize objects or people in static images without
acquiring or reasoning about the information using 3-D
structure
track moving objects represented in simple shapes(points
or blobs) often using 2-D representations
explore an environment building a 2-D map of walls,
doors, etc. without a possible human understanding of the
maps
control a moving robot, regarded as a moving object
obtain some 3-D information about the environment, only
to generate new images

What vision systems cannot do


After Freddy, the Edinburgh robot built in 1973, there was a
need to move from 2-D to 3-D. Failure due to limitations of
computational power, and difficulty of choosing a
representation
Consider a cup on a table. Humans can "see" the orientation
required for the grasping object at different grasping
locations-visual systems cannot. Alignment of grasped
surfaces with grasping ones is important
Affordances in the object being grasped, if it has sharp
corners, some part of it is more fragile than others-requires a
grasp of counterfactual conditionals involving processes that
do not actually exist

Why the limitation?


To develop a human-like visual system that will do
what a small child/many animals do-need for an
adequate analysis of the requirements for such a
system
The requirements might seem much simpler than
they actually are, if they are not studied in
sufficient depth
The failure to achieve set goals is not a fault of the
choice of domains, or the representation-it is a
problem of overoptimistic predictions

Where is the complexity?


Different levels of perception needed. High level of
precision to lift a hair with a pair of tweezers, much
lower precision to see something is not graspable
Perception can involve multi-strand relationships
requiring much richer forms of representation that
just a logical form
Multiple levels of abstraction, affordances,
causation-all is needed
Many more subtleties..

Visual Pathway
Hierarchical Neural Network
Architecture

Contents
Brain Mechanism of Vision
Hubels and Wiesel's hierarchy model

Cerebral Cortex
Evolution of cerebral cortex is one of the
great success in the history of living beings.
Insights of cortical organization:
Division into different regions having different
functionalities.
e. g. , Visual, auditory, somatic sensory, speech
and motor regions

Visual Pathway
Retina to the Visual Cortex

Hubels and Wiesels Model


Hierarchical model of cortical cells .
The cortical cells are divide into various types
Type IV
Simple cells
Complex cells

Hubels and Wiesels Model


Type IV
Cells have circular symmetry.
The receptive field of the cell is divided into

on Center.
(Excitatory Center and Inhibitory Surrounding)
off Center.
(Inhibitory Center and Excitatory Surrounding)

Hubels and Wiesels Model


Simple Cells
Respond to an optimally oriented line in a
narrowly defined location.
Achieved by requiring the centers of layer
Iv cells that lie along the line.

Hubels and Wiesels Model


Complex cells
the main feature of complex cell
o They are less particular about the location,
Concerned mainly on orientation.
o Aquired is from a number of simple cells
o Detects motion(direction specific).

Hubels and Wiesels Model

Biological Visual Systems


as Guides
Modelling attempts to imitate primate
vision systems

Extended Hubel-Weisel
Hubel-Weisel hierarchical models have been
extended to obtain a fine balance between
selectivity and invariance.
Simple and complex cells are interleaved at
different levels of the inferotemporal (IT) lobes.
Max-like pooling mechanisms have been
suggested at certain levels as opposed to a
weighted sum of afferents to boost invariancy in
scale, position and rotation.

Feedforward Architecture
The S cells (simple cells) in the previous figure passed on information
to the C cells (complex cells) by a bell-tuned weighted sum or a maxlike operation.

These cells were further arranged in a higher feature-level hierarchy.


Some cells bypass a level in propagating information.
This model only considers the feedforward architecture model for the
primary visual cortex, V4 and the posterior IT lobe, and a top-level
supervised learning mode (coloured regions). [Serre et al. 2007]

Feedforward Architecture
Primates have a very advanced level of attention modulation
(fixation) which is a feedback propagation from the IT lobes to
the primary visual cortex and lower levels.
This mechanism allows to shift attention from one part of the
image to another.
However, crude object recognition is done in a very small
duration after stimulus which indicates use of only the
feedforward architecture for rapid categorization.
Such a model was attempted at the McGovern Institute for Brain
Research at MIT with some simplifications.
The input consisted of 4 different orientations and several scales,
densely covering the gray-value input image of 7x 7

Results
The model was evaluated against human responses for input
stimulus of 20ms followed by varying inter-stimulus interval.
No single model parameter was adjusted to fit the human
data. All unsupervised parts were fixed and constant
throughout all the runs.
The supervised mode was tuned differently in different runs
using different test images. Humans were also shown these
test images.
An evaluation across all such runs for the identification of
animal objects was done for both humans and animals. The
results were compared.

Results
Various categories of
images in different clutter,
scale, position, rotation
were given.

Maximum similarity was


found for ISI until 80ms.

Conclusions
Biologically inspired computation models have
shown very promising results. They are versatile
and fast learners. Why not learn from natures
best?
Advances in neuroscience are picking up,
allowing us greater understanding. Also,
simulations of hypothetical models will help us
validate neuroscience findings.

References
Talks by Aaron Sloman, Univ of Birmingham, UK
2005 - 2007
http://www.cs.bham.ac.uk/~axs/invited-talks.html
http://www.lifesci.sussex.ac.uk/home/George_Mather/Linked%20Pages/Physiol/Cort
ex.html

Last accessed - 13 April 2008


Brain Mechanism of Vision,
David H. Hubel and Torsten N. Wiesel
Scientific American, September 1979
How We See What See - V. Demidov, Mir Publishers, 1986
A feedforward architecture accounts for rapid categorization
Serre et al., PNAS, 2007
Hierarchical Models of Object Recognition in Cortex
Poggio et al., Nature America, 1999
http://www.thebrain.mcgill.ca
Last accessed - 13 April 2008

S-ar putea să vă placă și