Sunteți pe pagina 1din 18

CN530 S-2004 1- 1 CN530 S-2004 1- 2

WEEK 1: FUNDAMENTAL PROBLEMS OF VISION


PREFACE
1) Unit formation and grouping The following lecture notes for Week 1 of CN 530 contain many
terms that are likely to be unfamiliar to you. In addition, many
2) Seeing and recognizing -- form/color interactions ordinary-seeming words are used in a manner that can only be
fully understood in the context of material that you will not
3) Retinal veins and blind spot encounter until later in the course.
The purpose of this lecture is to (dis?)orient you as quickly as
4) Perceiving surface color: Constancy, contrast, and possible to the scope of issues addressed and methods of
discounting the illuminant inquiry adopted in the course.
5) Stabilized images: Boundaries and featural color and Please do not worry if parts, or even most, of this lecture make
brightness no sense on first presentation. If you are looking at these notes
before attending the first lecture, please be aware that they
6) Complementary processing: may be difficult to interpret without a spoken “sound track.”
Unoriented and oriented filters Lecture notes for later weeks are more straightforward.
Please use this lecture as a “site map” for locating the contents of
subsequent lectures with respect to the goals of the entire course.

CN530 S-2004 1- 3 CN530 S-2004 1- 4

THE UNITS OF VISION

The objects of perception and the space in which


they seem to lie are not abstracted by a rigid metric but
a far looser one than any philosopher ever proposed or
any psychologist dreamed.

Lettvin (1981)
CN530 S-2004 1- 5 CN530 S-2004 1- 6

GEOMETRY OF VISION

Euclidean Quotidian In what substrate do these functional units exist?


point dot Are they biochemical processes?? … algorithms??
unoriented
… network patterns?? … aspects of consciousness??
oriented line bar (line)

plane surface (region)

infinitesimal extended

CN530 S-2004 1- 7 CN530 S-2004 1- 8

EMERGENT SEGMENTATION AND GROUPING

The “Gestalt Laws” refer to perceptual grouping or


unit formation include on the basis of properties
such as:
similarity
proximity
closure
symmetry
Proximity Collinearity pragnanz (literally “pregnant”)
“good continuation”
“common fate”
Gestalt “Laws”
See Koffka, 1935 and Köhler, 1947
See also: http://www.ship.edu/~cgboeree/gestalt.html
CN530 S-2004 1- 9 CN530 S-2004 1- 10

MULTIPLE PERCEPTUAL GROUPINGS


GESTALT PROPERTIES Multiple oriented 1-D groupings
in a 2-D structure. (after Bozzi, 1969)

Bozzi, 1969 proximity < collinearity

long range “cooperation”


yes

These
no
emergent groupings Consider whether interior and exterior orientations
are “invisible.” agree, and their alignment relative to larger frame
of reference.

CN530 S-2004 1- 11 CN530 S-2004 1- 12

COHERENT PATTERNS
INTERACTION OF TEXTURE AND (OUTER) FORM

No less an authority than Griffy has called


attention to a disturbing phenomenon:

global randomness; local correlations Glass, 1969

Coherence: select globally consistent groupings


CN530 S-2004 1- 13 CN530 S-2004 1- 14

LINE ENDINGS AND GROUPING


Possible groupings of oriented lines segments:

collinear
Glass patterns

Coherence: We select the globally most consistent


groupings from among all possible local groupings.
perpendicular
These emergent boundary groupings are invisible.
We continue to “see” only dots, although we are aware
of and “recognize” circular groupings.
oblique

Global grouping may or may not be in the same


orientation as local line segments.

CN530 S-2004 1- 15 CN530 S-2004 1- 16

BECK TEXTURE DISPLAYS

Beck textures
Linking forms emergent features,
which support region segregation. “Emergent features” can form via linking of local features
to enable us to segment one image region from another.

We “see” the same kind of image contrast for a given element,


whether or not that element is part of some larger grouping.

Q: How are these displays constructed? Why?

The observer’s task in these displays is to say whether the top


and bottom halves are “same” or “different.”
tachistoscopic display: 100 msec
CN530 S-2004 1- 17 CN530 S-2004 1- 18

ILLUSORY CONTOURS
NEON COLOR SPREADING

Some groupings do change what we see.

Ehrenstein (1941/1987):

The form we recognize is not “in the image.”

We see the disk region as brighter than the background.

Varin, 1971

CN530 S-2004 1- 19 CN530 S-2004 1- 20

DISCLAIMER: COGNITIVE FACTORS


DOMINANCE OF GLOBAL GROUPINGS

can distort local information


(even collinear groupings)

café wall illusion

http://www.illusionworks.com/html/cafe_wall.html

Note: For reasons that surpass understanding,


the existence of the illusionworks.com web site From Marr 1982 -- R. C. James
is a stochastic process. . . . by way of Sinha & Adelson, 1997
CN530 S-2004 1- 21 CN530 S-2004 1- 22

PERCEPTION “RESISTS” COGNITIVE INTRUSIONS


AUTONOMY OF PERCEPTION

“. . . two different moments, or aspects, in the perceptual process:


the moment of segmentation of the visual field,
which we called ‘pre-categorical’ or primary,
and the secondary aspect of cognitive processing of
the autonomously segregated perceptual units.
[emphasis added]

Kanizsa and Luccio 1987


Formation and categorization of visual objects:
Höffding’s never refuted but always forgotten argument.
Gestalt Theory 9 111-127
Fig 18, Kanizsa & Vicario, 1968

CN530 S-2004 1- 23 CN530 S-2004 1- 24

CRAIK-O'BRIEN-CORNSWEET EFFECT

The COCE is the first of many examples that we will


encounter that suggests that . . .

a perceptual surface is not simply a union of local patches.

long-range interactions “Hold this thought” to compare with later evidence.


in what we see
in a region.
The kind of taxonomy of visual phenomena that we have
been pursuing so far is not sufficiently constrained to help
After Todorovic,
é 1987 us model visual processing.

How, then, should we proceed?


computational constraints? physiology?
CN530 S-2004 1- 25 CN530 S-2004 1- 26

FROM PHENOMENOLOGY TO GEOMETRY


ILLUMINATION, REFLECTANCE, AND VISION
Bar (line): not a union of dots
Surface: not a union of local patches
Ex,y
Rx,y =
For these units: extension Ix,y
interiors
contours separating them from surround
light out: Ex,y
Not Euclidean geometry or calculus of Newton/Leibniz light in: Ix,y

“Fact: The notion of “surface area” is devoid of any meaning x,y


unless you specify the resolution at which it is to be assessed.
The same goes ipso facto for the notion of “arc length.”
Koenderink Solid Shape 1990

CN530 S-2004 1- 27 CN530 S-2004 1- 28

AMBIGUITY OF LOCAL INFORMATION

How could you tell from purely local information whether a


surface patch is planar under nonuniform illumination
or smoothly curved in depth?

If you cannot make this distinction at a certain point in visual


processing, what can you base subsequent processing on?

Ex,y = f(Ix,y Rx,y)


CN530 S-2004 1- 29 CN530 S-2004 1- 30

Illumination vs. Reflectance effects BRIGHTNESS CONSTANCY


Claim: We must be able to distinguish the visual effects
of surface shape (orientation) and reflectance from the
effects of variations in illumination* at a fairly early stage
of visual processing, because virtually every other light out
important visual competence depends on being able to light in
make this distinction.

Note: How animals do this is still unknown for general (shading codes surface reflectance)
illumination conditions! [That’s a major understatement;
put another way: we’re still far away from having a REFLECTANCE: A surface property
general computer vision system.]
We estimate the ratio of reflectances across boundaries.
NOTE: We perceive a world of objects and events, not of
light. (Gibson, 1950, 1966, 1979)

* E.g. cast shadows; variable distances from source, . . .

CN530 S-2004 1- 31 CN530 S-2004 1- 32


LAND'S EXPERIMENTS IN COLOR CONSTANCY
(Shading within boxes stands
for colored pigments.)
Physical reflectance is a surface property that is constant
red (Land, 1971)
over time, specifically w/r/t variable illumination.

It would be nice if our perceptions of surface color were green


similarly constant.

Note: We will confront two definitions of “reflectance;” blue


besides physical reflectance (radiance as a ratio of incident Land -- McCann “Mondrians,” I
illumination), we have Grossbergian reflectance, the FIRST EXPERIMENT: If the intensity of the red
ratio of the magnitude of input to one node in a (long wavelength) illuminant is doubled or tripled, the
network relative to total input to the network. colors in the Mondrian still look “much the same.”
We somehow factor away the “extra” red.
Helmholtz: “Discounting the illuminant”
(wavelength and intensity)
CN530 S-2004 1- 33 CN530 S-2004 1- 34

LAND -- McCANN MONDRIANS GRADIENTS OF ILLUMINATION AND REFLECTANCE

Use different illumination


gradients in different illumination
wavelengths, adjusted so as I
to “offset” the effects of position
spectral reflectance of
two patches. reflectance in
wavelength
R
Different colors seen from position
the same spectrum
image
. . . similar to those IR
seen in white light
E E position

CN530 S-2004 1- 35 CN530 S-2004 1- 36

“RETINEX” STRATEGY

1. Recover relative reflectances (ratios)


near image edges. How to go from IR to f (R )?

a c Ideally, some simple function


b d
d
a
To be able to “discount” the illuminant (intensity) with
image b a Retinex-like strategy, the Illumination gradient must be
IR more gradual than the gradient of change in reflectance.
c
This is not true in the Gelb effect.
position
2. Suppress information from slowly
varying region interiors.
CN530 S-2004 1- 37 CN530 S-2004 1- 38

THE GELB EFFECT

From Kaufman, 1974.


Exercise: Look around you and point to the
Figure 5-1. The Gelb surface of an object whose appearance would be
experiment. A spinning black closest to that of a moon rock, if that moon rock
paper disc D is illuminated by were in the room with you now.
a projector P. Stray light
enters the room behind the
disc so that it cannot be seen
by the observer 0. The disc
appears luminous except
when a small piece of white
paper is placed in the path of
the projector light.

CN530 S-2004 1- 39 CN530 S-2004 1- 40

RATIOS FROM EARLY BIOLOGICAL PROCESSING RATIOS NOT ENOUGH

If early representation looks like this

we need “filling-in” for perceived surface interiors;

otherwise, we would “see” a world of line drawings.


CN530 S-2004 1- 41 CN530 S-2004 1- 42

BRIGHTNESS CONTRAST
BRIGHTNESS CONTRAST
Two small disk patches of equal luminance.
One annulus is of high luminance.
The other is of low luminance.
Contrast effect:
Perceived brightness of inner disk
varies in direction opposite to
luminance of annuli.
Normalization within functional spatial domains --
Claim: This occurs because some
total “energy” in a representation is conserved,
representation of the “sum of ratios” of inputs
as the sum of inputs varies.
for the two scenes is approximately constant.
That is, normalization occurs, whereby
Not just “lateral inhibition” -- but “anchoring”.
-- for some functional spatial domain --
Gilchrist e amici
the quantity of total “energy” in a representation (or output
of some network) is conserved, as the sum of inputs varies
over some range. (Cf. Simulation Assignment 1.)

CN530 S-2004 1- 43 CN530 S-2004 1- 44

REGIONS AND EDGES: TISSUE CONTRAST


RETINA
The conditions of simultaneous contrast are subtle.
The retina is the “interface” between a mammal and its visual
Consider “tissue contrast” (blurred presentation) in the
environment.
Chromatic domain (Helmholtz)
Stimulus: Percept: Starting from the interface, research on visual perception could
red red be motivated by:

gray green 1) a detailed analysis of the visual system's mechanisms or

2) a detailed analysis of the visual environment.


Stimulus: Percept: (cf. Gibson, 1950, 1966, 1979; Marr, 1982)

red red Ultimately, these two lines of inquiry would converge, because
the visual environment has shaped the evolution of our visual
gray gray
systems.
CN530 S-2004 1- 45 CN530 S-2004 1- 46

RETINA AS INPUT DEVICE, I PATH OF LIGHT

Kolb, Fernandez & Anderson


http://retina.umh.es/Webvision/sretina.html
Lines and edges are registered . . . poorly!

CN530 S-2004 1- 47 CN530 S-2004 1- 48

COMPLETION OVER THE BLIND SPOT


RETINA AS INPUT DEVICE, II

Pattern formed on retina by a dark line

vein

DEMO: Close right eye and fixate upper cross with left eye.
blind spot
Hold page at about 1 ft from the eye, and move it back
and forth in depth slightly until the disk on the left disappears.

Similarly, when fixating the lower cross, the gap in the black
line can be made to fall on the blind spot, and the line is seen
. . .is not even connected!
as continuous.
Completion needed for “real” contours
Adapted from Kandel & Schwartz's (1985)
adaptation from Hurvich, (1981)
Note: this statement is still controversial.
CN530 S-2004 1- 49 CN530 S-2004 1- 50

EYE MICROMOVEMENTS
COMPENSATION FOR RETINAL GAPS
The eye jiggles constantly in its orbit.
(“tremor,” approximately 40 Hertz*)
The shadows of retinal veins do not move relative to the
photoreceptor mosaic.
They therefore form “stabilized images.” Emergent boundary formation (completion)
But stabilized images fade. *Which boundaries to connect?
(Time scale: seconds)
(Cornsweet, 1970; Krauskopf,1963, Ratliff,1965; Yarbus, 1967)

Featural filling-in
Note: While the stabilization of veins
accounts for our not “seeing” them, What color and brightness do we SEE?
the line that we do see still makes an
incomplete pattern on the retina.
* Not the same as
* Local expert: Prof. Rucci “recognition”

CN530 S-2004 1- 51 CN530 S-2004 1- 52

GLASS PATTERN RECONSIDERED EMERGENT SEGMENTATION


SEE: dots Examples of emergent segmentation:
RECOGNIZE: circular groupings 1) Collinear (Beck, Bozzi)
The emergent segmentation is “invisible”, 2) Perpendicular to line ends (Beck, Ehrenstein, Varin neon)
In the sense that no brightness difference 3) Diagonal, oblique (Beck, Kennedy)
.traces the form generated by the segmentation.
4) “Other” (Beck's T's and L's, dalmation)
CLAIM: All boundaries are invisible.

Specifically: the segmentation of the circular illusory


The contrasts and hues that we “see” are not the only
boundary in the Ehrenstein figure is also invisible; but it
relevant perceptual structures.
interacts with “something else” in a way that the Glass
pattern segmentation does not. We segment and group images on the basis of “emergent
boundaries,” that are based on image contrasts, but not
What else? Why? And why not, respectively?
isomorphic to the pattern of contrasts.
Is the boundary of the disk Why?
on the right invisible?
CN530 S-2004 1- 53 CN530 S-2004 1- 54

STABILIZED IMAGE EXPERIMENTS


BOUNDARIES AND FILLING-IN
How to stabilize an image. . . the hard way!

An image like this when stabilized on looks like this:


the retina, Caps needed to be of low mass and low
moment of inertia; why?
stabilize Fig 24 from Yarbus (1967)

red

black

NOTE: time scale of seconds

CN530 S-2004 1- 55 CN530 S-2004 1- 56

HOME-MADE STABILIZATION YARBUS EXPERIMENT (1967):

A red dot When indicated . . . white and black are


moves back boundaries are no longer visible, but
and forth. stabilized, . . .
. . . the effects of
contrast remain!

red (same) stabilize darker red lighter red


CN530 S-2004 1- 57 CN530 S-2004 1- 58

INFERENCES FROM YARBUS EXPERIMENT BOUNDARIES AND VISIBILITY

Q: What happens if there are no boundaries in a visual field?


Boundaries restrict featural filling-in (color and brightness). Ganzfeld: Completely homogeneous visual field -- can be
When boundaries fade, colors flow. approximated by wearing goggles made from halves of a
Boundary fading does not imply color fading. ping-pong ball.

A: When viewing a ganzfeld, brightness fades. (time scale?)


Two subsystems in early vision:
boundary system Note: Homogeneous regions of any image are
feature system (brightness, color) de facto stabilized.
e.g. “clear blue sky”

Homogeneous regions carry no information.

CN530 S-2004 1- 59 CN530 S-2004 1- 60

BOUNDARY PROCESSING
Later in the course we will consider binocular vision.
Boundaries: how to
detect
For a homogeneous region, any two views match
sharpen (sometimes)
in the two eyes’ inputs.
and complete?
For statistically homogenous regions, there are many
Unoriented and oriented receptive fields
“false matches.”
(masks, filters, kernels)

VS

Note: Cross-reference discussion of homogeneous


regions with “featural noise suppression” in Week 2. Hubel & Wiesel, 1962 . . . Nobel Prize
CN530 S-2004 1- 61 CN530 S-2004 1- 62

SIMPLE CELLS: ORIENTED LOCAL CONTRAST FILTERS REVERSE-CONTRAST KANIZSA SQUARE

active Local oriented-contrast filtering alone is not enough:


Sensitive to:

1) orientation
2) amount of contrast
3) direction-of-contrast
(i.e. contrast polarity)
4) spatial scale
5) position Cf: Shapley &
Gordon, 1985
inactive
Again, boundary ≠ brightness.

Not (just) “edge detectors.”

CN530 S-2004 1- 63 CN530 S-2004 1- 64

SCENE ANALYSIS ALGORITHMS


“INVISIBILITY” OF BOUNDARIES
In the 1960s and 70s, the artificial intelligentsia
tried to understand 3-D surface layout by
labeling the edges of simple scenes,
using syntactic operations on structures
of symbolic edge and junction tokens.
Characteristic problem: Combinatorial explosion
Boundary processing is
of possible labelings as number of edges increases
insensitive to direction of contrast
(contrast polarity of edges)
Guzman (1968)

when completing over gaps. SEE labels

Waltz (1975) got a lot of mileage out of disallowing


such linking in a syntactic scene labeling algorithm.
Huffman-Clowes (1971) labels
CN530 S-2004 1- 65 CN530 S-2004 1- 66

WHO PUT THE “A” IN THE OLD “AI”? REVERSE-CONTRAST KANIZSA SQUARE, RECALLED

Winston (1975) Evidently, while sensitive to:


raves about 1) orientation,
“the Waltz effect.” 2) amount of contrast,
3) spatial scale, and
4) position,
human boundary processing is
Waltz (1972/1975) added initial complexity to prior schemes:
a label for distinguishing“shadow” from “nonshadow” edges. insensitive to direction of contrast
(i.e., contrast polarity of “edges”),
GOOD NEWS: The addition of this constraint drastically
reduced the number of possible combinations to be searched when completing over gaps
in order to label a scene. (i.e, over a sufficiently large spatial scale.)

But . . .

CN530 S-2004 1- 67 CN530 S-2004 1- 68

COMPLEMENTARITY UNITS OF VISUAL REPRESENTATIONS

UNCERTAINTY: Some kinds of information are incompatible;


they can not be represented simultaneously by a single unit.

cooperation

filling-in (diffusion?)

Boundaries: Completion Surfaces: Filling-in


better orientation worse localization
oriented unoriented
inward outward
QUANTIZATION: How to maximize resolution,
insensitive to sensitive to
given that vision can not “do calculus” with infinitesimals?
direction-of-contrast direction-of-contrast
CN530 S-2004 1- 69 CN530 S-2004 1- 70

“EARLY”AND “MIDDLE” VISION PROCESSING

UNCERTAINTY AND QUANTIZATION OBJECT RECOGNITION SYSTEM


Learns and remembers perceptual codes

Thin perceptual line: Oriented filter: Boundary Contour System Feature Contour System
grouping filling-in
NOT a union of points NOT an edge detector completion of brightness, color
sharpening
Fuzzy/statistical object Uncertainty

Boundary and feature Localization of “edgel” PREPROCESSING


NOTE:
AND This diagram includes
ratios at edges “cognitive expectations”
plausible grouping directions

INPUT

CN530 S-2004 1- 71 CN530 S-2004 1- 72

Where’s the “intelligence” in vision? WHAT ARE THE UNITS OF VISION?


This course is about the geometry of vision.

(Recall Lettvin quote.)


Helmholtz: “as if” unconscious inference
Gestaltists: field theory
Gibson: direct perception; “pickup” of In the visual universe, we have, at times:
environmental information coherence among disconnected elements, and
Marr: computational theory segmentations through homogeneous regions!
Grossberg: network architecture & resonance What kind of geometry does this?
Schwartz: maps What could possibly be its functional units?
You: _________
Consider: The “preattentive” visual system at times makes use
of direction-of-contrast information and at other times ignores it,
thereby rivaling the performance of the most efficient
cognitive engine yet invented: the thermos bottle!

S-ar putea să vă placă și