Vol. 63, No. 1 March, 1956 The Psychological Review

VOL. 63, No.
1 MARCH, 1956
THE PSYCHOLOGICAL REVIEW

THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO:
SOME LIMITS ON OUR CAPACITY FOR
PROCESSING INFORMATION 1
GEORGE A. MILLER
Harvard University
My problem is that I have been perse- judgment. Historical accident, how-

cuted by an integer. For seven years ever, has decreed that they should have
this number has followed me around, has another name. We now call them ex-
intruded in my most private data, and periments on the capacity of people to
has assaulted me from the pages of our transmit information. Since these ex-
most public journals. This number as- periments would not have been done
sumes a variety of disguises, being some- without the appearance of information
times a little larger and sometimes a theory on the psychological scene, and
little smaller than usual, but never since the results are analyzed in terms
changing so much as to be unrecogniz- of the concepts of information theory,
able. The persistence with which this I shall have to preface my discussion
number plagues me is far more than with a few remarks about this theory.
a random accident. There is, to quote
INFORMATION MEASUREMENT
a famous senator, a design behind it,
some pattern governing its appearances. The "amount of information" is ex-
Either there really is something unusual actly the same concept that we have
about the number or else I am suffering talked about for years under the name
from delusions of persecution. of "variance." The equations are dif-
I shall begin my case history by tell- ferent, but if we hold tight to the idea
ing you about some experiments that that anything that increases the vari-
tested how accurately people can assign ance also increases the amount of infor-
numbers to the magnitudes of various mation we cannot go far astray.
aspects of a stimulus. In the tradi- The advantages of this new way
tional language of psychology these of talking about variance are simple
would be called experiments in absolute enough. Variance is always stated in
1
terms of the unit of measurement—
This paper was first read as an Invited inches, pounds, volts, etc.—whereas the
Address before the Eastern Psychological As-
sociation in Philadelphia on April IS, 19SS. amount of information is a dimension-
Preparation of the paper was supported by less quantity. Since the information in
the Harvard Psycho-Acoustic Laboratory un- a discrete statistical distribution does
der Contract NSori-76 between Harvard Uni- not depend upon the unit of measure-
versity and the Office of Naval Research, U. S. ment, we can extend the concept to
Navy (Project NR142-201, Report PNR-174).
Reproduction for any purpose of the U. S. situations where we have no metric and
Government is permitted. we would not ordinarily think of using
81
GEORGE A. MILLER
the variance. And it also enables us to cles. Then the left circle can be taken
compare results obtained in quite dif- to represent the variance of the input,
ferent experimental situations where it the right circle the variance of the out-
would be meaningless to compare vari- put, and the overlap the covariance of
ances based on different metrics. So input and output. I shall speak of the
there are some good reasons for adopt- left circle as the amount of input infor-
ing the newer concept. mation, the right circle as the amount
The similarity of variance and amount of output information, and the overlap
of information might be explained this as the amount of transmitted informa-
way: When we have a large variance, tion.
we are very ignorant about what is go- In the experiments on absolute judg-
ing to happen. If we are very ignorant, ment, the observer is considered to be
then when we make the observation it a communication channel. Then the
gives us a lot of information. On the left circle would represent the amount
other hand, if the variance is very small, of information in the stimuli, the right
we know in advance how our observa- circle the amount of information in his
tion must come out, so we get little in- responses, and the overlap the stimulus-
formation from making the observation. response correlation as measured by the
If you will now imagine a communi- amount of transmitted information. The
cation system, you will realize that experimental problem is to increase the
there is a great deal of variability about amount of input information and to
what goes into the system and also a measure the amount of transmitted in-
great deal of variability about what formation. If the observer's absolute
comes out. The input and the output judgments are quite accurate, then
can therefore be described in terms of nearly all of the input information will
their variance (or their information). be transmitted and will be recoverable
If it is a good communication system, from his responses. If he makes errors,
however, there must be some system- then the transmitted information may
atic relation between what goes in and be considerably less than the input. We
what comes out. That is to say, the expect that, as we increase the amount
output will depend upon the input, or of input information, the observer will
will be correlated with the input. If we begin to make more and more errors;
measure this correlation, then we can we can test the limits of accuracy of his
say how much of the output variance is absolute judgments. If the human ob-
attributable to the input and how much server is a reasonable kind of communi-
is due to random fluctuations or "noise" cation system, then when we increase
introduced by the system during trans- the amount of input information the
mission. So we see that the measure transmitted information will increase at
of transmitted information is simply a first and will eventually level off at some
measure of the input-output correlation. asymptotic value. This asymptotic value
There are two simple rules to follow. we take to be the channel capacity of
Whenever I refer to "amount of in- the observer: it represents the greatest
formation," you will understand "vari- amount of information that he can give
ance." And whenever I refer to "amount us about the stimulus on the basis of
of transmitted information," you will an absolute judgment. The channel ca-
understand "covariance" or "correla- pacity is the upper limit on the extent
tion." to which the observer can match his re-
The situation can be described graphi- sponses to the stimuli we give him.
cally by two partially overlapping cir- Now just a brief word about the bit
THE MAGICAL NUMBER SEVEN 83
and we can begin to look at some data. ABSOLUTE JUDGMENTS OF UNI-

One bit of information is the amount of DIMENSIONAL STIMULI
information that we need to make a Now let us consider what happens
decision between two equally likely al- when we make absolute judgments of
ternatives. If we must decide whether tones. Pollack (17) asked listeners to
a man is less than six feet tall or more identify tones by assigning numerals to
than six feet tall and if we know that them. The tones were different with re-
the chances are SO-SO, then we need spect to frequency, and covered the
one bit of information. Notice that range from 100 to 8000 cps in equal
this unit of information does not refer logarithmic steps. A tone was sounded
in any way to the unit of length that and the listener responded by giving a
we use—feet, inches, centimeters, etc. numeral. After the listener had made
However you measure the man's height, his response he was told the correct
we still need just one bit of information. identification of the tone.
Two bits of information enable us to When only two or three tones were
decide among four equally likely alter- used the listeners never confused them.
natives. Three bits of information en- With four different tones confusions
able us to decide among eight equally were quite rare, but with five or more
likely alternatives. Four bits of infor- tones confusions were frequent. With
mation decide among 16 alternatives, fourteen different tones the listeners
five among 32, and so on. That is to made many mistakes.
say, if there are 32 equally likely alter- These data are plotted in Fig. 1.
natives, we must make five successive Along the bottom is the amount of in-
binary decisions, worth one bit each, be- put information in bits per stimulus.
fore we know which alternative is cor- As the number of alternative tones was
rect. So the general rule is simple: increased from 2 to 14, the input infor-
every time the number of alternatives mation increased from 1 to 3.8 bits. On
is increased by a factor of two, one bit the ordinate is plotted the amount of
of information is added.
There are two ways we might in-
crease the amount of input information.
We could increase the rate at which we
give information to the observer, so that _2.5
BITS
the amount of information per unit time a 2
UJ
would increase. Or we could ignore the
time variable completely and increase en
z PITCHES
the amount of input information by 100-8000 CPS
increasing the number of alternative
stimuli. In the absolute judgment ex- 0 1 2 3 4 5
periment we are interested in the second INPUT INFORMATION

alternative. We give the observer as
FIG. 1. Data from Pollack (17, 18) on the
much time as he wants to make his re- amount of information that is transmitted by
sponse; we simply increase the number listeners who make absolute judgments of
of alternative stimuli among which he auditory pitch. As the amount of input in-
must discriminate and look to see where formation is increased by increasing from 2
to 14 the number of different pitches to be
confusions begin to occur. Confusions judged, the amount of transmitted informa-
will appear near the point that we are tion approaches as its upper limit a channel
calling his "channel capacity." capacity of about 2.S bits per judgment.
84 GEORGE A. MILLER
transmitted information. The amount

of transmitted information behaves in
much the way we would expect a com-
munication channel to behave; the trans-
mitted information increases linearly up
to about 2 bits and then bends off to-
ward an asymptote at about 2.5 bits.
This value, 2.5 bits, therefore, is what
we are calling the channel capacity of
the listener for absolute judgments of 1 2 3 4 5
pitch. INPUT INFORMATION
So now we have the number 2.5 FIG. 2. Data from Garner (7) on the chan-
bits. What does it mean? First, note nel capacity for absolute judgments of audi-
that 2.5 bits corresponds to about six tory loudness.
equally likely alternatives. The result
means that we cannot pick more than Next you can ask how reproducible
six different pitches that the listener will this result is. Does it depend on the
never confuse. Or, stated slightly dif- spacing of the tones or the various con-
ferently, no matter how many alterna- ditions of judgment? Pollack varied
tive tones we ask him to judge, the best these conditions in a number of ways.
we can expect him to do is to assign The range of frequencies can be changed
them to about six different classes with- by a factor of about 20 without chang-
out error. Or, again, if we know that ing the amount of information trans-
there were N alternative stimuli, then mitted more than a small percentage.
his judgment enables us to narrow down Different groupings of the pitches de-
the particular stimulus to one out of creased the transmission, but the loss
N/6. was small. For example, if you can
Most people are surprised that the discriminate five high-pitched tones in
number is as small as six. Of course, one series and five low-pitched tones in
there is evidence that a musically so- another series, it is reasonable to ex-
phisticated person with absolute pitch pect that you could combine all ten into
can identify accurately any one of 50 a single series and still tell them all
or 60 different pitches. Fortunately, I apart without error. When you try it,
do not have time to discuss these re- however, it does not work. The chan-
markable exceptions. I say it is for- nel capacity for pitch seems to be about
tunate because I do not know how to six and that is the best you can do.
explain their superior performance. So While we are on tones, let us look
I shall stick to the more pedestrian fact next at Garner's (7) work on loudness.
that most of us can identify about one Garner's data for loudness are sum-
out of only five or six pitches before we marized in Fig. 2. Garner went to some
begin to get confused. trouble to get the best possible spacing
It is interesting to consider that psy- of his tones over the intensity range
chologists have been using seven-point from 15 to 110 db. He used 4, 5, 6, 7,
rating scales for a long time, on the 10, and 20 different stimulus intensities.
intuitive basis that trying to rate into The results shown in Fig. 2 take into
finer categories does not really add much account the differences among subjects
to the usefulness of the ratings. Pol- and the sequential influence of the im-
lack's results indicate that, at least for mediately preceding judgment. Again
pitches, this intuition is fairly sound. we find that there seems to be a limit.
and Garner (8) asked observers to in-

terpolate visually between two scale
markers. Their results are shown in
Fig. 4. They did the experiment in
two ways. In one version they let the
observer use any number between zero
TAST
and 100 to describe the position, al-
JUDGMENTS OF SALINE though they presented stimuli at only
CONCENTRATION 5, 10, 20, or SO different positions. The
1 2 3 4 results with this unlimited response
technique are shown by the filled circles
INPUT INFORMATION
on the graph. In the other version the
FIG. 3. Data from Beebe-Center, Rogers, observers were limited in their re-
and O'Connell (1) on the channel capacity for sponses to reporting just those stimu-
absolute judgments of saltiness.
lus values that were possible. That is
to say, in the second version the num-
The channel capacity for absolute judg- ber of different responses that the ob-
ments of loudness is 2.3 bits, or about server could make was exactly the same
five perfectly discriminable alternatives. as the number of different stimuli that
Since these two studies were done in the experimenter might present. The
different laboratories with slightly dif- results with this limited response tech-
ferent techniques and methods of analy- nique are shown by the open circles on
sis, we are not in a good position to the graph. The two functions are so
argue whether five loudnesses is signifi- similar that it seems fair to conclude
cantly different from six pitches. Prob- that the number of responses available
ably the difference is in the right direc- to the observer had nothing to do with
tion, and absolute judgments of pitch the channel capacity of 3.2S bits.
are slightly more accurate than absolute The Hake-Garner experiment has been
judgments of loudness. The important repeated by Coonan and Klemmer. Al-
point, however, is that the two answers though they have not yet published
are of the same order of magnitude. their results, they have given me per-
The experiment has also been done mission to say that they obtained chan-
for taste intensities. In Fig. 3 are the nel capacities ranging from 3.2 bits for
results obtained by Beebe-Center, Rog-
ers, and O'Connell (1) for absolute
judgments of the concentration of salt -3.25
solutions. The concentrations ranged . BITS
from 0.3 to 34.7 gm. NaCl per 100 8 z
cc. tap water in equal subjective steps.
They used 3, 5, 9, and 17 different con-
centrations. The channel capacity is
1.9 bits, which is about four distinct
100
concentrations. Thus taste intensities
seem a little less distinctive than audi- 0 1 2 3 4 5 6
tory stimuli, but again the order of INPUT INFORMATION
magnitude is not far off.
FIG. 4. Data from Hake and Garner (8)
On the other hand, the channel ca- on the channel capacity for absolute judg-
pacity for judgments of visual position ments of the position of a pointer in a linear
seems to be significantly larger. Hake interval.
86 GEORGE A. MILLER
very short exposures of the pointer po- for the long exposure. Curvature was
sition to 3.9 bits for longer exposures. apparently harder to judge. When the
These values are slightly higher than length of the arc was constant, the re-
Hake and Garner's, so we must con- sult at the short exposure duration was
clude that there'are between 10 and IS 2.2 bits, but when the length of the
distinct positions along a linear inter- chord was constant, the result was only
val. This is the largest channel ca- 1.6 bits. This last value is the lowest
pacity that has been measured for any that anyone has measured to date. I
unidimensional variable. should add, however, that these values
At the present time these four experi- are apt to be slightly too low because
ments on absolute judgments of simple, the data from all subjects were pooled
unidimensional stimuli are all that have before the transmitted information was
appeared in the psychological journals. computed.
However, a great deal of work on other Now let us see where we are. First,
stimulus variables has not yet appeared the channel capacity does seem to be a
in the journals. For example, Eriksen valid notion for describing human ob-
and Hake (6) have found that the servers. Second, the channel capacities
channel capacity for judging the sizes measured for these unidimensional vari-
of squares is 2.2 bits, or about five ables range from 1.6 bits for curvature
categories, under a wide range of ex- to 3.9 bits for positions in an interval.
perimental conditions. In a separate Although there is no question that the
experiment Eriksen (5) found 2.8 bits
differences among the variables are real
for size, 3.1 bits for hue, and 2.3 bits
and meaningful, the more impressive
for brightness. Geldard has measured
the channel capacity for the skin by fact to me is their considerable simi-
placing vibrators on the chest region. larity. If I take the best estimates I
A good observer can identify about four can get of the channel capacities for all
intensities, about five durations, and the stimulus variables I have mentioned,
about seven locations. the mean is 2.6 bits and the standard
One of the most active groups in this deviation is only 0.6 bit. In terms of
area has been the Air Force Operational distinguishable alternatives, this mean
Applications Laboratory. Pollack has corresponds to about 6.5 categories, one
been kind enough to furnish me with standard deviation includes from 4 to
the results of their measurements for 10 categories, and the total range is
several aspects of visual displays. They from 3 to IS categories. Considering
made measurements for area and for the wide variety of different variables
the curvature, length, and direction of that have been studied, I find this to
lines. In one set of experiments they be a remarkably narrow range.
used a very short exposure of the stimu- There seems to be some limitation
lus—%0 second—and then they re- built into us either by learning or by
peated the measurements with a 5- the design of our nervous systems, a
second exposure. For area they got limit that keeps our channel capacities
2.6 bits with the short exposure and in this general range. On the basis of
2.7 bits with the long exposure. For the present evidence it seems safe to
the length of a line they got about 2.6 say that we possess a finite and rather
bits with the short exposure and about small capacity for making such unidi-
3.0 bits with the long exposure. Direc- mensional judgments and that this ca-
tion, or angle of inclination, gave 2.8 pacity does not vary a great deal from
bits for the short exposure and 3.3 bits one simple sensory attribute to another.
ABSOLUTE JUDGMENTS OF MULTI- suits. Now the channel capacity seems

DIMENSIONAL STIMULI to have increased to 4.6 bits, which
means that people can identify accu-
You may have noticed that I have rately any one of 24 positions in the
been careful to say that this magical square.
number seven applies to one-dimensional The position of a dot in a square is
judgments. Everyday experience teaches clearly a two-dimensional proposition.
us that we can identify accurately any Both its horizontal and its vertical po-
one of several hundred faces, any one sition must be identified. Thus it seems
of several thousand words, any one of natural to compare the 4.6-bit capacity
several thousand objects, etc. The story for a square with the 3.25-bit capacity
certainly would not be complete if we for the position of a point in an inter-
stopped at this point. We must have val. The point in the square requires
some understanding of why the one- two judgments of the interval type. If
dimensional variables we judge in the we have a capacity of 3.2S bits for esti-
laboratory give results so far out of mating intervals and we do this twice,
line with what we do constantly in our we should get 6.5 bits as our capacity
behavior outside the laboratory. A pos- for locating points in a square. Adding
sible explanation lies in the number of the second independent dimension gives
independently variable attributes of the us an increase from 3.2S to 4.6, but it
stimuli that are being judged. Objects, falls short of the perfect addition that
faces, words, and the like differ from would give 6.5 bits.
one another in many ways, whereas the Another example is provided by Beebe-
simple stimuli we have considered thus Center, Rogers, and O'Connell. When
far differ from one another in only one they asked people to identify both the
respect.
saltiness and the sweetness of solutions
Fortunately, there are a few data on
containing various concentrations of salt
what happens when we make absolute and sucrose, they found that the chan-
judgments of stimuli that differ from
nel capacity was 2.3 bits. Since the ca-
one another in several ways. Let us
pacity for salt alone was 1.9, we might
look first at the results Klemmer and expect about 3.8 bits if the two aspects
Frick (13) have reported for the abso- of the compound stimuli were judged
lute judgment of the position of a dot independently. As with spatial loca-
in a square. In Fig. 5 we see their re-
tions, the second dimension adds a little
to the capacity but not as much as it
conceivably might.
A third example is provided by Pol-
lack (18), who asked listeners to judge
both the loudness and the pitch of pure
tones. Since pitch gives 2.S bits and
loudness gives 2.3 bits, we might hope
POINTS IN A SQUARE to get as much as 4.8 bits for pitch and
NO GRID loudness together. Pollack obtained 3.1
.03 SEC. EXPOSURE
bits, which again indicates that the
3 4 5 6 7 second dimension augments the channel
INPUT INFORMATION capacity but not so much as it might.
FIG. S. Data from Klemmer and Frick (13) A fourth example can be drawn from
on the channel capacity for absolute judg- the work of Halsey and Chapanis (9)
ments of the position of a dot in a square. on confusions among colors of equal
88 GEORGE A. MILLER
luminance. Although they did not ana-

lyze their results in informational terms,
they estimate that there are about 11 to
IS identifiable colors, or, in our terms,
about 3.6 bits. Since these colors varied
in both hue and saturation, it is prob-
ably correct to regard this as a two-
dimensional judgment. If we compare
this with Eriksen's 3.1 bits for hue
(which is a questionable comparison to
I 2 3 4 5 6 7
draw), we again have something less
NUMBER OF VARIABLE ASPECTS
than perfect addition when a second
dimension is added. FIG. 6. The general form of the relation be-
It is still a long way, however, from tween channel capacity and the number of in-
dependently variable attributes of the stimuli.
these two-dimensional examples to the
multidimensional stimuli provided by
faces, words, etc. To fill this gap we decreasing rate. It is interesting to
have only one experiment, an auditory note that the channel capacity is in-
study done by Pollack and Picks (19). creased even when the several variables
They managed to get six different acous- are not independent. Eriksen (5) re-
tic variables that they could change: ports that, when size, brightness, and
frequency, intensity, rate of interrup- hue all vary together in perfect correla-
tion, on-time fraction, total duration, tion, the transmitted information is 4.1
and spatial location. Each one of these bits as compared with an average of
six variables could assume any one of about 2.7 bits when these attributes are
five different values, so altogether there varied one at a time. By confounding
were S8, or 15,625 different tones that three attributes, Eriksen increased the
they could present. The listeners made dimensionality of the input without in-
a separate rating for each one of these creasing the amount of input informa-
six dimensions. Under these conditions tion ; the result was an increase in chan-
the transmitted information was 7.2 bits, nel capacity of about the amount that
which corresponds to about 150 differ- the dotted function in Fig. 6 would lead
ent categories that could be absolutely us to expect.
identified without error. Now we are The point seems to be that, as we
beginning to get up into the range that add more variables to the display, we
ordinary experience would lead us to increase the total capacity, but we de-
expect. crease the accuracy for any particular
Suppose that we plot these data, variable. In other words, we can make
fragmentary as they are, and make a relatively crude judgments of several
guess about how the channel capacity things simultaneously.
changes with the dimensionality of the We might argue that in the course of
stimuli. The result is given in Fig. 6. evolution those organisms were most
In a moment of considerable daring I successful that were responsive to the
sketched the dotted line to indicate widest range of stimulus energies in
roughly the trend that the data seemed their environment. In order to survive
to be taking. in a constantly fluctuating world, it was
Clearly, the addition of independently better to have a little information about
variable attributes to the stimulus in- a lot of things than to have a lot of in-
creases the channel capacity, but at a formation about a small segment of the
environment. If a compromise was nec- find out. There is a limit, however, at
essary, the one we seem to have made is about eight or nine distinctive features
clearly the more adaptive. in every language that has been studied,
Pollack and Picks's results are very and so when we talk we must resort to
strongly suggestive of an argument that still another trick for increasing our
linguists and phoneticians have been channel capacity. Language uses se-
making for some time (11). According quences of phonemes, so we make sev-
to the linguistic analysis of the sounds eral judgments successively when we
of human speech, there are about eight listen to words and sentences. That is
or ten dimensions—the linguists call to say, we use both simultaneous and
them distinctive features—that distin- successive discriminations in order to
guish one phoneme from another. These expand the rather rigid limits imposed
distinctive features are usually binary, by the inaccuracy of our absolute judg-
or at most ternary, in nature. For ex- ments of simple magnitudes.
ample, a binary distinction is made be- These multidimensional judgments are
tween vowels and consonants, a binary strongly reminiscent of the abstraction
decision is made between oral and nasal experiment of Kulpe (14). As you may
consonants, a ternary decision is made remember, Kiilpe showed that observers
among front, middle, and back pho- report more accurately on an attribute
nemes, etc. This approach gives us for which they are set than on attributes
quite a different picture of speech per- for which they are not set. For exam-
ception than we might otherwise obtain ple, Chapman (4) used three different
from our studies of the speech spectrum attributes and compared the results ob-
and of the ear's ability to discriminate
tained when the. observers were in-
relative differences among pure tones.
structed before the tachistoscopic pres-
I am personally much interested in this
entation with the results obtained when
new approach (15), and I regret that
they were not told until after the pres-
there is not time to discuss it here.
entation which one of the three attri-
It was probably with this linguistic
theory in mind that Pollack and Picks butes was to be reported. When the
conducted a test on a set of tonal instruction was given in advance, the
stimuli that varied in eight dimensions, judgments were more accurate. When
but required only a binary decision on the instruction was given afterwards,
each dimension. With these tones they the subjects presumably had to judge all
measured the transmitted information three attributes in order to report on
at 6.9 bits, or about 120 recognizable any one of them and the accuracy was
kinds of sounds. It is an intriguing correspondingly lower. This is in com-
question, as yet unexplored, whether plete accord with the results we have
one can go on adding dimensions in- just been considering, where the ac-
definitely in this way. curacy of judgment on each attribute
In human speech there is clearly a decreased as more dimensions were
limit to the number of dimensions that added. The point is probably obvious,
we use. In this instance, however, it is but I shall make it anyhow, that the
not known whether the limit is imposed abstraction experiments did not demon-
by the nature of the perceptual ma- strate that people can judge only one
chinery that must recognize the sounds attribute at a time. They merely showed
or by the nature of the speech ma- what seems quite reasonable, that peo-
chinery that must produce them. Some- ple are less accurate if they must judge
body will have to do the experiment to more than one attribute simultaneously.
90 GEORGE A. MILLER
SUBITIZING two dimensions of numerousness are

I cannot leave this general area with- area and density. When the subject
out mentioning, however briefly, the ex- can subitize, area and density may not
periments conducted at Mount Holyoke be the significant variables, but when
College on the discrimination of num- the subject must estimate perhaps they
ber (12). In experiments by Kaufman, are significant. In any event, the com-
Lord, Reese, and Volkmann random parison is not so simple as it might
patterns of dots were flashed on a screen seem at first thought.
for y5 of a second. Anywhere from 1 This is one of the ways in which the
to more than 200 dots could appear in magical number seven has persecuted
the pattern. The subject's task was to me. Here we have two closely related
report how many dots there were. kinds of experiments, both of which
The first point to note is that on pat- point to the significance of the number
terns containing up to five or six dots seven as a limit on our capacities. And
the subjects simply did not make errors. yet when we examine the matter more
The performance on these small num- closely, there seems to be a reasonable
bers of dots was so different from the suspicion that it is nothing more than
performance with more dots that it was a coincidence.
given a special name. Below seven the
THE SPAN OF IMMEDIATE MEMOKY
subjects were said to subitize; above
seven they were said to estimate. This Let me summarize the situation in
is, as you will recognize, what we once this way. There is a clear and definite
optimistically called "the span of atten- limit to the accuracy with which we can
tion." identify absolutely the magnitude of
This discontinuity at seven is, of a unidimensional stimulus variable. I
course, suggestive. Is this the same would propose to call this limit the
basic process that limits our unidimen- span of absolute judgment, and I
sional judgments to about seven cate- maintain that for unidimensional judg-
gories? The generalization is tempting, ments this span is usually somewhere
but not sound in my opinion. The data in the neighborhood of seven. We are
on number estimates have not been ana- not completely at the mercy of this
lyzed in informational terms; but on limited span, however, because we have
the basis of the published data I would a variety of techniques for getting
guess that the subjects transmitted around it and increasing the accuracy
something more than four bits of in- of our judgments. The three most im-
formation about the number of dots. portant of these devices are (a) to
Using the same arguments as before, we make relative rather than absolute judg-
would conclude that there are about 20 ments; or, if that is not possible, (b)
or 30 distinguishable categories of nu- to increase the number of dimensions
merousness. This is considerably more along which the stimuli can differ; or
information than we would expect to (c) to arrange the task in such a way
get from a unidimensional display. It that we make a sequence of several ab-
is, as a matter of fact, very much like a solute judgments in a row.
two-dimensional display. Although the The study of relative judgments is
dimensionality of the random dot pat- one of the oldest topics in experimental
terns is not entirely clear, these results psychology, and I will not pause to re-
are in the same range as Klemmer and view it now. The second device, in-
Frick's for their two-dimensional dis- creasing the dimensionality, we have just
play of dots in a square. Perhaps the considered. It seems that by adding
more dimensions and requiring crude, a lot of different kinds of test materials
binary, yes-no judgments on each at- this span is about seven items in length.
tribute we can extend the span of abso- I have just shown you that there is a
lute judgment from seven to at least span of absolute judgment that can dis-
ISO. Judging from our everyday be- tinguish about seven categories and that
havior, the limit is probably in the there is a span of attention that will
thousands, if indeed there is a limit. In encompass about six objects at a glance.
my opinion, we cannot go on compound- What is more natural than to think that
ing dimensions indefinitely. I suspect all three of these spans are different as-
that there is also a span of perceptual pects of a single underlying process?
dimensionality and that this span is And that is a fundamental mistake, as
somewhere in the neighborhood of ten, I shall be at some pains to demonstrate.
but I must add at once that there is no This mistake is one of the malicious
objective evidence to support this sus- persecutions that the magical number
picion. This is a question sadly need- seven has subjected me to.
ing experimental exploration. My mistake went something like this.
Concerning the third device, the use We have seen that the invariant fea-
of successive judgments, I have quite a ture in the span of absolute judgment
bit to say because this device introduces is the amount of information that the
memory as the handmaiden of discrimi- observer can transmit. There is a real
nation. And, since mnemonic processes operational similarity between the ab-
are at least as complex as are perceptual solute judgment experiment and the
processes, we can anticipate that their immediate memory experiment. If im-
interactions will not be easily disen- mediate memory is like absolute judg-
tangled. ment, then it should follow that the in-
Suppose that we start by simply ex- variant feature in the span of immediate
tending slightly the experimental pro- memory is also the amount of informa-
cedure that we have been using. Up tion that an observer can retain. If the
to this point we have presented a single amount of information in the span of
stimulus and asked the observer to name immediate memory is a constant, then
it immediately thereafter. We can ex- the span should be short when the indi-
tend this procedure by requiring the ob- vidual items contain a lot of informa-
server to withhold his response until we tion and the span should be long when
have given him several stimuli in suc- the items contain little information. For
cession. At the end of the sequence of example, decimal digits are worth 3.3
stimuli he then makes his response. We bits apiece. We can recall about seven
still have the same sort of input-out- of them, for a total of 23 bits of in-
put situation that is required for the formation. Isolated English words are
measurement of transmitted informa- worth about 10 bits apiece. If the total
tion. But now we have passed from amount of information is to remain
an experiment on absolute judgment to constant at 23 bits, then we should be
what is traditionally called an experi- able to remember only two or three
ment on immediate memory. words chosen at random. In this way
Before we look at any data on this I generated a theory about how the span
topic I feel I must give you a word of of immediate memory should vary as a
warning to help you avoid some obvi- function of the amount of information
ous associations that can be confusing. per item in the test materials.
Everybody knows that there is a finite The measurements of memory span in
span of immediate memory and that for the literature are suggestive on this
92 GEORGE A. MILLER
question, but not definitive. And so it

was necessary to do the experiment to
see. Hayes (10) tried it out with five
different kinds of test materials: binary
digits, decimal digits, letters of the al-
phabet, letters plus decimal digits, and
with 1,000 monosyllabic words. The
lists were read aloud at the rate of one
item per second and the subjects had as
much time as they needed to give their
responses. A procedure described by " 0 1 2 3 4 5 6
Woodworth (20) was used to score the
INFORMATION PER ITEM IN BITS
responses.
The results are shown by the filled FIG. 8. Data from Pollack (16) on the
circles in Fig. 7. Here the dotted line amount of information retained after one
presentation plotted as a function of the
indicates what the span should have amount of information per item in the test
been if the amount of information in the materials.
span were constant. The solid curves
represent the data. Hayes repeated the There is nothing wrong with Hayes's
experiment using test vocabularies of experiment, because Pollack (16) re-
different sizes but all containing only peated it much more elaborately and
English monosyllables (open circles in got essentially the same result. Pol-
Fig. 7). This more homogeneous test lack took pains to measure the amount
material did not change the picture sig- of information transmitted and did not
nificantly. With binary items the span rely on the traditional procedure for
is about nine and, although it drops to scoring the responses. His results are
about five with monosyllabic English plotted in Fig. 8. Here it is clear that
words, the difference is far less than the amount of information transmitted
the hypothesis of constant information is not a constant, but increases almost
would require. linearly as the amount of information
per item in the input is increased.
BINARY DECIMAL [-BETTERS |000
DIGITS DIGITS .-LETTERS a DIGITS WORDS And so the outcome is perfectly clear.
P- J i U 1
DU In spite of the coincidence that the
1 magical number seven appears in both
> places, the span of absolute judgment
£ 40 i
O
2 * CONSTANT and the span of immediate memory are
2
30
\INFORMATION
\ quite different kinds of limitations that
z \
tft
\
»
are imposed on our ability to process
2 \ information. Absolute judgment is lim-
K 20 \
— \
s
ited by the amount of information. Im-
fc mediate memory is limited by the num-
>
o: 10 • -y, t* ^<
m ^~ - J —n ~~I'_1
ber of items. In order to capture this dis-
5 WOROS-^ tinction in somewhat picturesque terms,
3
z O
10 I have fallen into the custom of distin-
INFORMATION PER ITEM IN BITS
guishing between bits of information
and chunks of information. Then I can
FIG. 7. Data from Hayes (10) on the span
of immediate memory plotted as a function say that the number of bits of informa-
of the amount of information per item in the tion is constant for absolute judgment
test materials. and the number of chunks of informa-
tlon is constant for immediate memory. achieved at different rates and overlap
The span of immediate memory seems each other during the learning process.
to be almost independent of the number I am simply pointing to the obvious
of bits per chunk, at least over the fact that the dits and dahs are organ-
range that has been examined to date. ized by learning into patterns and that
The contrast of the terms bit and as these larger chunks emerge the
chunk also serves to highlight the fact amount of message that the operator
that we are not very definite about what can remember increases correspondingly.
constitutes a chunk of information. For In the terms I am proposing to use, the
example, the memory span of five words operator learns to increase the bits per
that Hayes obtained when each word chunk.
was drawn at random from a set of 1000 In the jargon of communication the-
English monosyllables might just as ap- ory, this process would be called reced-
propriately have been called a memory ing. The input is given in a code that
span of IS phonemes, since each word contains many chunks with few bits per
had about three phonemes in it. Intui- chunk. The operator recedes the input
tively, it is clear that the subjects were into another code that contains fewer
recalling five words, not IS phonemes, chunks with more bits per chunk. There
but the logical distinction is not im- are many ways to do this receding, but
mediately apparent. We are dealing probably the simplest is to group the
here with a process of organizing or input events, apply a new name to the
grouping the input into familiar units group, and then remember the new name
or chunks, and a great deal of learning rather than the original input events.
has gone into the formation of these Since I am convinced that this proc-
familiar units. ess is a very general and important one
for psychology, I want to tell you about
RECODING a demonstration experiment that should
In order to speak more precisely, make perfectly explicit what I am talk-
therefore, we must recognize the impor- ing about. This experiment was con-
tance of grouping or organizing the in- ducted by Sidney Smith and was re-
put sequence into units or chunks. ported by him before the Eastern Psy-
Since the memory span is a fixed num- chological Association in 1954.
ber of chunks, we can increase the num- Begin with the observed fact that peo-
ber of bits of information that it con- ple can repeat back eight decimal digits,
tains simply by building larger and but only nine binary digits. Since there
larger chunks, each chunk containing is a large discrepancy in the amount of
more information than before. information recalled in these two cases,
A man just beginning to learn radio- we suspect at once that a receding pro-
telegraphic code hears each dit and dah cedure could be used to increase the
as a separate chunk. Soon he is able span of immediate memory for binary
to organize these sounds into letters and digits. In Table 1 a method for group-
then he can deal with the letters as ing and renaming is illustrated. Along
chunks. Then the letters organize the top is a sequence of 18 binary digits,
themselves as words, which are still far more than any subject was able to
larger chunks, and he begins to hear recall after a single presentation. In
whole phrases. I do not mean that each the next line these same binary digits
step is a discrete process, or that pla- are grouped by pairs. Four possible
teaus must appear in his learning curve, pairs can occur: 00 is renamed 0, 01 is
for surely the levels of organization are renamed 1, 10 is renamed 2, and 11 is
GEORGE A. MILLER
TABLE 1
WAYS OP RECODING SEQUENCES OF BINARY DIGITS
Binary Digits (Bits) 1 0 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0
2:1 Chunks 10 10 00 10 01 11 00 11 10
Receding 2 2 0 2 1 3 0 3 2
3:1 Chunks 101 000 100 111 001 110
Receding 5 0 4 7 1 6
4:1 Chunks 1010 0010 0111 0011 10

Receding 10 2 7 3
5:1 Chunks 10100 01001 11001 110

Receding 20 9 25
renamed 3. That is to say, we recode The receding schemes increased their

from a base-two arithmetic to a base- span for binary digits in every case.
four arithmetic. In the receded se- But the increase was not as large as we
quence there are now just nine digits to had expected on the basis of their span
remember, and this is almost within the for octal digits. Since the discrepancy
span of immediate memory. In the next increased as the receding ratio increased,
line the same sequence of binary digits we reasoned that the few minutes the
is regrouped into chunks of three. There subjects had spent learning the reced-
are eight possible sequences of three, so ing schemes had not been sufficient.
we give each sequence a new name be- Apparently the translation from one
tween 0 and 7. Now we have receded code to the other must be almost auto-
from a sequence of 18 binary digits matic or the subject will lose part of the
into a sequence of 6 octal digits, and next group while he is trying to remem-
this is well within the span of immedi- ber the translation of the last group.
ate memory. In the last two lines the Since the 4:1 and 5:1 ratios require
binary digits are grouped by fours and considerable study, Smith decided to
by fives and are given decimal-digit imitate Ebbinghaus and do the experi-
names from 0 to IS and from 0 to 31. ment on himself. With Germanic pa-
It is reasonably obvious that this kind tience he drilled himself on each reced-
of receding increases the bits per chunk, ing successively, and obtained the re-
and packages the binary sequence into sults shown in Fig. 9. Here the data
a form that can be retained within the follow along rather nicely with the re-
span of immediate memory. So Smith sults you would predict on the basis of
assembled 20 subjects and measured his span for octal digits. He could re-
their spans for binary and octal digits. member 12 octal digits. With the 2:1
The spans were 9 for binaries and 7 for receding, these 12 chunks were worth
octals. Then he gave each receding 24 binary digits. With the 3:1 reced-
scheme to five of the subjects. They ing they were worth 36 binary digits.
studied the receding until they said With the 4:1 and 5:1 recodings, they
they understood it—for about S or 10 were worth about 40 binary digits.
minutes. Then he tested their span for It is a little dramatic to watch a per-
binary digits again while they tried to son get 40 binary digits in a row and
use the receding schemes they had then repeat them back without error.
studied. However, if you think of this merely as
THE MAGICAL NUMBER SEVEN
50 eyewitnesses is well known in legal psy-

1 chology, but the distortions of testi-
> 40 PREDICTED
FROM SPAN FOR
mony are not random—they follow
OCTAL DIGITS
naturally from the particular recoding
S 30 that the witness used, and the particu-
lar recoding he used depends upon his
e ,*/ N
OBSERVED whole life history. Our language is tre-
z 20
mendously useful for repackaging ma-
terial into a few chunks rich in infor-
10
mation. I suspect that imagery is a
ONE HIGHLY PRACTICED SUBJECT
form of recoding, too, but images seem
2:1 3:1 4:1 SM much harder to get at operationally and
to study experimentally than the more
RECOOINO RATIO
symbolic kinds of recoding.
FIG. 9. The span of immediate memory for It seems probable that even memori-
binary digits is plotted as a function of the zation can be studied in these terms.
receding procedure used. The predicted func-
tion is obtained by multiplying the span for The process of memorizing may be sim-
octals by 2, 3 and 3.3 for receding into base ply the formation of chunks, or groups
4, base 8, and base 10, respectively. of items that go together, until there
are few enough chunks so that we can
a mnemonic trick for extending the recall all the items. The work by Bous-
memory span, you will miss the more field and Cohen (2) on the occurrence
important point that is implicit in of clustering in the recall of words is
nearly all such mnemonic devices. The especially interesting in this respect.
point is that receding is an extremely
powerful weapon for increasing the SUMMARY
amount of information that we can I have come to the end of the data
deal with. In one form or another we that I wanted to present, so I would
use receding constantly in our daily like now to make some summarizing re-
behavior. marks.
In my opinion the most customary First, the span of absolute judgment
kind of receding that we do all the time and the span of immediate memory im-
is to translate into a verbal code. When pose severe limitations on the amount
there is a story or an argument or an of information that we are able to re-
idea that we want to remember, we usu- ceive, process, and remember. By or-
ally try to rephrase it "in our own ganizing the stimulus input simultane-
words." When we witness some event ously into several dimensions and suc-
we want to remember, we make a verbal cessively into a sequence of chunks, we
description of the event and then re- manage to break (or at least stretch)
member our verbalization. Upon recall this informational bottleneck.
we recreate by secondary elaboration Second, the process of recoding is a
the details that seem consistent with very important one in human psychol-
the particular verbal recoding we hap- ogy and deserves much more explicit at-
pen to have made. The well-known ex- tention than it has received. In par-
periment by Carmichael, Hogan, and ticular, the kind of linguistic recoding
Walter (3) on the influence that names that people do seems to me to be the
have on the recall of visual figures is very lifeblood of the thought processes.
one demonstration of the process. Recoding procedures are a constant
The inaccuracy of the testimony of concern to clinicians, social psycholo-
96 GEORGE A. MILLER
gists, linguists, and anthropologists and for absolute judgment, the seven ob-
yet, probably because receding is less jects in the span of attention, and the
accessible to experimental manipulation seven digits in the span of immediate
than nonsense syllables or T mazes, the memory? For the present I propose to
traditional experimental psychologist has withhold judgment. Perhaps there is
contributed little or nothing to their something deep and profound behind all
analysis. Nevertheless, experimental these sevens, something just calling out
techniques can be used, methods of re- for us to discover it. But I suspect
coding can be specified, behavioral in- that it is only a pernicious, Pythagorean
dicants can be found. And I anticipate coincidence.
that we will find a very orderly set of
relations describing what now seems an REFERENCES
uncharted wilderness of individual dif- 1. BEEBE-CENTER, J. G., ROGERS, M. S., &
ferences. O'CoNNELL, B. N. Transmission of in-
Third, the concepts and measures formation about sucrose and saline solu-
provided by the theory of information tions through the sense of taste. J.
Psychol., 19SS, 39, 157-160.
provide a quantitative way of getting at 2. BOUSFIELD, W. A., & COHEN, B. H. The
some of these questions. The theory occurrence of clustering in the recall of
provides us with a yardstick for cali- randomly arranged words of different
brating our stimulus materials and for frequencies-of-usage. /. gen. Psychol.,
measuring the performance of our sub- 1955, 52, 83-9S.
3. CARMICHAEL, L., HOGAN, H. P., & WALTER,
jects. In the interests of communica- A. A. An experimental study of the
tion I have suppressed the technical de- effect of language on the reproduction
tails of information measurement and of visually perceived form. /. exp.
have tried to express the ideas in more Psychol., 1932, IS, 73-86.
familiar terms; I hope this paraphrase 4. CHAPMAN, D. W. Relative effects of de-
terminate and indeterminate Aufgaben.
will not lead you to think they are not Amer. J. Psychol, 1932, 44, 163-174.
useful in research. Informational con- 5. ERIKSEN, C. W. Multidimensional stimu-
cepts have already proved valuable in lus differences and accuracy of discrimi-
the study of discrimination and of lan- nation. VSAF, WADC Tech. Rep.,
guage; they promise a great deal in the 1954, No. 54-165.
6. ERIKSEN, C. W., & HAKE, H. W. Abso-
study of learning and memory; and it lute judgments as a function of the
has even been proposed that they can stimulus range and the number of
be useful in the study of concept for- stimulus and response categories. /.
mation. A lot of questions that seemed exp. Psychol., 1955, 49, 323-332.
fruitless twenty or thirty years ago may 7. GARNER, W. R. An informational analy-
sis of absolute judgments of loudness.
now be worth another look. In fact, I /. exp. Psychol., 1953, 46, 373-380.
feel that my story here must stop just 8. HAKE, H. W., & GARNER, W. R. The ef-
as it begins to get really interesting. fect of presenting various numbers of
And finally, what about the magical discrete steps on scale reading accuracy,
number seven? What about the seven /. exp. Psychol., 1951, 42, 358-366.
9. HALSEY, R. M., & CHAPANIS, A. Chro-
wonders of the world, the seven seas, maticity-confusion contours in a com-
the seven deadly sins, the seven daugh- plex viewing situation. J. Opt. Soc.
ters of Atlas in the Pleiades, the seven Amer., 1954, 44, 442-4S4.
ages of man, the seven levels of hell, 10. HAYES, J. R. M. Memory span for sev-
the seven primary colors, the seven notes eral vocabularies as a function of vo-
cabulary size. In Quarterly Progress
of the musical scale, and the seven days Report, Cambridge, Mass.: Acoustics
of the week? What about the seven- Laboratory, Massachusetts Institute of
point rating scale, the seven categories Technology, Jan.-June, 1952.
11. JAKOBSON, R,, FANT, C. G. M., & HALLE, English consonants. J. Acoust. Soc.
M, Preliminaries to speech analysis. Amer., 1955, 27, 338-352.
Cambridge, Mass.: Acoustics Labora- 16. POLLACK, I. The assimilation of sequen-
tory, Massachusetts Institute of Tech- tially encoded information. Amer. J.
nology, 19S2. (Tech. Rep. No. 13.) Psychol., 1953, 66, 421-435.
12. KAUFMAN, E. L., LORD, M. W., REESE, 17. POLLACK, I. The information of elemen-
T. W., & VOLKMANN, J. The discrimi- tary auditory displays. /. Acoust. Soc.
nation of visual number. Amer. J. Amer., 1952, 24, 745-749.
Psychol, 1949, 62, 498-525. 18. POLLACK, I. The information of elemen-
13. KLEMMEE, E. T., & FRICK, F. C. Assimi- tary auditory displays. II. /. Acoust.
lation of information from dot and Soc. Amer., 1953, 25, 765-769.
matrix patterns. /. exp. Psychol., 1953, 19. POLLACK, I., & FICKS, L. Information of
45, 15-19. elementary multi-dimensional auditory
14. KTJLPE, O. Versuche tiber Abstraktion. displays. /. Acoust. Soc. Amer., 1954,
26, 155-158.
Ber. u. d. I Kongr. f. exper. Psychol.,
1904, 56-68. 20. WOODWORTH, R. S. Experimental psy-
chology. New York: Holt, 1938.
15. MILLER, G. A., & NICELY, P. E. An analy-
sis of perceptual confusions among some (Received May 4, 1955)

Vol. 63, No. 1 March, 1956 The Psychological Review

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Vol. 63, No. 1 March, 1956 The Psychological Review

Încărcat de

Drepturi de autor:

Formate disponibile

VOL. 63, No.

THE PSYCHOLOGICAL REVIEW

My problem is that I have been perse- judgment. Historical accident, how-

and we can begin to look at some data. ABSOLUTE JUDGMENTS OF UNI-

periment we are interested in the second INPUT INFORMATION

transmitted information. The amount

and Garner (8) asked observers to in-

ABSOLUTE JUDGMENTS OF MULTI- suits. Now the channel capacity seems

luminance. Although they did not ana-

SUBITIZING two dimensions of numerousness are

question, but not definitive. And so it

Binary Digits (Bits) 1 0 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0

4:1 Chunks 1010 0010 0111 0011 10

5:1 Chunks 10100 01001 11001 110

renamed 3. That is to say, we recode The receding schemes increased their

50 eyewitnesses is well known in legal psy-

S-ar putea să vă placă și