Sunteți pe pagina 1din 7

Parallel Distributed Processing Models of

Memory
By James L. McClelland
The Gale Group

This article describes a class of computational models that help us understand some of the
most important characteristics of human memory. The computational models are called
parallel distributed processing (PDP) models because memories are stored and retrieved in a
system consisting of a large number of simple computational elements, all working at the
same time and all contributing to the outcome. They are sometimes also called connectionist
models because the knowledge that governs retrieval is stored in the strengths of the
connections among the elements.

The article begins with a common metaphor for human memory, and shows why it fails to
capture several key characteristics of memory that are captured by the PDP approach. Then a
brief statement of the general characteristics of PDP systems is given. Following this, two
specific models are presented that capture key characteristics of memory in slightly different
ways. Strengths and weaknesses of the two approaches are considered, and a synthesis is
presented. The article ends with a brief discussion of the techniques that have been developed
for adjusting connection strengths in PDP systems.

Characteristics of Memory
A common metaphor for human memory might be called the "computer file" metaphor. On
this metaphor, we store a copy of an idea or experience in a file, which we can later retrieve
and reexamine. There are several problems with this view.

Memories are accessed by content.

First of all, the natural way of accessing records in a computer is by their address in the
computer. However, what actually happens in human memory is that we access memories by
their contents. Any description that uniquely identifies a memory is likely to be sufficient for
recall. Even more interesting, each individual element of the description may be nearly
useless by itself, if it applies to many memories; only the combination needs to be unique.
Thus
"He bet on sports. He played baseball."
is enough for many people to identify Pete Rose, even through the cues about baseball and
betting on sports would not generally be sufficient as cues individually, since each matches
too many memories.

Memory fills in gaps.

The computer-file metaphor also misses the fact that when we recall, we often fill in
information that could not have been part of the original record. Pieces of information that
were not part of the original experience intrude on our recollections. Sometimes these
intrusions are misleading, but often enough they are in fact helpful reconstructions based on
things we know about similar memories. For example, if we are told that someone has been
shot by someone else from a distance of 300 yards, we are likely to recall later that a rifle was
used, even though this was not mentioned when we heard about the original event.

Memory generalizes over examples.

A third crucial characteristic of memory is that it allows us to form generalizations. If every


apricot we see is orange, we come to treat this as an inherent characteristic of apricots. But if
cars come in many different colors, we come to treat the color as a freely varying property. So
when we are asked to retrieve the common properties of apricots, the color is a prominent
element of our recollection; but no color comes out when we are asked to retrieve the
common properties of cars.

Proponents of the computer-file view of memory deal with these issues by adding special
processes. Access by content is done by laborious sequential search. Reconstruction is done
by applying inferential processes to the retrieved record. Generalization occurs through a
process of forming explicit records for the category (e.g., car or apricot).

In PDP systems, these three characteristics of memory are intrinsic to the operation of the
memory system.

Characteristics of PDP Systems


A PDP system consists of a large number of neuron-like computing elements called units.
Each unit can take on an activation value between some maximum and minimum values,
often 1 and 0. In such systems, the representation of something that we are currently thinking
about is a pattern of activation over the computing elements. Processing occurs by the
propagation of activation from one unit to another via connections among the units. A
connection may be excitatory (positive-valued) or inhibitory (negative-valued). If the
connection from one unit to another is excitatory, then the activation of the receiving unit
tends to increase whenever the sending unit is active. If the connection is inhibitory, then the
activation of the receiving unit tends to decrease. But note that each unit may receive
connections from many other units. The actual change in activation, then, is based on the net
input, aggregated over all of the excitatory and inhibitory connections.

In a system like this, the knowledge that governs processing is stored in the connections
among the units, for it is these connections that determine what pattern will result from the
presentation of an input. Learning occurs through adjustments of connection strengths.
Memory storage is just a form of learning, and also occurs by connection weight adjustment.

To make these ideas concrete, we now examine two PDP models of memory. The models
differ in a crucial way. In the first, each individual computing element (henceforth called a
unit) represents a separate cognitive unit, be it a feature (for example, the color of
something), or a whole object, or the object's name. When we are remembering events, there
is a unit for each event. Such models are called localist models. In the second type of model,
cognitive units are not separately assigned to individual computing units. Rather, the
representation of each cognitive unit is thought of as a pattern of activation over an ensemble
of computing units. Alternative objects of thought are represented by alternative patterns of
activation. This type of model is called a distributed model.
A Localist PDP Model of Memory
McClelland (1981) presented a PDP model that illustrates the properties of access by content,
filling in of gaps, and generalization. The database for the model is shown in Figure 1. The
network is shown in Figure 2.

The data base consists of descriptions of a group of people who are members of two gangs,
the Jets and the Sharks. Each person has a name, and the list specifies the age, marital status,
and education of each person. Perusal of the list reveals that the Jets are, by and large,
younger and less well educated than the Sharks, and tend to be single rather than married.
However, these tendencies are not absolute and, furthermore, there is no single Jet who has
all of the properties that tend to be typical of Jets.

The goal of the network is to allow retrieval of general and specific information about
individuals in the data base. The network consists of a unit for each person (in the center of
Figure 2) and a unit for each property (name, age, educational level, occupation, gang) that a
person can have. Units are grouped into pools by type as shown, so that all the name units are
in one pool, for instance. There is a bidirectional excitatory connection between each person's
unit and the units for each of his properties; and there are bidirectional inhibitory connections
between units that can be thought of as incompatible alternatives. Thus there is inhibition
between the different occupation units, between the different age units, and so on. There is
also inhibition between the different name units and between the units for different
individuals.

In this network, units take on activation values between 1 and -0.2. The output is equal to the
activation, unless the activation is less than 0; then there is no output. In the absence of input,
the activations of all the units are set to a resting value of -0.1.

Retrieval by Name
Retrieval begins with the presentation of a probe, in the form of externally supplied input to
one or more of the property units. To retrieve the properties of Lance, for example, we need
only turn on the name unit for Lance. The activation process is gradual and builds up over
time, eventually resulting in a stable pattern that in this case represents the properties of
Lance. Activation spreads from the name unit to the property units by way of the instance
unit. Feedback from activated properties tends to activate the instance units for other
individuals, but because of the mutual inhibition, these activators are kept relatively low.

Retrieval by Content
It should be clear how we can access an individual by properties, as well as by name. As long
as we present a set of properties that uniquely matches a single individual, retrieval of the rest
of what is known of properties of that individual is quite good. Other similar individuals may
become partially active, but the correct person unit will dominate the person pool, and the
correct properties will be activated.

Filling in Gaps
Suppose that we delete the connection between Lance and burglar. This creates a gap in the
database. However, the model can fill in this gap, in the following way. As the other
properties of Lance become active, they in turn feed back activation to units for other
individuals similar to Lance. Because the instance unit for Lance himself is not specifying
any activation for an occupation, the instance units for other, similar individuals conspire
together to fill in the gap. In this case it turns out that there is a group of individuals who are
very similar to Lance and who are all burglars. As a result, the network fills in burglar for
Lance as well. One may view this as an example of guilt by association. In this case, it so
happens that the model is correct in filling in burglar, but of course this kind of filling in is
by no means guaranteed to be correct. Similarly, in human memory, our reconstructions of
past events often blend in the contents of other, similar events.

Generalization
The model can be used to retrieve a generalization over a set of individuals who match a
particular probe. For example, one can retrieve the typical properties of Jets simply by
turning on the Jet unit and allowing the network to settle. The result is that the network
activates 20s, junior high, and single strongly. No name is strongly activated, and the three
occupations are all activated about equally, reflecting the fact that all three occur with equal
frequency among the Jets.

Figure 1

In summary, this simple model shows how retrieval by content, filling in gaps, and
generalization are intrinsic to the process of retrieval in the PDP approach to memory.

A Distributed PDP Model of Memory


The second model to be considered is a distributed model. Many authors (e.g., Kohonen,
1977; Anderson et al., 1977) have proposed variants of such models. The one shown in
Figure 3 is from McClelland and Rumelhart (1985). The model is called distributed because
there are no single units for individuals or for properties. Instead, the representation to be
stored is a distributed pattern over the entire set of units. Similar memories are represented by
similar patterns, as before; but now each unit need not correspond to a specific feature or
property, and there are no separate units for the item as a whole. Again, the knowledge is
stored in the connections among the units.

Methods for training such networks will be considered in more detail below. Suffice it to note
one simple method, called the Hebbian method. According to this method, we increase the

connection
Figure 2
Some of the units and interconnections needed to represent the individuals shown in Figure 1.
The units connected with doubleheaded arrows are mutually excitatory. All the units within
the same cloud are mutually inhibitory. strength between two units if they are both active in a
particular pattern at the same time.

Distributed networks trained with this Hebbian learning rule exhibit many of the properties of
localist networks. They perform an operation, called pattern completion, that is similar to
retrieval by content. In pattern completion, any part of the pattern can be used as a cue to
retrieve the rest of the pattern, although there are limits to this that we will consider below.
Because many memories are stored using the same connection weights, they have a very
strong tendency to fill in gaps in one pattern with parts of other, similar patterns. These
models also generalize. When similar patterns are stored, what is learned about one pattern
will tend to transfer to those parts that it has in common with the other. When a set of similar
patterns is stored, what is common to all of them will build up as each example is learned;
what is different will cancel out.

There is a final important property of distributed memory models, and that is graceful
degradation. The knowledge that governs the ability to reconstruct each pattern is distributed
throughout the network, so if some of the connections are lost, it will not necessarily be
catastrophic. In fact, the network can function quite well even when many of the units are
destroyed, especially if it is relatively lightly loaded with memories.
A Synthesis
Each of the two models described above has some limitations. The localist model requires a
special instance unit to be devoted to each memory trace; this is inefficient, especially when
there is redundancy across different memories in terms of what properties tend to concur in
the same memory. On the other hand, the distributed model is limited because only a few
distinct patterns can be stored in the direct connections among the members of a set of units.

The best of both worlds can be obtained in a hybrid system, in which the various parts of the
representation of a memory are bound together by a set of superordinate units, as in the
localist model, but each superordinate unit participates in the representation of many different
memories, as in the distributed model.

Learning Rules for PDP Systems


Several of the learning rules for PDP systems are reviewed in Rumelhart, Hinton, and
McClelland (1986). Here we consider two main classes, Hebbian learning rules and error-
correcting learning rules. We have already mentioned the Hebbian learning rule, which
increases the strength of the connection between two units when both units are
simultaneously active. In a common variant, the strength of the connection is decreased when
one unit is active and the other is inactive.

These Hebbian learning rules are limited in what can be learned with them. Some of these
limitations are overcome by what are called error-correcting learning rules. In such learning
rules, the idea is that the pattern to be learned is treated not only as input but also as the target
for learning. A pattern is presented, and the network is allowed to settle. Once it has done so,
the discrepancies between the resulting pattern and the input pattern are used to determine
what changes should be made in the connections. For example, if a unit is activated that
should not be active, the connection weights coming into that unit from other active units will
be reduced. Several very powerful learning procedures for adjusting connection weights that
are based on the idea of reducing the discrepancy between output and target have been
developed in recent years. The best-known is the back-propagation learning procedure
(Rumelhart, Hinton, and Williams, 1986). Another important learning rule for PDP systems is
the Boltzmann machine learning rule (Ackley, Hinton, and Sejnowski, 1985). Both work well
in training the hybrid systems described above.

See also:Localization of Memory Traces, Neural Computation, Reconstructive Memory

Bibliography

Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A learning algorithm for
Boltzmann machines. Cognitive Science 9, 147-169.

Anderson, J. A., Silverstein, J. W., Ritz, S. A., and Jones, R. S. (1977). Distinctive features,
categorical perception, and probability learning: Some applications of a neural model.
Psychological Review 84, 413-451.

Hertz, J., Krogh, A., and Palmer, R. (1990). Introduction to the theory of neural computation.
Redwood City, CA: Addison-Wesley.
Hinton, G. E., and Anderson J. A., eds. (1981). Parallel models of associative memory.
Hillsdale, NJ: Erlbaum.

Kohonen, T. (1977). Associative memory: A system theoretical approach. New York:


Springer-Verlag.

McClelland, J. L. (1981). Retrieving general and specific information from stored knowledge
of specifics. Paper presented at the third annual meeting of the Cognitive Science Society.
Berkeley, CA.

McClelland, J. L., and Rumelhart, D. E. (1985). Distributed memory and the representation
of general and specific information. Journal of Experimental Psychology: General 114, 159-
188.

Rumelhart, D. E., Hinton, G. E., and McClelland, J. L. (1986). A general framework for
parallel distributed processing. In D. E. Rumelhart, J. L. McClelland, and the PDP Research
Group, eds., Parallel distributed processing: Explorations in the microstructures of cognition,
Vol. 1. Cambridge, MA: MIT Press.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal representations
by error propagation. In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group,
eds., Parallel distributed processing: Explorations in the microstructures of cognition, Vol. 1.
Cambridge, MA: MIT Press.

Rumelhart, D. E., McClelland, J. L., and the PDP Research Group. (1986). Parallel
distributed processing: Explorations in the microstructure of cognition, 2 vols. Cambridge,
MA: MIT Press.

Copyright 2003-2009 The Gale Group, Inc. All rights reserved.

S-ar putea să vă placă și