Bidirectional Asssociative Memory

ILEE IRANSALTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL 18, NO 1, JANUARY/FEBRUARY 1988
49
Bidirectional Associative Memories

BART KOSKO, MEMBER,
IEEE
Ahtract -Stability and encoding properties of two-layer nonlinear feedback neural networks are examined. Bidirectionality, forward and backward information flow, is introduced in neural nets to produce two-way associative search for stored associations ( A , , B , ) . Passing information through M gives one direction; passing it through its transpose M T gives the other. A bidirectional associative memory (BAM) behaves as a heteroa5sociative content addressable memory (CAM), storing and recalling the vector pairs ( A , , B , ) ; . , ( A , , B , ) , where A ~ { 0 , 1 } "and B E { O , ~ } P . We prove that ewry n-by-p matrix M is a bidirectionally stable heteroaswciative CAM for both binarypipolar and continuous neurons a, and b,. When the BAM neurons are activated, the network quickly evolves to a stable state of two-pattern reverberation, or resonance. The stable reverberation corresponds to a system energy local minimum. Heteroassociat i v e information is encoded in a BAM by summing correlation matrices. The BAM storage capacity for reliable recall is roughly m < mm(n, p). No more heteroassociative pairs can be reliably stored and recalled than the lesser of the dimensions of the pattern spaces (0,l)" and {O,l}P. The Appendix shows that it is better on average to use bipolar { - 1 , l ) coding than binary ( 0 , l ) coding of heteroassociative pairs (A,, B,). BAM encoding and decoding are combined in the adaptive BAM, which extends global bidirectional stability to realtime unsupervised learning. Temporal patterns ( A L , . , A , ) are represented as ordered lists of binary/bipolar vectors and stored in a temporal associative memory (TAM) n -by- n matrix M as a limit cycle of the dynamical system. Forward recall proceeds through M , backward recall through M T . Temporal patterns are stored by summing contiguous bipolar correlation matrices, XTX2 . .' + ,'A X,, generalizing the BAM storage procedure. This temporal encoding scheme is seen to be equitalent to a form of Grossberg outstar avalanche coding for spatiotemporal patterns. The storage capacity is rn = m1 f . . . + mk < n, where m, is the length of the j th temporal pattern and n is the dimension of the spatial pattern space. Limit cycles (A,; . . , A , , , A,) are shown to be 5tored in local energy minima of the binary state space {0,1}"
forward from one neuron field to the other by passing through the connection matrix M . Information passes backward through the matrix transpose MT. All other two-layer networks require more information in the form of backward connections N different from M T . The underlying mathematics are closely related to the properties of adjoint operators in function spaces, in particular how quadratic forms are essentially linearized by real matrices and their adjoints (transposes). Since every matrix M is bidirectionally stable, we suspect that gradually changes due to learning in M will result in stability. We show that this is so quite naturally for real-time unsupervised learning. This extends Lyapunov convergence of neural networks for the first time to learning. The neural network interpretation of a BAM is a twolayer hierarchy of symmetrically connected neurons. When the neurons are activated, the network quickly evolves to a stable state of two-pattern reverberation. The stable reverberation corresponds to a system energy local minimum. In the learning or adaptive BAM, the stable re+ verberation of a pattern ( A , , B,) across the two fields of neurons seeps pattern information into the long-term memory connections M , allowing input associations ( A , ,B,) to dig their own energy wells in which to reverberate. Temporal patterns are sequences of spatial patterns. Recalled temporal patterns are limit cycles. For instance, a I. STORING PAIRED AND TEMPORAL PATTERNS sequence of binary vectors can represent a harmonized melody. A given note or chord of the melody is often OW CAN paired-data associations ( A , ,B , ) be stored and recalled in a two-layer nonlinear feedback dy- sufficient to recollect the rest of the melody, to "name that namical system? What is the minimal neural network that tune." The same note or chord can be made to trigger the achieves this? We show that the introduction of bidirec- dual bidirectional memory to continue (recall) the rest of tionality, forward and backward associative search for the melody backwards to the start-a whstling feat worthy stored associations ( A , , B,), extends the symmetric unidi- of Mozart or Bach! Limit cycles can also be shown to be rectional autoassociators [30] of Cohen and Grossberg [7] energy minimizers of simple networks of synchronous and Hopfield [24], [25].Euety real matrix is both a discrete on-off neurons. The forward and backward directionality of BAM correand continuous bidirectionally stable associative memory. lation encoding naturally extends to the encoding of temThe bidirectional associative memory (BAM) is the minimal two-layer nonlinear feedback network. Information passes poral patterns or limit cycles. The correlation encoding algorithm is a discrete approximation of Hebbian learning, in particular, a type of Grossberg outstar avalanche Manuscript received December 3, 1986; revised November 3, 1987. This work was supported in part by the Air Force Office of Scientific [91-[121.
Research under Contract F49620-86-C-0070, and by the Advanced Research Projects Agency of the Department of Defense, ARPA Order 5794. The author is with the Department of Electrical Engineering, Systems, Signal, and Information Processing Institute, University of Southern California, Los Angeles, CA 90089. IEEE Log Number 8718862.
11. EVERY MATRIX IS BIDIRECTIONALLY STABLE
Traditional associative memories are unidirectional. Vector patterns A,, A , ; . ., A , are stored in a matrix memory
0018-9472/88/0100-0049$01.00 01988 IEEE
50
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL
18, NO 1,JANUARY/FLBRIA K Y 19x8
M . Input pattern A is presented to the memory by performing the multiplication A M and some subsequent nonlinear operation, such as thresholding, with resulting output A. A is either accepted as the recollection or fedback into M , whch produces A, and so on. A stable memory will eventually produce a fixed output A,. If the memory is a proper content addressable memory (CAM), then A , should be one of the stored patterns A , , . . . , A , . This feedback procedure behaves as if input A was unidirectionally fed through a chain of Ms: A M A + M + A+M+ +A,+M+A,+ Unidirectional CAMs are autoassociatiue [28]-[30]. Pieces of patterns recall entire patterns. In effect, autoassociative memories store the redundant pairs ( A , , A,), ( A 2 ,A 2 ) ; . - , ( A , , A m ) . In general, associative memories are heteroassociatiue. They store pairs of different data: (Al, B J , ( A 2 ,B 2 ) ; . . , ( A , , Bm). A , and B, are vectors in different vector spaces. For instance, if A , and B, are binary and hence depict sets, they may come from the respective vector spaces (0,l) and (0,l)P. If they are unit-interval valued and hence depict fuzzy sets [38], they may come from [0,1] and [0,1]P. Heteroassociative memories are usually used as oneshot memories. A is presented to M , B is output, and the process is finished. Hopefully, B will be closer to stored pattern B, than to all other stored patterns B, if the input A is closest to stored pattern A , . Kohonen [28]-[30] has shown how to guarantee this for matrix memories by using pseudoinverses as optimal orthogonal projections. For instance, M will always recall B, when presented with A , if all the stored input patterns A , are orthogonal. What is the minimal nonlinear feedback heteroassociative memory that stores and accurately recalls binary associations ( A , , B,)? Consider the chain A + M + B. Suppose A is closer to A , than to all the other stored input patterns A,. Suppose the memory M is sufficiently reliable so that the recollection B is relatively close to B,. Suppose further that M is an n-by-p matrix memory. We would like to somehow feedback B through the memory to increase the accuracy of the final recollection. The simplest way to do this is to multiply B by some p-by-n matrix memory (then threshold, say), and the simplest such memory is the transpose (adjoint) of M , M T . Whether the network is implemented electrically, optically, or biologically, M T is locally available information if M is. Any other feedback scheme requires additional information in the form of a matrix p-by-n matrix N distinct from M T . This gwes the new chain B + MT-+ A, where, hopefully, A is at least as close to A , as A is. We can then reverse direction again and feed A through M : A+ M + B. Continuing t h s bidirectional process, we produce a sequence of paired approximations to the stored pair ( A , , B,): ( A , B ) , (A, B), ( A , B), ( A , B ), . . . . Ideally, this sequence will quickly converge to some fixed pair (A,, B,), and this fixed pair will be ( A , ,B , ) or nearly so. A bidirectional associative memory (BAM) behaves as a heteroassociative CAM if it is represented by the chain of
--f --f
. e . .
recollection: A+M+B A + M ~ +B A+ M + B A t M T c B
A,
M - , B,
A , + M ~ B, +
This BAM chain makes explicit that a fixed pair ( A , , B,) corresponds to a stable network reverberation or resonance, in the spirit of Grossbergs adaptive resonance [ 5 ] , [6], [16]-[20]. It also makes clear that a fixed point A, of a symmetric autoassociative memory is a fixed pair (A,, A,) of a BAM. On the contrary, a BAM, indeed any heteroassociator, can be viewed as a symmetrized augmented autoassociator with connection matrix made up of zero block diagonal matrices, and with M and M T nonzero off-diagonal matrices, and C, = [A,IB,]. The fixed or stable points of autoassociative (autocorrelation) memories are often described as rocks on a stretched rubber sheet. An input pattern then behaves as a ballbearing on the rubber sheet as it minimizes its potential energy subject to frictional damping. Hecht-Nielsen [21] even defines artificial neural systems or neurocomputers as programmable dissipative dynamical systems. BAM fixed points are harder to visualize. Perhaps a frictionally damped pendulum dynamical system captures the backand-forth operations of A -+ M and M T + B , or perhaps a product-space ball bearing rolling into product-space potential energy wells. A pair ( A , B ) defines the state of the BAM M . We prove stability by identifying a Lyapunov or energy function E with each state ( A , B ) . In the autoassociative case when M is symmetric and zero diagonal, Hopfield [24], [25] has identified an appropriate E by E ( A ) = - AMA (actually, Hopfield uses half this quantity). We review Hopfields [24], [35], [37] argument to prove unidirectional stability for zero-diagonal symmetric matrices in asynchronous operation. We will then generalize this proof technique to establish bidirectional stability of arbitrary matrices. Equation (21) generalizes this proof to a spectrum of asynchronous BAM update strategies. Unidirectional stability follows since if A E = E , - E , IS caused by the kth neurons state change, Aa, = a,, - u k i ; then E can be expanded as E@)
=i f k j#k
a,a~m ,a ~k C a j m k / - a k C a t m , k
J
I
(1)
so that talung the difference E , - E , and dividing by ha,
KOSKO: BIDIRECTIONAL ASSOCIATIVE MEMORIES
51
gives
- k
similarly, AE
-=
CaJrnkJCazrnrk
J
BMT= C b j r n i j
i
(6)
-AM:AM^ (2) where Mk is the kth row of M , M k is the kth column. If M is symmetric, the right side of (2) is simply - 2 A M k . A M k is the input activation sum to neuron a k . As in the classical McCulloch-Pitts [34] bivalent neuron model, a thresholds to + 1 if A M k > 0, to - 1 if A M k < 0. Hence Auk and AMk agree in sign, and hence their product is positive (or zero). Hence AE = - 2 A a k ( A M k ) < 0. Since E is bounded, the unidirectional procedure converges on some A , such that E ( A f ) is a local energy minimum. The unidirectional autoassociative CAM procedure is in general unstable if M is not symmetric. For then the term AM: in (2) is the output activation sum from a k to the other neurons, and, in general, AM:# A M k . If the magnitude of the output sum exceeds the magnitude of the input sum and the two sums disagree in sign, AE > 0 occurs. The unidirectional CAM procedure is no longer a nearest neighbor classifier. Oscillation occurs. We propose the potential function
where M iis the ith row (column) of M ( M T ) .We take 0 as the threshold for all neurons. In summary, the threshold functions for a, and bj are
(7)
b,=
(i;
if A M J < 0 .
(8)
When a paired pattern ( A , B ) is presented to the BAM, the neurons in populations A and B are turned on or off according to the occurrence of 1s and 0s (-1s) in state vectors A and B. The neurons continue their asynchronous (or synchronous) state changes until a bidirectionally stable state ( A f ,B f ) is reached. We now prove that such a stable state is reached for any matrix M and that it corresponds to a local minimum of (3). E decreases along discrete trajectories in the phase space ( 0 ,l} x (0,l}*. We show this by showing that changes A a , and AbJ in state variables produce AE < 0. Note that A a I ,Ab, E { - l , O , l > for binary state variables and h a , , E ( A , B ) = - 1 / 2 A M B T - 1/2BMTAT (3) Ab, E { - 2 , 0 , 2 } for bipolar variables. We need only conas the BAM system energy of state ( A , B ) . Observe that sider nonzero changes in a , and bJ. Rewriting (4) as a BMTAT= B ( A M ) T = ( A M B T ) T =AMBT. The last equality double sum gives follows since, trivially, the transpose of a scalar equals the E ( A , B ) = - CaibJrnij scalar. Hence (3) is equivalent to
(4) This establishes that the BAM system energy is a welldefined concept since E( A , B ) = E ( B, A ) and makes clear that the Hopfield autoassociative energy corresponds to the special case when B = A . Analogously, if a two-dimensional pendulum has a stable equilibrium at the vertical, then the energy of the pendulum at a given angle is the same whether the angle is measured clockwise or counterclockwise from the vertical. Moreover, the equality E( A , B ) = E ( B, A ) holds even though the neurons in both the A and B networks behave asynchronously. The BAM recall procedure is a nonlinear feedback procedure. Each neuron a , in neuron population or field A and each neuron bJ in B independently and asynchronously (or synchronously) examines its input sum from the neurons in the other population, then changes state or not according to whether the input sum exceeds, equals, or falls short of the threshold. Hence we make the neuroclassical assumption that each neuron is either on (+ 1) or off (0 or - 1 ) according to whether its input sum exceeds or falls short of some numerical threshold; if the input sum equals the threshold, the neuron maintains its current state. The input sum to bJ is the column inner product A M J= I a , r n I J
I
E ( A , B ) = - AMBT.
=-
1 aibirnii- b k I a k r n i k .
~.
(9)
i j # k
Hence the energy change A E = E, - E, due to state change A a , is AE -- - bjrnkj = - BM;.
I
j
We recognize the right side of (10) as the input sum to a k from the threshold rule (7). Hence if 0 < A u k =10 =1, then (7) ensures that BM;> 0, and thus AE = - Aa,(BM:) < 0. Similarly, if Auk < 0, then (7) again ensures that aks input sum agrees in sign, BMT< 0, and thus A E = - Auk(BM;) < 0. Similarly, the energy change due to state change Ab, is
-- - - Cairn,,= - A M k
1
AE
(5)
where MJis the j t h column of M . The input sum to a , is,
Again we recognize the right side of (11) as the negative of the input sum to b, from the threshold rule (8). Hence Ab, > 0 only if A M k > 0, and Abk < 0 only if A M k < 0. In either case, AE = - A b k ( A M k )< 0. When A a , = AbJ = 0, A E = 0. Hence AE < 0 along discrete trajectories in { 0 , 1 } x{O,l}* (or in { - 1 , 1 } X { - l , l } j ) ,as claimed.
52
IEEE TRANSACTIONSON SYSTEMS, MAN, AND CYBERNETICS, VOL.
18, NO. 1.J A N C A K Y / F L F J R ~ I A K Y 1 9 X X
Since E is bounded below,
E ( A , B ) >,
xlm,,l,
J
for all A and all B , (12)
the BAM converges to some stable point ( A f ,B f ) such that E(Af7B f ) is a local energy minimum. Since the n-by-p matrix M in (3) was an arbitrary (real) matrix, every matrix is bidirectionally stable.
111. BAM ENCODING

Suppose we wish to store the binary (bipolar) patterns ( A , , B,); . . , ( A , , B,) at or near local energy minima. How can these association pairs be encoded in some BAM n-by-p matrix M ? In the previous section we showed how to decode an arbitrary M but not how to construct a specific M . We now develop a simple but general encoding procedure based upon familiar correlation techniques. The association ( A , , B,) can be viewed as a meta-rule or set-level logical implication: IF A,, THEN B,. However, bidirectionality implies that ( A , , B,) also represents the converse meta-rule: IF B,, THEN A , . Hence the logical relation between A , and B, is symmetric, namely, logical implication (set equivalence). The vector analogue of t h s symmetric biconditionality is correlation. The natural suggestion then is to memorize the association ( A , , B,) by forming the correlation matrix or vector outer product ATB,. The correlation matrix redundantly distributes the vector information in ( A , ,B,) in a parallel storage medium, a matrix. The next suggestion is to superimpose the m associations (Al7B,) by simply adding up the correlation matrices pointwise:
orization and superimposition. This yields the BAM memories
=
I
ATB,
(13)
with dual BAM memory M T given by
The associative memory defined in (13) is the emblem of linear associative network theory. It has been exhaustively studied in this context by Kohonen [27]-[30], Nakano [36], Anderson et al. [2]-[4], and several other researchers. In the overwhelming number of cases, M is used in a simple one-shot feedforward linear procedure. Consequently, much research [22], [23], [30] has focused on preprocessing of stored input ( A , ) patterns to improve the accuracy of one-iteration synchronous recall. In contrast, the BAM procedure uses (13) and (14) as system components in a nonlinear multi-iteration procedure to achieve heteroassociative content addressability. The fundamental biconditional nature of the BAM process naturally leads to the selection of vector correlation for the memorization process. However, the nonlinearity introduced by the thresholding in (7) and (8) renders the memories in (13) and (14) unsuitable for BAM storage. The candidate memory binary patterns ( A , , B,); . . , ( A , , , B,) must be transformed to bipolar patterns (XI, Y,), * . ,(X,, Y,) for proper mem-
Bi) can be erased from M ( M T )by adding Note that (Al7 XTr,C = - X T x to the right side of (15) since the bipolar complement yc= - y. Also note that X;yr = - XTY (= X T x . Hence encoding ( A , , B,) in memory encodes ( A : , B,)as well, and vice versa. The fundamental reason why (13) and (14) are unsuitable but (15) and (16) are suitable for BAM storage is that 0s in binary patterns are ignored when added, but - 1s in bipolar patterns are not: 1 + 0 =1 but 1 + (-1) =: 0. If the numbers are matrix entries that represent synaptic strengths, then multiplying and adding binary quantities can only produce excitatory connections or zero-weight connections. (We note, however, that (13) and (14) are functionally suitable if bipolar state vectors are used, although the neuronal interpretation is less clear than when (15) and (16) are used.) Multiplying and adding bipolar quantities produces excitatory and inhibitory connections. The connection strengths represent the frequency of excitatory and inhibitory connections in the individual correlation matrices. If e,, is the edge or connection strength between a , and h,, then ei, 5 0 according as the number of + 1 ijth entries in the m correlation matrices XTY, exceeds, equals, or falls short of the number of - 1 ijth entries. The magnitude o f e,, measures the preponderance of 1s over - ls, or - 1s over ls, in the summed matrices. Coding details aside, (15) encodes ( A , , B , ) in M by forming discrete reciprocal outstars [ 81-[ 121, in the language of Grossberg associative learning; see Figs. 1 and 2. Grossberg 181 has long since shown that the outstar is the minimal network capable of perfectly learning a spatial pattern. The reciprocal outstar framework provides a fertile context in which to interpret BAM convergence.
53
F .
bipolar features of Y, thus tending to produce B, when the input sum X , M is thresholded according to (8). The noise amplification coefficients, x,, = correct the noise terms Y, according to the Hamming distances H ( A , , A,). In particular,
x,,
5 0,
iff H ( A , , A , )
2n/2.
(19)
T h s relationshp holds because x,, is the number of vector slots in which A , and A, agree, n - H ( A , , A J ) ,minus the number of slots in which they differ, H ( A , , A,). Hence
XI] =
n -2H(A,, A,),
(20)
Fig. 2
The neurons { a , ; . a,} and { b,; . ., b,} can be interpreted as two symmetrically connected fields [5], [6], [15], [16], [20] FA and FB of bivalent threshold functions. BAM convergence then corresponds to a simple type of adaptive resonance [5], [6], [16]-[20]. Adaptive resonance occurs when recurrent neuronal activity (short-term memory) and variable connection strengths (long-term memory) equilibrate or resonant. The resonance is adaptive because the connection strengths gradually change. Hence BAM convergence represents nonadaptive resonance since the connections m,, are fixed by (15). (Later, and in Kosko [31], we allow BAM's to learn). Since connections typically change much slower than neuron activations change, BAM resonance may still accurately model interesting distributed behavior. Let us examine the synchronous behavior of BAM's when M and M are given by (15) and (16). Suppose we have stored ( A , , B,); . . , ( A , , B,) in the BAM, and we are presented with the pair ( A , B ) . We can initiate the recall process using A first or B first, or using them simultaneously. For simplicity, suppose we present the BAM with the stored pattern A , . Then we obtain the signal-noise expansion
e,
AIM= (A,X,T)Y+
c (A,X?)Y,
/ + I
which implies (19). If H ( A , , A,) = n/2, Y, is zeroed out of the input sum. If H ( A , , A,) < n / 2 , and hence if A , and A, are close, then x,, > 0 and 5 is positively amplified in direct proportion to strength of match between A , and A,. If H( A , , A , ) > n/2, then x,, < 0 and the complement y/' is positively amplified in direct proportion to H( A , , Af) since y/'= - 5. Thus the correction coefficients x convert the additive noise vectors y/ into a distance-weighted signal sum, thereby increasing the probability that the right side of (18) will approximate some positive multiple of y, then threshold to B z ( K ) .This argument still applies when an arbitrary vector A = A , is presented to the BAM. The BAM storage capacity is ultimately determined by the noise sum in (18). Roughly speakmg, this sum can be expected to outweigh the signal term(s) if m > n , where m is the number of stored pairs ( A , , B,), since n is the maximum signal amplification factor. Similarly, when presenting M T with B , the maximum signal term is p X , ; so m > p can be expected to produce unreliable recall. Hence a rough estimate of the BAM storage capacity is m < min(n, p ) . The BAM can be confused if like inputs are associated with unlike outputs or vice versa. Continuity must hold for all i and j: l/nH(A,, A,) = l / p H ( B,, B,). The foregoing argument assumes continuity. It trivially holds in the autoassociative ( A , ,A , ) case. Synchronous BAM behavior produces large energy changes. Hence few vector multiplies are required per recall. This is established by denoting AA = A , - A , = (Aa,; . -,Aa,) in the energy change equation
(17)
AE
AAMBT= I
C A a , b,m,,
J
or, if we use the bipolar version X I of A , , which, as established in the Appendix, improves recall reliability on average, then X , M = (X,X,T)E;+
=-
CAaiCbjmij
Z J
which is the sum of pointwise energy decreases - A a , BMIT, and similarly for AB = B, - B,. This argument also shows = nYj + (x , q T ) ~ that simple asynchronous behavior, as required by the Jf' Hopfield model [24], can be viewed as special case of (cly;,c,, y;," . ~ c p $ ) > ck>o. (I8) synchronous behavior, namely, when at most one Auk is nonzero per iteration. More generally, t h s argument shows Observe that the signal in (18) is given the maximum [31] that any subset of neurons in either field can be positive amplification factor n > 0. T h s exaggerates the updated per iteration-subset asynchrony.
J+l
c (GqjY,
=-CAU,BM,T.
I
(21)
54
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 18, NO. 1, JANUARY/FEBRUARY 198X
Let us examine a simple example of a BAM construction and synchronous operation. Suppose we wish to store the following four nonorthogonal associations:
A,=(l
lects the respective stored pairs ( A l , Bl), ( A 2 ,B2),( A 3 ,B, ), ( A 4 , B4). This is expected since the A , matches correspond
0 1 0 1 0 1 0 1 0 1 0 1 0 1)
B,=(1
B,=(l
1 1 1 0 0 0 0 1 I) 1 1 0 0 0 1 1 1 0)
1 0 0 1 1 0 0 1 1)
A,=(1
A4=(1
1 0 0 1 1 0 0 1 1 0 0 1 1 0) 1 1 1 0 0 0 0 1 1 1 1 0 0 0)
A 3 = ( 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1)
B,=(1
B 4 = ( l 0 1 0 1 0 1 0 1 0),
where rn = 4, n = 15, p =lo. The first step is to convert these binary associations into bipolar associations:
X,=(l
X,=(l X3=(l X4=(I
-1
1
1 -1 -1 1 -1 1 1
1 -1 1 -1
-1
1 -1 -1 -1 -1 1 -1 -1 1
1 -1 1 1
1 -1 1 1)
1 -1
1 -1 1
1)
1 -1
1 -1 -1
-1
-1
1 -1 -1 1
-1
1 -1)
1 1
-1
1
1 1
1 -1 1 -1 -1 1
1 1 1) 1 -1 -1 -1)
Y1=(1
Y*=(l Y3=(1
1 -1
1 1 -1) 1 1) 1 -1).
Y4=(1
1 -1 -1 1 -1 -1
1 -1
1 -1
-1 1 -1
Next the four vector outer-product correlation matrices to the correct specification of 15 variables, while the B, XTY,, X,'Y,, XTY3, and XrY4 are formed and added matches only correspond to the correct specification of 10 pointwise to form the BAM matrix M = XTY, variables. * * . XTY4:
'
4 2 2 - 2 0 -2 0 -2 4 0 0 - 2 2 0 0
2 0 0 - 4 2 0 2 0 2 - 2 - 2 - 4 4 0 0 -
2 -2 0 -2 2 0 0 - 4 2 0 0 0 0 0 2 0 2 2 - 4 - 2 0 0 - 2 0 2 2 0 2 4 0 2 4 2 - 2 0 - 2 2 - 2 0 - 2 2 2 0 - 2 0 0 2 0 0 0 - 2 0 2 - 2 0 2 2 2 0 2
0 -2 2 0 2 - 4 2 0 0 2 2 4 4 - 2 2 0 0 - 2 4 2 0 - 2 2 0 2 0 0 2 4 - 2
4 0 2 - 2 2 2 2 - 2 0 0 2 - 2 0 4 2 2 4 0 0 - 4 0 0 2 - 2 2 2 0 0 0 4
Then ( A , , I?,);. . , ( A 4 , B4) are stable points in {0,1}15X IV. CONTINUOUS AND ADAPTIVE BIDIRECTIONAL {O,l}'o with respective energies -56, -48, -60, and ASSOCIATIVE MEMORIES - 40. The BAM concepts and convergence proof discussed This BAM illustrates rapid convergence and accurate pattern completion. If A = (1 0 1 0 1 0 1 0 0 earlier pass over to the continuous or physical case. We 0 0 0 0 0 0) = A , , with H ( A , A , ) = 4 , then B , is prove that if the agggregate real-valued activation to the recalled in one synchronous iteration, and thus ( A , , B,) is ith neuron in FA and j t h neuron in FB, denoted a, and retrieved from memory since ( A , , B,) is stable. If B = bj, are transformed by bounded monotone-increasing sig(1 1 0 0 1 0 0 0 0 O)=B3, with H ( B , B , ) = nal functions S( a,) and S( b,), then every matrix is bidirec3, then ( A 3 ,B 3 ) is recalled in one iteration. Any of the tionally stable. Hence S' = dS( x ) / d x > 0. When the signal blended pairs ( A l , B4), ( A , , B3), ( A 3 ,B,), (A4, B , ) recol- functions take values in [0,1], the output state vectors
KOSKO: I1IUIKI.( 1 IONAL ASSOCIATIVE .MEMORIES
55
S( A ) = ( S (al); . .. S( a,,)) and S( B ) = ( S (b l ) , . . . S(b,)) are fuzzy sets [38]. Then BAM convergence often corresponds [31], [32] to minimization of a nonprobabilistic fuzzy entropy [33]. Suppose a , and b, are governed by the additive [16] dynamical equations
4, = - a , + x S ( b , ) m , ,
J
change, and since fixed weights always produce global stability, if the weights are slowly varied in (22), (23) we can expect stability in the learning case. The minimal [31] unsupervised correlation learning law is the signal Hebb law [lo], [12]:
m,, = - m,,
Hence the signal Hebb law learns an exponentially weighted average of sampled signals. Hence m , is bounded b/ = - b, + x S ( a , ) m , , J,. (23) and rapidly converges. Note that if the signals S are in the I bipolar interval [ - 1,1],t h s learning law asymptotically This dynamical model is a direct generalization of the converges to the bipolar correlation learning scheme discontinuous Hopfield circuit model [25], [26],whch is itself cussed in Section 111. The biological plausibility of the a special case of the Cohen-Grossberg theorem [7]. In (22) signal Hebb law stems from its use of only locally availthe term - a , is a passive decay term, the constant I, is able information. A learning synapse m,, only sees the the exogenous input to a,, and similarly for - b, and J , in information locally available to it: its own strength m,, (23). Proportionality constants have been omitted for sim- and the signals S ( a , ) and S(b,) flowing through it. (The plicity. The constant inputs I, and J , can be interpreted as synapse also sees the instantaneous changes dS( a l ) / d t sustained environmental stimuli or as patterns of stable and dS(b,)/dt of the signals [32].) Moreover, the synapse reverberation from an adjoining neural network. The time must, in general, learn from one or few data passes, scales are roughly that the (short-term memory) activations unlike feedforward supervised schemes where thousands of 0 , and b, fluctuate orders of magnitudes faster than the data passes are often required. (long-term memory) memory traces m,, and the applied A global bounded Lyapunov function for the adaptive external inputs I , and J,. Hence a reasonable approxima- BAM is tion of realtime continuous BAM behavior is got by asE ( A ,B, M ) = F + 1 / 2 x x m ; , suming all m,,, I , , J , constant. I J As in the Cohen-Grossberg [7] framework, many more nonlinear models than (22), (23) can be used. To prove where F denotes the bounded energy function (24) of the stability of the additive model (22): (23), we follow the continuous BAM. Then example of the Cohen-Grossberg theorem and postulate E =[S(~,)S b,) ( - mlJ] - C S I ~C ; SI~; that the dynamical system (22), (23) admits the global 1 J Z J Lyapunov or energy function =-CCm;,-xS(~,)d?-xS(b,)bf E( A > B, /s(x/)x/ xs(al>s( 1 J I J
+ Z,
+ S ( a , ) S (b,).
(22)
cYjllJ
dxl
x
(
b/)mlJ
J
-
GO
x S ( a,)Z, +
I
JhS( y,)y, dy,

J o
c S ( bJ)
J
upon substituting the signal Hebb law for the term in 4. braces. Again, since S> 0, E 0 iff d, bJ m,, 0 for
= = =
=
all i and j . Hence every signal Hebb BAM is globally stable (adaptively resonates [5], [17], [18]). (This theorem The total time derivative of E is extends to any number of BAM fields interconnected with signal Hebb learning laws.) Stable reverberations across E =a , ) , - a , + C s ( b j ) m i j + 1, the nodes seep pattern information into the memory traces I m,,. Input associations dig their own energy wells in the network state space [0,1] X [O,l]J. - CS(bJ)b/ -bJ+xS(a/)m//+J, The adaptive BAM is the general BAM model. The [ I J energy function (24) is only unique up to linear transfor= - CS(a,)ci,2- x S l ( b , ) b ; mation. It already includes the sum of squared weights mf, I J as an additive constant. Note that in (24) memory inforGO (25) mation only enters through the quadratic form, the sum of upon substituting the right sides of (22) and (23) for the products ml,S( a,)S( bJ). Time differentiation of these terms in braces in (25). Since E is bounded and M is an products leads to the terms that eliminate both the feedarbitrary n-by-p matrix, (25) proves that every matrix is back terms (path-weighted sums of signals) in (22), (23) continuously bidirectionally stable. Moreover, since S > 0, and that eliminate the learning component S(a,)S(b,) of the energy function E reaches a minimum if and only if the signal Hebb learning law when the adaptive BAM d , = b/ = 0 for all i and all 1. energy function E is differentiated and rearranged. Hence For completeness, we summarize here recent results on for Lyapunov functions of the Cohen-Grossberg type adaptive BAMs [31]. Since during learning the weights m,, (such as (24)) that use a quadratic form to eliminate change so much more slowly than the activations a,and bJ feedback sums and do not include memory information in (24)
csy
I J
56
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.
18. NO 1. JANUARY/FLBRGARY 19XX
its other terms, only the signal Hebb learning law is globally stable. This learning law cannot be changed without makmg further assumptions, in particular, without changing the activation dynamical models (22), (23). A structurally different Lyapunov function must otherwise be used. T h s argument holds [31] for all dynamical systems that can be written in the Cohen-Grossberg form [7].
V. TEMPORAL ASSOCIATIVE MEMORIES (TAMS)
Ai
Temporal patterns are ordered vectors, functions from an index set to a vector space. We assume all temporal Fig. 3 patterns are finite and discrete. We, therefore, can represent them as a list of binary or bipolar vectors, vector-valued samples. ( A , , A , , A , , A , ) is such a temporal pattern forming the correlation matrix X,'X,+ 1. Third, the contiguous relationships are added pointwise as in the BAM where algorithm to give M : A 1 = ( l 0 0 1 0 0 1 0 0 1)
A,=(1
A,=(1
an
1 0 0 1 1 0 0 1 1)
n-1
M=
x,'x,+
I
x,rx,+c,.
(26)
0 1 0 1 1 1 0 1 0)
0 0 1 0 0 1 0 0 1).
A1=(l
This temporal pattern (array) %might represent a musical chord progression where tag bits are added to indicate sustained tones and to discriminate repeated tones. A , is appended to the sequence ( A l , A , , A , ) to convert the sequence into an infinite loop in the sense that A , is always followed by A,+1. This loop can intuitively correspond to a music box that plays the same tune over and over and over. How can ( A , , A , , A,, A , ) be encoded in a parallel distributed associative matrix memory M ? How can a temporal structure be stored in a static medium so that if A = A , , then A , + , , A , + , , . . . are sequentially recalled? Consider your favorite musical piece or motion picture. In what sense do you remember it? All at once or serially? For concreteness consider the Elizabethan song Greensleeves. How do we remember Greensleeves when we hum or play it? Suppose you are asked to hum Greensleeves starting from some small group of notes in the middle of the song. You probably would try to recollect the small group of notes that immediately precede the given notes. These contiguous groupings might enable you "pick up the melody," enabling you to recall the next contiguous group, then the next, and so on with increasingly less mental effort. Hence we might conjecture that the temporal pattern ( A , ; * -,A , ) can be memorized by memorizing the local contiguities A , A,, A , 4A , , . . . . Alternatively, this can be represented schematically as the unidirectional conjecture
+
Suppose A,(X,) is presented to M . Suppose ( l / n ) H ( A , ,A,) = ( l / n ) H ( A , + l ,A , , , ) tends to hold as in the bidirectional case. Then
mf=nx+,+
= :
c ix,X:)x,+1
J f l
(c 1
2y + 2 1,..
. , C , X , f + l)
J P O
+
(27)
as in the BAM signal-noise expansion (18). Hence in synchronous unidirectional threshold operation, X, ,( A , ) tends to be recalled in the next iteration, X,+3 in the next iteration, and so on until the sequence is completed or begins anew. Similarly, if A , ( & ) is presented to the dual ' the melody should proceed bidirectional memory M , backwards to the start:
+
x,M'=x,(x,T,x*)'+
/ # I
x,(qlx,)T
nx,_, +
c (x,y)x,-,
0 ,
Jf,
= ( d1x 1' - ~d , 2 x'-',*. 2
d , ~ : ~ ' ) , d, > 0 . (28)
A similar approximate argument holds in general when A = A , . Since n is the maximal signal amplification factor in (27) and (28), we obtain the same rough maximal storage capacity bound for an m-length temporal pattern as we obtained in the BAM analysis: m < min(n, n ) = n. More generally, the memory can store at most k-many temporal patterns of length m,, . * ., mK, provided m + m - . - +m,=m<<n. The neural network interpretation of this temporal codA,+A,+ . . * A , + A i + , + *... ing and recall procedure is a simple type of Grossberg This local contiguity conjecture suggests a simple al- outstar aoalanche. Grossberg [9]-[ 121 showed long ago gorithm for encoding binary temporal patterns in an n-by-n through differential analysis that, just as an outstar is the matrix memory. First, as in the BAM encoding algorithm, minimal network capable of learning an arbitrary spatial the binary vectors A , are converted to bipolar vectors XI. pattern, an avalanche is the minimal network capable of Second, the contiguous relationshp A , -+ A , is mem- learning an arbitrary temporal pattern. An avalanche is a orized as if it were the heteroassociativepair ( A , ,A , + 1 ) by cascade of outstars; see Fig. 3.
57
Outstar bursts are sequentially activated by an axonal cable. In the present case this cable is forged with the contiguous correlation matrices X,'X,+ When a spatial pattern A , is presented to M , the X I outstar sends an X I pattern pulse to the neurons a,; . ., a,. The neurons threshold this pulse into A , + , . While the X , pulse is propagating to the neurons, the axonal cable transmits an X, command to the X I + ,outstar. An X I + , pattern pulse is then sent to the neurons, and an X,+, command is sent along the cable to the X,+, outstar, and so on until all the outstars have fired. Hence the successive synchronous states of the neurons a , ; . . , a , replay the temporal pattern ( A , , A , + , ; . ., A , ) . If the axonal command cable forms a closed loop, the infinite temporal pattern ( A , ; . ., A m , A , , A , , . . ., A , , . . . ) will be recalled in a music-box loop. Grossberg [9]-[ 121 uses a differential model to prove that practice makes avalanches perfectly learn temporal patterns. Neurons (nodes) and synapses (edges) are continuous variables governed by differential equations, and the temporal pattern is a continuous vector-valued function approximated with arbitrary accuracy by discrete samples. The key term in the node equation for a, is the vector dot product AIM' (supplemented with a passive state decay term). The key term in the edge equation for e , , , + , is a lagged Hebb product a , ( t ) a , + l ( t+1) (supplemented with a forget or passive memory decay term) where t is a time index. Equations (7) and (8) are discrete approximations to Grossberg's neuron equation. Moreover, the hyperplane-threshold behavior of (7) and (8) approximates a sigmoid or S-shaped function that is required to dynamically quench noise and enhance signals. Equation (26) is a discrete vector approximation of the lagged Hebbian law in Grossberg's edge, or learning, equation. The memory capacity bound m < n obviates a passive memory decay term and other dynamical complexities but at the price of restricting the pattern environments to which the model can be applied. We also note that, as Grossberg [lo]-[12], [15] observes, a temporal pattern generally involves not just order or contiguity information but rhythm information as well. The simple temporal associative memory constructed by (26) ignores rhythm. Once a limit cycle is reached, for instance, it cannot stop "playing." More generally, the speed with which successive spatial patterns are read out across the neurons should vary. Grossberg shows that the simplest way to achieve t h s is to append a command cell atop the outstar cells. The command cell nonspecifically excites or inhibits current activation of outstar cells according to contextual cues, as if a hormone were nonspecifically released into the bloodstream. This architecture is called a context modulated outstar avalanche. By reversing the direction of arrows, the dual instar avalanche can recognize learned spatiotemporal patterns. We now extend BAM energy convergence to temporal associative memories (TAM'S). What is the energy of a temporal pattern? We cannot expect as easy an answer as in the BAM case. For we know, just by examining the
,.
encoding scheme (26), that the same memory matrix M can house limit cycles of different lengths. We must, therefore, limit our analysis to local behavior around a given limit cycle of length m. The bivalent synchronous TAM recall procedure guarantees convergence to such a limit cycle in at most 2" iterations. Consider the simplest temporal pattern ( A , , A , ) . A natural way to define the energy of this pattern would be E ( A l , A,) = - A,MA; as in the BAM case. Then E ( A , , A , ) = E ( A , , A,). Now suppose the temporal pattern, the limit cycle, is ( A l , A , , A 3 ) .Then there are three two-vector sums to consider: E ( A , , A , ) , E ( & , A 3 ) , and E ( A , , A 3 ) . The thrd energy sum violates the contiguity assumption of temporal encoding. The energy of the sequence can then be defined as the sum of the remaining two:
E ( A l , A , , A 3 ) = - A,MAT- A,MAT- A3MAT.
Then E ( A , , A,, A 3 )= E ( A 3 ,A,, A,) since E ( A , , A , ) = E ( A , , A , ) and E ( A , , A 3 ) = E ( A 3 , A , ) . This leads to a general definition of temporal pattern energy:
m-1
E(A,,A,;..,A,)=-A,MATI
2 A,MAT+,
=I
e ,
(29)
with the property that E ( A , ; . . , A , ) = E(A,; * A,). Let P denote the temporal pattern ( A , ; . -,A m ) .Then we can rewrite (29) as
E(P)= - A , - 1 M A ~ - A k M A ~ + 1 r i k iik-1
AIMAT+, (30)
where time slice A , has been exhibited for analysis and the "loop" energy term - A,MAF has been omitted for convenience and without loss of generality. Observe that the input to a, at time k, a:, is Ak-,M' in the forward direction, A,+,M,' in the backward direction, just as with bidirectional networks. The serial synchronous operation of the algorithm is essential to distinguish these directions. At time k in the forward direction A , is active but A k + , is not. Similarly, at time k in the backward direction A , is active but A , - , is the null vector. One neural network interpretation is a m-level hierarchy of neuron fields or slabs. The fields are interconnected to contiguous fields, and A,+,, but have no intraconnecfield A , to fields tions among their neurons. Suppose the energy change E , - E , is due to changes in the kth iteration or kth field, A,, - A,,. Then by (30)] AE
= - A,-,MAAz-
- - A,-,MAAL-
AA,MAz+, Ak+,MTAAt
= - I A a , " A,-,M' - ~ A a , k A , + , M ~
I
I
< 0.
This inequality follows since, first, in the forward direction Aa," and A , - , M ' agree in sign for all i by (8) and = 0, and second, in the backward direction Aa," and A k + , M i T agree in sign for all i by (7) and A , - l = 0.
58
IEEE TRANSACTIONS ON SYSTEMS, MAN, 4ND CYBERNETICS, LO1
18. NO 1, JAUI
\RY/I
1 B R I \KY
l')K8
0 0 . T = . . 0 0 ,M
0 0
. . .
M 0
. . .
... ...
0 0
. .
.
0 0
. .
.
0 0 0
0 0 0
...
M O 0
O M 0,
Then A , M = ( 4 4 - 4 - 4 4 4 - 6 - 4 4 4) -(1 1 0 0 1 1 0 0 1 l ) = A z by the hyperplane threshold law (8). We can measure the energy of this recollection as in the BAM case by E ( A , + A ? ) = -AlMA;=-24. Next A 2 M = ( 6 -10 6 - 2 2 2 8 -6 2 -6)-+(1 0 1 0 1 1 1 0 1 0 ) = A , with E ( A 2 A , ) = -26. Next A,M= (6 - 10 - 2 6 -6 - 6 8 -6 -6 2 ) + ( 1 0 0 I 0 0 1 0 0 l ) = A l with E ( A , - + A , ) = -22. Hence E ( A , . A , , A , , A , ) = -72. Hence presenting any of the patterns A , , A , , or A , to M recalls the remainder of the temporal sequence. In this example the energy sequence ( - 24, - 26, - 22) contains an energy increase of + 4 when A , triggers A , . We expect this since we are traversing a limit cycle in the state space (0,l)". This is also consistent with the principle of temporal stability since we are only concerned with different sums of contiguous energies. Consider, for example, the bit vector A = (1 0 0 1 0 0 0 0 0 0), with H ( A , A , ) = 2. Then A recalls A , since ,4M = (2 2 -2 -2 2 2 - 4 - 2 2 2 ) + ( 1 1 0 0 1 1 0 0 1 l ) = A , , but E ( A + A 2 ) = - 1 2 > - 2 4 = E ( d , - A 2 ) . Suppose now A = ( l 1 0 0 1 1 0 0 0 0), with H ( A , A , ) = 2 . Then A M = ( 4 - 8 4 0 0 0 6 -4 0 -4)-+(1 0 1 0 1 1 1 0 0 0 ) = A' (recalling that neurons with input sums that equal the zero threshold maintain their current on-off state). with H( A , A , ) = 1 and E ( A + A') = - 14 > - 26 = E ( A A 3 ) .Finally, A ' M = ( 5 -7 - 3 5 - 5 -5 5 -5 - 5 3 ) + ( 1 0 0 1 0 0 1 0 0 I ) = A , and E(A'+A,)=-18> -22=E(A3+A,). Accessing the backward TAM memory M' with A , gives A , M T = ( 2 - 4 4 - 4 4 4 4 - 4 4 -4) + ( l 0 1 0 1 1 1 0 1 O ) = A , with E ( A , A 3 ) = - 22, as expected. Next, A , M T = (4 6 - 10 - 2 2 2 -6 -6 2 l o ) + ( ] 1 0 0 1 1 0 0 1 1 ) = A , with E ( A , - + A , ) = - 2 6 . Next, A , M T = ( 6 - 2 -10 6 - 6 -6 2 -6 -6 1 0 ) + ( 1 0 0 1 0 0 1 0 0 1 ) = A , with E ( A 2 + A , ) = -24. Hence a backwards music-box loop ( A l , A , , A 2 , A , ) is recalled with total energy -72.
APPENDIX BINARY VERSUS BIPOLAR CODING The memory storage techniques discussed in this paper involve summing correlation matrices formed from bipolar vectors. Given the memory matrix M and input vector
~
3 -1 -1 -1 1 1 1 -3 1
-1 -1 -1 3
-3 -3 1 1 -3
-1 3 -1 -1 1 1 -3 1 1
-1 -1 3 -1 1 1 1 1 1
1 1 -3 1 -1 -1 -1 -1 -1
1 -1 1 1 1 -3 1 - 3 -1 3 -1 3 -1 -1 -1 -1 -1 3 3 -1
-3 1 1 1 -1 -1 -1 3 -1 -1
1 1 -3 1 -1 -1 -1 -1 -1 3
1 3 1 1 -1 -1 3 -1 -1 -1
59
A = (1 0 0 1 0 l), should we vector multiply A and M o r X = ( l -1 -1 1 - 1 1)and M? Should we, in general, use binary or bipolar coding of state vectors? Bipolar coding is better on average. The argument is based on the expansion
where Euclidean norm of the binary vector A is IAl = A * I , the cardinality or number of 1's in A . Hence cos($) = correlation(X,, X,) = Xi * X,/n, where $ is the angle between Xi and X , in R". The denominator of this last expression can be interpreted as the product of the standard deviations of X, and X,. Here Xi is a zero-mean (AX,T)Y, (31) binomial random vector with standard deviation given by A M = (A?T)?+ i Z j the Euclidean norm value h. where H( A , A,) is the Hamming distance between A and Suppose that Xi and Xi are random vectors. We assume A,, the number of vector slots in which A and A, differ, that the expected number of 1's in any random bipolar/biH ( A , A,) = min,H(A, A , ) , and Xi is the bipolar transform polar vector is n/2. The only question is how those 1's are of binary A , - X , is A , with 0's replaced with -1's. In distributed throughout X(A). We use words, A is closest to A, of all the stored input patterns A . The first term on the right side of (31) is the signal term A, * xi= A, * ( 2 4 - I ) and the second term is the noise term. The parenthetic X : = XiAT. Hence (31) can terms are dot products a, = A be written as a linear combination of stored output patterns to eliminate the term 2Ai * A, in the expansion
m,
x; * q. - A , * x/ = ( xi - A ; )* x, We want a, to amplify and a , to "correct" Y,. If = ( ( 2 A ;- I)- A ; )* ( 2 4 - I) H ( A , A , ) > n/2, then A is closer to the complement of A , , = 2Ai * A, n - IA,I-21Ajl A:, than to A , . Hence we want a , < O so that Y, will be transformed into y. If H ( A , Ai) < n/2, A is closer to A i = A, * Xi n - ]Ail- lAjl than Af, so we want a,>0. If H ( A , A i ) = n / 2 , A is = A, * Xi n - n/2 - n/2 equidistant between A , and Af, so we want a , = 0. These requirements hold without qualification if M is a sum of = A, * (37) autocorrelation matrices, and thus 5 = Xi. For correlation matrices we are implicitly assuming that H ( A , , A,) = The sign of A, * X, depends on the distribution of 1's in H( B,, B,)-that if stored inputs are close, the associated A , and A,. This information is summarized by the Hamstored outputs are close. If we vector multiply M by X, the bipolar transform of ming distance H ( A , , A,). Consider the kth slot of A, and X,. The farther apart A , and A,, the greater the probabilA , we get ity P{ A/"= 1 and X;" = - l}since this probability is equivX M = xjy/ + x,Y, (33) alent to P{ A/"= 1, A;k = O}. We can model A/" and X," as i Z j independent random variables with success/failure probwhere x i = XXT. We again require that x i 0 according as abilities P { A f = l } = P { A , k = O } = P { X , " = l } =P{X,"= H ( A , A , ) 5 n/2. - l } = 1/2, but this is valid for all k only if H ( A , , A,) = Bipolar coding is better than binary coding in terms of n /2. In general, we only impose conditions on the joint strength and sign of correction coefficients. We shall show that an average 1) x i < a , when H ( A , A , ) > n/2; 2) x i > a , distribution P{A,, X,}. We simply require that the joint when H ( A , A , ) < n / 2 ; and 3) x , = O always when distribution obey P{ A/"= 1, X," = l}= P{ A/"= 0, X," = J = O , X,"=l}, and H ( A , A , ) = n/2. We show this by showing that on average -1} and P{A,k=l, X;"=-l} = P { A k X * Xi - A * Xi 0 if and only if H ( A , A , ) n/2, where that it be driven by the Hamming distance H ( A , , A,) in a the asterik " * " denotes the dot product XAT. We shall let reasonable way. The latter condition can be interpreted as P ( A f =1, X;" =1) $1/4 if and only if H ( A , , A,) n/2, I denote the vector of l's, I = (1 1 1). We first observe that Xi* X , can be written as the and P(A,k = 1, X;" = - 1) 5 1 4 if and only if H( A ,, A, ) I > 0 if and only if H ( A , , A,) number of slots in which the two vectors agree minus the n/2. Then on average A, * X, number in which they differ. The latter number is simply 5 n/2. Hence by (37), on average X, * X, A , * x/ if and the Hamming distance H ( A , , A j ) ; the former, n - only if H( A , , A,) n/2, as claimed. H ( A , , A,). Hence
i # j
AM=a,Y,+
a,Y,.
(32)
+ + + x;.
'<
'
'<
X, * x/
=n
- ~ H ( A ,A ,,).
(34)
From this we obtain the sign relationship if and only if H ( Ai, A,) n/2. (35) Xi * 0, Although we shall not use the fact, it is interesting to note that the Euclidean norm of any bipolar X is fi while the
ACKNOWLEDGMENT
XJ 5
The author thanks Clark Guest, Robert Hecht-Nielsen, and Robert Sasseen for their comments on the theory and application of bidirectional associative memories.
60
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.
18, NO. 1, JANUARY/FERRUARY 1988
REFERENCES
S. Amari, K. Yoshida, and K. Kanatani, A mathematical foundation for statistical neurodynamics, S I A M J . Appl. Math, vol. 33, no. 1, pp. 95-126, July 1977. J. A. Anderson, Cognitive and psychological computation with neural models, IEEE Trans. Syst. Man. Cyber., vol. SMC-13, no. 5, Sept./Oct. 1983. J. A. Anderson and M. Mozer, Categorization and selective neurons, in Parallel Models of Associative Memory, G. Hinton and J. A. Anderson, Eds. Hillsdale, NJ: Erlbaum, 1981. J. A. Anderson, J. W. Silverstein, S. A. Ritz, and R. S. Jones, Distinctive features, categorical perception, and probability learning: Some applications of a neural model, Psych. Rev., vol. 84, pp. 413-451, 1977. G. A. Carpenter and S. Grossberg, A massively parallel architecture for a self-organizing neural pattern recognition machine, Comput. Vis., Graphics, Image Processing, vol. 37, pp. 54-116, 1987. __, Associative learning, adaptive pattern recognition, and cooperative-competitive decision making by neural networks, Proc. SPrE: Hybrid, Opt. Sysr., H. Szu, Ed., vol. 634, pp. 218-247, Mar. 1986. M. A. Cohen and S. Grossberg, Absolute stability of global pattern formation and parallel memory storage by competitive neural networks, IEEE Trans. Syst. Man. Cybern., vol. SMC-13, pp. 815-826, Sept./Oct. 1983. S. Grossberg, Some nonlinear networks capable of learning a spatial pattern of arbitrary complexity, Proc. Nat. Acad. Sci., vol. 60. r DD. r 368-372. 1968. -, O n the serial learning of lists, Math. Biosci., vol. 4, pp. 201-253, 1969. -, Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, I, J . Math. Mechan., vol. 19, pp. 53-91, 1969. __, O n learning of spatiotemporal patterns by networks with ordered sensory and motor components, I, Stud. Appl. Math., vol. 48, pp. 105-132,1969, -, Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, 11, Stud. Appl. Math., vol. 49, pp. 135-166, 1970. __, Contour enhancement, short term memory, and constancies in reverberating neural networks, Stud. Appl. Math., vol. 52, pp. 217-257, 1973. __, Adaptive pattern classification and universal recoding, I: Parallel development and coding of neural feature detectors, Biol. Cybern., vol. 23, pp. 121-134, 1976. __, A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans, in Progress in Theoretical Biology, vol. 5, R. Rosen and F. Snell, Eds. New York: Academic, 1978. -, How does a brain build a cognitive code? Psych. Rev., vol. 1, pp, 1-51, 1980. .-. __, Adaptive resonance in development, perception, and cognition, in Mathematical Psychology and Psychophysiology. s. Grossberg, Ed. Providence, RI: Amer. Math. Soc., 1981. -, Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control. Boston, MA: Reidel Press, 1982. S. Grossberg and M. Kuperstein, Neural Dynamics of Adaptive Sensory-Motor Control: Ballistic Eye Movements. Amsterdam, The Netherlands: North-Holland, 1986. S. Grossberg and M. A. Cohen, Maslung fields: A massively parallel neural architecture for learning, recognizing, and predicting multiple groupings of patterned data, Appl. Opt., to be published.
2
R. Hecht-Nielsen, Performance limits of optical, electro-optical, and electronic neurocomputers, Proc. SPIE: Hybrid, Opt. Syst., H. SZU,Ed., pp. 277-306, Mx. 1986. Y. Hirai, A template matching model for pattern recognition: Self-organization of template and template matching by a disinhibitory neural network, Biol. Cybern., vol. 38, pp. 91-101, 1980. -, A model of human associative processor (HASP), IEEE Trans. Syst. Man. Cybern., vol. SMC-13, no. 5, pp. 851-857, Sept./Oct. 1983. J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci. USA, VO~ 79, . pp. 2554-2558,1982. -, Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Nat. Adud. Sci. U S A , vol. 81, pp. 3088-3092, 1984. J. J. Hopfield and D. W. Tank, Neural computation of decisions in optimization problems, Biol. Cybern., vol. 52, p. 141, 1985. T. Kohonen, Correlation matrix memories, IEEE Trans. Comput., V O ~ .C-21, pp. 353-359, 1972. __ , Associative Memoty: A System-Theoretical Approach. Berlin: Springer-Verlag, 1977. T. Kohonen, E. Oja, and P. Lehtio, Storage and processing of information in distributed associative memory systems, in Parallel Models of Associative Memoty, G. Hmton and J. A. Anderson, Eds. Hillsdale, NJ: Erlbaum, 1981. T. Kohonen, Self Organization and Associative Memory. Berlin: Springer-Verlag, 1984. B. Kosko, Adaptive bidirectional associative memories, Appl. Opt., vol. 26, no. 23, pp. 4947-4860, Dec. 1987. -, Fuzzy associative memories, in Fuzzy Expert Systems, A. Kandel, Ed. Reading, MA: Addison-Wesley, 1987. -, Fuzzy entropy and conditioning, Info. Sci., vol. 40, pp. 165-174, 1986. W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bull. Math. Biophys., vol. 5, pp. 115-133, 1943. R. J. McEliece, E. C. Posner, E. R. Rodemich, and S. S. Venkatesh, The capacity of the Hopfield associative memory, IEEE Truns. Inform. Theory, vol. IT-33, pp. 1-33, July 1987. K. Nakano, Associatron-A model of associative memory, IEEE Trans. Syst. Man. Cybern., vol. SMC-2, pp. 380-388, 1972. D. Psaltis and N. Farhat, Optical information processing based on an associative-memory model of neural nets with thresholding and feedback, Opt. Lett., vol. 10, no. 2, pp. 98-100, Feb. 1985. L. A. Zadeh, Fuzzy sets, Inform. Contr., vol. 8, pp. 338-353, 1965.
-4
Bart Kosko (M85) received the B.A. degrees in philosophy and economics from the University of Southern California, Los Angeles, the M.A. degree in applied mathematics from the University of California, San Diego, and the Ph.D. degree in electrical engineering from the University of California, Irvine. He is currently with the Department of Electrical Engineering, Systems, Signal and Information Processing Institute, at the University of Southern California. Dr. Kosko is the Associate Editor and Technology News Editor of Neural Networks. He was organizing and program chairman of the IEEE First International Conference on Neural Networks (ICNN-87) in June 1987 and is program chairman of ICNN-88.

Bidirectional Asssociative Memory

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Bidirectional Asssociative Memory

Încărcat de

Drepturi de autor:

Formate disponibile

ILEE IRANSALTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL 18, NO 1, JANUARY/FEBRUARY 1988

Bidirectional Associative Memories

11. EVERY MATRIX IS BIDIRECTIONALLY STABLE

0018-9472/88/0100-0049$01.00 01988 IEEE

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL

18, NO 1,JANUARY/FLBRIA K Y 19x8

so that talung the difference E , - E , and dividing by ha,

KOSKO: BIDIRECTIONAL ASSOCIATIVE MEMORIES

Hence the energy change A E = E, - E, due to state change A a , is AE -- - bjrnkj = - BM;.

where MJis the j t h column of M . The input sum to a , is,

IEEE TRANSACTIONSON SYSTEMS, MAN, AND CYBERNETICS, VOL.

18, NO. 1.J A N C A K Y / F L F J R ~ I A K Y 1 9 X X

Since E is bounded below,

for all A and all B , (12)

111. BAM ENCODING

orization and superimposition. This yields the BAM memories

with dual BAM memory M T given by

KOSKO: BIDIRECTIONAL ASSOCIATIVE MEMORIES

KOSKO: I1IUIKI.( 1 IONAL ASSOCIATIVE .MEMORIES

JhS( y,)y, dy,

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.

18. NO 1. JANUARY/FLBRGARY 19XX

= ( d1x 1' - ~d , 2 x'-',*. 2

d , ~ : ~ ' ) , d, > 0 . (28)

KOSKO: BIDIRECTIONAL ASSOCIATIVE MEMORIES

IEEE TRANSACTIONS ON SYSTEMS, MAN, 4ND CYBERNETICS, LO1

KOSKO: BIDIRECTIONAL ASSOCIATIVE MEMORIES

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.

18, NO. 1, JANUARY/FERRUARY 1988

S-ar putea să vă placă și