Sunteți pe pagina 1din 4

ANALOG IMPLEMENTATION OF THE SOFT-MAX FUNCTION

Rodolfo Zunino and Paolo Gastaldo

DIBE - Genoa University - Italy - Email: { zunino, gastaldo }@dibe.unige.it

ABSTRACT freedom to comply with technology specifications. The


design methodology adopts a modular approach, and VLSI
The paper describes the analog implementation of the simulations confirm the satisfactory level of accuracy of
S o M a x function by using CMOS circuitry. First, an the circuit. Architectural features limit power consumption
optimization-based strategy is presented, which fits in spite of the square-law MOST operation.
technology requirements to the desired accuracy in the
SoftMax-mapping. Then circuit solutions are described to 2. CIRCUIT DESIGN
both the approximation of the exp(x) function and the
normalizing ratio. These optimized design aspects improve The SoftMax function of a vector, Y, of n input scalar
overall effectiveness, reduce VLSI complexity by variables: Y = {y,,...,yn}, is a vector, S, of n values
exploiting inherent parallelism, and ultimately limit overall computed as
power consumption.

j=l
1. INTRODUCTION The (gain) parameter y rules the sharpness of the SM
mapping. Small gain values lead to a "smooth" SM
Soft-Max (SM) and Winner-Take-All (WTA) modules operation, which asymptotically supports a uniform
111 provide the natural support for ranking and selection, distribution: y=O 3 si(Y)=l/nVi. Large values of y
which are basic elements in modem systems for nonlinear make the SM function tend to a truly WTA behaviour:
signal processing. SM is a usual operation in the modeling y- 3 q(Y) = 1, ify,= max{ yjs Y,j=l, ...n }, and s4Y) =
of continuous phenomena [2-31, whereas WTA mainly fits
0 VYk # Yi.
digital, discrete quantities and processes. Theory recently The ability to tune the transfer function by means of
proved the validity of these structures as universal function one quantity justifies the interest in SM for signal
approximators [4]. processing (e.g., interfacing analog and digital
Hardware SM processing seems almost unexplored in representations.) For practical purposes, in the following
the literature. The approach proposed in [5] uses the we shall write -vi = xi and 0 I xi I XM Vi.
subthreshold operation of MOS transistors. SM circuitry design is made complicated by two issues:
Approximating axp(x) [6] and accurate normalizing ratio first, the dynamics of the exp(x) function shrinks the valid
[7] are crucial aspects of SM VLSI design. Simplistic input range; secondly, the normalizing ratio in (1) may
approaches, plugging together exponential and ratio hinder accurate analog solutions. Our approach limits the
circuits, prove ineffective in terms of mapping accuracy approximation of the exp(x) function to a 2nd-degree
and power consumption. polynomial, thus allowing an acceptable tradeoff between
This paper approaches hardware SM processing as an representation accuracy and circuit complexity.
integrated problem; overall functional requirements drive
the design of individual submodules. All transistors in the
current-mode CMOS circuit operate in strong inversion.
The basic strategy approximates the exp(x) hnction by
(. i
Therefore, the exp x) function is replaced with
f2(p)(x) = c . + @ -+ x2 . As the ratio in (1) leaves c as
a degree of freedom, the parameter set to be optimized is
a 2nd-degree polynomial to limit circuit complexity. P={ a,p}. The SM approximationis given by:
Imposing the SoftMax characteristics on the circuit
transfer function poses a quadratic-cost problem, within
the constraints of ranking consistency and discrimination
accuracy. Thus, the eventual MOS sizing results from an j=1
optimization process, leaving the designer with a degree of

0-7803-7448-7/02/$17.00 02002 IEEE 11- 117


Fig. 1 - Modular architecture of the SoftMax circuit Fig.2 - The squaring circuit for &)(xi).

Consistent design looks for the set P that approximatesthe coeficent a usually lies in very narrow ranges and
SM mapping by minimizing the quadratic cost: simplifies the supporting circuitry; in fact, in most cases, it
can be dropped altogether.
(3)
The above expression averages the (Euclidean) distortion 3. MODULAR CIRCUIT IMPLEMENTATION
between the actual and approximating output vectors over
the probability distribution of input vectors X. The crucial The analog VLSI implementation involves current-
advantage of the analytical cost ( 3 ) over a direct fitting of mode inputs and voltage-mode outputs. An effort to
to exp(x) is that the approximation problem is set in terms simplifjl the basic elements making up the overall schema
of global SM mapping behaviour. justifies such a choice. A modular approach is adopted for
Two constraints shape the cost function so as to the circuit-architecture strategy (Fig. 1). Each i-th element
preserve consistency with the SM mapping: first, f2(p)(x) in SM (2) is supported by a specific pair of cells, which
must be monotonically increasing in the range [0, x ~ ] : operate locally and compute f y ) ( x i ) and the final ratio,
s?)(X). Increasing the dimension of the SM vector (2)
simply requires that an additional pair of such cells be
included. The presence of a valid output is signaled by the
Resolution-accuracy requirements in (2) set the second TRIGGER logic level.
constraint, which imposes that, for a critical input vector The only section propagating global information is the
= (xM,O,...,O ) , the largest output be lower-bounded by: "denom" module, which handles the normalizing
denominator, iz. The fact that it requires a minimal wiring
is a benefit resulting from the: modular architecture, which
greatly facilitates a VLSI implementation in the layout
design.
where q is a design parameter trimming the SM-mapping
quality. Such a parameter relates to the smallest gain value
3.1 Implementation of the expo approximation
for which the eventual circuit will still discriminate a
critical input vector consistently. A few simple derivations Thanks to the optimization result p=O, the function
from (5) yield the second constraint on P : approximating exp(x) becomes fi(p)(x)= a + cx2, which
can be easily implemented by joining a bias source to a
current-squaring circuit. The circuit adopted for the
current-squaring task (Fig.:?) exploits the translinear
Thus, the design process minimizes the quadratic cost c(P)
principle in MOS-transistor loops, and is described in [ 8 ] .
within the linear constraints (4) and (6). An especially
The transfer function of this subcircuit is given by:
profitable result of the optimization process is that
coefficient p invariably nullifies, which eliminates (7)
amplifiers from the design of fi'p)(x). Moreover, the bias
This schema has been selected for its accurate

I1 - 118
Fig.3 - The denominator cell and two (numerator) output cells.

performance. The normalizing current, Iz, is directly switching of the TRIGGER logic level, which is
available by summing, in a common node, a copy of the represented by the inverter status. The reset of the overall
output current from each stage. circuit status is obtained by activating the switches that
ground all capacitors. As output voltages v(sJ are read out
3.2 Implementation of the normalizing ratio on each “numerator” side, the ratios sharing the common
denominator are worked out in parallel. Likewise, the
The novel circuit that supports the normalizing ratios in largest current flows in the unique “denominator” module.
(2) exploits two features to sharply reduce the schema Cascode mirror stages provide high impedance levels
complexity: 1) all quantities are positive; 2) all outputs toward storage capacitors. The designer may take
share a common denominator value. The modular advantage of the SM property $ x i ) d z and rescale current
multiple-divider circuit treats the numerator(s) and the amplifications in (9) by sizing the MOS aspect ratios to
denominator separately in specific subcircuits. Each impose dynamic-range constraints on output-voltage
numerator module simply includes a switched capacitor, levels. The overall ratio performance is sensititive to
whereas the single denominator module combines a mismatches rather than the actual parameters of the
capacitor with an inverter stage. individual components.
Figure 3 displays the schemata of two output cells and
of the denominator cell; the voltages across the capacitors 4. EXPERIMENTAL RESULTS
ci and Cd are initially null. The denominator current, iz,
feeds Cd whose voltage increases approximately linearly The optimization-based design methodology was tested
(over a limited time interval). The switches M,, remain on different SM configurations, that is, for a range of SM
“ON’ as long as vz keeps smaller than the inverter’s dimensions: n E [3,15]. In all cases, the resulting circuit
threshold voltage, VL+ This enables $xi) to charge Ci performances turned out to be satisfactory in terms of
linearly as well. The circuit status changes as soon as vz mapping accuracy. This section presents a sample of the
exceeds VLH; then each M,, cuts off the current flow that obtained results by illustrating the complete design process
was charging Ci. The switching time, T, is given by for the case n = 5 . In our experiments, the input current
T=- cd ‘LH bound was always set to xM = 4 PA. The final VLSI
(8) implementation was supported by 0 . 8 ~ AMS technology
KdlZ
with Vcc=SV.
where Kd is the static gain of the denominator current-
A random-search algorithm [9] optimized the cost (3)
driving stage. Disregarding the switching transient at the
with the parameter set P={a=O.198,p=O}. One verifies the
inverter stage, the charge stored in Ci is given by
consistency of this result: the a value makes the supported
Civi( T ) = Ki T $xi). By using (8), one obtains: function retum a uniform distribution si@)= Iln when
K.C, if (xi) E O . The result p=O removes from the final schema any
V i (T)= IVLH- (9)
K d ci I, term proportional to the input current. The remaining
The voltage on each capacitor, Ci, is proportional to the degree of freedom, c, allows one to size the current-
ratio of the associate current, $xi), to the denominator; squaring stage so as to control power consumption. For
n=5, this process leads to the settings given in Table I. The
since I z = c , i f(xi),expression (9) implements (2). The
I circuit was simulated using HSPICE at level 13; the
availability of a valid result in vi(T) is flagged by the maximum power consumption amounted to 690pW.

I1 - 119
Table I.
Circuit parameters (n=5)
1 Subcircuit I
I I Device I Setting I
MI I W/L = 16/4
sauaring
I Y
M, I W/L= 10/10
Cell Ib 3pA
Numerator Msw WIL = 212
cell c: n 5 nF
WIL = 6/12
Denominator
Inverter nMOS W/L = 20/2
Inverter MOS W/L = 10130
0.01 F
Fig.4 - Sample plot of SM output voltages.
Table 11.
Sample experimental results
I Input TrueSM ApproxSM HWSM achieved at the cost of an event-driven normalization
schema, which involves an asynchronous operation. The
X ( M ~ si(* .y!’)(~) V(si)No
circuit schema obviously imposes the classical tradeoff
0.40 0.02 0.02 0.040 between speed and power. Therefore, promising lines of
4.00 0.84 0.78 0.775
0.50 0.03 0.02 0.045 research associated with the design methodology consist in
1.01 0.04 0.06 0.080 testing different current-squaring circuits to reduce power
1S O 0.07 0.12 0.139 consumption.
0.10 0.02 0.01 0.040
4.00 0.93 0.95 0.919 5. REFEEWNCES
0.10 0.02 0.01 0.040
0.10 0.02 0.01 0.040 [ 11 Liu Shi-Chii “A Winner-Take-All circuit with
0.10 0.02 0.01 0.040 controlable Soft Max property” Adv. in Neur. In$ Proc.
4.00 0.20 0.20 0.230
NIPS-13,2000, pp.717-723
4.00 0.20 0.20 0.230 [2] Gold S, Rangarajan A, “Soflmax to softassign: neural
4.00 0.20 0.20 0.230 network algorithms for combinatorial optimization” J.
4.00 0.20 0.20 0.230 Artif: Neural Netwoks, 1995, vol. 2, No. 4, pp. 381-399.
4.00 0.20 0.20 0.230
[3] Cox DR, Millar HD The theory of stochastic process,
Methuen, 1965.
Figure 4 exemplifies a plot of output voltages resulting [4] Maas W “Neural computation with Winner-Take-All
from post-layout simulations of an example input X=(2.0, as the only nonlinear operation” Adv. in Neur. In$ Proc.
4.0, 0.1, 0.0, 0.0) (currents in pA). However, the SoftMax NIPS-13,2000, pp. 293-299
operation requires one to test the circuit performance over [ 5 ] Baysoy C, Kinney LL “Subthreshold MOS fizzy
a wide range of input vector configurations. To this end, a madmin neuron circuits” Proc. World Congress Neural
large set of experimental runs for different settings were Networks, 1994, vol. 2, pp. 500-505.
carried out that confirmed the approach effectiveness. [6] Chang C-C, Liu S-I “Pseudo-exponential function for
Table I1 presents some sample results and compares, for MOSFETs in saturation” IEEE Trans. Circuits and Sys. 11,
each input dimension, the true SM values with the NOV2000, ~01.47,N0.11, pp.1318-1321
theoretical approximations predicted by (2) and with [7] Laopulos TL, Karybakas CA “A simple analog
hardware results (scaled by an invariant factor Vo=l.6V). division scheme” IEEE Trans. Instrum. and Meas., Aug
Empirical evidence proves the satisfactory and stable 1991, ~01.40,No.4, pp.779-782
performance of the circuit in the widest range of operation. [8] Wiegerink RJ Analysis and synthesis of MOS
translinear circuits, Kluwer Academic Publishers, 1993
4. CONCLUSIONS [9] Baba N “A new approach for finding the global
minimum of error finction of neural networks” Neural
The optimization-based strategy yields profitable Networks, V01.2, 1989, pp.367-373.
results concerning both SM-representation accuracy and
VLSI-realization efficiency. These advantages are

I1 - 120

S-ar putea să vă placă și