Sunteți pe pagina 1din 4

ASYNCHRONOUS EVENT REDIRECTING IN BIO-INSPIRED COMMUNICATION Ph. H iger a Institute of Informatics University of Oslo, Norway e-mail: haiger@i.uio.

no
ABSTRACT The paper presents the FPGA implementation of a programmable asynchronous digital circuit (henceforth called AE-map) that remaps address events. Address event representation (AER) is an event driven communication protocol originally used in VLSI implementations of neural networks to transfer action potentials (neural voltage pulses) between neurons. More generally speaking it is suited to transmit a number of analog values that are coded in frequency of events over an asynchronous digital bus. The AE-map allows to redirect such events between an AE sender and an AE receiver, thereby for instance programming the connection scheme of a neural network. Earlier approaches for redirecting AEs have used digital synchronous devices such as DSPs or microcontrollers. The more simple and more dedicated asynchronous solution presented here is more energy efcient, does not impose a discretization on the time axis and achieves a much faster throughput. In the present implementation AEs (9 bit input, 7 bit output) can be processed at intervals of less than 84ns per output AE. 1. INTRODUCTION 1.1. Address Event Representation Address event representation (AER) is an event driven communication protocol that has originally been put forward within the eld of neuromorphic aVLSI [6, 5, 9]. Neuromorphic engineering tries to incorporate operatingprinciples of the nervous system into technical devices [8]. AER was rst used to approach the massive connectivity of biological neural networks but in general it is suited to convey a large number of analog values (e.g. sensory data) through a low capacity channel (an asynchronous digital bus). It works as follows: AER is used to transmit events. Events are characterized by a location (address) and a time. For example, in a network of neurons the address identies one particular neuron and the time would be the time at which the neuron res an action potential (AP=nerve pulse). For the transmission of a number of analog values (e.g. pixels in a camera) one would code the intensity in the frequency of such events (rate coding). (This transformation of an intensity (e.g. a photodiode current) into a event rate can be achieved quite easily by placing a simple integrate-and-re neuron circuit (6 transistors, 2 small capacitors) into the pixel.) An asynchronous digital bus is used for the actual transmission. The events location is encoded digitally as an address which is placed on the bus at the time of the event. On the receiver end of the bus this address is again decoded into a receiving location. For neural networks that location would be a particular synapse (input site) of a particular neuron on that receiver chip and for rate coded analog values it could be some integrator that reconstructs the analog value (e.g. a pixel on a screen). Or these addresses can directly be used by a digital device without the effort of an AD conversion. This event driven strategy is more energy efcient than scanning (as for example in video connections), if the data is sparse, i.e. if only a few sender locations tend to be very active at a time. An example of such data would be the output of a silicon retina [6]. This is an intelligent camera inspired by the biological retina (the photo sensitive tissue in the back of the eye). It performs some processing on an image already in the recording pixel. One variant of a silicon retina for example is only sensitive to changes. And since natural scenes tend to be rather static, fast changes happening only around edges of moving objects, a scanning strategy wastes a lot of energy on reading pixels where nothing is happening. In the worst case detection of changes might be delayed for the time it takes to scan through the whole image or even be missed, if they are synchronous with the frame rate. Whereas an AER strategy immediately reports on a change in a pixel. The drawback is a risk of over-running the bus and the need for collision handling. Other publications deal with these issues [5, 9, 1, 2, 4] (and the AER map implementation presented in this paper assumes that collisions are resolved, before AEs are placed on the bus). In general it can be said that for transmitting analog data there is a trade off in temporal resolution, intensity resolution, size of address space, and expected occupation of that address space. To come back to the example of the change sensitive retina, given the timing of the AEs not only the rate of change (rate coding) can be reconstructed but also the onset of the change is quite evident an undisturbed by a

frame rate (the rst event in a burst of activity). The order of those onsets in neighbouring pixels indicates a direction of motion and by measuring the intervals between the onsets (temporal coding) the speed of that motion becomes evident. The asynchronous unclocked implementation of the AE bus avoids introducing a discretization error on this temporal code. 1.2. Asynchronous Devices In asynchronous designs as opposed to synchronous ones each component works at its own pace. In sequential processes each component has to know when the data it is supposed to process is ready. It has to obtain that information from the component that provides the data. Pipelining, which considerably sped up synchronous processors two decades or so ago, is a natural result of this approach. However in a pipelined operation, the slowest component limits the overall speed of a sequence of operations. Still an asynchronous design can get en edge on even optimally pipelined synchronous solutions for two reasons. Firstly the slowest component (that dictates the clockrate in the synchronous approach) might not always be part of every operation, and secondly in synchronous designs the next clock cycle starts after all local operations are completed, whereas in asynchronous designs ideally the next step in a sequential operation starts immediately when the previous operation is completed. The ideally refers to the fact, that it is locally not always easily possible to compute, whether a component has nished its operation, and so as a work around the unit can simply indicate that it is nished after a xed delay, in which case there is no real advantage gained by the second argument. Concerning energy efciency, asynchronous circuits have the advantage that they do not actively consume current when they are idle and that they do not need a clock, which consumes a considerable percentage of the total power in fast, highly integrated circuits. These arguments have convinced researchers to even start developing asynchronous micro-processors [7] and an increasing number of commercial asynchronous devices are nowadays available. And as previously mentioned, if the AER map is to be used in a system that relies on temporal codes, implementing it asynchronously avoids introducing a discretization error in the time domain. 2. ARCHITECTURE OF AN ASYNCHRONOUS ADDRESS EVENT MAP In a neural net structure normally one neuron is connected to many other neurons. In AER that means that the sender address has to be mapped to several receiver addresses. This mapping could be hardwired on the sending and receiving IC-chip, such that the address on the AE-bus would correspond to one sending and several re-

consumed consume
NOR2

INPUT OUTPUT NOR2

processed process
AND2 NOR2

INPUT OUTPUT

process

NOR2 AND2 NOR2

NOR2

processed
WIRE NOT OUTPUT NOT INPUT OUTPUT WIRE

reqin ackout

INPUT

reqout ackin

Figure 2: The HS propagate circuit (used in gure 1) synchronizes a pipelining stage with its neighbours.
50 ns / div

88 ns

72 ns 52 ns

Figure 3: A recording from the FPGA by a logic analyzer that illustrates the minimal output interval and the latency of processed AEs.

ceiving sites (or vice versa), or it can also be handled by a separate component, mapping addresses on a sender bus to addresses on a receiver bus. Such an AE map can be designed to be programmable such that arbitrary network structures (mappings) can be investigated. A synchronous programmable AE map based on a DSP has been presented in [3], and others have used micro controllers (unpublished). In the following there will be a much simpler asynchronous device presented, that is more dedicated to that particular task. Figure 1 shows the block diagram of the asynchronous implementation. The size of the input and output address spaces were chosen to connect a particular retina chip to an array of articial neurons. The whole design is implemented on an ALTERA Flex FPGA (EPF10K20RC2083). Communication with a Sun Ultra 5 workstation is achieved by a PCI 16D card from EDT, which provides fast 16-bit parallel handshake controlled commu-

50 ns / div

84 ns

84 ns

Figure 4: A recording from the AER map depicting the minimal output interval in case of a one to one mapping.

9 5 D[8..0] we[2] D RAM2 Q WE D EN DOWN COUNTER A[8..0] A ALD Q eq0 1 BUS MUX 0 ADD Q we_glob 8 A 7 D RAM3 Q we[3] WE AE_out[6..0]

EQ0

cnt_en 9 A 8 D RAM1 Q we[1] we_glob WE D LATCH

cnt_clk

12ns

4ns

12ns

16ns

processed

processed

consumed

consumed

consumed

req_in consume

cnt_en

cnt_clk consume

ack_out

req req HS_propagate ack ack

req req HS_propagate ack ack

20ns

processed

process

consume

process

process

req req HS_propagate ack ack

req_in ack_out

EDOCED

access_mode[1..0]

we[3..0] eq0 we_glob

Figure 1: The schematics of the AER map. nication. The FPGA is placed on a simple PCB board that contains additional bus drivers to connect the FPGA to the 16D bus. Some additional blocks on the FPGA (not shown) make it possible to congure the AE map via the 16D bus. AEs can be sent to the map by this 16D bus or through two other connectors on the PCB. Additional circuitry on the FPGA (not shown) performs an asynchronous arbitration between those three sources of input. AEs from the map are put out on a fourth connector. The circuit of the AE map (gure 1) is subdivided into three pipelining stages (separated by dashed lines). The HS propagate circuits (described in gure 2) and the surrounding logic on the bottom of the gure control the timing and the sequence of events in the asynchronous computation. Each HS propagate circuit is in control of one pipelining stage. The delay elements contain an appropriate number of RS-ipops in series to achieve the indicated delays. The rst stage reads in the incoming AE that addresses the content in the two left hand RAMs (RAM1 and RAM2). RAM1 contains pointers to memory blocks in RAM3 to the right of the gure. These blocks in RAM3 contain all the outgoing AEs to which the incoming AEs are to be mapped. RAM2 contains the sizes of these blocks. (Note that the blocks for different incoming AEs can overlap. This can be exploited to save memory as for instance two incoming AEs that are supposed to produce the same outputs can simply point to the same block.) When the pointer and the block size are stable, the rst HS propagate circuit issues a request to the next pipelining stage which latches the pointer and loads the block size into a counter. The logic to the right of the second HS propagate block generates now the number of hand-shakes as determined by the block size. The pointer to the block and the counter value are added and the resulting pointer into the memory block is handed to the last pipelining stage, which merely controls the memory access to RAM3 an handles the handshake with the external receiver of the outgoing AEs. The signal access mode[1..0] distinguishes between an AE input (access mode=0) and write accesses to the three RAMs (access mode=1,2 or 3). The HS propagate circuit depicted in gure 2 synchronizes a pipelining stages with the previous and the next stage using a 4 phase handshake. If the circuit is not already busy (i.e. the process signal is not active) an incoming request is acknowledged as soon as the incoming data is consumed and the process signal is set. When the processing is completed (processed) an outgoing handshake is initiated at the completion of which process is reset. Thereafter new incoming requests are accepted again. This circuit has the important advantage that it does not hang even if the causality rules of a handshake are not followed by the providing and the receiving partner, i.e. when an acknowledge or a request is withdrawn too soon. 3. PERFORMANCE If the request and acknowledge signals of the output port are short circuited on the PCB, then the AE map circuit puts out AEs in intervals between 52 ns and 84 ns (de-

pendent on the nature of the mapping, see gures 3 and 4) if the AER map is overrun. In order to overrun the map with varying input from the 16D bus (gure 3) it had to be programmed to map every incoming AE to at least 6 outgoing AEs, since in our setup we could only provide changing input AEs with a minimal interval of 300 ns. This minimal interval for transmission from the 16D bus to the AER map was given by the delay on the bus, in the drivers that connected the bus to the PCB, and in the arbitration circuits on the FPGA (not shown) that allow three different sources of input. Figure 3 shows a recording by a logic analyzer of such a scenario. It also illustrates the latency of processed AEs. The circuit is programmed to map an incoming 4 (on bus P16OUT all) to (7F, 7E, 7D, 7C, 7B, 7A) (bus AEOUT all) and a 5 to (5, 4, 3, 2, 1, 0). An incoming AE sequence of (5, 4) is processed. The latency of 88 ns of the AER map is measured between the onset of the incoming request signal (MAPREQ) and the onset of the rst outgoing request (REQOUT). The signals MAPREQ and MAPACK are measured directly at input to the AER map (nodes reqin and ackout in gure 1). The latency from off-board (not shown) was 156 ns. The interval between subsequent outputs caused by two different inputs is 72 ns and the interval between two subsequent outputs caused by the same input is 52 ns. In order to overrun the map, when it was programmed to map one incoming AE to only one outgoing AE (gure 4) a faster sender was simulated by inverting the outgoing request and feeding it back as acknowledge. The input address was hold constant. One of the input connectors that did not go through bus drivers was used. For the recording in gure 4 the map was programmed to put out an F for an incoming 4. The delays caused by the signals going to and coming from off-chip plus the additional circuitry that arbitrates between three possible sources of input use up slightly more time than the map uses to process two subsequent inputs. Therefore the output interval is increased to 84 ns in this scenario. 84 ns is about two orders of magnitude faster than the published 10 s from a DSP based solution [3], although the DSP solution offers a bigger address space and the authors hope to be able to optimize their software further to achieve a shorter transmission interval of the order of 1 s (private communication). Unfortunately the energy consumption of the FPGA could not be measured directly on the PCB, since there was no separate power line going to it and most of the power of the board goes into the bus drivers. In any case we maintain the claim that the asynchronous solution fares better than a comparable synchronous implementation on the same FPGA. A synchronous solution with the same output rate would need to be clocked with at least 12 MHz (84 ns cycles) and would always consume the current that is necessary to drive that clock line. And especially while there are no AEs to process will the asynchronous implementation fare better.

4. CONCLUSION A simple and dedicated architecture and its implementation on a FPGA is presented that performs address event mapping. It is an asynchronous design that is simpler, faster and cheaper as compared to systems based on DSPs or micro controllers. The asynchronous implementation saves the power that would go into driving the clock in a synchronous design, and its current consumption is minimal when no events are processed. When testing the implementation presented here on an FPGA it can process address events in less than 84 ns per output event. Since the architecture is asynchronous, no discretization is imposed on the time and therefore discretization errors in continuous time computations on address events are avoided. 5. REFERENCES [1] A. Abusland, T. S. Lande, and M. Hvin. A aVLSI communication architecture for stochastically pulseencoded analog signals. ISCAS, III:401404, 1996. [2] K. Boahen. A throughput-on-demand address-event transmitter for neuromorphic chips. In Adv. Res. in VLSI, pages 7286. IEEE Comp. Soc. Press, 1999. [3] S. R. Deiss, R. J. Douglas, and A. M. Whatley. A pulse-coded communications infrastructure for neuromorphic systems. In Pulsed Neural Networks, pages 157178. The MIT Press, 1999. [4] P. H iger. A spike based learning rule and its ima plementation in analog hardware. PhD thesis, ETH Z rich, Switzerland, 2000. http://www.i.uio.no/ u haiger. [5] J. Lazzaro, J. Wawrzynek, M. Mahowald, M. Silviotti, and D. Gillespie. Silicon auditory processors as computer peripherals. IEEE Trans. on Neural Networks, 4(3):523528, 1993. [6] M. Mahowald. VLSI analogs of neuronal visual processing: A synthesis of form and function. PhD thesis, Cal. Inst. of Tech., Pasadena, California, 1992. [7] A. J. Martin, A. Lines, R. Manohar, M. Nystr m, o P. Penez, R. Southworth, U. Cummings, and T. Kwan Lee. The design of an asynchronous MIPS R3000 microprocessor. Adv. Res. in VLSI, (17), September 1997. [8] C. A. Mead. Neuromorphic electronic systems. Proc. IEEE, 78:16291636, 1990. [9] A. Mortara and E. A. Vittoz. A communication architecture tailored for analog VLSI articial neural networks: intrinsic performance and limitations. IEEE Trans. on Neural Networks, 5:459466, 1994.

S-ar putea să vă placă și