Sunteți pe pagina 1din 18

Cellular Neural Network Simulation and Modeling

Oroszi Balzs Mintaalcm szerkesztse 2006.01.06.

Overview
t t t t t t t

Introduction: About the CNN in general Basic characteristics of the CNN Modeling and simulation of the CNN architecture The functional model of the CNN architecture

Handling special cases to increase performance

From theory to practice: Realization of the CNN simulator Summary Demonstration

About the CNN in general


In 1988 papers from Leon O. Chua introduced the concept of the Cellular Neural Network. CNNs can be defined as 2D or 3D arrays of mainly locally connected nonlinear dynamical systems called cells, whose dynamics are functionally determined by a small set of parameters which control the cell interconnection strength (Chua). These parameters determine the connection pattern, and are collected into the so-called cloning templates, which, once determined, define the processing of the whole structure.

Basic Characteristics of the CNN


t

t t

The CNN can be defined as an M x N type array of identical cells arranged in a rectangular grid. Each cell is locally connected to its 8 nearest surrounding neighbors. Each cell is characterized by uij, yij and xij being the input, the output and the state variable of the cell respectively. The output is related to the state by the nonlinear equation: yij = f(xij) = 0.5 (| xij + 1| |xij 1|)

Basic Characteristics of the CNN (2)


t

The state transition of neuron (i, j) is governed by the following differential equation:
xi , j = xi , j +
C ( k ,l )S r ( i , j )

A(i, j, k , l ) y

k ,l

(t ) +

C ( k ,l )S r ( i , j )

B(i, j, k , l )u

k ,l

(t ) + zi , j

Where C(i,j) represents the neuron at column i, row j, Sr(i,j) represents the neurons in the radius r of the neuron C(i,j), and zi,j is the threshold (bias) of the cell C(i,j). The coefficients A(i, j, k, l) and B(i, j, k, l) are known as the cloning templates. In general, they are nonlinear, time- and space variant operators. If they are considered linear, time- and space invariant, they can simply be represented by matrices.

Modeling and simulation of the CNN architecture

t t t t

Simulation plays an important role in the design of the CNN cloning templates. Therefore, it has to be fast enough to allow the design phase of various templates be accomplished in reasonable time. At the same time, the simulation has to be accurate enough, to reflect the behavior of the analog circuitry correctly. In practice, the simulation of the CNN involves a trade-off between accuracy and computation time.

Modeling and simulation of the CNN architecture (2)

t t t

The true processing capabilities of CNNs for high-speed parallel processing are only fully exploited by dedicated VLSI hardware realizations. Typical CNN chips may contain up to 200 transistors per pixel. At the same time, industrial applications require large enough grid sizes (around 100 x 100). Thus, CNN chip designers must confront complexity levels larger than 106 transistors, most of them operating in analogue mode.

Modeling and simulation of the CNN architecture (3)

On the one hand, high-level simulation, which is focused on emulating the functional behaviour, cannot reflect realistically the underlying electronic circuitry. Their lack of detail makes them illsuited for reliable IC simulation. On the other hand, the SPICE-type transistor-level simulators, although very accurate, are barely capable of handling more than about 105 transistors and may take several days of CPU time for circuit netlists containing about 106 transistors. Hence, these low-level tools are ill-suited for simulating large CNN chips. Therefore, it would be necessary to bridge the gap between these approaches, which would give very accurate results in reasonable (but not real-) time. However, our main concern now is fast simulation, so in the rest of this presentation we shall focus on the functional modeling of the CNN architecture.

The functional model of the CNN architecture

The output of a CNN model simulation is the final state reached by the network after evolving from an initial state under the influence of a specific input and boundary conditions. The following block diagram shows the state-transition and output of a single cell:

The functional model of the CNN architecture (2)

In the most general case, the final state of one cell can be described by the following equation:
x(t ) = x(t0) + x( )d = x(t0) + f ( x ( )) d
t0 t0 t t

t t

As a closed form for the solution of the above equation cannot be given, it must be integrated numerically. For the simulation of such equations on a digital computer, they must be mapped into a discrete-time system that

emulates the continuous-time behavior, has similar dynamics and converges to the same final state.

The error committed by this emulation depends on the choice of the method of integration, i. e. the way in which the integral is calculated.

The functional model of the CNN architecture (3)

There is a wide variety of integration algorithms that can be used to perform this task. However, only three of them are going to be considered here. These methods are:
t

the explicit Eulers formula:


t n+0

the predictor-corrector algorithm:


t n+0 tn

tn

f ( x(t ))dt = t f ( x(t ))


n

f ( x(t ))dt =

and the fourth-order Runge-Kutta method:


t n+0

t ) )(0 [ f ( x(t n )) + f ( x(t n +0 ( 0 )] where x(t n +0 = x(t n ) + t f ( x(t n )) ) ) 0


k0= t f ( x(t n ))

tn

0 f ( x(t ))dt = (k0+ 00+ 00+ k 0) k k 0

where

0 k 0 = t f ( x(t n ) + k0 ) 0 0 k0= t f ( x(t n ) + k 0) 0 0 k 0 = t f ( x(t n ) + k0 ) 0

The functional model of the CNN architecture (4)

t t

The Euler method is the fastest, but gives the least accurate convergence behaviour. Runge-Kutta gives the best results, however, much slower. In this case, four auxiliary components (k1-k4) are computed. These are auxiliary values, which are then averaged. This makes it rather illsuited for applications, that prefer speed over accuracy. If the main goal would be accuracy and robustness, undoubtedly Runge-Kutta would be the method of choice. In our case, however, as the primary target is a fast, working implementation of a CNN simulator as an image processor, we shall choose the Euler method.

Handling special cases for increasing performamce

What can be considered a special case from a programming point of view?


special input special templates (A, B)

t t t

To gain significant speed improvements, the case of special templates should be examined. It is not uncommon within templates that extract local properties of the image (like edge detectors) to use a fully zero A template. I have discovered, that revisiting the state equation when the A template is fully zero, significant improvements in speed can be achieved.

Handling special cases for increasing performamce (2)

Given A = 0, the state equation takes the following form:


X = X + BU + Z

t t

(BU + Z) is constant during the process. Let: BU + Z = C Using Euler integration:


X 0= X 0+ t X 0= X 0+ t ( X 0+ C ) = X 0(0 t ) + C t X 0 = X 00 t ) + C t = [ X 0(0 t ) + C t ](0 t ) + C t = X 0(0 t ) 0+ C t (0 t ) + C t (
0 X 0= X 0(0 t ) + C t = [ X 0(0 t ) + C t (0 t ) + C t ](0 t ) + C t = = X 0(0 t ) 0+ C t (0 t ) 0+ C t (0 t ) + C t

X 0 = X 0(0 t ) + C t = [ X 0(0 t ) 0+ C t (0 t ) 0+ C t (0 t ) + C t ](0 t ) + C t =

...
t

= X 0(0 t ) 0+ C t (0 t )0+ C t (0 t ) 0+ C t (0 t ) + C t

The pattern can clearly be seen by now.

Handling special cases for increasing performamce (3)


X 0 = X 0(0 t ) 0+ C t (0 t ) 0+ C t (0 t ) 0+ C t (0 t ) + C t
t

In each new step X0(1-t)n gets multiplied by (1-t) so its power index increases. The remaining part is a geometric series, so the general equation of calculating the n-th state is:
X n = X 0(0 t ) + C t (0 t ) k
n k =0 n 0

Using the general formula of calculating the sum of a geometric series:q n 0 S n = a0 q 0 the sum of the above geometric series turns into:
(0 t ) n 0 (0 t ) n 0 C t (0 t ) = C t = C t = C [(0 t ) n 0= C [0 (0 t ) n ] ] (0 t ) 0 t k =0
n 0 k

So the state equation using this will be:


X n = X 0(0 t ) n + C [0 (0 t ) n ] = X 0(0 t ) n + C C (0 t ) n = ( X 0 C )(0 t ) n + C X n = ( X 0 BU Z )(0 t ) n + BU + Z

Handling special cases for increasing performance (4)

X n = ( X 0 BU Z )(0 t ) n + BU + Z
t

This result is of utmost importance regarding speed, because: the number of iterations that need to be performed to get to the nth state is reduced to 1 thus we can get to the final state immediately, given the U input, the B template and Z bias As multiple iterations through an image causes lots of non-cacheable memory accesses (which is very slow), this improvement in the special case of A = 0 gives a huge boost in speed.

From theory to practice: Realization of the CNN simulator


t

Environment used: Avisynth (http://www.avisynth.org)


A powerful tool for video post-production. Special programming language, designed specifically for video processing. Its functions are implemented under-the-hood as C/C++ dynamic link libraries (DLLs), which are called plugins. Plugins expose an interface (functions) towards the scripting language, from which these functions can be called. The CNN simulator is realised as a plugin (DLL) for Avisynth, written in C++. but also make sure it behaves according to the state-equation

Primary goal: speed

Summary
t

CNN simulation:

functional modeling (mathematical calculation according to the state-equation) circuit-level modeling based on functional model using Avisynth (http://www.avisynth.org) written in C++ programming language available in my web-space at http://digitus.itk.ppke.hu/~oroba

Implementation:

S-ar putea să vă placă și