0 Voturi pozitive0 Voturi negative

0 (de) vizualizări10 paginiJul 02, 2019

© © All Rights Reserved

PDF, TXT sau citiți online pe Scribd

© All Rights Reserved

0 (de) vizualizări

© All Rights Reserved

- OLPCMemo1
- ANN AND IMPEDANCE COMBINED METHOD FOR FAULT LOCATION IN ELECTRICAL POWER DISTRIBUTION SYSTEMS.pdf
- Summer and weekend hours
- Ban Galore
- Neural Networks
- Computer Vision with ConvNets
- neural4
- 7_Anand_ANN.pdf
- Multi Layer Perceptrons
- Chapter 04
- Journal of Microwaves
- 8.Back Propagation Network
- CS Practical File
- Numerical_Analysis_-_MTH603_Handouts_Lecture_20.pdf
- Neural Network Approach for Modelling Global Solar Radiation.pdf
- Coding and ANN - assisted Pseudo - Noise Sequence Generator for DS / FH Spread Spectrum Modulation for Wireless Channels
- Test3
- quadratic_eq.pdf
- IJEIT1412201210_01.pdf
- C3 June 2014

Sunteți pe pagina 1din 10

DO

Printedin the USA.All rightsreserved. Copyright© 1989PergamonPressplc

ORIGINAL CONTRIBUTION

by Neural Networks

KEN-ICHI FUNAHASHI

ATR Auditoryand Visual Perception Research Laboratories

(Received 6 May 1988; revised and accepted 14 September 1988)

Abstract--In this paper, we prove that any continuous mapping can be approximately realized by Rumelhart-

Hinton-Williams' multilayer neural networks with at least one hidden layer whose output functions are sigmoid

functions. The starting point of the proof for the one hidden layer case is an integral formula recently proposed

by Irie-Miyake and from this, the general case (for any number of hidden layers) can be proved by induction.

The two hidden layers case is proved also by using the Kolmogorov-Arnold-Sprecher theorem and this proof

also gives non-trivial realizations.

Keywords--Neural network, Back propagation, Output function, Sigmoid function, Hidden layer, Unit, Re-

alization, Continuous mapping.

spaces. Huang and Lippmann (1987) demonstrated

Since McCulloch-Pitts (1943), there have been many

by simulations that three-layer networks can form

studies of mathematical models of neural networks.

several complex decision regions in pattern recog-

Recently, Hopfield, Hinton, Rumelhart, Sejnowski

nition application. However, it has been known that

and others have tried many concrete applications

any piecewise-linear decision region (which is not

such as pattern recognition, and have shown that it

necessarily convex) can be realized by a multilayer

is possible to clarify the mechanism of human infor-

network (Duda & Fossum, 1966). It's learning al-

mation processing by the use of these models. In

gorithm was also proposed (Amari, 1967) based on

particular, the back propagation algorithm (gener-

the same principle as the generalized delta rule.

alized delta rule) proposed by Rumelhart, Hinton,

There are also other applications of multilayer net-

and Williams (1986) provides a learning rule for mul-

works for forming mappings, such as NETtalk by

tilayer networks. Many applications of this algorithm

Sejnowski and Rosenberg (1987).

have been shown recently. However, there has been

Hecht-Nielsen (1987) pointed out that Kolmo-

little theoretical research on the capability of the

gorov's theorem (Kolmogorov, 1957) and Sprecher's

Rumelhart-Hinton-Williams multilayer network.

refinement (Sprecher, 1965), which are both known

On the application to pattern recognition, Lipp-

as negative solutions of Hilbert's thirteenth problem,

mann (1987) asserts that arbitrary complex decision

show that any continuous mapping can be repre-

regions, including concave regions, can be formed

sented by a form of four-layer neural network. Ue-

using four-layer networks, but this is only an intuitive

saka (1971) and Poggio (1983) have also pointed this

assertion. Wieland and Leighton (1987) showed an

out. However, the assertion has a problem in that

example of a three-layer network with thresholding

the output function of each unit of this network is

not a given sigmoid function.

Irie and Miyake (1988) obtained an integral for-

The author wishes to thank Drs. Y. Tohkura, T. Inui and S. mula which suggests the realization of functions of

Miyake,and Mr. T. Okamotofor their valuablecommentson the several variables by three-layer networks by analogy

manuscript. The author also wouldlike to thank anonymousre- with the principle of the computerized tomography

viewerswhoseconstructivesuggestionshaveimprovedthe quality (CT). But in this integral formula, the output func-

of this paper. tion 0(x) must satisfy the condition of absolute in-

Requests for reprints should be sent to Ken-ichiFunahashi,

ATR Auditory and Visual Perception Research Laboratories, tegrability, so that it cannot be a sigmoid function.

Twin21 Building,MID Tower2-1-61 Shiromi,Higashi-ku,Osaka Moreover, the function to be realized is given by an

540, Japan. integral representation and the formula does not di-

183

184 K. F u n a h a s h i

rectly give the realization theorem of functions by tween the desired output and the output signal of

networks with finite units. the network is minimized. We generally use a

In neural networks of the feed-forward type by bounded and monotonic increasing differentiable

Rumelhart-Hinton-Williams, bounded and mono- function which is called sigmoid function for each

tone increasing differentiable functions such as the unit's output function.

sigmoid function ~b(x) = 1/(1 + e -x) are used as If a multilayer network has n input units and m

output functions of units. This is a different point output units, then the input-output relationship de-

from the McCulloch-Pitts model and perceptron fines a continuous mapping from n-dimensional Eu-

which use heaviside function as output functions of clidean space to m-dimensional Euclidean space. We

units and is the reason why it is possible to derive a call this mapping the input-output mapping of the

learning algorithm for multilayer networks. network. We study the problem of network capabil-

In a feed-forward type network, it's input-output ities from the point of view of input-output map-

relationship defines a mapping which is called an pings. It is observed that for the study of mappings

input-output mapping of the network. We studied defined by muitilayer networks it is sufficient to con-

the problem of network capabilities from the point sider networks whose output functions for hidden

of view of input-output mappings. layers are the above +(x) and whose output functions

In this paper, we started from an integral formula for input and output layers are linear.

recently proposed by Irie and Miyake (1988) and

proved the theorem which guarantees the approxi- 3. APPROXIMATE REALIZATION

mate realization of continuous mappings by three- OF CONTINUOUS MAPPINGS

layer (one hidden layer) networks whose output BY NEURAL NETWORKS

functions for hidden layer are sigmoid, and whose

output functions for input and output layers are lin- We shall consider the possibility of representing con-

ear in the sense of uniform topology. It is easy to tinuous mappings by neural networks whose output

prove the theorem for k (->3)-layer networks by us- functions in hidden layers are sigmoid, for example,

ing the theorem for a three-layer case. But the proof +(x) = 1/(1 + e-X). It is simply noted here that

of the theorem for the case k > 3 gives only trivial general continuous mappings cannot be exactly rep-

approximate realization of given mappings. There- resented by Rumelhart-Hinton-Williams' networks.

fore we show another proof for the four-layer case For example, if a real analytic output function such

by using the Kolmogorov-Arnold-Sprecher theorem as the sigmoid function +(x) = 1/(1 + e -*) is used,

(Kolmogorov, 1957; Sprecher, 1965). then an input-output mapping of this network is an-

McCulloch-Pitts showed that any logical circuit alytic and generally cannot represent all continuous

can be designed using their model. Correspondingly, mappings.

our assertion shows that any continuous mapping can Let points of n-dimensional Euclidean space R"

be approximately represented by the Rumelhart- be denoted by x = ( x 1 . . . . . Xn) and the norm of x

Hinton-Williams multilayer network. defined by Ix[ = teni:o xT~,l., ) '-.

We prove the following theorems and corollaries

in this paper.

2. MULTILAYER NEURAL NETWORKS

The Rumelhart-Hinton-Williams multilayer net- Theorem 1.

work that we consider here is a feed-forward type Let qb(x) be a nonconstant, bounded and monotone

network with connections between adjoining layers increasing continuous function. Let K be a compact

only. Networks generally have hidden layers be- subset (bounded closed subset) of R" and f ( x l . . . . .

tween the input and output layers. Each layer con- xn) be a real valued continuous function on K. Then

sists of computational units. The input-output for an arbitrary e > 0, there exists an integer N and

relationship of each unit is represented by inputs xi, real constants ci, Oi(i = 1 . . . . , N), wq(i = 1 .... ,

output y, connection weights wi, threshold 0, and N, j -- 1 . . . . . n) such that

differentiable function + as follows:

f(xl ..... x.) = c,+ woxj - oi

i=1

y 0t

satisfies m a x , ~ I f ( x , . . . . . x , ) - f (x~ . . . . . x,)[ <

The learning rule of this network is known as the ~. In other words, for an arbitrary ~ > 0, there exists

back propagation algorithm (Rumelhart, Hinton, & a three-layer network whose output functions for the

Williams, 1986). The back propagation algorithm is hidden layer are +(x), whose output functions for

an algorithm that uses a gradient descent method to input and output layers are linear and which has an

modify weights and thresholds so that the error be- input-output function f ( x ~ . . . . . x . ) such that

Approximate Realization of Continuous Mappings 185

The above theorem easily leads to the following

general theorem. Usual output functions such as the sigmoid function

1/(1 + e -x) used for back-propagation neural net-

works satisfy the condition of d#(x) that ~b(x) is a

Theorem 2.

nonconstant, bounded and monotone increasing con-

Let ¢(x) be a nonconstant, bounded and monotone tinuous function.

increasing continuous function. Let K be a compact

subset (bounded closed subset) of R" and fix an in- Remark 2.

teger k - 3. Then any continuous mapping f : K

Any mapping is approximately realized by a three-

R ~ defined by x = ( x , , . . . , Xn) ~ ( f , ( x ) . . . . ,

layer (one hidden layer) network. However, it should

f,,(x)) can be approximated in the sense of uniform

be theoretically studied in the future that the pos-

topology on K by input-output mappings of k-layer

sibility of k > 3-layer networks can realize a given

(k-2 hidden layers) networks whose output functions

mapping with less costs (number of units or connec-

for hidden layers are d#(x), and whose output func-

tions) than three-layer networks, within error e.

tions for input and output layers are linear. In other

For the application of neural networks to pattern

words, for any continuous mapping f : K ~ R " and

recognition, if m is the number of recognized cate-

an arbitrary e > 0, there exists a k-layer network

gories, usually m output units corresponding to these

whose input-output mapping is given by f : K ~ R "

categories are used, and the system is allowed to

such that max,er d(f(x), f(x)) < e, where d(,) is a

learn to take values near 1 only for units correspond-

metric which induces the usual topology of R".

ing to the input categories. Corollaries show that if

one uses multilayer networks with hidden layers, any

Corollary 1.

decision region can be formed by a neural network.

Let qb(x), K be as above and fix an integer k --- In particular, a strictly increasing continuous func-

3. Then any mapping f : x E K ~ (f,(x), . . . , tion, as the output function of each unit, can be

f,,(x)) E R " where fl(x) (i = 1, . . . , m) are sum- chosen.

mable on K, can be approximated in the sense of In this paper, we call bounded and monotone in-

L2-topology on K by input-output mappings of k- creasing continuous functions, sigmoid functions. In

layer (k-2 hidden layers) networks whose output particular, a sigmoid function ~b(x) having a weak

functions for hidden layers are ~b(x) and whose out- derivative which is summable has the property that

put functions for input and output layers are linear. if we set d#~(x) = ~b(x/e)(e > 0), then the derivatives

In other words, for an arbitrary ~ > 0, there exists d#~(x) = (1/()df(x/~) converge, in the sense of the

a k-layer network whose input-output mapping is generalized function (see, e.g., Gel'fand & Shilov,

given by f : x E K---~ ( f l ( x ) . . . . . fro(X)) E R m such 1964), to the ~ function as • ~ 0. That is to say, if

that ~b(oo) - ~b(- ~) = 1, then for any smooth function

g(x) with compact support,

dL2(X)(f' f ) = (~i=1fr [fi(x, . . . . ,X,)

lim (= ~b~(x) • g(x) dx = g(O).

~'--~ + 0 .J - - ~o

- f,(Xm, . . . , x,)l 2 dx)1,2 < The following examples are included in the class

of sigmoid functions considered here.

1/~ e x p ( - x / ~ ) / ( 1 + e x p ( - x / ~ ) ) 2 and ~b(x) is a sig-

Let K be as above and fix an integer k -> 3. Let ~b(x) moid function.

be a strictly increasing continuous function such that

d~((- ~, ~)) = (0, 1). Then any continuous mapping Example 2. For ~ ( x ) = 1/V~--~ f~_~ e x p ( - t z / 2 ) dt,

f : K ~ (0, 1) m c a n be approximated in the sense of • "(x) = ~ / X / ~ exp(-x2/2~) and ~ ( x ) is a sigmoid

uniform topology on K by input-output mappings of function.

k(->3)-layer neural networks whose output functions

for hidden and output layers are d#(x). Example 3. For ~b(x) where ~b(x) = 0(x < 0),

~b(x) = x(0 < x < 1) and ~b(x) = l(x -> 1),

Proof. Set f(x) = ( f l ( x ) , . . . ,

fm(X)). As d#-l:(0, ~b'(x) = 0(x < 0 or x ->- ~), ~b'(x) = l/e(0 -< x <

1) --9 ( - oo, ~) is continuous, the theorem 2 is applied ~), and d#(x) is a sigmoid function.

to the mapping x ~ ~b-lf(x) = ( ~ b - l f ( x ) , . . . , In the McCulloch-Pitts neural model and per-

~b-lf,,(x)) and the corollary is obtained easily. ceptron, a threshold function ~b(x) = l(x - 0), =

q.e.d. 0(x < 0) is used as the output function.

186 K. Funahashi

Sigmoid functions qb(x) where 6 ( - ~ ) = 0 and For f(x) E LI(R"), Fourier transform

6(~) = 1 are appropriate as output functions in the

neural model because if we set +,(x) = + ( x & ) ( e > r(e) : f. e dx, (1)

O) then these converge to the threshold function in

the McCulloch-Pitts neural model and perceptron as

where (x, {) = £7-1 xi{i, can be defined and set

~--> +0.

McCulloch-Pitts shows that one can design any / ( g ) = ,.;f(g).

If f(x) satisfies an additional condition that f(x)

logical circuit using their model. Correspondingly,

has continuous partial derivatives of order up to n,

theorem 2 above shows that any continuous mapping

then f(x) can be represented at each point by inverse

can be approximately represented by multilayer net-

Fourier transform of .f (~) as follows.

works with sigmoid output functions.

4. PRELIMINARY 1 (MOLLIFIERS,

FOURIER TRANSFORMS) The Plancherel theorem especially asserts that ,-;can

be extended one to one onto mapping S : L2(R ") --~

Fundamental matters used in this paper are reviewed L2(R") and for f(x) E L ~ V1 L2(R"), / ; f ( { ) is equal

here. to the one defined by (1). Furthermore, for f(x) C

Let LP(R")(p -> 1) denote the space of all mea- L2(R"),

surable functions f(x) on R" which satisfy

e -i<*4>f(x) dx ~ 0(A ) + zc)

f If(x)?

n

dx < ~. i, 2

The norm of f E LP(R ~) is defined by

IIf(x)lk~ = n

If(x)l ~ dx , 5. PRELIMINARY 2 (IRIE-MIYAKE'S

INTEGRAL FORMULA)

and the convergence f,(x) ---> f(x) in LP(R") is de- The following theorem is a starting point for proof

fined by of Theorem 1.

lim IIf,(x) - f(x)llL, = O.

Theorem (Irie-Miyake)

Generally, for any measurable set K, L P ( K ) ( p >- Let O(x) E LI(R), that is, let O(x) be absolutely

1) is defined similarly. integrable and f ( x t . . . . . x,) ~ L2(R"). Let W(~j)

Let p(x) be a function on R" which satisfies the and F(wl . . . . . w,) be Fourier transforms of @(x)

following conditions: and f ( x l , . • • , xn) respectively.

If W(1) # 0, then

(i) p(x) >- O, 9(x) has continuous partial derivatives

of all orders and the support is contained in the

unit sphere IxI <-- 1.

(ii) fR" p(x) dx = 1

qJ x;w;- Wo (2,rr),xF(1)

i=l

Then, for ~ > O, set p,(x) = (1/e)"p(x/e). x exp(iw0) dwo dwl "'" dw,,.

If u(x) ~ L~o~, that is, u(x) is locally summable,

consider

Remark

This formula precisely asserts that if we set

p**u(x) = fR° p,(x - y)u(y) dy,

I~A(X1 . . . . . Xn) . . . .

then the following assertions hold: (a) p,*u(x) C -A -A

of all orders, and the support of p,*u(x) is contained

in the ¢ neighborhood of support of u(x); (b) if u(x)

is a continuous function with compact support, then 1

x F(wl ..... wn)

O~*u(x) ---" u(x) uniformly on R" as e ~ + 0; and (c) (2"rr)"*(1)

if u(x) E LP(R~)(p >- 1) then p~*u(x) ---> u(x) in L p

as ¢ ~ + 0. The operator p,* is called a mollifier. x exp(iw0) dwo] dwl 1 1 0

dw,

d

Approximate Realization of Continuous Mappings 187

lim IIL.A(X, . . . . , x,) - f(x,, . . . , X,)IIL= = O.

f[ (+(x + ~) - O(x - a))e - ~ dx = 0

Connecting this formula with three-layer net- (for any ~ > 0). (1)

works, Irie and Miyake (1988) assert that arbitrary

functions can be represented by a three-layer net- Taking the complex conjugate of the above equation

work with an infinite number of computational units. (1),

In this formula, w0 corresponds to threshold, wi cor-

responds to connection weight and 0(x) corresponds

to the output function of the units. However, the f[ (+(x + a) - do(x - a))e/~ dx = 0

~o

sigmoid function 1/(1 + e -~) does not satisfy the (for any g > 0). (2)

condition of this formula that 0(x) be absolutely in-

Since the Fourier transform al(~) of gl(x) =

tegrable and so the formula does not directly give

do(X + o0 -- 6(X -- a) E LI(R) is continuous, so,

the realization theorem of functions by networks.

from (1) and (2), G I ( 0 is identically zero. Therefore,

We prepare several Lemmas for proof of our theo- This is a contradiction, because do(x) is not

rem 1. constant, q.e.d.

Lemma 1.

Let d0(x) be a nonconstant, bounded and monotone Remark

increasing continuous function. For a > 0, if we set

Lemma 1 holds for do(x) which is locally summable.

g(x) = do(x + ~) - do(x - ~),

then g ( x ) ~ L~(R), that is,

Lemma 2

(bounded closed subset) of R n and h(x~ . . . . . Xm,

Furthermore, for some 8 > O, if we set h . . . . , t,) be a continuous function on [ - m l , Zl]

X "" X [ - A m , Am] X K.

gdx) = do(x/~ + ~) - do(x/~ - ~), Then the function defined by the integral

then the value of Fourier transform G~({) of g~(x) at

= 1 is non-zero. H(t)= f _ ' - - , f~m

A1 Am

Proof. Let ]g(x)] ~ M. For L > M,

h(Xl . . . . . xm, tl, • • • , tn) dXl • • • dxm

I

•J L

Ig(x)l d x = f'_' g ( x ) d x =

L I-°

J -L+ct

can be approximated uniformly on K by the Riemann

sum

L-or Hs(t) =

Nm

N-1 (

= (~+~ re(x) ax x ~ h

kl • 2A1

-A1 + - -

dL-a

kl...km=0 N , • . . ,

J -L-a -Am+ , tl . . . . , tn)-

N

Therefore,

In other words, for an arbitrary e > 0, there exists

a natural number No such that for N -> No,

lira

L--*~ f: L

Ig(x)] dx < ~.

max ]H(t) - HN(t)] < e.

We show that for some g > 0, G~(1) ~ 0. If the I•K

Proof. The function h(x, t) is continuous on the com-

f~ (do(x/~ + Or) - +(x/~ - oO)e -~ ax o. pact set [ - A 1 , ml] x ... x [ - A m , Am] x K, so h(x,

oo

188 K. Funahashi

JA(XI ..... Xn)-- 1 fA fA

we can take the integer No such that if N - No and (2~r)" A -A

+(x) = + - -

h(Xl ..... Xm, tl ..... tn)

k l " 2A1 for some ot and ~ so that +(x) satisfies Lemma 1 in

- h -A1 + ~ ..... Section 6.

The essential part of the proof of Irie-Miyake's

integral formula is the equality

-Am + ----~-, fi . . . . . tn < 2A1 "" 2Am"

I~A(X 1. . . . . X,,) = JA(X1 ..... Xn) (7)

Assertion of the L e m m a is obvious from this

inequality, q.e.d. and this is derived from

7. P R O O F OF T H E O R E M S

We will prove our theorems in Section 3 under the

above preliminaries. = exp(i~xiwi)i=l "*(1). (8)

can prove

Step I. Because f(x)(x = (x~, . . . , x,)) is a contin-

lira JA(Xl . . . . . Xn) = f ( x l , • • • , X,)

uous function on a compact subset K of R", f(x) can A~zc

be extended to be a continuous function on R" with

compact support. We also denote this by f(x). uniformly on R". Therefore

If we operate the mollifier p~* on f ( x ) , p~*f(x) is lira I~ A(Xt . . . . . X,) = f(Xl . . . . . Xn)

C~-function with compact support. Furthermore, A~

p~*f(x) ~ f ( x ) ( a ~ + 0) uniformly on R ~. Therefore uniformly on R". That is to say, we can state that

we may suppose f ( x ) is a C~-function with compact for any e > 0 there exists A > 0 such that

support for proving T h e o r e m 1. By the Paley-Wie-

ner theorem (see, e.g., Yosida, 1968), the Fourier max II~A(X, . . . . . Xn) -- f ( x l , • • • , X,)I < ~/2.

x~R n

transform F(w)(w = (wl . . . . . w~)) of f ( x ) is real

analytic and, for any integer N, there exists a con- (i)

stant CN such that

Step 2. We will approximate I~.A by finite integrals

IF(w)h ~ C~(1 + Iwl) N. (3) on K. For e > 0, fix A which satisfies (i).

In particular, F(w) E L ~ O LZ(R"). For A ' > 0, set

We define IA(Xl . . . . . X,), L.A(Xl . . . . . X,) and

JA(XI, . . . , Xn) as follows: A A

IA'.A(X1 .....

f

Xn) . . . .

-A -A

Ia(Xl' " " " 'Xn) = f~A "" f~zQ (~XiWi -- Wo)

IfA: (, xiwi Wo)

1

F(wl . . . . , w,) 1

(2"rr)"qs(1) × F(wl . . . . , w,)

(2"rr)"~(1)

× exp(iw0) dwo dWl "" dw,, (4)

× exp(iw0) dwo[ dwv • .dw,.

I~A(X,, j: f:[f:

. . . , Xn) . . . .

A A

We will show that, for ~ > 0, we can take A ' >

* i=,xiwi-

) 1

wo (2~r)"~(1)

0 so that

max [IA' A(Xl. . . . . X,) -- L A(Xl, • • • , X,)I < ~/2.

x•K

3

Approximate Realization of Continuous Mappings 189

Using the following equation Proof of Theorem 2

If k = 3, set f : x = (xl, • • • , x,) ---> (fl(x) . . . . .

f_AA, t~(i=~lXiWi -- WO)exp(iwo)dWo fro(X)) and apply Theorem 1 to each fi(x).

For the general case, we first remark that a k(>3)-

layer network can be represented by the composition

= (~='x'w~+m'd~(t)e x p ( - i t ) d t . e x p ( i £ x i w i ) , of k-2 three-layer networks and using the realization

35"?= lxiw i - A ' i= 1

of identity mapping by three-layer network,

the fact F(x) E L ] and compactness of [ - A , A ]" ×

K, we can take A ' so that Proof of Corollary 1

In the expression f:x---> (fl(x), . , . , fro(x)), we ex-

If ~AA l~l (i=~lXiWi - wo) exp(iwo) dwo tend fi(x) to functions which take value zero on

R" - K. We also denote these by f~(x) (i = 1 . . . . .

m). We can approximate fi(x)(i = 1. . . . , m) by

- ~ ¢(~]x~w~- wo) exp(iwo)dwo C~-functions with compact support by operating mol-

lifier p~* on f~ and apply theorem 2 to p~*fi, q.e.d.

< ¢(2rr)"l't'(1)l The above proof of the theorem 2 for the case

k > 3 gives only trivial approximate realizations of

given mappings by k-layer networks. Therefore, we

shall give a different proof for the case k = 4, by

× "'" IF(x)l dx on g . using the Kolmogorov-Arnold-Sprecher theorem,

A A which gives nontrivial realizations of continuous

mappings.

Therefore,

max I1a,,a(Xl. . . . , x,) - L.a(xl . . . . , x,)l 8. KOLMOGOROV-ARNOLD-

xEK

SPRECHER'S THEOREM

E

Let I = [0, 1] denote the closed unit interval, I" =

_= [F(x)l , x + , ) [0, 1]"(n - 2) the Cartesian product of I.

In his famous thirteenth problem, Hilbert conjec-

tured that there are analytic functions of three vari-

x --. IF(x)l dx < ~/2. ables which cannot be represented as a finite

-A A

superposition of continuous functions of only two

arguments. Kolmogorov (1957) and Arnold refuted

Step 3. From (i) and (ii), we can say that for any this conjecture and proved the following theorem.

> 0, there exist A, A' > 0 such that

Theorem (Kolmogorov)

max If(x1 .... , x.) - ZA' A(x,, • • • , X.)I <

xEK

Any continuous functions f(xl . . . . , x,) of several

(iii) variables defined on l"(n >- 2) can be represented in

the form

That is to say, f(x) can be approximated by the

finite integral IA',A(X) uniformly on K. The integrand

of I~'.A(X) can be replaced by the real part and is

f(x) = ~ ×j d~o(xi) ,

1=1 i=1

continuous on [-A', A'] × ... × [-A, A] × K,

so by Lemma 2, Im',a(X) can be approximated by the where ×j, ¢ij are continuous functions of one variable

Riemann sum uniformly on K. and ¢0 are monotone functions which are not de-

Since pendent on f.

Sprecher (1965) refined the above theorem and

Theorem (Sprecher)

O( aWX,, wo o) For each integer n -> 2, there exists a real, monotone

increasing function ¢(x), ¢([0, 1]) = [0, 1], depen-

the Riemann sum can be represented by a three-layer dent on n and having the following property:

network. Therefore f(x) can be represented approx- For each preassigned number ~ > 0 there is a rational

imately by the three-layer networks, q.e.d. number ~, 0 < ¢ < 8, such that every real continuous

190 K. Funahashi

function of n variables, f(x), defined on 1% can be proximate these functions using a sigmoid func-

represented as tion ¢b.

Let K j ( j = 1, . . . , 2n + 1) be the images of [0,

f(x) ~-~ × hi~J(Xi + ¢ ( j -- 1)) + j -- 1 , 1]" by mappings

/=1 i=1

n

where the function × is real and continuous and h is •j:x

an independent constant of f. ,=l

rem means that any continuous mapping f : x

I" ---> (fl(x) . . . . . f r o ( X ) ) E R m is represented by a

and set K = U Kj. Take ~ > 0 and the closure K~

form of four-layer neural network with hidden units of 6 neighborhood of K. Continuous functions Xp

whose output functions are 0, ×~(i = 1 . . . . . m), (p = 1 . . . . . m) are approximated by

where ~ is used for the first hidden layer, ×~ is given N

by Sprecher's theorem for f~(x) and ×~(i = 1, . . . . Xp.N(X) = E Ci.N+(ai.NX + bi,N) (9)

i=1

m) are used for the second hidden layer.

so that

9. A L T E R N A T I V E P R O O F OF T H E O R E M 2

F O R T H E C A S E k -- 4 I×p(x) - Xp.N(X)l < ¢ / ( 4 n + 2)(p = 1 . . . . , m)

(10)

In section 8, we reviewed Kolmogorov's theorem and

its refinement from the point of view of neural net- on K~. As ×p.U(X) are uniformly continuous on K~,

works. The K o l m o g o r o v - A r n o l d - S p r e c h e r theorem sufficiently small -q can be taken so that if Ix - Yl

and the following proposition are used to prove our < ~q(x, y ~ K~) t h e n ]Xp,N(X) -- Xp.N(Y)I < ¢ / ( 4 n +

theorem 2 for the case k = 4. This proposition is a 2)(p = 1 , . . . , m ) .

special case (one variable case) of theorem 1 in Sec- We apply our lemma to "rj and approximate "r/on

tion 3. [0, 1]" by "rj,U' SO that

]-rj(x) - Tj.N'(X)] < min(-q, a), (11)

Proposition

where % N , ( x ) ( j = 1 . . . . . m ) are defined as follows:

Let g ( x ) be a continuous function on R and ~b(x) a We approximate +(x) by

bounded and monotone increasing continuous func-

N'

tion. For an arbitrary compact subset (bounded

~JN'(x) ~--- E eil~)( ~lix + [)i) (12)

closed subset) K of R and an arbitrary ~ > 0, there i-1

are an integer N and real constants a~, bi, ci(i =

1. . . . . N) such that on 2n~ neighborhood of [0, 1] and set

i=l

holds on K.

In the appendix, we shall state the direct proof of so that the above inequality (11) is satisfied. Using

the above proposition by a different method without a transformation

using Fourier transforms under the additional con- 2n+l 2n+l

dition that ¢b(x) has a weak derivative which is sum- X X

mable. j=l j-1

Next we prove theorem 2 for the case k = 4 by 2n+l 2n+l

/=1 j=l

and the above proposition.

2n+l 2n+l

=l

E

/=1

fp(x)(p = 1, . . . , m) can be extended continuous

functions with compact supports. We apply Sprech- it is seen that fp(x)(p = 1 , . . . , m) are approxi-

er's theorem to f p ( x ) ( p = 1, . . . , m ) and represent mated by

fp(x) by the form 2n+l

Xp,N['rj.N,(X)] (p = 1 ..... m)

/=1

fp(x) = ~ ×e hi*( x' + ~ ( J - 1)) + j - 1

j=l i=1

on [0, 1]" so that the errors are less than ~. Looking

(p = 1, . . . , m), where h and ~ are constants. We at the form of this approximation, the theorem is

apply our proposition to functions ×p, tb, and ap- obtained, q.e.d.

Approximate Realization o f Continuous Mappings 191

10. N E U R A L NETWORK mation processing Systems, Denver, Colorado, 1987 (pp. 387-

396). New York: American Institute of Physics.

A N D INFORMATION PROCESSING Irie, B., & Miyake, S. (1988). Capabilities of three-layered Per-

IN THE BRAIN ceptrons. IEEE International Conference on Neural Networks,

1,641-648.

In the Rumelhart-Hinton-Williams multilayer

Kolmogorov, A. N. (1957). On the representation of continuous

neural network, input and output values of each unit functions of many variables by superposition of continuous

correspond to pulse-frequencies in a neuron and thus functions of one variable and addition. Doklady Akademii

each unit, disregarding time characteristics, is a very Nauk SSSR, 144, 679-681; American Mathematical Society

simple model of the neuron. When a neural network Translation, 28, 55-59 [1963].

Lippmann, R. P. (1987, April). An introduction to computing

is implemented for pattern recognition in engineer-

with neural nets. IEEE ASSP Magazine, 4, pp. 4-22.

ing fields, output units correspond to gnostic cells in McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the

the brain. idea immanent in nervous activity. Bulletin of Mathematical

The approximate realization of continuous map- Biophysics, 5, 115-133.

pings using neural networks, which are simple Poggio, T. (1983). Visual algorithms. In O. J. Braddick & A. C.

Sleigh (Eds.), Physical and biological processing of images (pp.

models of the neural system, suggest that there are 128-135). New York: Springer-Verlag.

several gnostic cells in the brain. It also shows the Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986).

possibility of revealing information processing in the Learning representations by error propagation. In D. E. Ru-

brain through neural network approaches. melhart, J. L. McClelland and the PDP Research Group (Eds.),

Parallel distributed processing (Vol. 1, pp. 318-362). Cam-

bridge, MA: MIT Press.

11. SUMMARY Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks

that learn to pronounce English text. Complex Systems, 1, 145-

We proved the approximate realization theorem of 168.

continuous functions by three-layer networks. This Sprecher, D. A. (1965). On the structure of continuous functions

theorem leads to the approximate realization theo- of several variables. Transactions of the American Mathemat-

rem of continuous mappings by k(->3)-layer net- ical Society, 115, 340-355.

Tamura, S. and Waibel, A. (1988). Noise reduction using con-

works and we showed that any mapping whose nectionist models. 1988 International Conference on Acoustic,

components are summable on compact subset, can Speech, and Signal Processing, pp. 553-556.

be approximately represented by k(->3)-layer net- Uesaka, Y. (197l). Analog perceptrons: On additive represen-

works in the sense of L2-norm. We also showed an tation of functions. Information and Control, 19, 41-65.

alternative proof of the theorem for the case k = 4 Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and Lang,

K. (1988). Phoneme recognition: neural networks vs. hidden

by using the Kolmogorov-Arnold-Sprecher theo- Markov models. 1988 International Conference on Acoustic,

rem and a proposition which is a special case of the Speech, and Signal Processing, pp. 107-110.

three-layer case. We consider that one of the prob- Wieland, A., & Leighton, R. (1987). Geometric analysis of neural

lems of analyzing neural network capabilities is network capabilities, IEEE First International Conference on

solved in the form of the existence theorem of net- Neural Networks, 3, 385-392.

Yosida, K. (1968). Functional analysis. New York: Springer-Ver-

works which are approximately capable of repre- lag.

senting any mapping given.

Presently, for application of neural networks to APPENDIX (DIRECT PROOF OF THE PROPOSITION

pattern recognition or related engineering fields, up IN SECTION 9 BY A DIFFERENT METHOD)

to four-layer networks are used (Waibel, Hanazawa, Proof. There is a continuous function g(x) on R which has a

Hinton, Shikano, & Lang, 1988; Tamura & Waibel, compact support such that g(x) = g(x) on K. We may prove the

proposition for 8(x) and so we may initially suppose that g(x) has

1988). The theorems proved here provide that the a compact support. We may also suppose that ~b(~) - d~(-~) =

mathematical base and their use would be funda- 1. For the arbitrary ¢ > 0, we will approximate g(x) on K by a

mental in further discussions of neural network sys- summation of sigmoid functions whose variables are shifted and

scaled. Initially, we can approximate g(x) by a simple function

tem theory. (step function) c(x) with compact support so that

REFERENCES [g(x) - c(x)l < ~/2 (A.1)

Amari, S. (1967). A theory of adaptive pattern classifiers, 1EEE on R and whose step variances are less than e/4. Here c(x) is

Transactions on Electronic Computers, EC-16, 299-307. represented using the Heaviside function H(x) as follows:

Duda, R. O., & Fossum, H. (1966). Pattern classification by

iteratively determined linear and piecewise linear discriminant c(x) : ~ ciH(x - x,).

functions. 1EEE Transactions on Electronic Computers, EC-

15,220-232.

For a sigmoid function ~b(x), set (b~(x) = ¢(x/ot)(ot > 0). Then

Gel'fand, I. M., & Shilov, G. E. (1964). Generalized functions, d

(Vol. 1, Chap. 1). New York: Academic Press. ~b'(x) = d~x ~b~(x) converge to the delta function as ct ~ 0. We

Hecht-Nielsen, R. (1987). Kolmogorov mapping neural network consider the convolution c*~b'(x) of c(x) and ~b'(x). We set

existence theorem. IEEE First International Conference on 2~' = "minimum width of steps" and obtain

Neural Networks, 3, 11-13.

Huang, W. Y., & Lippmann, R. P. (1987). Neural net and tra- c(x) - c*~'(x) = f " *'(y)[c(x) - c(x - Y)] dy.

ditional classifiers. In D. Z. Anderson (Ed.), Neural infor-

192 K. Funahashi

Divide the integrand of the right term into ( - ~ , - ( ) , [ - , ' , and so, c*+'(x) is represented as follows:

E'], ( ( , ~) and estimate these using the properties of sigmoid

functions. For example, c*+'(x) = ~ e,+°(x - x3.

i:I

and other terms will be arbitrarily small for a sufficiently small c~. c(x) - ~ c,+°(x x,) < ~/2. (A.2)

i=l

Therefore we obtain

Using (A.1) and (A.2) we obtain

Ic~x) - c*+'(x)l < ~/4.

As c*+~(x) = c'*qb~(x) and c'(x) is given by g(x) - ~ c,+o(x -- x,) < E.

i=1

i..l - x , / a and the proposition is proved, q.e.d.

- OLPCMemo1Încărcat deCynthia Solomon
- ANN AND IMPEDANCE COMBINED METHOD FOR FAULT LOCATION IN ELECTRICAL POWER DISTRIBUTION SYSTEMS.pdfÎncărcat deIAEME Publication
- Summer and weekend hoursÎncărcat deRubsonepro
- Ban GaloreÎncărcat deRajashekhargouda Patil
- Neural NetworksÎncărcat dech_sukundar
- Computer Vision with ConvNetsÎncărcat deSlavik
- neural4Încărcat depostscript
- 7_Anand_ANN.pdfÎncărcat deVaddi Dushyanth Kumar
- Multi Layer PerceptronsÎncărcat deMuhammad Rana Farhan
- Chapter 04Încărcat deBelle Rexha
- Journal of MicrowavesÎncărcat deSerkan Öcal
- 8.Back Propagation NetworkÎncărcat deRahman Asix
- CS Practical FileÎncărcat deDhruv Agarwal
- Numerical_Analysis_-_MTH603_Handouts_Lecture_20.pdfÎncărcat deAnonymous A0Dl8VUB
- Neural Network Approach for Modelling Global Solar Radiation.pdfÎncărcat deDomingo Ben
- Coding and ANN - assisted Pseudo - Noise Sequence Generator for DS / FH Spread Spectrum Modulation for Wireless ChannelsÎncărcat deIDES
- Test3Încărcat deRajat Kalia
- quadratic_eq.pdfÎncărcat deAonnii Tatty
- IJEIT1412201210_01.pdfÎncărcat deabhiiiiiiiiiiii
- C3 June 2014Încărcat deDon Wilson
- common core practice standardsÎncărcat deapi-267328665
- Matematik Pengurusan_Bab 9Încărcat deMardi Umar
- bridgingdeeplearningbiology_bengioÎncărcat dexcavier
- dinesh - 00171954Încărcat deDinesh Gopal
- Ma 21Încărcat deDavid Murphy
- Math Promos Summary 2010 FinalÎncărcat den2oaddict
- An Alternative Approach for Neural Network Evolution With a Genetic Algorithm Crossover by Combinatorial OptimizationÎncărcat deandreinuta
- visicoÎncărcat deminhtama7
- 1st interpolation.pdfÎncărcat deAnamaria-Ileana Lăzăroiu
- 1st interpolation.pdfÎncărcat deAnamaria-Ileana Lăzăroiu

- La Evolucion Lean Six Sigma (1)Încărcat deJuan Coasaca Portal
- base mapeo.pdfÎncărcat deDiego Rivera
- 354238172-Tecnicas-de-Muestreo-Estadistico-Cesar-Perez-pdf.pdfÎncărcat deDiego Rivera
- ASTM A283 Grados a,b,c,dÎncărcat deDiego Rivera
- WAM2Încărcat deTaufik Sign
- Working Capital ChinaÎncărcat deDiego Rivera
- working capital.pdfÎncărcat deDiego Rivera
- 1-s2.0-S1665642315000371-main.pdfÎncărcat deCarmen Servigna
- Geometric AÎncărcat deDiego Rivera
- Técnicas de Muestreo Estadístico - Cesar Pérez.pdfÎncărcat deDiego Rivera

- Draft Syllabus for interdisciplinary graduate course on race and ethnicityÎncărcat deTanya Golash-Boza
- 4. Politics & Bureaucracy in Urban GovernanceÎncărcat deAntriksh Pandey
- Juniper Practice questionsÎncărcat deDasaradhiNidamarthy
- Marketing Mix analysis of Ford Motor companyÎncărcat defarahnaz
- Concept of Quality EducationÎncărcat dealishaw99
- Solution Architecture Review Blueprint Preparation TemplateÎncărcat deFoxman2k
- Notes Thich Nhat Hanh.docÎncărcat deEllenSSinger
- 2012 A Comparison Study between Data Mining Algorithms over Classification Techniques in Squid Dataset .pdfÎncărcat deSoniya Sk
- 21TS1020 3P66SN STR-W6554A TDA11145PS N3 LA78141 LA7840 TDA7266 5800 A3P660 000 completoÎncărcat deRonal Gutierrez
- SpeechGeek H.I.-larious Volume 3Încărcat deSpeechGeek
- Everson David, a Brief Comparison of Targumic and Midrashic Angelological TraditionsÎncărcat dePosa Nicolae Toma
- Eco DriveÎncărcat deserege
- Material Science and Technology - Nuclear MaterialsÎncărcat deعلي علي
- Parent-of-origin-specific_a....pdfÎncărcat deAbdul Qadir
- Teaching EnglishÎncărcat deJovana Zivkovic
- Nervus trigeminalÎncărcat depetersouisa6454
- QA Analyst/ TesterÎncărcat deapi-79223732
- Using Oxidation States to Describe Redox Changes in a Given Reaction EquationÎncărcat dekushan
- Islands Near MakassarÎncărcat deSholehuddin Zainudin
- Great Circle - Chief MateÎncărcat devisakakakora
- KIMIA LIngkunganÎncărcat deFatimah Al Jihadiyah
- 5118.pdfÎncărcat deaishwarya
- essentialbiology3-7respirationcore-110205223527-phpapp02Încărcat dePretty Brownie
- Installation of Corrugated Pipe for SewerÎncărcat dekaishi
- sony_kdl-26_32_37_fa400_n4000_chassis_ma2_sm (1).pdfÎncărcat deErick David Sandoval
- 5 Off Design Performance 111 (1)Încărcat deمحمد عبدالله
- La Diferencia Entre La Fundición y ForjaÎncărcat deSergio Panta Abad
- Technology AddictionÎncărcat deBayAreaNewsGroup
- Joshi et al_2Încărcat deBishnu Joshi
- Presentation-Community Development_lower version.pptÎncărcat deyapatnuC

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.