Sunteți pe pagina 1din 11

RESEARCH PAPERS

DESIGN ENHANCEMENT OF COMBINATIONAL NEURAL NETWORKS USING HDL BASED FPGA FRAMEWORK FOR PATTERN RECOGNITION
By PRIYANKA MEKALA * JEFFREY FAN **

* Research Assistant and PhD Candidate, Department of Electrical and Computer Engineering, Florida International University, Miami, FL, USA. ** Assistant Professor, Department of Electrical and Computer Engineering, Florida International University, Miami, FL, USA.

ABSTRACT The fast emerging highly-integrated multimedia devices require complex video/image processing tasks leading to a very challenging design process; as it demands more efficient and high processing systems. Neural networks are used in many of these imaging applications to represent the complex input-output relationships. Software implementation of these networks attain accuracy with tradeoffs between processing performance (to achieve specified frame rates, working on large image data sets), power and cost constraints. The current trends involve conventional processor being replaced by the Field programmable gate array (FPGA) systems due to their high performance when processing huge amount of data. The goal is to design the Combinational Neural Networks (CNN) for pattern recognition using an FPGA based platform for accelerated performance. The enhancement in speed and computation from the hardware is being compared to the software (using MATLAB) model. The employment of HDL on the FPGA enables operations to be performed in parallel. Thus allowing the exploitation of the vast parallelism found in many real-world applications such as in robotics, controller free gaming and sign/gesture recognition. As a validation of the CNN hardware model a case study in pattern recognition is being explored and implemented on Xilinx Spartan 3E FPGA board. To measure the quality of learning in the trained network mean squared error is used. The processing performance of this non-linear stochastic tool is determined by comparing the HDL (parallel model) simulations with the MATLAB design (sequential model). The gain in training time and memory used for processing is also derived. Keywords: VHDL, Combinatorial Neural Networks, Back Propagation, Pattern Recognition. INTRODUCTION The neuroscience, study of the human brain, is thousands of years old. This fascination with the human brain has led to the development of Artificial Neural Networks (ANNs) which have been made possible due to advances in electronics. ANNs have been used successfully in a broad spectrum of applications such as pattern recognition, data classification, control systems signal processing and functional approximations, etc. Much work has been done in these fields that rely on software simulations and investigating the capabilities of the ANN models using both analog and digital implementations (Torres-Huitzil, Girau, & Gauffriau, 2007). Digital implementations are more popular for their basic advantages of higher accuracy, less noise sensitivity, more flexibility and compatibility with different types of processors. These digital implementations can be done either with a digital signal processor or FPGA or programmable logic design. An FPGA-based implementation, would be the best choice from the previously mention platforms since it can work in parallel as is the case of ANNs behavior (Cantrell & Wurtz, 1993)(Baker & Hammerstrom, 1989)(Blais & Mertz, 2001)(Vargas, Barba, Torres & Mattos, 2011). Previous research on implementing various kinds of neural networks on the HDL platform in (Ali, & Mohammed, 2010)(Omondi & Rajapakse, 2006)(Izeboudjen, Farah, Bessalah, Bouridane & Chikhi, 2008)(Schemmel, Meier & Schurmann, 2001) has focused on developing the neuron models and their

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

RESEARCH PAPERS
validation. The computations involved mostly the fixed point integer rather than floating point which results in some false outputs. This can be fixed by introducing libraries defining the float type variables and vectors. Pattern recognition using the neural networks is dealt recently in (Vargas, Barba, Torres & Mottos, 2011). The values of pixels of an image frame were used as inputs for recognition thus causing increased memor y usage and computations. We choose to perform the recognition on the bitmapped (depth 4) image rather than the grayscale (8 bits) image. Gain in bandwidth is achieved in terms of memory storage. Also, once the image is preprocessed, the features are extracted and used as inputs in this architecture proposed. In this paper, the authors present the new generic design of Combinational Neural Network (CNN) proposed in earlier research (Mekala, Erdogan & Fan, 2010) for pattern recognition based on Xilinx Spartan 3E board using VHDL model called HDL-CNN. The simulation of VHDL models are facilitated by the use of stimulus sequences and checkers (e.g., VHDL test benches, mean square error). The training time and computations variations (which depend on global parameters defined by the user) are analyzed and displayed in later sections of this paper. Comparison is made in order to establish the performance in speed of the model proposed. The rest of the paper is as follows: section 2 presents the design progression using HDL and FPGA logistics, section 3 explains the Combinational Neural Networks (CNN) and HDL-CNN, section 4 explains the Sign/Gesture recognition model, section 5 presents the result, and, section 6 concludes the paper. 1. Design progression using HDL The first question that comes to mind is: Why use a high level design methodology (such as HDL) for CNN implementation as opposed to other objectoriented simulations. The answer would be that high speed processing can be achieved through dedicated hardware working in parallel which can be implemented on FPGAs using HDL. ANNs are powerful systems capable of modeling the complex inputoutput relationships. Information is processed via the mathematical models using the interconnections of neurons. Some interesting features displayed by the network engine
VHDL provide ways to describe propagation of time and signal dependencies. Hardware oriented Digital logic design (The operations and structure are described in gate level and RT level hierarchal design). VHDL supports unsynthesizable constructs that are useful in writing high-level models, test benches and other non-hardware artifacts needed in hardware logic design. VHDL has static type checking-many errors can be caught before synthesis and/or simulation. VHDL has a rich collection of data types and well-defined standard with a full-featured language and module system (libraries and packages). No way to describe time and signal dependency. Software orientedBinar y executable (Data flow language and non-hierarchal design)

are adaptation, parallelism, classification, optimization and generalization. The debate over whether to build a generic system that can be reprogrammed on user demands/ applications or a single specialized dedicated to one application with high speed performance still prevails (Omondi, & Rajapakse, 2006). The researchers propose a HDL based design methodology to tradeoff the high level application requirements and the low level FPGA hardware for patter recognition. The HDL description has the advantages of being generic, flexible, dynamic reconfiguration on user demand and useful to gain more control of parallel processes. Efficient reusability and performance is derived by providing the characteristics of entities into a model library (Izeboudjen, Farah, Bessalah, Bouridane & Chikhi, 2008). Table 1 shows the comparison of VHDL with the procedural languages and outlines the advantages of characterizing digital hardware using hardware description language based on entity
VHDL (Hardware descriptive) Procedural languages (C, C++ , MATLAB) VHDL contains components that are Traditional software languages like C, concurrent i.e. run in parallel/ simultaneously. C++, and MATLAB are sequential.

Explicit constructs and assignments are not supported by the procedural languages.

Errors can be analyzed only after debugging. Synthesis errors are hard to debug. Object oriented programs are written with pure logical or algorithmic thinking. Inherently procedural (single-threaded), with limited syntactical and semantic support to handle concurrency.

Table 1. VHDL vs. Procedural languages

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

RESEARCH PAPERS
connections, concurrent operations, propagation delay and timing information (Omondi & Rajapakse, 2006) (Berry, 2002)(Schemmel, Meier & Schurmann, 2001)(Short, 2009)(Ashenden, 1995). 2. Hardware Descriptive Language - Combinational Neural Networks (HDL- CNN) The CNN is a special class of ANN being described as follows. The design resembles the tree structure in addition to the generic architecture of a neural network. The previous research on the software solution CNN design as proposed in (Mekala, Erdogan & Fan, 2010) is based on the address search of the virtual memory of a CPU. This paper examines an alternative implementation of the CNN on the hardware platform called as the HDL-CNN which modifies the architecture with the help of VHDL design on a FPGA to better the performance. A basic neural network engine and the extension of back-propagation network on to the HDL- CNN model are described below. 2.1 Generic Neuron Model in HDL In order to model an artificial neuron from a biological neuron, three basic components are used - input to the neurons, synaptic weights and activation threshold function. The synapses of the biological neuron (i.e. the one which interconnects the neural network and gives the strength of the connection) are modeled as synaptic weights. Mathematically they can be considered as functions- two linear and one non-linear. All inputs are modified by the weights and summed altogether. This activity is referred as a linear combination. The linear combination of the input stage and aggregation is being modeled as a simple MAC (multiply and accumulate) function. The output of the MAC is passed through a nonlinear activation threshold to determine the output. The activation function considered could be step function (simplest non-linear function), ramp function or a sigmoid function (Mehrotra, Chilukuri & Ranka, 1997)(Caudill & Butler, 1992)(Stergiou & Siganos, 1996)(Dreyfus, 2005). The Figure 1 shows the neural network engine with the three layer structure (Fausett, 1994). Each neuron receives several inputs i.e. xi and generates pre-output vk (k representing the neuron generating output) through the
Figure 1. Generic Neural Network Model
xp

x0 x1

wk0 wk0 = bk (bias) wk1 Activation Function wk2 vk S Summing Junction wkp Synaptic Weights j ( ) Ouput yk

x2

Input signals

q k Threshold

LINEAR FUNCTIONS MAC LAYER

NON-LINEAR FUNCTION

linear function of multiply and accumulate as follows: (1) The synaptic weight of the connection is given by wki; where 'p' is the number of incoming inputs to the neuron. The output of the model is given by yk given by the preoutput passed through the activation function (.) (Sigmoid function defined in Eq. 3) as shown below: yk = j (n k) 2.2 HDL- CNN architecture Model The CNN is built on the basic network of a back propagation described in previous research (Mekala, Erdogan & Fan, 2010). The features extracted from the prior module block are divided into classes or stages where a set of features describe some information on the probability of the decision of the recognition. Hence from a set of M actual features to be extracted, the probabilistic decision is made on three levels with first level monitoring the other two levels. Each level is fed with a vector V of variable length. Hence there exists vector V of size L1, L2 and L3. Since the platform is being designed to serve as a generic model and flexible to user demands, the value of M varies from application to application depending on the linearity of the output classes. A parallel communication bus is provided between the feature extraction layer and the three level CNN recognition model in order to allow the flow of the three level sets of the vector data as well as an initialization clock signal to choose either level2 or level3 once the decision on level1 is made (Mekala, Erdogan & Fan, 2010). The time and (2)

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

RESEARCH PAPERS
memory consumption involved in computing the feature vector depends on the length of it. Thus instead of deriving all the values of the vector at a time, three levels are involved in order to improve the speed and performance of the HDL-CNN model. The activation function used in the modeling of the CNN architecture is the sigmoid function. Each node in the network receives several input values and combines them to produce an output value. The node's activation function determines the manner in which these values are combined. It is necessary for the activation function to combine the input values in a non-linear manner so as to fit for wider range of task applications. Since each stage of CNN is constructed using back-propagation network (three layers- input, hidden and output layers) it is important that the activation function used needs to be continuously differentiable. There exists several functions which meet this criteria but the most commonly used activation function is the sigmoid function as described by the Equation 3 below (Kwan, 1992). (3) It is not easy to represent sigmoid function in digital design since it contains the exponential series. In the object oriented programming based models it is defined with the help of a look up table consuming more memory resources. In the HDL-CNN design piecewise second order approximation of the function using quadratic functions is implemented shown in Equation 3 where c0, c1 and c2 are coefficients of the quadratic function (Tommiska, 2003). This requires two adders and three multiplier operators redesigned as two MAC operations and a shift register for calculating the square. Each level vector based back propagation module is evaluated as described below. Assuming H is the vector of hidden-layer neurons, I is the vector of input-layer neurons and W1 is the weight matrix between the input and hidden layer, W2 is the matrix of synapses connection hidden and output layers, th1 and th2 are the effect biases on the computed activations (set to value 1 for this design), T is the target activation of the output layer, is the momentum factor used to allow the previous weight
Figure 2. HDL- CNN recognition model Block diagram
H=j (I.W1+th1) Hidden layer neuron activations 0=j (H.W2+th2) Output-layer neuron activations D = 0(1-0)(0-T) Output-layer error E = H(1-H)W2.D Hidden-layer error W2 where DHD+D W2 =mW2t-1 Weights for second layer of synapses W2= W2+D IE+m D W1 Weight adjustment of first layer of synapses W1= W1+W1t where W1t =a t-1

change to influence the weight change and is the learning rate adopted for the training. The mathematical equations are given in Table 2 for the design of each level of CNN adopted (Fausett, 1994). Generally the error threshold adjustment and learning rate (generally between 0.5 and 0) variations adds little to the process; so the idea of momentum is used to boost the performance. On each pass through the layers, the weight change of a matrix of synapses is influenced by the previous pass's weight change. The degree to which it is influenced is determined by the momentum term (generally varies between 0 and 1). The weight adjustments are made as epoch based training where at the end of each epoch the cumulative error is also tracked. All the factors such as size of the three level vectors L1, L2, L3, each BP input, hidden and output layers, learning rate parameters are user defined and are being set in a configuration file. Figure 2 shows the block diagram of the HDL- CNN model. In order to generate the

Table 2. Equations governing each level of CNN (Back propagation neural network)

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

RESEARCH PAPERS
random weights a linear shift register module is used (weights between -1 and 1 are generated). The asynchronous RESET when set to high, the internal finite state machine of CNN is reset to the initial state. During the initialization phase, the CNN randomizes all connection weights using the shift register module and when completed it enters the idle state. The training and testing is done in two different modes called TRAIN and TEST. When the mode is set to TRAIN - the CNN enters the training state from idle and during TEST- CNN enters the run state with the corresponding flags being set. 2.2.1 Computational Analysis The number of computations involved and the gain acquired by shifting the architecture to HDL platform is being modeled below. With the feature vector V of variable length is a function of number of patterns 'p' considered for recognition and the number of features extracted 'n'. (4) General consideration of the CNN level is n input neurons, h hidden neurons and l output neurons where n < h < 2n1; MAC represents multiply and accumulate, A is adder, M is multiplier and S is a shifter operation. For a sigmoid operation the software solution using a look up table (LUT) where the time taken for performing one LUT depends on the speed of the processor. In the HDL-CNN the quadratic equation depicted uses one MAC and one Shifter (MAC +S) for the calculation of single neuron activation function. In the HDL-CNN the matrix operations involved in the weight layer updates and error calculations are performed by the dedicated adders, multipliers, shifters and MAC units and hence concurrently (in one complete clock cycle) done at a time for all neurons rather than the for loop control used in software models. Each level of the CNN has the different number of computations involved listed in Table 3 where the values of n, h, and l (the input, hidden and output layer neurons) vary from level1, level2 and level3. The average gain in training time is plotted and discussed in the results section supports the above analysis.
Figure 3. System Overview of the sign recognition model
Hidden layer neuron activations Output-layer neuron activations Output-layer error Hidden-layer error Weight adjustment of second layer Weight adjustment of first layer CNN {nhM + (1+h(n-1))A} + h(LUT) {hlM + (1+l(h-1))A} + l(LUT) 2lA + 2lM (h(l-1))A + (lh + 2h)M (2hl)A + (2lh)M (nh)A + (3nh)M HDL-CNN {MAC}h + (MAC + S)h {MAC}l + (MAC + S)l (l + 1)MAC h(MAC + M) hMAC + 2A nMAC + 2(A + M)

Table 3. Comparison of Number of computation involved in each level between the MATLAB CNN and HDL-CNN

3. Sign/Gesture Recognition Model A recognition model is shown below in Figure 3. Sign recognition using neural networks is based on the learning of the network using a database set of signs/ gestures (Vargas, Barba, Torres & Mottos, 2011). The architecture is designed based on camera-based recognition methodology. Once the video/ image are being obtained from the acquisition unit, the image (256x256 pixels) is processed in various stages and data is extracted to implement the recognition model. The first step of the model after the image data acquired is pre-processing. In general raw image data processing consumes high memory and other resources due to redundancy in the spatial and temporal basis. Pre-processing involves filtering and background subtraction in order to consider various environment factors such as illumination, unwanted noise and other scene conditions done using MATLAB. These pre-processed frames are taken as input (bitmapped images) into the FPGA for LoG (Laplacian of

10

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

RESEARCH PAPERS
Gaussian) edge detection. The feature extraction block and CNN block perform in parallel with dual bus communication interface provided. The feature extraction layer extracts the necessary features of size, shape and state attributes of the hand (described in detail (Mekala, Gao, Fan & Davari, 2011)). Since it is time consuming for the processor to wait until all the 55 features have been extracted, the CNN layer is initiated at the arrival of first 15 feature elements to Level1 and then the 40 for the next level 2 or level 3 adopted based on the decision of level 1 network (Mekala, Gao, Fan & Davari, 2011). The training of CNN is done using the sign language patterns from A to Z (without J and Z characters involving the motion). In order to test the ability and performance of the network, usually a test set of independent examples is used in order to generalize the network with regards to example sets which are not present in the training set. The case study of American Sign Language (ASL) recognition is being interpreted step by step as follows: Image acquisition via camera and generating still image frame data (Video to Frames conversion with background subtraction) done using MATLAB and stored as .coe files. Transfer of the image data to a Xilinx Spartan 3E FPGA (Field Programmable Gate Array) board via USB 2.0 using a PC. Saving the data to the onboard SRAM (Static Random Access Memory) to allow image processing functions to be performed on the image. Implementing the edge detection and feature extraction algorithms on the image and storing the feature vector back in the SRAM. Recognition via CNN model with parallel interaction to the feature extraction unit. Display of the input frame and processed frame on to the PC to be viewed by the user via the VGA controller, recognized sign displayed on the LCD segment. The model schematic of the sign recognition is shown in Figure 4. It contains the SRAM module, preprocessor module, feature extraction module and the CNN recognition module. There are three main hardware
Figure 5. Xilinx Spartan 3E kit Hardware connections Figure 4. System Overview of the sign recognition model Preprocessing, Feature Extraction and CNN Engine.

components in use in this case study realization of CNN using HDL for sign language recognition. Figure 5 shows the connections between the FPGA board Xilinx Spartan

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

11

RESEARCH PAPERS
3E, the USB to Peripheral communications module and a monitor with VGA connection in order to display the recognized output sign. The authors adopt Xilinx Integrated Software Environment (ISE 10.1) which is a powerful, flexible integrated design environment that allows designing Xilinx FPGA devices from basic modules to complete microprocessor architectures. Project Navigator is the user interface that manages the entire design process including design entry, simulation, synthesis, implementation, and finally download the configuration of the FPGA device. PACE is responsible for placing and routing the code for optimization. IMPACT then generates the programming files and downloads the code to hardware (Xilinx, 2009). 4. Results Most components of the architecture perform in parallel and hence the potentially infinite training times are reduced reasonably. The training time is the time taken to train the network for a given number of patterns 'p' without duplicates input frames. The training time is being plotted as varied with respect to number of patterns being trained in Figure 6. It clearly shows that the HDL-CNN model saves at an average 13x times the time involved in training when compared to the software based CNN model. Also the curve states that the time saved increases exponentially as the number of patterns increases i.e. as complexity of recognition is more non-linear and thus an average is considered for comparison. The adjustments of the weight matrices and neuron activation vector are all
Training time vs. No. of Patterns 600 CNN Trainime t 500 HDL CNN Train time

parallel procedures hence the gain in speed. On an average (mean over different number of patterns) the time consumed for the same design on MATLAB was around 381.36 seconds while the time taken for the design on VHDL is CPU time 29.34 seconds and real time 30 seconds as listed in Table 4. The HDL solution proposed proved to be 13x times better in consideration with time and speed of the training of network. The images after background subtraction are converted from grayscale to bitmap of depth 4 (sent as input in the form of '.coe' files to the ISE). Grayscale involves 8 bits to represent each pixel where as the bitmap image of depth 4. Thus processing of 256x256 image saves 262144 (256x256x4) bits in representation that is 256Kb in bandwidth listed in Table 4. To validate the performance of the HDL-CNN, they generate the mean square error test-bench considering the actual situation of the neural network operation. The test-bench adopting the three-level feature vector as input signal vectors, and the weight coefficient of hidden and output layers are stored in RAMs. Both mean square error and evolution of weights are transferred to text files and plotted using MATLAB. Epoch based updating of the weights is performed and Mean Square Error is decreasing at an exponential rate and settling down to an almost constant value shown in Figure 7. An epoch is the presentation of the entire training set to the neural network once and for the network to reach the minimum threshold error the training is done multiple times counted as number of epochs. Maximal weight change in each epoch is decreasing and finally reaching to a least value possible. The best, intermediate and worst case scenarios are shown in Figure 7 where the evolution of weights is settling down to a constant value at the end of 1815 epochs for the best case obtained by the global

400 Training Time

15x times

300

200

13x times

HDL-CNN (Hardware solution) CNN (MATLAB solution) Average Training Time 29.34secs Single Pattern Recognition Time 43.45ms Average Performance 95% Average Noise Immunity 51% Epochs (Best case scenario) 1815 J,Z (signs with motion) Limitations 256Kb per frame Gain in bandwidth 381.36secs 0.52secs 92.8% 48% 1832 -

100 9x times 0

6 7 8 Number of Patterns to be trained

10

11

Figure 6. HDL-CNN vs. CNN architecture Training time (in seconds) variation based on number of patterns to be trained

Table 4. HDL-CNN recognition model vs. CNN recognition model (Software vs. Hardware architecture)

12

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

RESEARCH PAPERS

Best case scenario11 Hidden layer neurons, learning rate 0.1, momentum 0.4 and threshold 0.0001.

Intermediate case scenario-Hidden layer units=11; learning rate=0.1, momentum=0.7; threshold=0.0001

Worst case scenarioHidden layer units=11; learning rate=0.01,No momentum; threshold=0.001

Figure 7. Various simulations for Best, Intermediate and Worst case considerations for the each level of HDL-CNN training acquired (15 input neurons)

parameters (learning rate, momentum and error threshold) optimization. Inclusion of the momentum is proved to be useful with the training sets that include a few patterns that are very different than the rest (as in patterns B, W, Y are completely different from the patterns A, C, O where the finger tips are not present) demonstrated in the worst case where there is no momentum compared to the best and intermediate cases. Normally, these patterns will upset the convergence towards the minimum defined by the majority of the patterns. To improve that, one could use a very small learning rate (<0.1), but then the convergence would be very slow. Instead the study keep a moderate learning rate (0.1) but the authors will involve the previous weight change, in addition to the current data (weight change), for defining the weight upgrade. This will provide certain inertia to the training, which will minimize the disruption of the convergence caused by

strange patterns. Figure 8 generalizes the results of few test patterns with the LoG edge operator and the sign recognized. Few noisy patterns are also tested in order to evaluate the accuracy of the architecture. Though the network is trained using different test patterns, it appears that the noise immunity levels are varying for each sign involved. Noise immunity is the level of noise under which the pattern can still be recognized accurately. The correlation between the signs plays a role factor for the inconsistency of the noise immunity seen. The performance is calculated as the ratio of correct patterns recognized to the total number of test patterns. On an average performance of 95% is achieved with the pattern identification and takes around 43.45ms to retrieve one image pattern. Given an input frame for testing the time taken by the network architecture to process and recognize the sign is the

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

13

RESEARCH PAPERS

Image Processed-LoG (Laplacian of Gaussian) Edge detection

Sign Alphabet Recognized

Noisy Image ProcessedLoG Edge detection

Sign Alphabet Recognized

B I C
Figure 8. Sign language alphabets recognized by the HDL-CNN recognition model

Y V L
updates and hence concurrently done at a time for all neurons rather than the for loop control used in object oriented languages. Arithmetic precision is also achieved due to the use of floating point libraries. Moving to this parallel hardware provided the speedups in orders of magnitude (13x times in this case). Many advanced families of FPGAs have been manufactured (Vargas, Barba, Torres & Mattos, 2011) that contain more logic blocks and also video input controllers, which clearly implies the design to be optimized on different goals of area, power and speed. The use of VHDL for the architectural design represents a very practical option when dealing with complex systems. Thus the FPGAs constitute a very powerful option for implementing CNNs since we can really exploit their parallel processing capabilities to improve the performance. To progress the research the algorithm needs to be extended to recognize words or sentences which involve a set of images (i.e. video frames) to be processed at a time with the help of a vector bank. Also the HDL-CNN architecture is generic as it could be used for other pattern recognition

single pattern recognition time. The signs involving motion (J, Z) are the limitations of the architecture as compared to the software solution. The epochs involved to reach the steady state and the noise immunity achieved are approximately equal in both cases. An inclusion of a SRAM vector back to store the motion vectors of adjacent frames could be considered for future research in order to eliminate the above limitations. Conclusion The combinational neural networks are one of the most powerful tools in the recognition/ identification process applications. The VHDL based model design of the sign recognition model presents a performance pretty good to identify the static images of the American Sign Language alphabets with implementation on the Xilinx Spartan 3E FPGA. Performance is achieved as the expensive operations are optimized in VHDL by the use of a matrix-vector multiplication performed during each layer and level data flow. Dedicated adder and multipliers are used for performing the weight layer

14

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

RESEARCH PAPERS
(like objects, face) provided the training sequences have to be varied. References [1]. Torres-Huitzil, C., Girau, B., and Gauffriau, A., (2007). Hardware/Software Co-design for Embedded Implementation of Neural Networks. Reconfigurable computing: architectures, tools and applicationsLecture notes in computer science, 4419, 167-178. [2]. Cantrell, C., and Wurtz, L., (1993). A Parallel Bus Architecture for artificial neural networks.Southeastcon'93 P r o c e e d i n g s , I E E E ( p p. 5 ) . d o i : 10.1109/SECON.1993.465674. [3]. Baker, T., and Hammerstrom, D., (1989). Characterization of Artificial Neural Network Algorithms. Circuits and Systems- IEEE International Symposium, vol.1, 78-81. doi: 10.1109/ISCAS.1989.100296. [4]. Blais, A., and Mertz, D., (2001, July). An Introduction to Neural Networks Pattern Learning with Back Propagation Algorithm. Retrieved from http://www.ibm.com/developerworks/library/l-neural/. [5]. Vargas, P Lorena, Barba, L., Torres, C. O., and Mattos, . L., (2011). Sign Language Recognition System using Neural Network for Digital Hardware Implementation. Journal of Physics: Conference Series, 274(1). doi: 1088/1742-6596/374/1/012051. [6]. Ali, H. K., and Mohammed, E. Z., (2010, August). Design Artificial Neural Network using FPGA. International journal of computer science and network security, 10(8), 88-92. [7]. Omondi, R. Amos, and Rajapakse, Jagath C., (2006, July). FPGA Implementations of Neural Networks. Springer. [8]. Izeboudjen, N., Farah, A., Bessalah, H., Bouridane, A., and Chikhi, N., (2008, July). High Level Design Approach for FPGA Implementation of ANNs. 10.4018/978-1-599-4-849-9. [9]. Berry, D. L., (2002). VHDL programming by examples. McGraw-Hill, fourth edition. [10]. Schemmel, J., Meier, K. and Schurmann, F., (2001, Encyclopedia of Artificial Intelligence, IGI-Global Publishers. doi: October). A VLSI Implementation of an Analog Neural Network suitable for Genetic Algorithms. ICES '01 Proceedings of the 4th International Conference on Evolvable Systems: From Biology to Hardware. SpringerVerlag London, 50-61. [11]. Short, Kenneth L., (2009). VHDL for Engineer. NJ: Pearson Prentice Hall. [12]. Ashenden, Peter J., (1995). The designer's guide to VHDL. San Francisco: Morgan Kaufmann publishers. [13]. Mekala, P Erdogan, S. and Fan, J., (2010, ., November). Automatic object recognition using combinational neural networks in surveillance networks. IEEE 3rd International Conference on Computer and Electrical Engineering (ICCEE'10), Chengdu, China, Vol. 8, pp. 387-391. [14]. Mehrotra, K., Chilukuri, K.M., and Ranka, S., (1997). Elements of Artificial Neural Networks, The MIT Press, pp12. [15]. Caudill, M., Butler, C., (1992). Understanding neural networks: Computer explorations, MIT press. [16]. Stergiou, C., and Siganos, D., (1996). Report: Neural Networks. 11/report.html. [17]. Dreyfus, G., (2005). Neural networks: methodology and applications. Berlin, New York: Springer. [18]. Kwan, H.K., (1992, July). Simple sigmoid like activation function suitable for digital hardware implementation. Electronic Letters, 28(15), 1379-1380. doi: 10.1049/EL: 19920877. [19]. Fausett, L., (1994). Fundamentals of Neural Networks architecture, algorithms and applications. Prentice Hall. [20]. Mekala, P Gao, Y., Fan, J., and Davari, A., (2011, ., March). Real-time sign language recognition based on neural network architecture. Joint IEEE International Conference on Industrial Technology & 43rd Southeastern Symposium on System Theory (SSST'11), Auburn, AL, pp. 197-201. [21]. Xilinx (2009). XST User Guide, Xilinx Inc. Retrieved from http://www.xilinx.com/support/documentation/sw_manu Vol 14. Retrieved from http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

15

RESEARCH PAPERS
als/xilinx12_2/xst.pdf. [22]. Tommiska, M.T., (2003, November). Efficient digital implementation of the sigmoid function for reprogrammable logic. IEEE proceedings, Computer Digital Techniques, 150(6). doi: 10.1049/ip-cdt: 20030965.

ABOUT THE AUTHORS


Priyanka Mekala received her M.S. degree in Electrical Engineering from Arizona State University and B.E. degree in Electronics and Communications from Osmania University, India in May 2009 and June 2007, respectively. She started to work on her Ph.D. degree in Electrical Engineering at FIU in fall 2009. She is currently a Ph.D. candidate. Her research interests include Signal Processing, Real-time Image/ Video processing and VLSI design/ testing. She is also a student member of IEEE.

Dr. Fan is currently working as an Assistant Professor in Electrical and Computer Engineering at Florida International University. His research interests include very-large-scaled-integrated (VLSI) circuit simulation, modeling, optimization, bio-electronics, embedded real-time operating systems in application to robotic control, and wireless communications in sensor networks. Prior to his academic career, He served as Vice President of Vivavr Technology, Inc., and General Manager/co-founder of Musica Technologies, Inc. From 1988 to 2002, he held various senior technical positions in California at Western Digital, Emulex Corporation, Adaptec Inc., and Toshiba America. His product line of research and development includes Virtual Reality (VR) 3-D animation, MP3 players, hard drives, fiber channel adapters, SCSI/ATAPI adapters, RAID disk array, PCMCIA cards and laser printer controllers. He received his Ph.D. degree in Electrical Engineering at University of California, Riverside in 2007, and the Master of Science degree in Electrical Engineering from State University of New York at Buffalo in 1987. He also holds Bachelor of Science degree in Electronics Engineering from National Chiao Tung University in Taiwan, R.O.C. He has served as a steering committee member of SSST, a technical program committee member for ICESS, CAMAD, ISQED, ISCAS, and an invited tutorial speaker for ASICON'07. He is a Senior Member of IEEE.

16

i-managers Journal on Electronics Engineering, Vol. 2 l1 l No. September - November 2011

S-ar putea să vă placă și