Sunteți pe pagina 1din 11

Deep Learning Algorithms: Sparse AutoEncoders

Gabriel Broadwin Nongsiej (13CS60R02) Under the guidance of Prof. Sudeshna Sarkar

1. Introduction Supervised learning is one of the most powerful tools of AI, and has led to a number of innovative applications over the years. Despite its significant successes, supervised learning today is still severely limited. Specifically, most applications of it still require that we manually specify the input features x given to the algorithm. Once a good feature representation is given, a supervised learning algorithm can do well. But these features are not easy to detect and formulate. And the difficult feature engineering work does not scale well to new problems. This seminar report describes the sparse autoencoder learning algorithm, which is one approach to automatically learn features from unlabeled data. It has been found that these features produced by a sparse autoencoder do surprisingly well and are competitive and sometimes superior to the best hand-engineered features. 2. Artificial Neural Networks (ANN) Consider a supervised learning problem where we have access to labeled training examples (x (i), y
(i)

). Neural networks give a way of defining a complex, non-linear form of

hypotheses hW,b(x), with parameters W, b that we can fit to our data. An artificial neural network (ANN) comprises of one or more neurons. A neuron is a computational unit that takes some set of inputs, calculates a linear combination of these inputs and outputs hW,b(x) = f (WT, x) = f(3i=1Wixi + b), where f : R R is called the activation function.

Figure 1 : Simple Neuron

A neural network is put together by hooking together many of our simple neurons, so that the output of a neuron can be the input of another. For example, here is a small neural network:

Figure 2 : Artificial Neural Network

In this figure, we have used circles to also denote the inputs to the network. The circles labelled +1 are called bias units, and correspond to the intercept term. The leftmost layer of the network is called the input layer, and the rightmost layer the output layer (which, in this example, has only one node). The middle layer of nodes is called the hidden layer, because its values are not observed in the training set. We also say that our example neural network has 3 input units (not counting the bias unit), 3 hidden units, and 1 output unit. ANNs may contain one or more hidden layers between the input layer and the output layer. The most common choice is an nl -layered network where layer l is the input layer, layer nl

is the output layer, and each layer l is densely connected to layer l + 1. This is one example of a feedforward neural network, since the connectivity graph does not have any directed loops or cycles. Neural networks can also have multiple output units.

Figure 3 : Feedforward Network

The parameters of any ANN are given below: The interconnection pattern between the different layers of neurons The learning process for updating the weights of the interconnections

The activation function that converts a neuron's weighted input to its output activation.
Suppose we have a fixed training set {(x (1), y (1)), ..., (x (m), y (m))} of m training examples. We can train our neural network using batch gradient descent. In detail, for a single training example (x, y), we define the cost function with respect to that single example to be 3. BackPropagation Algorithm: Some input and output patterns can be easily learned by single-layer neural networks (i.e. perceptrons). However, these single-layer perceptrons cannot learn some relatively simple patterns, such as those that are not linearly separable. A single-layer neural network however, must learn a function that outputs a label solely using the features of the data. There is no way .

for it to learn any abstract features of the input since it is limited to having only one layer. A multi-layered network overcomes this limitation as it can create internal representations and learn different features in each layer. Each higher layer learns more and more abstract features that can be used to describe the data. Each layer finds patterns in the layer below it and it is this ability to create internal representations that are independent of outside input that gives multilayered networks its power. The goal and motivation for developing the backpropagation algorithm is to find a way to train multi-layered neural networks such that it can learn the appropriate internal representations to allow it to learn any arbitrary mapping of input to output. Algorithm for a 3-layer network (only one hidden layer): initialize network weights (often small random values) do for each training example x run a feedforward pass to predict what the ANN will output (activations) compute error (prediction - actual) at the output units compute wh(l) for all weights from hidden layer to output layer compute wi(l) for all weights from input layer to hidden layer update network weights until stopping criterion satisfied return the network The BackPropagation algorithm however had many problems which caused it to give sub-optimal results. Some of these problems are: Gradient progressively becomes diluted Gets stuck in local minima In usual settings, we can use only labelled data

4. Deep Learning: Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations. The underlying assumption is that observed data is generated by the interactions of many different factors on different levels. Some of the reasons to use deep learning are: Performs far better than its predecessors Simple to construct Allows abstraction to develop naturally Help the network to initialize with good parameters Allows refining of the features so that they become more relevant to the task Trades space for time: More layers but less hardware 5. AutoEncoders: An autoencoder is an artificial neural network used for learning efficient codings. The aim of an auto-encoder is to learn a compressed, distributed representation (encoding) for a set of data. This means it is being used for dimensionality reduction. An autoencoder is trained to encode the input in some representation so that the input can be reconstructed from that representation. The target output is the input itself. The autoencoder tries to learn a function hW,b (x) x. In other words, it is trying to learn an approximation to the identity function, so as to output y that is similar to x (See Fig. 4). The first hidden layer is trained to replicate the input. After the error has been reduced to an acceptable range, the next layer can be introduced. The output of the first hidden layer is treated as the input for the next hidden layer. So we are always training one hidden layer keeping all previous layers intact and hidden. For each layer, we try to minimize a cost function so that the output does not deviate too much from the input. Here, hW,b (x) = f (WTx) where f : R R is the activation function. f can be a sigmoid function or tanh function.

Figure 4 : AutoEncoder

6. Sparsity: The number of hidden units in an ANN can vary. But even when the number of hidden units is large, we can impose a sparsity constraint on the hidden units. We can set a constraint such that the average activation of each hidden neuron to be close to 0. This constraint makes most of the neurons inactive most of the time. This technique of putting constraints on the autoencoder such that only a few of the links are activated at any given time is called sparsity. We introduce a sparsity parameter defined as a threshold parameter which decides the activation of the neuron. We initialize close to 0. Let aj (l )(x) denote the activation of the hidden unit j when the network is given a specific input x. Let * ( )+

be the average activation of hidden unit (averaged over the training set).

A penalty will be used to penalize those hidden units that deviate from this sparsity parameter . Penalty term that can be used is the Kullback Leibler Divergence (KL). | )

So our penalty term in the equations becomes

Our overall cost function becomes: Our derivative calculation changes from ( to (( ) ( )) ( ) ) ( ) | )

We will need to know to compute this term. We run a forward pass on all the training examples first to compute the average activations on the training set, before computing backpropagation on any example. A sparse representation uses more features but at any given time a significant number of the features will have a value close to 0. This leads to more localized features where a particular node (or small group of nodes) with value 1 signifies the presence of a feature. A sparse autoencoder uses more hidden nodes. It can learn from corrupted training instances, decoding only the uncorrupted instances and learning conditional dependencies.

We have shown how to train one layer of the autoencoder. Using the output of the first hidden layer as input, we can add another hidden layer after it. This addition of layers can be done many times to create a deep network. We can then train this layer in a similar fashion. Supervised training (backpropagation) is performed on the last layer using final features, followed by supervised training on the entire network to fine-tune all weights. 7. Visualization: Let us take an example of an image processor.

Figure 5 : Sample Image

Given a picture, the autoencoder selects a grid of pixels to encode. Let the grid be a 640 x 480 grid. Each pixel is used as an input xi. The autoencoder tries to output the pixels so that they look similar to the original pixels. For doing this, the autoencoder tries to learn the features of the image in each layer.

Figure 6 : Image in the selected grid with whitewashing.

On passing the image grid through the autoencoder, the output of the autoencoder is similar to the image shown below:

Figure 7 : Output of the autoencoder for image grid

Each square in the figure shows the input image that maximally actives one of the many hidden units. We see that the different hidden units have learned to detect edges at different positions and orientations in the image.

8. Applications: Image Processing Computer Vision Automated systems Natural Language Processing Sound Processing Tactile Recognition Data Processing

References Deep Learning and Unsupervised Feature Learning Winter 2011, Stanford University, Stanford, California 94305. https://www.stanford.edu/class/cs294a/ Stanfords Unsupervised Feature and Deep Learning tutorial, Stanford University, Stanford, California 94305. http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial Nando de Freitas, Deep learning with autoencoders, Department of Computer Science, University of British Columbia, Vancouver, Canada (March 2013, Lecture for CPSC540). Tom M. Mitchell, Machine Learning, Tom M. Mitchell, McGraw-Hill Publications, March 1997, ISBN: 0070428077. Honglak Lee, Tutorial on Deep Learning and Applications, NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning, University of Michigan, USA. Itamar Arel, Derek C. Rose, and Thomas P. Karnowski, Deep Machine Learning A New Frontier in Artificial Intelligence Research, The University of Tennessee, USA, IEEE Computational Intelligence Magazine, November 2010. Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, Vol. 2, No. 1 (2009).

S-ar putea să vă placă și