Documente Academic
Documente Profesional
Documente Cultură
UNIVERSITY
Deep Learning
Submitted To:
Associate Prof. Yidong Li
Submitted By:
Name: Muhammad Waqas Moin Sheikh
Student Id: 15129145
University: Beijing Jiaotong University
Department: Computer Application Technology
1
15129145
Table of Contents
1.
Abstract ........................................................................................................................................... 3
2.
Introduction .................................................................................................................................... 3
3.
4.
5.
I.
II.
Unsupervised Learning.................................................................................................................... 5
III.
6.
I.
Clustering Algorithms...................................................................................................................... 6
II.
III.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Conclusion ................................................................................................................................. 16
BJTU
15129145
1. Abstract
Deep learning research aims at discovering learning algorithms that discover
multiple levels of distributed representations, with higher levels representing more
abstract concepts. Although the study of deep learning has already led to impressive
theoretical results, learning algorithms and breakthrough experiments, several
challenges lie ahead. This report describes the some of the key features and basic
concepts behind deep learning algorithms.
2. Introduction
Deep learning uses neural networks (DNNs) many layers deep and large datasets to
teach computers how to solve perceptual problems, such as detecting recognizable
concepts in data, translating or understanding natural languages, interpreting
information from input data, and more. Deep learning is used in the research
community and in industry to help solve many big data problems such as computer
vision, speech recognition and natural language processing. Practical examples
include vehicle, pedestrian and landmark identification for driver assistance; image
recognition; speech recognition; natural language processing; neural machine
translation and cancer detection.
Deep learning is the fastest growing area of machine learning. Deep learning uses
convolutional neural networks to learn many levels of abstraction. The levels of
abstractions range from simple concepts to complex, the more complex require
more layers in your network. Each layer categorizes some kind of information,
refines it and passes it along to the next. These many layers are what put the deep
into deep learning.
Deep learning enables a machine to use this process to build a hierarchical
representation. The first layer might look for simple edges. The next might look for
collections of edges that form simple shapes like rectangles, or circles. The third
might identify features like eyes and noses. After five or six layers, the neural
network can put these features together. The result: a machine that can recognize
faces.
GPUs are ideal for training neural networks; a process that could otherwise take
months now just takes weeks or days. Thats because GPUs perform many
calculations at onceor in parallel. And once a system is trained with GPUs,
scientists and researchers can put that learning to work.
That work involves tasks once thought impossible. Speech recognition is one
application. So is real-time voice translation from one language to another. Other
researchers are building systems that analyse the sentiment in social media
conversations.
BJTU
15129145
A cool example of an ensemble of lines of best fit. Weak members are grey, the
combined prediction is red.
BJTU
15129145
Supervised Learning
o Input data is called training data and has a known label or result such as
spam/not-spam or a stock price at a time.
o A model is prepared through a training process where it is required to make
predictions and is corrected when those predictions are wrong. The training
process continues until the model achieves a desired level of accuracy on the
training data.
o Example problems are classification and regression.
o Example algorithms include Logistic Regression and the Back Propagation Neural
Network.
II.
Unsupervised Learning
o Input data is not labelled and does not have a known result.
o A model is prepared by deducing structures present in the input data. This may
be to extract general rules. It may through a mathematical process to
systematically reduce redundancy, or it may be to organize data by similarity.
o Example problems are clustering, dimensionality reduction and association rule
learning.
o Example algorithms include: the Apriority algorithm and k-Means.
BJTU
III.
15129145
Semi-Supervised Learning
Clustering Algorithms
o Clustering, like regression describes the class of problem and the class of
methods.
o Clustering methods are typically organized by the modelling approaches such as
centroid-based and hierarchal. All methods are concerned with using the
inherent structures in the data to best organize the data into groups of maximum
commonality.
o The most popular clustering algorithms are:
BJTU
15129145
a. k-Means
b. k-Medians
c. Expectation Maximisation (EM)
d. Hierarchical Clustering
II.
o Artificial Neural Networks are models that are inspired by the structure and/or
function of biological neural networks.
o They are a class of pattern matching that are commonly used for regression and
classification problems but are really an enormous subfield comprised of
hundreds of algorithms and variations for all manner of problem types.
o Note that I have separated out Deep Learning from neural networks because of
the massive growth and popularity in the field. Here we are concerned with the
more classical methods.
o The most popular artificial neural network algorithms are:
a. Perceptron
b. Back-Propagation
c. Hopfield Network
d. Radial Basis Function Network (RBFN)
BJTU
III.
15129145
o Deep Learning methods are a modern update to Artificial Neural Networks that
exploit abundant cheap computation.
o They are concerned with building much larger and more complex neural
networks, and as commented above, many methods are concerned with semisupervised learning problems where large datasets contain very little labelled
data.
o Some of the reasons to use deep learning are.
i. Performs far better than its predecessors
ii.
Simple to construct
iii. Allows abstraction to develop naturally
iv. Help the network to initialize with good parameters
v. Allows refining of the features so that they become more relevant to the
task
vi. Trades space for time: More layers but less hardware
o The most popular deep learning algorithms are:
a. Deep Belief Networks (DBN)
b. Deep/Restricted Boltzmann Machine (RBM)
c. Convolutional Neural Network (CNN)
d. Stacked Auto-Encoders
7. Auto- Encoder
Auto encoder is a simple 3-layer neural network where output units are directly
connected back to input units. E.g. in a network like this:
BJTU
15129145
Output [i] has edge back to input[i] for every i. typically, number of hidden units is
much less then number of visible (input/output) ones. As a result, when you pass
data through such a network, it first compresses (encodes) input vector to "fit" in a
smaller representation, and then tries to reconstruct (decode) it back. The task of
training is to minimize an error or reconstruction, i.e. find the most efficient compact
representation (encoding) for input data.
Auto encoders are a family of unsupervised neural networks. There are quite a lot of
them, e.g. deep auto encoders or those having different regularisation tricks
attached--e.g. de-noising, contractive, and sparse. There even exist probabilistic
ones, such as generative stochastic networks or the variational auto encoder. Their
most abstract form is
For natural image data, regularized auto encoders and sparse coding tend to yield
very similar W. However, auto encoders are much more efficient and are easily
generalized to much more complicated models. E.g. the decoder can be highly
nonlinear, e.g. a deep neural network. Furthermore, one is not tied to the squared
loss (on which the estimation of W for Lsc depends.)
Also, the different methods of regularisation yield representations with different
characteristic. De-noising auto encoders have also been shown to be equivalent to a
certain form of RBMs etc.
BJTU
15129145
(propagate: sample hidden given visible; reconstruct: sample visible given hidden;
repeat) and adjusting the weights to minimize reconstruction error.
Intuition behind RBMs is that there are some visible random variables (e.g. film
reviews from different users) and some hidden variables (like film genres or other
internal features), and the task of training is to find out how these two sets of
variables are actually connected to each other.
The goal of convolutional neural networks is not to use one of predefined kernels,
but instead to learn data-specific kernels. The idea is the same as with auto encoders
BJTU
10
15129145
or RBMs - translate many low-level features (e.g. user reviews or image pixels) to the
compressed high-level representation (e.g. film genres or edges) - but now weights
are learned only from neurons that are spatially close to each other.
All three models have their use cases, pros and cons, but probably the most
important properties are:
o Auto encoders are simplest ones. They are intuitively understandable; easy to
implement and to reason about (e.g. it's much easier to find good metaparameters for them than for RBMs).
o RBMs are generative. That is, unlike auto encoders that only discriminate
some data vectors in favour of others, RBMs can also generate new data with
given joined distribution. They are also considered more feature-rich and
flexible.
o CNNs are very specific model that is mostly used for very specific task
(though pretty popular task). Most of the top-level algorithms in image
recognition are somehow based on CNNs today, but outside that niche they
are hardly applicable (e.g. what's the reason to use convolution for film
review analysis?).
BJTU
11
10.
15129145
Deep Architectures
BJTU
12
15129145
On the image above there's an example of such a deep network. We start with
ordinary pixels, proceed with simple filters, then with face elements and finally end
up with entire faces! This is the essence of deep learning.
Now note that at this example we worked with image data and sequentially took
larger and larger areas of spatially close pixels. Doesn't it sound similar? Yes, because
it's an example of deep convolutional network. Be it based on auto encoders or
RBMs, it uses convolution to stress importance of locality. That's why CNNs are
somewhat distinct from auto encoders and RBMs.
11.
The number of hidden units in an ANN (Artificial Neural Network) can vary. But even
when the number of hidden units is large, we can impose a sparsity constraint on the
hidden units. We can set a constraint such that the average activation of each
hidden neuron to be close to 0. This constraint makes most of the neurons inactive
most of the time. This technique of putting constraints on the auto encoder such
that only a few of the links are activated at any given time is called sparsity. Sparse
coding minimizes the objective.
BJTU
13
12.
15129145
Given a picture, the auto encoder selects a grid of pixels to encode. Let the grid be a
640 x 480 grid. Each pixel is used as an input xi. The auto encoder tries to output the
pixels so that they look similar to the original pixels. For doing this, the auto encoder
tries to learn the features of the image in each layer.
BJTU
14
15129145
On passing the image grid through the auto encoder, the output of the auto encoder
is similar to the image shown below:
BJTU
15
13.
14.
15129145
Image Processing
Computer Vision
Automated systems
Natural Language Processings
Sound Processing
Tactile Recognition
Data Processing
In some Big Data domains, the input corpus consists of a mix of both labelled and
unlabelled data, e.g., cyber security [59], fraud detection [60], and computer vision
[45]. In such cases, Deep Learning algorithms can incorporate semi-supervised
training methods towards the goal of defining criteria for good data representation
learning. For example, following learning representations and patterns from the
unlabelled/unsupervised data, the available labelled/supervised data can be
exploited to further tune and improve the learnt representations and patterns for a
specific analytics task, including semantic indexing or discriminative modelling. A
variation of semi-supervised learning in data mining, active learning methods could
also be applicable towards obtaining improved data representations where input
from crowdsourcing or human experts can be used to obtain labels for some data
samples which can then be used to better tune and improve the learnt data
representations.
15.
Conclusion
BJTU
16