Sunteți pe pagina 1din 16

BEIJING JIAOTONG

UNIVERSITY

Deep Learning
Submitted To:
Associate Prof. Yidong Li

Submitted By:
Name: Muhammad Waqas Moin Sheikh
Student Id: 15129145
University: Beijing Jiaotong University
Department: Computer Application Technology
1

Computer Application Technology

15129145

Table of Contents
1.

Abstract ........................................................................................................................................... 3

2.

Introduction .................................................................................................................................... 3

3.

Quick Overview of Deep Learning Algorithm.................................................................................. 4

4.

Types of Famous Learning Algorithms ............................................................................................ 4

5.

Algorithms Grouped by Learning Style ........................................................................................... 5

I.

Supervised Learning ........................................................................................................................ 5

II.

Unsupervised Learning.................................................................................................................... 5

III.

Semi-Supervised Learning ........................................................................................................... 6

6.

Algorithms Grouped by Similarity ................................................................................................... 6

I.

Clustering Algorithms...................................................................................................................... 6

II.

Artificial Neural Network Algorithm ............................................................................................... 7

III.

Deep Learning Algorithm ............................................................................................................ 8

7.

Auto- Encoder ................................................................................................................................. 8

8.

Restricted Boltzmann Machine (RBM) ............................................................................................ 9

9.

Convolutional Neural Network (CNN) ........................................................................................... 10

10.

Deep Architectures ................................................................................................................... 12

11.

Sparsity in Deep learning .......................................................................................................... 13

12.

Visualization in Deep learning................................................................................................... 14

13.

Deep Learning Applications ...................................................................................................... 16

14.

Deep Learning in Big Data Analytics ......................................................................................... 16

15.

Conclusion ................................................................................................................................. 16

BJTU

Computer Application Technology

15129145

1. Abstract
Deep learning research aims at discovering learning algorithms that discover
multiple levels of distributed representations, with higher levels representing more
abstract concepts. Although the study of deep learning has already led to impressive
theoretical results, learning algorithms and breakthrough experiments, several
challenges lie ahead. This report describes the some of the key features and basic
concepts behind deep learning algorithms.

2. Introduction
Deep learning uses neural networks (DNNs) many layers deep and large datasets to
teach computers how to solve perceptual problems, such as detecting recognizable
concepts in data, translating or understanding natural languages, interpreting
information from input data, and more. Deep learning is used in the research
community and in industry to help solve many big data problems such as computer
vision, speech recognition and natural language processing. Practical examples
include vehicle, pedestrian and landmark identification for driver assistance; image
recognition; speech recognition; natural language processing; neural machine
translation and cancer detection.
Deep learning is the fastest growing area of machine learning. Deep learning uses
convolutional neural networks to learn many levels of abstraction. The levels of
abstractions range from simple concepts to complex, the more complex require
more layers in your network. Each layer categorizes some kind of information,
refines it and passes it along to the next. These many layers are what put the deep
into deep learning.
Deep learning enables a machine to use this process to build a hierarchical
representation. The first layer might look for simple edges. The next might look for
collections of edges that form simple shapes like rectangles, or circles. The third
might identify features like eyes and noses. After five or six layers, the neural
network can put these features together. The result: a machine that can recognize
faces.
GPUs are ideal for training neural networks; a process that could otherwise take
months now just takes weeks or days. Thats because GPUs perform many
calculations at onceor in parallel. And once a system is trained with GPUs,
scientists and researchers can put that learning to work.
That work involves tasks once thought impossible. Speech recognition is one
application. So is real-time voice translation from one language to another. Other
researchers are building systems that analyse the sentiment in social media
conversations.

BJTU

Computer Application Technology

15129145

3. Quick Overview of Deep Learning Algorithm.


The central concept behind all deep learning methodology is the automated
discovery of abstraction, with the belief that more abstract representations of data
such as images, video and audio signals tend to be more useful: they represent the
semantic content of the data, divorced from the low-level features of the raw data
(e.g., pixels, voxels, or waveforms). Deep architectures lead to abstract
representations because more abstract concepts can often be constructed in terms
of less abstract ones. Deep learning algorithms are special cases of representation
learning with the property that they learn multiple levels of representation. Deep
learning algorithms often employ shallow (single-layer) representation learning
algorithms as subroutines.

4. Types of Famous Learning Algorithms


There are two ways to think about and categorize the algorithms you may come
across in the field.
o The first is a grouping of algorithms by the learning style.
o The second is a grouping of algorithms by similarity in form or function (like
grouping similar animals together).
Both approaches are useful, but we will focus in on the grouping of algorithms by
similarity and go on a tour of a variety of different algorithm types.

A cool example of an ensemble of lines of best fit. Weak members are grey, the
combined prediction is red.

BJTU

Computer Application Technology

15129145

5. Algorithms Grouped by Learning Style


I.

Supervised Learning

o Input data is called training data and has a known label or result such as
spam/not-spam or a stock price at a time.
o A model is prepared through a training process where it is required to make
predictions and is corrected when those predictions are wrong. The training
process continues until the model achieves a desired level of accuracy on the
training data.
o Example problems are classification and regression.
o Example algorithms include Logistic Regression and the Back Propagation Neural
Network.

II.

Unsupervised Learning

o Input data is not labelled and does not have a known result.
o A model is prepared by deducing structures present in the input data. This may
be to extract general rules. It may through a mathematical process to
systematically reduce redundancy, or it may be to organize data by similarity.
o Example problems are clustering, dimensionality reduction and association rule
learning.
o Example algorithms include: the Apriority algorithm and k-Means.

BJTU

Computer Application Technology

III.

15129145

Semi-Supervised Learning

o Input data is a mixture of labelled and unlabelled examples.


o There is a desired prediction problem but the model must learn the structures to
organize the data as well as make predictions.
o Example problems are classification and regression.
o Example algorithms are extensions to other flexible methods that make
assumptions about how to model the unlabelled data.

6. Algorithms Grouped by Similarity


I.

Clustering Algorithms

o Clustering, like regression describes the class of problem and the class of
methods.
o Clustering methods are typically organized by the modelling approaches such as
centroid-based and hierarchal. All methods are concerned with using the
inherent structures in the data to best organize the data into groups of maximum
commonality.
o The most popular clustering algorithms are:

BJTU

Computer Application Technology

15129145

a. k-Means
b. k-Medians
c. Expectation Maximisation (EM)
d. Hierarchical Clustering

II.

Artificial Neural Network Algorithm

o Artificial Neural Networks are models that are inspired by the structure and/or
function of biological neural networks.
o They are a class of pattern matching that are commonly used for regression and
classification problems but are really an enormous subfield comprised of
hundreds of algorithms and variations for all manner of problem types.
o Note that I have separated out Deep Learning from neural networks because of
the massive growth and popularity in the field. Here we are concerned with the
more classical methods.
o The most popular artificial neural network algorithms are:
a. Perceptron
b. Back-Propagation
c. Hopfield Network
d. Radial Basis Function Network (RBFN)

BJTU

Computer Application Technology

III.

15129145

Deep Learning Algorithm

o Deep Learning methods are a modern update to Artificial Neural Networks that
exploit abundant cheap computation.
o They are concerned with building much larger and more complex neural
networks, and as commented above, many methods are concerned with semisupervised learning problems where large datasets contain very little labelled
data.
o Some of the reasons to use deep learning are.
i. Performs far better than its predecessors
ii.
Simple to construct
iii. Allows abstraction to develop naturally
iv. Help the network to initialize with good parameters
v. Allows refining of the features so that they become more relevant to the
task
vi. Trades space for time: More layers but less hardware
o The most popular deep learning algorithms are:
a. Deep Belief Networks (DBN)
b. Deep/Restricted Boltzmann Machine (RBM)
c. Convolutional Neural Network (CNN)
d. Stacked Auto-Encoders

7. Auto- Encoder
Auto encoder is a simple 3-layer neural network where output units are directly
connected back to input units. E.g. in a network like this:

BJTU

Computer Application Technology

15129145

Output [i] has edge back to input[i] for every i. typically, number of hidden units is
much less then number of visible (input/output) ones. As a result, when you pass
data through such a network, it first compresses (encodes) input vector to "fit" in a
smaller representation, and then tries to reconstruct (decode) it back. The task of
training is to minimize an error or reconstruction, i.e. find the most efficient compact
representation (encoding) for input data.
Auto encoders are a family of unsupervised neural networks. There are quite a lot of
them, e.g. deep auto encoders or those having different regularisation tricks
attached--e.g. de-noising, contractive, and sparse. There even exist probabilistic
ones, such as generative stochastic networks or the variational auto encoder. Their
most abstract form is

but we will go along with a much simpler one for now

Where is a nonlinear function such as the logistic sigmoid

For natural image data, regularized auto encoders and sparse coding tend to yield
very similar W. However, auto encoders are much more efficient and are easily
generalized to much more complicated models. E.g. the decoder can be highly
nonlinear, e.g. a deep neural network. Furthermore, one is not tied to the squared
loss (on which the estimation of W for Lsc depends.)
Also, the different methods of regularisation yield representations with different
characteristic. De-noising auto encoders have also been shown to be equivalent to a
certain form of RBMs etc.

8. Restricted Boltzmann Machine (RBM)


RBM shares similar idea, but uses stochastic approach. Instead of deterministic (e.g.
logistic or ReLU) it uses stochastic units with particular (usually binary of Gaussian)
distribution. Learning procedure consists of several steps of Gibbs sampling

BJTU

Computer Application Technology

15129145

(propagate: sample hidden given visible; reconstruct: sample visible given hidden;
repeat) and adjusting the weights to minimize reconstruction error.

Intuition behind RBMs is that there are some visible random variables (e.g. film
reviews from different users) and some hidden variables (like film genres or other
internal features), and the task of training is to find out how these two sets of
variables are actually connected to each other.

9. Convolutional Neural Network (CNN)


Convolutional Neural Networks are somewhat similar to these two, but instead of
learning single global weight matrix between two layers, they aim to find a set of
locally connected neurons. CNNs are mostly used in image recognition. Their name
comes from "convolution" operator or simply "filter". In short, filters are an easy way
to perform complex operation by means of simple change of a convolution kernel.
Apply Gaussian blur kernel and you'll get it smoothed. Apply Canny kernel and you'll
see all edges. Apply Gabor kernel to get gradient features.

The goal of convolutional neural networks is not to use one of predefined kernels,
but instead to learn data-specific kernels. The idea is the same as with auto encoders
BJTU

10

Computer Application Technology

15129145

or RBMs - translate many low-level features (e.g. user reviews or image pixels) to the
compressed high-level representation (e.g. film genres or edges) - but now weights
are learned only from neurons that are spatially close to each other.

All three models have their use cases, pros and cons, but probably the most
important properties are:
o Auto encoders are simplest ones. They are intuitively understandable; easy to
implement and to reason about (e.g. it's much easier to find good metaparameters for them than for RBMs).
o RBMs are generative. That is, unlike auto encoders that only discriminate
some data vectors in favour of others, RBMs can also generate new data with
given joined distribution. They are also considered more feature-rich and
flexible.
o CNNs are very specific model that is mostly used for very specific task
(though pretty popular task). Most of the top-level algorithms in image
recognition are somehow based on CNNs today, but outside that niche they
are hardly applicable (e.g. what's the reason to use convolution for film
review analysis?).

BJTU

11

Computer Application Technology

10.

15129145

Deep Architectures

So, if we already had


PCA, why the hell did
we come up with auto
encoders and RBMs?
It turns out that PCA
only allows linear
transformation of a
data vectors. That is,
having m principal
components c1..cm,
you can represent only
vectors x=mi=1wici.
This is pretty good
already,
but
not
always enough. No
matter, how many
times you will apply
PCA to a data relationship
will
always stay linear.
Auto encoders and
RBMs, on other hand,
are non-linear by the
nature, and thus, they
can
learn
more
complicated relations
between visible and
hidden
units.
Moreover, they can be
stacked, which makes
them
even
more
powerful. E.g. you
train RBM with n
visible and m hidden units, and then you put another RBM with m visible and k
hidden units on top of the first one and train it too, etc. and exactly the same way
with auto encoders.
But you don't just add new layers. On each layer you try to learn best possible
representation for a data from the previous one:

BJTU

12

Computer Application Technology

15129145

On the image above there's an example of such a deep network. We start with
ordinary pixels, proceed with simple filters, then with face elements and finally end
up with entire faces! This is the essence of deep learning.
Now note that at this example we worked with image data and sequentially took
larger and larger areas of spatially close pixels. Doesn't it sound similar? Yes, because
it's an example of deep convolutional network. Be it based on auto encoders or
RBMs, it uses convolution to stress importance of locality. That's why CNNs are
somewhat distinct from auto encoders and RBMs.

11.

Sparsity in Deep learning

The number of hidden units in an ANN (Artificial Neural Network) can vary. But even
when the number of hidden units is large, we can impose a sparsity constraint on the
hidden units. We can set a constraint such that the average activation of each
hidden neuron to be close to 0. This constraint makes most of the neurons inactive
most of the time. This technique of putting constraints on the auto encoder such
that only a few of the links are activated at any given time is called sparsity. Sparse
coding minimizes the objective.

Where W is a matrix of bases, H is a matrix of codes and X is a matrix of the data we


wish to represent. implements a trade of between sparsity and reconstruction.
Note that if we are given H, estimation of W is easy via least squares.
In the beginning, we do not have H however. Yet, many algorithms exist that can
solve the objective above with respect to H. Actually, this is how we do inference: we
need to solve an optimisation problem if we want to know the h belonging to an
unseen x.

BJTU

13

Computer Application Technology

12.

15129145

Visualization in Deep learning

Let us take an example of an image processor.

Figure 1: Sample Image

Given a picture, the auto encoder selects a grid of pixels to encode. Let the grid be a
640 x 480 grid. Each pixel is used as an input xi. The auto encoder tries to output the
pixels so that they look similar to the original pixels. For doing this, the auto encoder
tries to learn the features of the image in each layer.

BJTU

14

Computer Application Technology

15129145

Figure 2: Image in the selected grid with whitewashing.

On passing the image grid through the auto encoder, the output of the auto encoder
is similar to the image shown below:

Figure 3: Output of the auto encoder for image grid


Each square in the figure shows the input image that maximally actives one of the
many hidden units. We see that the different hidden units have learned to detect
edges at different positions and orientations in the image.

BJTU

15

Computer Application Technology

13.

Deep Learning Applications


o
o
o
o
o
o
o

14.

15129145

Image Processing
Computer Vision
Automated systems
Natural Language Processings
Sound Processing
Tactile Recognition
Data Processing

Deep Learning in Big Data Analytics

In some Big Data domains, the input corpus consists of a mix of both labelled and
unlabelled data, e.g., cyber security [59], fraud detection [60], and computer vision
[45]. In such cases, Deep Learning algorithms can incorporate semi-supervised
training methods towards the goal of defining criteria for good data representation
learning. For example, following learning representations and patterns from the
unlabelled/unsupervised data, the available labelled/supervised data can be
exploited to further tune and improve the learnt representations and patterns for a
specific analytics task, including semantic indexing or discriminative modelling. A
variation of semi-supervised learning in data mining, active learning methods could
also be applicable towards obtaining improved data representations where input
from crowdsourcing or human experts can be used to obtain labels for some data
samples which can then be used to better tune and improve the learnt data
representations.

15.

Conclusion

In contrast to more conventional machine learning and feature engineering


algorithms, Deep Learning has an advantage of potentially providing a solution to
address the data analysis and learning problems found in massive volumes of input
data. More specifically, it aids in automatically extracting complex data
representations from large volumes of unsupervised data. This makes it a valuable
tool for Big Data Analytics, which involves data analysis from very large collections of
raw data that is generally unsupervised and un-categorized. The hierarchical learning
and extraction of different levels of complex, data abstractions in Deep Learning
provides a certain degree of simplification for Big Data Analytics tasks, especially for
analysing massive volumes of data, semantic indexing, data tagging, information
retrieval, and discriminative tasks such a classification and prediction.

BJTU

16

S-ar putea să vă placă și