0 evaluări0% au considerat acest document util (0 voturi)

108 vizualizări9 paginisupport vector machine, decision tree

Sep 24, 2016

© © All Rights Reserved

PDF, TXT sau citiți online pe Scribd

support vector machine, decision tree

© All Rights Reserved

0 evaluări0% au considerat acest document util (0 voturi)

108 vizualizări9 paginisupport vector machine, decision tree

© All Rights Reserved

Sunteți pe pagina 1din 9

1. INTRODUCTION:

A Radial Basis Function Network (RBFN) is a particular type of neural network. Generally,

Artificial Neural Networksare referring to the Multilayer Perceptron (MLP). Each neuron

in an MLP takes the weighted sum of its input values. That is, each input value is multiplied

by a coefficient, and the results are all summed together. A single MLP neuron is a simple

linear classifier, but complex non-linear classifiers can be built by combining these neurons

into a network.

The RBFN approach is more intuitive than the MLP. An RBFN performs classification by

measuring the inputs similarity to examples from the training set. Each RBFN neuron

stores a prototype, which is just one of the examples from the training set. When we want

to classify a new input, each neuron computes the Euclidean distance between the input and

its prototype. Roughly speaking, if the input more closely resembles the class A prototypes

than the class B prototypes, it is classified as class A.

2. RBF NETWORK ARCHITECTURE:

The above illustration shows the typical architecture of an RBF Network. It consists of an

input vector, a layer of RBF neurons, and an output layer with one node per category or

class of data.

2.1 The Input Vector:

The input vector is the n-dimensional vector that you are trying to classify. The entire input

vector is shown to each of the RBF neurons.

Each RBF neuron stores a prototype vector which is just one of the vectors from the

training set. Each RBF neuron compares the input vector to its prototype, and outputs a

value between 0 and 1 which is a measure of similarity. If the input is equal to the

prototype, then the output of that RBF neuron will be 1. As the distance between the input

and prototype grows, the response falls off exponentially towards 0. The shape of the RBF

neurons response is a bell curve, as illustrated in the network architecture diagram.

The neurons response value is also called its activation value.

2.3 The Output Nodes:

The output of the network consists of a set of nodes, one per category that we are trying to

classify. Each output node computes a sort of score for the associated category. Typically, a

classification decision is made by assigning the input to the category with the highest score.

The score is computed by taking a weighted sum of the activation values from every RBF

neuron.

3. RBF NEURON ACTIVATION FUNCTION:

Each RBF neuron computes a measure of the similarity between the input and its prototype

vector (taken from the training set). Input vectors which are more similar to the prototype

return a result closer to 1.

The RBF neuron activation function is typically written as:

1. INTRODUCTION TO LEARNING ALGORITHMS:

KNN, Decision trees, Neural Nets are all supervised learning algorithms.. Their general goal is

to make accurate predictions about unknown data

data after being trained on known data.

Data comes in form of examples with the general forms

form like x1, .. xn are known as features,

inputs or dimensions and y is the

th output or class label.

Both xi and ys can be discrete (taking

(ta

on specific values) {0, 1} or continuous (taking on a

range of values) [0, 1].

In training we are given (x1, ... xn, y) tuples. In testing (classification), we are given only

(x1,...xn) and the goal is to predict y with high accuracy.

Training error is the classification error measured

measure using training data to test.

Testing error is classification error on data not seen in the training phase.

2. K NEAREST NEIGHBOURS:

K nearest neighbors is a simple algorithm that stores all available cases and classifies new

cases based on a similarity measure (e.g., distance functions). KNN has been used in

statistical estimation and pattern recognition already in the beginning of 1970s as a non

nonparametric technique.

I-NN:

Given an unknown point, pick the closest 1 neighbor by some distance measure.

Class of unknown is the 1-nearest

1

neighbor's label.

K-NN:

Given an unknown, pick the k closest neighbors by some distance function.

Class of unknown is the mode of the k-nearest neighbor's labels.

k is usually an odd number to facilitate tie breaking.

3. ALGORITHM:

A case is classified by a majority vote of its neighbors, with the case being assigned to the

class most common amongst its K nearest neighbors measured by a distance function.

If K = 1, then the case is simply assigned to the class of its nearest

nea

neighbor.

It should also be noted that all three distance measures are only valid for continuous

variables. In the instance of categorical variables the Hamming distance must be used.

Choosing the optimal value for K is best done by first inspecting the data. In general, a large

K value is more precise as it reduces the overall noise but there is no guarantee. Crossvalidation is another way to retrospectively determine a good K value by using an

independent dataset to validate the K value.

Example:

Consider the following data concerning credit default. Age and Loan are two numerical

variables (predictors) and Default is the target.

We can now use the training set to classify an unknown case (Age=48 and Loan=$142,000)

using Euclidean distance. If K=1 then the nearest neighbor is the last case in the training set

with Default=Y.

D = Sqrt[(48-33)^2 + (142000-150000)^2] = 8000.01 >> Default=Y

With K=3, there are two Default=Y and one Default=N out of three closest neighbors. The

prediction for the unknown case is again Default=Y.

MAC

(SVM)

1. INTRODUCTION:

Support Vector Machine (SVM) is a classification and regression prediction tool that uses

machine learning theory to maximize

maximize predictive accuracy while automatically avoiding

over-fit

fit to the data. Support Vector machines

machines can be defined as systems which use

hypothesis space of a linear functions in a high dimensional

dimensional feature space, trained with a

learning

ng algorithm from optimization theory that implements a learning bias derived from

statistical learning theory.

SVM was first introduced in 1992, it became popular because of its success in handwritten

digit recognition with 1.1% test error rate for SVM.

2. SVM AND CLASSIFICATION:

It is a discriminative classifier formally defined by a separating hyperplane. In other words,

given labeled training data (supervised learning), the algorithm outputs an optimal

hyperplane which categorizes new examples.

For a linearly

early separable set of 2D-points

2D points which belong to one of two classes, find a

separating straight line.

In the above picture you can see that there exists multiple lines that offer a solution to the

problem. Is any of them better than the others? We can intuitively define a criterion to

estimate the worth of the lines:

A line is bad if it passes too close to the points because it will be noise sensitive and it will

not generalize correctly. Therefore, our goal should be to find the line passing as far as

possible from all points.

Then, the operation of the SVM algorithm is based on finding the hyperplane that gives the

largest minimum distance to the training examples. Twice, this distance receives the

important name of margin within SVMs theory. Therefore,

Therefore, the optimal separating

hyperplane maximizes the margin of the training data.

Boundaries

The decision boundary should be as far away from the data of both classes as possible.

Distance between

ween the origin and the line wt x=k is k/||w||

Let {x1, ..., xn} be our data set and let yi{1,-1} be the class label of xi.The decision

boundary should classify all points correctly

The decision boundary can be found by solving the following constrained optimization

problem

DECISION TREES

1. INTRODUCTION:

A decision tree is a simple representation for classifying examples. Decision tree learning is

one of the most successful techniques for supervised classification learning. Each element of

the domain of the classification is called a class.

A decision tree or a classification tree is a tree in which each internal (non-leaf)

leaf) node is

labeled with an input feature. The arcs coming from a node labeled with a feature are

labeled with each of the possible values of the feature. Each leaf of the tree is labeled with a

class or a probability distribution over the classes.

To classify an example, filterr it down the tree, as follows:

Example:

Above figure shows two possible decision trees.

trees Each decision tree can be used to classify

according to the user's action. To classify a new example using the tree on the left, first

determine the length. If it is long, predict skips. Otherwise, check the thread. If the thread is

new, predict reads. Otherwise, check the author and predict read only if the author is

known.

The tree on the right makes probabilistic predictions

predictions when the length is short. In this case, it

predicts reads with probability 0.82 and so skips with probability 0.18.

A deterministic decision tree, in which all of the leaves are classes, can be mapped into a set

of rules, with each leaf of the tree corresponding

corresponding to a rule. The example has the

classification at the leaf if all of the conditions on the path from the root to the leaf are true.

2. ISSUES IN LEARNING DECISION TREES:

is hard to obtain, it might be possible to

extrapolate or use unknown.

If the data set is too large, one might use bagging to select a sample from the training set.

Or, one can use boosting to assign a weight showing importance to each instance. Or,

one can divide the sample set into subsets and train on one, and test on others.

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.