Sunteți pe pagina 1din 9

AI-Unit5

Machine Learning I
- Bayes networks = reason with knows models
- Machine learning = learn models from data
Supervised Learning - Unsupervised Learning
Quiz
What companies are famous for using machine learning?
(*) Google - WEB Mining
(*) Netflix - DVD Recommendations
(*) Amazon - Product Placement
( ) None of the above
Machine Learning
- What?
Parameters
Structure
Hidden Concepts
- What from?
Supervised
Unsupervised
Reinforcement
- What for?
Prediction
Diagnostics
Summarization
- How?
Passive
Active
Online
Offline
-Outputs?
Classification
Regression
-Details
Generative
Discriminative
Supervised learning
x1
x2
x3
Data
x11
x21
.
.
.
xm1

x12
x22
.
.
.
xm2

x13
x23
.
.
.
xm3

...

xn

...
...

x1n -> y1
x2n -> y2
.
.
.
xmn
-> ym

...

-> y

f(xm) = ym
f(x) = y
Quiz
Which function is preferrable
(*) a
( ) b
( ) Neither
Occam's razor (Okham's Razor)
Everithing else being equal,
choose the less complex hypothesis
Fit <------------->Low Complexity
Pgina 1

AI-Unit5
Generalization error
Overfitting error
Training data error
Complexity -->
Overfitting <-"Dear Sir.
First, I must solicit your confidence in
this transaction, this is by virtue of its
nature as being utterly confidential and
top secret ..."
"To be removed from future
mailings, simply reply to this
message and put "remove" in
the subject.
99 million emailadresses for
only $99"
"Ok, I know this is blatanlty OT but I'm
beginning to go insane. Had an old Deli
Dimension XPS sitting in the corner and
decided to put it to use, I know it was
working pre being stuck in the corner.
but when I plugged it in, hit the power
nothing happened."
Spam Detection
|---Spam
E-mail------|
|---Ham
(
(
(
(

)
)
)
)

Is Spam
*

Is Ham
*
*
*

Bag of Words
Hello I'll say Hello
Hello I will say -> Dictionary
2
1
1
1
Hello Good-Bye ->
2
0
Spam
- Offer is secret
- Click secret link
- Secret sports link
Ham
- Play sports today
- Went play sports
- Secret sports event
- Sport is Today
- Sport costs money
Pgina 2

AI-Unit5
Quiz -> Size of vocabulary = ?
Size of vocabulary = 12
Offer is secret click link sports
Play today went event costs money
Quiz -> P(Spam) = ?
P(Spam) = 3/8 = 0.375
Maximum Likelihood
SSSHHHHH
S - Spam
H - Ham
P(S) = pi
| pi if yi = S
P(yi) = -|
| 1-pi if yi = H
1 1 1 0 0 0 0 0
P(yi) = pi^yi*(1-pi)^(1-yi)
Ppi(data)
p(data) = pi i=1 P(yi) = pi^count(yi=1) * (1-pi)^count(yi=0)
count(yi=1)->3
count(yi=0)->5
p(data) = pi i=1 P(yi) = pi^3 * (1-pi)^5
Log(P(data)) = 3*Log(pi) + 5*Log(1-pi)
( Log*P(data))/ pi != 0 = 3/pi - 5/(1-pi)
3/pi - 5/(1-pi)
3/pi = 5/(1-pi)
3*(1-pi) = 5*pi
5*pi + 3*pi = 3
pi = 3/8
Maximum Likelihood -> 3/8
Quiz -> Maximum Likelihood
ML-Solutions for
P("Secret"|Spam) =
P("Secret"|Ham) =
P("Secret"|Spam) = 1/3 = 0.333
P("Secret"|Ham) = 1/15 = 0.0667
Bayes Network
|- w1
Spam----|- w2
|- w3
Offer is secret click sports -> 1/3
P("Secret"|Spam) = 1/3
Quiz
Dictionary has 12 Words
How many parameters? = 23
Pgina 3

AI-Unit5
P(Spam) ~ 1
P(wi|Spam) ~ 11
P(wi|Ham) ~ 11
Parameters - 23
Quiz Message M = "Sports"
P(Spam|M) = 0.1667 = 3/18
(P(M|Spam)*P(Spam))/((P(M|Spam)*P(Spam))+(P(M|Ham)*P(Ham)))
P(M|Spam) = 1/9
P(Spam) = 3/8
P(M|Ham) = 1/3
P(Ham) = 5/8
(1/9*3/8)/((1/9*3/8)+(1/3*5/8)) = (3/12)/(18/12)
Quiz M = "Secret is secret"
P(Spam|M) = ?
P(Spam|M) = 25/26 = 0.9615
(3/8*1/3*1/9*1/3)/((3/8*1/3*1/9*1/3)+(5/8*1/15*1/15*1/15))
(1/216)/(1/216 + 1/5400)
Quiz M = "Today is Secret"
P(Spam|M) = ?
P(Spam|M) = 0
(3/8*0*1/9*1/3)/((3/8*0*1/9*1/3)+(5/8*2/15*1/15*1/15))
0/(1/2700) = 0
Laplace Smoothing
Maximum Likelihood
P(x) = Count(x)/N
Laplace Smoothing (k)
P(x) = (Count(x)+k)/(N+k|x|)
Laplace Smoothing
K=1
1 message
1 spam
P(spam) = 2/3 = 0.667
(1+1)/(1+2) = 2/3
10 message
6 spam
P(spam) = 7/12 = 0.5833
(6+1)/(10+2) = 7/12
100 message 60 spam
P(spam) = 61/102 = 0.5986
(60+1)/(100+2) = 61/102
Quiz k=1
P(Spam) = 2/5
(3+1)/(8+2) = 4/10 = 2/5
P(Ham) = 3/5
P("today"|Spam) = 1/21
(0+1)/(9+12) = 1/21
P("today"|Ham) = 1/9
(2+1)/(15+12) = 3/27 = 1/9
Quiz M="Today is Secret"
k1
P(Spam|M) = ?
Pgina 4

AI-Unit5
P(Spam|M) = 0.4858
(2/5*1/21*2/21*4/21)/((2/5*1/21*2/21*4/21)+(3/5*3/27*2/27*2/27))
Summary Naive Bayes
__x__ -> y
|---( )
(y)---|---( )
|---( )
Maximum Likelihood
Laplecing Smoother
Bayes rule

Bag of words

Advanced Spam Filters


- Known Spamming IP?
- Have you Emailed person before?
- Have I know other recived same Message?
- Email Header Consistent
- All Caps?
- Do inline URL's point to where they say?
- Are you addressed by name?
0
1
2
1

0
1
2
1

Input Vector = pixel Values


16x16
Ovefitting Prevention
Occam's Razor k?
Cross Validation
Training Data
Training Data
Train
Cross Validation
Test
80% Train - Parameters
10% Cross Validation - k
10% Test - Verify --------> Report
Supervised Learning
- Classification
yi {0,1}
- Regression
yi [0,1]
House price in $1,000 -> 300-1,000
House size in square feet -> 500-3,500
2,500 sq ft
( ) 400k
( ) 600k
(*) 800k
( ) 1,000k
Linear Regression
Data
Pgina 5

AI-Unit5
x11
x12 ... x1n ->
.
.
.
.
.
.
.
.
.
xm1 xm2 ... xmn -> ym

y1

y = f(x)
f(x) = w1*x + w0
f(x) = w*x + w0
Quiz

f(x) = w1*x + w0

x
3
6
4
5

y
0
-3
-1
-2

w0 = 3
w1 = -1
Loss = j (yj-w1xj-w0)^2
w* = argmin Loss w
Minimizing Quadratic Loss
minw (yi - w1*xi - w0)^2 = L
L/w0 = -2 (yi - w1*xi - w0) != 0
L/w0 -> yi - w1*xi = M*w0
L/w0 -> w0 = 1/M * yi - w1/M * xi
L/w1
L/w1
L/w1
L/w1

= -2 (yi - w1*xi - w0) * xi != 0


-> xiyi - w0*xi = w1 * xi^2
-> xiyi - 1/M*(yixi) - w1/M*(xi)^2 = w1*xi^2
-> w1 = (Mxiyi - xiyi)/(Mxi^2 - (xi)^2)

w0 = 1/M * yi - w1/M * xi
w1 = (Mxiyi - xiyi)/(Mxi^2 - (xi)^2)
w1 =
((4*((3*0)+(6*-3)+(4*-1)+(5*-2)))-((3+6+4+5)*(0-3-1-2)))/(((4*((3^2)+(6^2)+(4^2)+(5
^2)))-((3+6+4+5)^2)))
w1 = (4*(-32) - 18*(-6))/(4*86) - (18)^2 )
w0 =
((1/4)*-6)-(((((4*((3*0)+(6*-3)+(4*-1)+(5*-2)))-((3+6+4+5)*(0-3-1-2)))/(((4*((3^2)+
(6^2)+(4^2)+(5^2)))-((3+6+4+5)^2))))/4)*18)
w0 = ((1/4)*(-6))-(((-1)/4)*18)
Quiz
x
2
4
6
8

y
2
5
5
8

w0 = ?
w1 = ?
w1 =
(((4)*((2*2)+(4*5)+(6*5)+(8*8)))-((2+4+6+8)*(2+5+5+8)))/((4*((2^2)+(4^2)+(6^2)+(8^2
)))-(2+4+6+8)^2)
Pgina 6

AI-Unit5
w0 =
((1/4)*(2+5+5+8))-(((((4)*((2*2)+(4*5)+(6*5)+(8*8)))-((2+4+6+8)*(2+5+5+8)))/((4*((2
^2)+(4^2)+(6^2)+(8^2)))-(2+4+6+8)^2))/4)*(2+4+6+8)
w0 = ((1/4)*(2+5+5+8))-((0.9)/4)*(2+4+6+8)
Problems with Linear Regression
Logistic Regression
f(x)
z = 1/(1+(e^(-f(x))))
Quiz
z = 1/(1+(e^(-f(x))))
(*)
( )
( )
( )
( )

]0,1[
]-1,1[
]-1,0[
]-2,2[
None of above

f(x)
z
1
f(x)
z
0
Regularization
Loss = Loss(data) + Loss(parameters)
j(yi-w1xi-w0)^2 + i(wi)^p
p = 1
p = 2
Minimizing More Complicated Loss Functions
Loss Function -> w (Gradient Descent)
w^0
w^(i+1) <- w^i - wL(w^i)
points: a,b,c
a
b
c

Gradient Pos.
( )
( )
(*)

About Zero

Gradient Neg.
( )
(*)
( )

Gradient Positive
About Zero
Gradient Negative
Quiz
Loss Function
Which Gradient is Largest?
( )
a
( ) b
(*) c
( )
all equal
Quiz
Loss Function
Will gradient descent likely reah the global minimum?
(*) Yes
( ) No
Gradient Descent
L = j(yi-w1xi-w0)^2 -> min
Pgina 7

(*)
( )
( )

AI-Unit5
L/w1 = -2j(yj-w1xj-w0)*xj
w1^0
w1^m <- w1^m-1 - (L/w1)*w1^(m-1)
|w1^0|
L/w0 = -2j(yj-w1xj-w0)
w0^0
w0^m <- w0^m-1 - (L/w1)*w0^(m-1)
|w0^0|
Perceptron Algorithm
++++++++++++++++
Linear Equations -> Linear Separator
---------------|1 if w1*x+w0 >= 0
f(x)----|
|0 if w1*x+w0 < 0
w1*x+w0 -> Linear Funtion
Start with random guess for w1 w0
wi^m <- wi^(m-1) + *(yj-f(xj))
-> learning rate
(yj-f(xj)) -> error
Wuich one to prefer?
( )
a
(*)
b
( )
c
( )
None
+++++++++++
___________
___________ -> Margin
----------Maximum Margin Algorithms
- Supportvector Machines SVM's
- Boosting
Quadratic Program
"Kernel Trick" - x1,x2,x3
x2
- + x1
x3 = ((x1)^2 + (x2)^2)
|++++ | ----|_____|_______ x3
|
|
|++++ | ----Linear Methods
- Revression vs Classification
- Exact Solutions vs Iterative Solutions
- Smothing
- Non-Linear Problems
Supervised Learning
- Parametric
Pgina 8

AI-Unit5
# of parameters independent of training set size
- Non-Parametric
# of parameters can grow
1-Nearest Neighbor
( )
( ) (+) +
(+) +
K-Nearest Neighbors
Learning: Memorize all data
Label new example
- Find "k" nearest neighbors
- Return majority class label
k = Regularizer
1 23456789
( ) ++----++++-+
k=1
(*)
k=3
(*)
k=5
( )
k=7
( )
k=9
(*)

( )
( )
(*)
(*)
( )

Voronoi Graph
k=1
k=3
Problems of KNN
- Very large data sets
kdd trees
- Very large feature spaces
Edge lenght of neighborhood
Input dimensions

Pgina 9

S-ar putea să vă placă și