Documente Academic
Documente Profesional
Documente Cultură
SS 2005
Website
http://cig.in.tu-clausthal.de/
Visit regularly!
1. Introduction
2. Solving Problems by Searching
3. Learning from Observations
4. Neural Networks
5. Knowledge Engineering
6. More about Logic
7. Planning
8. Reasoning under Uncertainty
1. Introduction
1.1 Overview
1.2 Basics
1.3 History of AI
1.4 Intelligent agents
“The exciting new effort to make computers “The study of mental faculties through the
think . . . machines with minds, in the full use of computational models”
and literal sense” (Haugeland, 1985) (Charniak and McDermott, 1985)
“[The automation of] activities that we asso- “The study of the computations that make
ciate with human thinking, activities such as it possible to perceive, reason, and act”
decision-making, problem solving, learning (Winston, 1992)
. . .” (Bellman, 1978)
“The art of creating machines that perform “A field of study that seeks to explain and
functions that require intelligence when per- emulate intelligent behavior in terms of
formed by people” (Kurzweil, 1990) computational processes” (Schalkoff, 1990)
“The study of how to make computers do “The branch of computer science that is con-
things at which, at the moment, people are cerned with the automation of intelligent
better” (Rich and Knight, 1991) behavior” (Luger and Stubblefield, 1993)
1. Cognitive science
3. Turing Test :
http://cogsci.ucsd.edu/∼asaygin/tt/ttest.html
http://www.loebner.net/Prizef/loebner-prize.html
Question
Does the chamber understand the Chinese
language?
Artificial Intelligence – Prof. Dr. Jürgen Dix (13/554)
1. Introduction 2. Basics
1.2 Basics
1596–1650 : R. Descartes:
Mind = physical system
1646–1716 : G. W. Leibniz:
Materialism, uses some ideas of Ars Magna
1862–1943 : D. Hilbert:
Famous talk 1900 in Paris: 23 problems
23rd problem: The Entscheidungsproblem
1872–1970 : B. Russell:
1910: Principia Mathematica
Logical positivism, Vienna Circle (1920–40)
1906–1978 : K. Gödel:
Completeness theorem (1930)
Incompleteness theorem (1930/31)
Unprovability of theorems (1936)
1912–1954 : A. Turing:
Turing-machine (1936)
Computability
1903–1995 : A. Church:
λ-Calculus, Church-Turing-thesis
with floating-point-arithmetic.
Plankalkül: First high-level programming language
1.3 History of AI
Idea:
Combine knowledge about automata theory, neural
nets and the studies of intelligence (10 participants)
is to as is to:
1 2 3 4 5
Convergence theorem
An algorithm exists that can adjust the connection
strengths of a perceptron to match any input data.
Artificial Intelligence – Prof. Dr. Jürgen Dix (39/554)
1. Introduction 3. History of AI
Inacceptable mistake
The spirit is willing but the flesh is weak.
was translated into
The vodka is good but the meat is rotten.
Hope:
Faster hardware and more memory will solve
everything!
NP-Completeness, S. Cook/R. Karp (1971/72),
P 6= NP?
The result:
Definition 1 (Agent a )
An agent a is anything that can be viewed as
perceiving its environment through sensor and
acting upon that environment through effectors.
sensors
percepts
?
environment
agent
actions
effectors
Attention:
A rational agent is in general not omniscient!
Question
What is the right thing and what does it depend on?
Mappings:
set of percept sequences 7→ set of actions
Hint:
Internally an agent is
Part-picking robot Pixels of varying Pick up parts and Place parts in Conveyor belt
intensity sort into bins correct bins with parts
Taxi driver Cameras, Steer, accelerate, Safe, fast, legal, Roads, other
speedometer, GPS, brake, talk to comfortable trip, traffic, pedestrians,
sonar, microphone passenger maximize profits customers
Question:
How can agent programs be designed?
Some examples:
1
Production rules: If the driver in front hits the
breaks, then hit the breaks too.
function SIMPLE-REFLEX-AGENT( percept) returns action
static: rules, a set of condition-action rules
state INTERPRET-INPUT( percept)
rule RULE-MATCH(state, rules)
action RULE-ACTION[rule]
return action
env : S × A −→ 2S
Agent Sensors
Environment
What action I
Condition−action rules should do now
Effectors
Question:
Why not doing the same for env?
Definition 6 (Environment)
An environment Env is a triple hS, s0 ,ττi consisting of
1
the set S of states,
2
the inital state s0 ∈ S,
3
a function τ : Ract −→ 2S , which describes how
the environment changes when an action is
performed (given the whole history).
Definition 7 (Agent a)
An agent a is determined by a function
action : Rstate −→ A,
Important:
An agent system is then a pair a = haction, Env i
consisting of an agent and an environment.
We denote by R(a a, Env ) the set of runs of agent a in
environment Env .
Equivalence
Two agents a , b are called behaviourally equivalent
wrt. environment Env , if R(aa, Env ) = R(bb, Env ).
Two agents a , b are called behaviourally
equivalent, if they are behaviourally equivalent
wrt. all possible environments Env .
Question:
What about purely reactive agents? They can be
described much simpler!
action : S −→ A.
action : P∗ −→ A
together with
see : S −→ P.
Definition 11 (Indistinguishable)
Two different states s, s0 are called indistinguishable
for an agent a , if see(s) = see(s0 ).
where
p0 = see(s0 ),
ai = action(hp0 , . . . , pi i),
pi = see(si ), where si ∈ τ (s0 , a0 , s1 , a1 , . . . , si−1 , ai −1 ).
see : S −→ P,
and next : I × P −→ I.
Sensors
State
Environment
What my actions do
What action I
Condition−action rules should do now
Agent Effectors
aj = action(ij ),
sj ∈ τ(s0 , a0 , s1 , a1 , . . . , sj−1 , aj −1 ),
pj = see(sj )
Theorem 16 (Equivalence)
Standard agents and state-based agents are
equivalent with respect to their environmental
behaviour.
More precisely: For each state-based agent a state
and next storage function there exists a standard
agent a which has the same environmental
behaviour, and vice versa.
Sensors
State
Environment
What my actions do What it will be like
if I do action A
What action I
Goals should do now
Agent Effectors
Intention:
We want to tell our agent what to do, without telling it
how to do it!
We need a utility measure.
u : S 7→ R
Question:
What is the problem with this definition?
Better idea:
u : R 7→ R
Environment
What my actions do What it will be like
if I do action A
What action I
should do now
Agent Effectors
Example:
Tileworld
H H
T T T
T T T H
(a) (b) (c)
Question:
How do properties of the environment influence the
design of a fitting agent?
2. Searching
Formulate:
Goal: a set of states,
Problem: States, actions mapping
from states into states, transitions
Search: Which sequence of actions is helpful?
Execute: The resulting sequence of actions is
applied to the initial state.
inputs: , a percept
, a problem formulation
U PDATE -S TATE(
, )
if is empty then do
F ORMULATE -G OAL( )
F ORMULATE -P ROBLEM(
,
)
S EARCH( )
F IRST( )
% &
R EST( )
return
% &
1 2
3 4
5 6
7 8
Definition 17 (1-state-problem)
A 1-state-problem consists of:
a set of states (incl. the initial state)
a set of n actions (operators), which – applied to a
state – leads to an other state:
Operatori : States 7→ States, i = 1, . . . , n
We use a function Successor-Fn: S 7→ A × S
a set of goal states or a goal test, which – applied
on a state – determines if it is a goal-state or not.
a cost function g, which assesses every path in
the state space (set of reachable states).
g is usually additive.
Artificial Intelligence – Prof. Dr. Jürgen Dix (93/554)
2. Searching 1. Problem formulation
Game-example:
5 4 5
1 4
2 3
6 1 88 6
8 84
7 3 22 7 6 25
Game-example:
R
L R
S S
R R
L R L R
L L
S S
S S
R
L R
S S
L R
S S
L R
R
S S
R L
L R
S S
R
Real-world-problems:
travelling-salesman-problem
VLSI-layout
labelling maps
robot-navigation
Map of Romania
Oradea
71
Neamt
Zerind 87
151
75
Iasi
Arad
140
92
Sibiu Fagaras
99
118
Vaslui
80
Rimnicu Vilcea
Timisoara
142
111 Pitesti 211
Lugoj 97
70 98
85 Hirsova
146 101
Mehadia Urziceni
75 138 86
Bucharest
Dobreta 120
90
Craiova Eforie
Giurgiu
Map of Romania
loop do
if there are no candidates for expansion then return failure
choose a leaf node for expansion according to
if the node contains a goal state then return the corresponding solution
else expand the node and add the resulting nodes to the search tree
Important:
State-space versus search-tree:
The search-tree is countably infinite in contrast to the
finite state-space.
Important:
The recursive dependency between node and parent
is important. If the depth is left out then a special
node root has to be introduced.
Conversely the root can be defined by the depth: root
is its own parent with depth 0.
Artificial Intelligence – Prof. Dr. Jürgen Dix (105/554)
2. Searching 2. Uninformed search
PARENT− N ODE
A CTION = right
5 4 Node DEPTH = 6
STATE PATH− C OST = 6
6 1 88
7 3 22
Design-decision: Queue
The nodes-to-be-expanded could be described as a
set. But we will use a queue instead.
The fringe is the set of generated nodes that are not
yet expanded.
]), / +
)
loop do
if E MPTY ?( ) then return failure / +
R EMOVE -F IRST(
) = / +
] succeeds
then return S OLUTION( ) =
, + = 6
), / +
function E XPAND( = : 6
]) do
a new N ODE
S TATE[ ] !
PARENT-N ODE[ ] =
ACTION[ ]
, )
add to !
return !
Datatype Node
Question:
Which are interesting requirements of
search-strategies?
completeness
time-complexity
space complexity
optimality (w.r.t. path-costs)
We distinguish:
Uninformed vs. informed search.
Complete? Yes.
Optimal? Yes, if all operators are equally expensive.
A A A A
B C B C B C B C
D E F G D E F G D E F G D E F G
Uniform-cost
Uniform-cost=Breadth-first.
A A A
B C B C B C
D E F G D E F G D E F G
H I J K L M N O H I J K L M N O H I J K L M N O
A A A
B C B C B C
D E F G D E F G D E F G
H I J K L M N O H I J K L M N O H I J K L M N O
A A A
B C B C B C
D E F G D E F G D E F G
H I J K L M N O H I J K L M N O H I J K L M N O
A A A
B C B C B C
D E F G D E F G D E F G
H I J K L M N O H I J K L M N O H I J K L M N O
false
# %
R ECURSIVE -DLS(
)
,
, ) % )
) )
if = then
true
)
# %
.
- / 0
)
if
then return
else return
# / 0
In total: 1 + b + b2 + . . . + bd −1 + bd + (bd +1 − b)
many nodes.
Space-complexity: b × l,
Time-complexity: bl .
Two different sorts of failures!
Limit = 1 A A A A
B C B C B C B C
Limit = 2 A A A A
B C B C B C B C
D E F G D E F G D E F G D E F G
A A A A
B C B C B C B C
D E F G D E F G D E F G D E F G
Limit = 3 A A A A
B C B C B C B C
D E F G D E F G D E F G D E F G
H I J K L M N O H I J K L M N O H I J K L M N O H I J K L M N O
A A A A
B C B C B C B C
D E F G D E F G D E F G D E F G
H I J K L M N O H I J K L M N O H I J K L M N O H I J K L M N O
A A A A
B C B C B C B C
D E F G D E F G D E F G D E F G
H I J K L M N O H I J K L M N O H I J K L M N O H I J K L M N O
for
0 to do
,
)
if
b + b2 + . . . + bd + (bd +1 − b)-many.
Attention:
Iterative-deepening-search is faster than
breadth-first-search because the latter also generates
nodes at depth d + 1 (even if the solution is at depth
d).
a limit.
Bidirectional search:
Start Goal
for loops?
A A
B B B
A
C C C C C
an empty set
/ +
]), / +
loop do
if E MPTY ?( ) then return failure / +
R EMOVE -F IRST(
) = / +
)
if S TATE[ ] is not in then = =
add S TATE[ ] to = =
) + = 6 / +
Idea:
Use problem-specific knowledge to improve the
search.
Example 21 (path-finding)
Straight−line distance
Oradea to Bucharest
71
Neamt Arad 366
87
Bucharest 0
Zerind
75
151 Craiova 160
Iasi Dobreta 242
Arad
140 Eforie 161
92 Fagaras 178
Sibiu Fagaras
99 Giurgiu 77
118 Hirsova 151
Vaslui
80 Iasi 226
Rimnicu Vilcea Lugoj
Timisoara 244
142
Mehadia 241
111 Pitesti 211 Neamt 234
Lugoj 97
Oradea 380
70 98 Pitesti 98
85 Hirsova
146 101 Rimnicu Vilcea 193
Mehadia Urziceni
86 Sibiu 253
75 138 Bucharest Timisoara 329
Dobreta 120
90
Urziceni 80
Craiova Eforie Vaslui 199
Giurgiu Zerind 374
Sibiu Bucharest
253 0
Artificial Intelligence – Prof. Dr. Jürgen Dix (133/554)
2. Searching 3. Best first (Greedy) search
Questions:
1
Is greedy-search optimal?
2
Is greedy-search complete?
2.4 A∗ search
N
Z
I
A
380 S
F
V
400
T R
L P
H
M U
B
420
D
E
C
G
Question:
How many nodes does A∗ store in memory?
Answer:
Virtually always exponentially-many with respect to
the length of the solution.
Important:
A∗ ’s problem is space not time.
Important:
A∗ is even optimally efficient: No other algorithm
(which expands search-paths beginning with an initial
node) expands less nodes than A∗ .
2.5 Heuristics
5 4 5
1 4
2 3
6 1 88 6
8 84
7 3 22 7 6 25
Question:
Which branching-factor?
Answer:
Approx. 3 (more exactly 83 ).
Question:
How many states have to be considered?
Answer:
3g ≈ 1010
Question:
Which heuristic-functions come in handy?
Question:
How to determine the quality of a heuristic-function?
(In a single value)
N + 1 = 1 + b∗ + (b∗ )2 + . . . + (b∗ )d
Attention:
b∗ depends on h and on the special
problem-instance.
But for many classes of problem-instances b∗ is quite
constant.
Question:
Is the Manhattan-distance better than the
Hamming-distance?
Question:
A∗ is memory-intensive. What if the memory is
limited? What to do if the queue is restricted in its
length?
root MAKE-NODE(INITIAL-STATE[problem])
f-limit f - COST(root)
loop do
solution, f-limit DFS-CONTOUR(root, f-limit)
1
if solution is non-null then return solution
if f-limit = then return failure; end
function DFS-CONTOUR(node, f-limit) returns a solution sequence and a new f - COST limit
inputs: node, a node
1
f-limit, the current f - COST limit
static: next-f, the f - COST limit for the next contour, initially
Question:
What about complexity?
" "
E XPAND( ,
) $
if
, "
[s]
$ * , - / 0 / 2 4 6 / 2 7
9
: 2
repeat
the lowest -value node in
$
" "
if then return , [ ] 9
: =
" "
, [ ] RBFS(
,
)
$
* F H / 7
B
2
if then return
L
K
447
Sibiu Timisoara Zerind
393
447 449
415
Arad Fagaras Oradea Rimnicu Vilcea
526 413
646 415
447
Sibiu Timisoara Zerind
393 447 449
417
Arad Fagaras Oradea Rimnicu Vilcea
646 415 526 413 417
Sibiu Bucharest
591 450
447
Sibiu Timisoara Zerind
393 447 449
447
Arad Fagaras Oradea Rimnicu Vilcea
646 415 450 526 417
447
Craiova Pitesti Sibiu
526 417 553
memory.
A
0+12=12
10 8
B G
10+5=15 8+5=13
10 10 8 16
C D H I
20+5=25 20+0=20 16+2=18 24+0=24
10 10 8 8
E F J K
30+5=35 30+0=30 24+0=24 24+5=29
1 2 3 A
4 A
A A
12 12 13 13(15)
B B G G
15 13
15 13
H
18
5 A 6 A 7 A 8 A
15(15) 15 15(24) 20(24)
G B G B B
24( ) 15 20( )
15 24
I C D
25
24 20
Queue MAKE-QUEUE({MAKE-NODE(INITIAL-STATE[problem])})
loop do
if Queue is empty then return failure
n deepest least-f-cost node in Queue
if GOAL-TEST(n) then return success
s NEXT-SUCCESSOR(n)
1
if s is not a goal and is at maximum depth then
f(s)
else
f(s) MAX(f(n), g(s)+h(s))
if all of n’s successors have been generated then
update n’s f -cost and those of its ancestors if necessary
if SUCCESSORS(n) all in memory then remove n from Queue
if memory is full then
delete shallowest, highest-f-cost node in Queue
remove it from its parent’s successor list
insert its parent on Queue if necessary
insert s on Queue
end
Idea:
Considering certain problems only the actual state is
important, but not the path leading to it.
evaluation
current
state
current MAKE-NODE(INITIAL-STATE[problem])
loop do
next a highest-valued successor of current
if VALUE[next] < VALUE[current] then return current
current next
end
1
current MAKE-NODE(INITIAL-STATE[problem])
for t 1 to do
T schedule[t]
if T=0 then return current
next a randomly selected successor of current
ΔE VALUE[next] – VALUE[current]
if ΔE > 0 then current next
else current next only with probability e ΔE/T
repeat
empty set
R ANDOM -S ELECTION(
, F ITNESS -F N)
R ANDOM -S ELECTION(
, F ITNESS -F N)
R EPRODUCE( , ) "
)
add to
"
L ENGTH( )
, ))
+ =
3 G
1 S
1 2 3
S A
S A
S G
S G
Hill-Climbing + Memory=LRTA∗
1 1 1 1 1 1 1
(a) 8 9 2 2 4 3
1 1 1 1 1 1 1
(b) 8 9 3 2 4 3
1 1 1 1 1 1 1
(c) 8 9 3 4 4 3
1 1 1 1 1 1 1
(d) 8 9 5 4 4 3
1 1 1 1 1 1 1
(e) 8 9 5 5 4 3
3. Supervised Learning
3.1 Basics
3.2 Decision Trees
3.3 Ensemble Learning
3.4 PL1 formalisations
3.5 PAC learning
3.6 Noise and overfitting
Learning in general
Hitherto: An agent’s intelligence is in his program, it
is hard-wired.
Now: We want a more autonomous agent, which
should learn through percepts
(experiences) to know its environment.
Important: If the domain in which he acts can’t be
described completely.
3.1 Basics
Performance standard
Critic Sensors
feedback
Environment
changes
Learning Performance
element element
knowledge
learning
goals
Problem
generator
Agent Effectors
Question:
What does learning (the design of the learning
element) really depend on?
3. Which feedback?
Supervised learning: The execution of an incorrect
action leads to the “right” solution as feedback (e.g.
How intensively should the brakes be used?).
Driving instructor
Reinforcement learning: Only the result is perceived.
Critic tells, if good or bad, but not what would have
been right.
Unsupervised learning: No hints about the right
actions.
Important:
All these components are – from a mathematical
point of view – mappings.
o o o o
o o o o
o o o o
o o o o
o o o o
global examples fg
function REFLEX-PERFORMANCE-ELEMENT( percept) returns an action
A T 4 7.
Patrons?
No Yes WaitEstimate?
No Yes No Yes
Question:
Can every boolean function be represented by a
decision tree?
Answer:
Yes! Each row of the table describing the function
belongs to one path in the tree.
Attention
Decision trees can be much smaller! But there are
boolean function which can only be represented by
trees with an exponential size:
1, if ∑ni=1 xi is even
Parity function: par (x1 , . . . , xn ) :=
0, else
Artificial Intelligence – Prof. Dr. Jürgen Dix (203/554)
3. Supervised Learning 2. Decision trees
PAC-Learning
No No
Patrons(x,Some) Patrons(x,Full) Fri/Sat(x) No
>
Yes Yes
Yes Yes
Obviously:
n-DL(n) = set of all boolean functions
n
card (n-DL(n)) = 22 .
Disadvantage:
New cases can not be considered.
Idea:
Choose the simplest tree (or rather the most
general) which is compatible with all examples.
Example 34 (Guess!)
A computer program encodes triples of numbers with respect to
a certain rule. Find out that rule.
Your task:
Make more enquiries (approx. 10) and try to find out the rule.
f g
for each value v i of best do
;
examples i elements of examples with best = v i
subtree DECISION-TREE-LEARNING(examples i, attributes best,
MAJORITY-VALUE(examples))
add a branch to tree with label v i and subtree subtree
end
return tree
Patrons?
No Yes Hungry?
No Yes
No Type?
No Yes
Empiric approach:
1
Chose a set of examples MEx .
2
Divide into two sets: MEx = MTrai ∪ MTest .
3
Apply the learning algorithm on MTrai and get a
hypothesis H.
4
Calculate the amount of correctly classified
elements of MTest .
5
Repeat 1.-4. for many MTrai ∪ MTest with randomly
generated MTrai .
Attention: peeking!
1
Proportion correct on test set
0.9
0.8
0.7
0.6
0.5
0.4
0 20 40 60 80 100
Training set size
Information theory
Question:
How to choose the “best” attribute? The best attribute is the one
that delivers the highest amount of information.
Question:
For each attribute A: If this attribute is evaluated with
respect to the actual training-set, how much
information will be gained this way?
1 3 4 6 8 12 1 3 4 6 8 12
2 5 7 9 10 11 2 5 7 9 10 11
Type? Patrons?
No Yes Hungry?
No Yes
4 12
5 9 2 10
(a) (b)
Sofar:
A single hypothesis is used to make predictions.
Idea:
Let’s take a whole bunch of ’em (an ensemble).
Motivation:
Among several hypotheses, use majority voting.
The misclassified ones get higher weights!
−
− −
− − − −
− − −
− −
− −
− − + − −
− + +
++ + −
+ − −
− ++ + −
+ + + +
− − −
− − − −
− − − −
− −
− − −
−
Artificial Intelligence – Prof. Dr. Jürgen Dix (224/554)
3. Supervised Learning 3. Ensemble Learning
h1 h2 h3 h4
h
Artificial Intelligence – Prof. Dr. Jürgen Dix (225/554)
3. Supervised Learning 3. Ensemble Learning
1
0.95
Proportion correct on test set
0.9
0.85
0.8
0.75
0.7
0.65 Boosted decision stumps
0.6 Decision stump
0.55
0.5
0 20 40 60 80 100
Training set size
Artificial Intelligence – Prof. Dr. Jürgen Dix (229/554)
3. Supervised Learning 3. Ensemble Learning
1
0.95
Training/test accuracy
0.9
0.85 Training error
Test error
0.8
0.75
0.7
0.65
0.6
0 50 100 150 200
Number of hypotheses M
Artificial Intelligence – Prof. Dr. Jürgen Dix (230/554)
3. Supervised Learning 4. PL1 formalisations
Goal:
A more general framework.
Idea:
To learn means to search in the hypotheses
space
( planning).
Goal-predicate:
Q (x ), one-dimensional (hitherto: Will_Wait)
Hi : ∀xC (x ) ↔ Ci (x )
Definition 39 (Hypotheses-Space H )
The hypotheses-space H of a learning-algorithm is
the set of those hypotheses the algorithm can create.
The extension of a hypothesis H with respect to the
goal-predicate Q is the set of examples for which H
holds.
Artificial Intelligence – Prof. Dr. Jürgen Dix (234/554)
3. Supervised Learning 4. PL1 formalisations
Attention:
The combination of hypotheses with different
extensions leads to inconsistency.
Attributes Goal
Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
X1 Yes No No Yes Some $$$ No Yes French 0–10 Yes
X2 Yes No No Yes Full $ No No Thai 30–60 No
X3 No Yes No No Some $ No No Burger 0–10 Yes
X4 Yes No Yes Yes Full $ No No Thai 10–30 Yes
X5 Yes No Yes No Full $$$ No Yes French >60 No
X6 No Yes No Yes Some $$ Yes Yes Italian 0–10 Yes
X7 No Yes No No None $ Yes No Burger 0–10 No
X8 No No No Yes Some $$ Yes Yes Thai 0–10 Yes
X9 No Yes Yes No Full $ Yes No Burger >60 No
X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10–30 No
X11 No No No No None $ No No Thai 0–10 No
X12 Yes Yes Yes Yes Full $ No No Burger 30–60 Yes
Question:
Under which conditions is a hypothesis H inconsistent with an
example X ?
Attention:
Inductive learning in logic-based domains means to restrict the
set of possible hypotheses with every example.
− − −
− − − − − − − − − − − − − −
−
− −
− − − − −
− − − − −
+ + + + +
+ + + + +
+ − + − + − + − + −
− − − − −
+ + + + +
+ + + + + − + + − + +
+ + − + + − + + − + + − + + −
− − − − − − − − − −
(a) (b) (c) (d) (e)
Question:
How to generalize/specialize?
H1 : ∀x (Q (x ) ↔ C1 (x ))
H2 : ∀x (Q (x ) ↔ C2 (x ))
H1 generalises H2 , if C2 (x ) → C1 (x ), H1 specialises H2 , if
C1 (x ) → C2 (x ).
generalisation means: leave out ∧-elements in a
conjunction, add ∨-elements to a disjunction.
specialisation means: add ∧-elements to a disjunction,
leave out ∨-elements in a conjunction.
Problem:
H is a big disjunction H1 ∨ . . . ∨ Hn . How to represent this?
Reminder:
How is the set of real numbers between 0 and 1 represented?
Through the range [0, 1].
G1 G2 G3 ... G −
m −
More general − G1
−
−
−
− + + G2
−
− S1
+ + + −
+
+
+ + +
More specific −
S1 S2 ... S −
n
− −
Question:
What could happen?
1
only one hypothesis remains: Hooray!
2
the space collapses: G = 0/ or S = 0/ .
3
no examples left but several hypotheses: i.e.
ending with a big disjunction.
Question:
How big is the distance between the hypothesis H
calculated by the learning algorithm and the real
function f ?
computational learning theory: PAC-learning –
Probably Approximately Correct.
Idea:
If a hypothesis is consistent with a big training set
then it cannot be that wrong.
Question:
How are the training set and test set related?
We assume:
Definition 40 (error(h))
Let h ∈ H be a hypothesis and f the target (i.e.
to-be-learned) function. We are interested in the set
Question:
ε > 0 is given. How many examples must the training
set contain to make sure that the hypothesis created
by a learning algorithm is ε approximatively correct?
Question is wrongly stated!
H bad
∋
f
Question:
What does the last result mean?
Answer:
The complexity depends on log(|H |). What does this mean
for boolean functions with n arguments?
Artificial Intelligence – Prof. Dr. Jürgen Dix (256/554)
3. Supervised Learning 5. PAC Learning
Answer:
We have log(|H |) = 2n . Thus one needs
exponentially-many examples even if one is satisfied
with ε approximative correctness under a certain
probability!
Proof (continuation):
We want P 0 ≤ δ:
|H |(1 − ε)m ≤ δ
After some transformations:
(1 − ε)m ≤ |Hδ |
m ln(1 − ε) ≤ ln(δ) − ln(|H |)
m ≥ − ln(11−ε) (ln( 1δ ) + ln(|H |))
m ≥ 1ε (ln( 1δ ) + ln(|H |))
The last line holds because of ln(1 − ε) < −ε.
Essential result:
Learning is never better than looking up in
a table!
; exa
return a decision list with initial test t and outcome o
and remaining elements given by DECISION-LIST-LEARNING(examples
1
Proportion correct on test set
0.9
0.8
Decision tree
0.7 Decision list
0.6
0.5
0.4
0 20 40 60 80 100
Artificial Intelligence – Prof. Dr. Jürgen Dix (265/554)
3. Supervised Learning 6. Noise and overfitting
Noise:
examples are inconsistent (Q (x ) together with
¬Q (x )),
no attributes left to classify more examples,
makes sense if the environment is
nondeterministic.
Other examples:
the pyramids,
astrology,
“Mein magisches Fahrrad”.
4. Learning in networks
Axonal arborization
Synapse
Dendrite Axon
Nucleus
Synapses
A neuron consists of
the soma: the body of the cell,
the nucleus: the core of the cell,
the dendrites,
the axon: 1 cm - 1 m in length.
Question:
How does the building process of the network of
neurons look like?
Question
Long term changes in the strength of the connections
are in response to the pattern of stimulation.
Idea:
A unit i receives an input via links to other units j. The
input function
ini := ∑ Wj ,i aj
j
Notation Meaning
ai Activation value of unit i (also the output of the unit)
ai Vector of activation values for the inputs to unit i
g Activation function
g0 Derivative of the activation function
Erri Error (difference between output and target) for unit i
Erre Error for example e
Ii Activation of a unit i in the input layer
I Vector of activations of all input units
Ie Vector of inputs for example e
ini Weighted sum of inputs to unit i
N Total number of units in the network
O Activation of the single output unit of a perceptron
Oi Activation of a unit i in the output layer
O Vector of activations of all units in the output layer
t Threshold for a step function
T Target (desired) output for a perceptron
T Target vector when there are several output units
Te Target vector for example e
Wj,i Weight on the link from unit j to unit i
Wi Weight from unit i to the output in a perceptron
Wi Vector of weights leading into unit i
W Vector of all weights in the network
Artificial Intelligence – Prof. Dr. Jürgen Dix (282/554)
4. Learning in networks 2. Neural networks
a i = g(ini )
aj Wj,i
g
ini
Σ
Input Output
ai
Links Links
Input Activation
Function Function Output
+1 +1 +1
−1
Simple units:
ai ai ai
+1 +1 +1
−1
Standardisation:
We consider step0 instead of stept .
If an unit i uses the activation function stept (x ) then
we bring in an additional input link “0” which adds a
constant value of a0 := −1. This value is weighted as
W0,i := t. Now we can use step0 for the activation
function:
n n n
stept ( ∑ Wj ,i aj ) = step0 (−t + ∑ Wj ,i aj ) = step0 ( ∑ Wj ,i aj )
j =1 j =1 j =0
Networks can be
recurrent, i.e. somehow connected,
feed-forward, i.e. they form an acyclic graph.
Usually networks are partitioned into layers:
units in one layer have only links to units of the next layer.
E.g. multi-layer feed-forward networks: without internal states
(no short-term memory).
w13
I1 H3
w35
w14
O5
w23
w45
I2 H4
w24
Artificial Intelligence – Prof. Dr. Jürgen Dix (287/554)
4. Learning in networks 2. Neural networks
Important:
The output of the input-units is determined by the environment.
Question 1:
Which function does the figure describe?
Question 2:
Why can non-trivial functions be represented at all?
Question 3:
How many units do we need?
Hopfield networks:
1
bidirectional links with symmetrical weights,
2
activation function: sign,
3
units are input- and output-units,
4
can store up to 0.14 N training examples.
Question:
How to find the optimal network structure?
Answer:
Perform a search in the space of network structures
(e.g. with genetic algorithms).
Definition 46 (Perceptron)
A perceptron is a feed-forward network with only one
layer based on the activation function step0 .
Question:
Can every boolean function be represented by a
feed-forward network?
What about
n
1, if ∑ni=1 xi >
f (x1 , . . . , xn ) := 2
0, else.
Solution:
Every boolean function can be composed using
AND, OR and NOT (or even only NAND).
The combination of the respective perceptrons
is not a perceptron!
Perceptron output
1
0.8
0.6
0.4
0.2 4
0 2
−4 −2 0 x2
0 −2
x1 2 4 −4
Question:
What about XOR?
I1 I1 I1
1 1 1
0 0 0
0 1 I2 0 1 I2 0 1 I2
(a) I1 and I2 (b) I1 or I2 (c) I1 xor I2
n
Output = step0 ( ∑ Wj Ij )
j =1
I1
W = −1
I2 W = −1
t = −1.5
W = −1
I3
Learning algorithm:
Similar to current best hypothesis ( chapter on
learning).
hypothesis: network with the current weights
(firstly randomly generated)
UPDATE: make it consistent through small
changes.
Important: for each example UPDATE is called
several times. These calls are called epochs.
Wj := Wj + α × Ij × Error
Wj := Wj + α × g 0 (in) Ij × Error
and output
!
"
#
$
r
X
L
%
repeat
for each in
do
!
& ) +
H $ $
# *
$ }
W
) " + 0 % 2 & 5
* .
) 6 O O % 2 & 5 O +
9
$ $ $
# # 7 *
Proof:
Let ŵ be a solution, i.e. (why? exercise)
→
− →
−
ŵ I > 0 for all I ∈ Ipos ∪ −Ineg
Proof (continued):
→
−
Consider the possibility that not all wi are different from their
predecessor (or successor) (this is the case if the new example
is not consistent with the current weights). Let k1 , k2 , . . . , kj be
the indices of the changed weights (where error is non-zero), i.e.
→→
− − −−−→ = −
w I ≤ 0, w
→ →
−
w + αI .
kj kj kj +1 kj kj
→
− 0
With Ikj being the kj -th tested example in I (which is not
consistent (wrt the definition of kj )).
The we have
−
−−→ = −→ →
− →
− →
−
w w + αI + αI + . . . + α I
kj +1 k1 k1 k2 kj
Proof (continued):
We use this to show that j cannot become arbitrarily big. We
compose
−−−→
ŵ wkj +1
cos ω =
kŵ k k−
w
−−→k
kj +1
Proof (continued):
How do we get the estimation above?
−−−→
the scalar product ŵ wkj +1 leads to
−→ →
− →
− −→
ŵ wk1 + αŵ ( I1 + . . . + Ij ) ≥ ŵ wk1 + αmj.
→
−
k− −−→ −
→
wkj +1 k2 = kwkj + α Ikj k2 =
→
−− →
− →
−
k− → → −
→
wkj k2 + 2α Ikj wkj + α2 k Ikj k2 ≤ kwkj k2 + α2 k Ikj k2 , da
→
−− →
Ikj wkj ≤ 0, this is how we have chosen kj .
→
− →
−
Now be M := max {k I k2 : I ∈ I0 }. Then
→
− →
−
k−
−−→ −→ −→
wkj +1 k2 ≤ kwk1 k2 + α2 k Ik1 k2 + . . . + α2 k Ikj k2 ≤ wk1 k2 + α2 Mj
holds.
0.9
0.8
0.7
0.6 Perceptron
Decision tree
0.5
0.4
0 10 20 30 40 50 60 70 80 90 100
Artificial Intelligence – Prof. Dr. Jürgen Dix (310/554)
4. Learning in networks 3. The Perceptron
0.9
0.6
0.5
0.4
0 10 20 30 40 50 60 70 80 90 100
Artificial Intelligence – Prof. Dr. Jürgen Dix (311/554)
4. Learning in networks 4. Multi-layer feed-forward
Problem:
How does the error-function of the hidden units look
like?
hW(x1, x2)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 4
0 2
-4 -2 0 x2
0 -2
x1 2 4 -4
hW(x1, x2)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2 4
0.1 2
0 0 x2
-4 -2 -2
0 2 -4
x1 4
Output units Oi
Wj,i
Hidden units a j
Wk,j
Input units Ik
Answer:
Four!
Perceptron’s error is easily determined because there
was only one Wj between input and output. Now we
have several.
How should the error be distributed?
Artificial Intelligence – Prof. Dr. Jürgen Dix (317/554)
4. Learning in networks 4. Multi-layer feed-forward
∂E
∂Wj ,i = 12 ∑i (Ti − Oi )2
= 12 2(Ti − g (. . .))(− ∂∂gx (ini ))aj
= −aj (Ti − Oi ) ∂∂gx (ini )
= −aj ∆i
Now the Wk ,j :
∂E
∂Wk ,j = 12 ∑i (2(Ti − g (. . .))(− ∂∂gx (ini ))(Wj ,i ∂∂gx (inj )Ik ))
= ∑i (∆i Wj ,i ∂∂gx (inj )Ik )
= ∂∂gx (inj )Ik ∑i Wj ,i ∆i
Idea:
We perform two different updates. One for the
weights to the input units and one for the weights to
the output units.
Wk ,j := Wk ,j + α × Ik × ∆j
with ∆j := g 0 (inj ) ∑i Wj ,i ∆i .
inputs:
, activation function
repeat
for each in do
!
#
$
for = 2 to do
%
& !
H
! ) & +
! ) & + O ) $ 3 5 +
.
1 #
for = to 1 do
% 3 6
, ,
! ) & +
.
H
do
,
! 7 O 5 O
9
) (321/554)
4. Learning in networks 4. Multi-layer feed-forward
14
12
Total error on training set
10
8
6
4
2
0
0 50 100 150 200 250 300 350 400
Number of epochs
0.9
0.8
0.7
0.4
0 10 20 30 40 50 60 70 80 90 100
Artificial Intelligence – Prof. Dr. Jürgen Dix (323/554)
4. Learning in networks 4. Multi-layer feed-forward
Important:
Back propagation is gradient search!
W1
a
b
Artificial Intelligence – Prof. Dr. Jürgen Dix (325/554)
4. Learning in networks 4. Multi-layer feed-forward
General remarks:
expressibility: neural networks are suitable for
continous input and outputs (noise).
To represent all boolean functions with
n
n attributes 2n hidden units suffice.
Often much less suffice: the art of
determining the topology of the network.
1.5
0.5
x2
-0.5
-1
-1.5
-1.5
Artificial Intelligence – Prof. Dr. Jürgen Dix -1 -0.5 0 0.5 1 1.5 (330/554)
4. Learning in networks 5. Kernel Machines
√
F : R2 ← R3 ; hx1 , x2 i 7→ hx12 , x22 , 2x1 x2 i
Note that
√2x1x2
3
2
1
0
-1
-2 2.5
-3 2
0 1.5
0.5
1 1 x22
1.5 0.5
x21 2
Artificial Intelligence – Prof. Dr. Jürgen Dix (332/554)
4. Learning in networks 5. Kernel Machines
Overfitting!!!
Support Vector Machines
Kernel (or Support Vector-) machines were invented
to compute the optimal linear separator.
.
∑ αi − ∑(αi αj yi yj (xi xj ))
i i ,j
0.8
0.6
x22
0.4
0.2
0
0
Artificial Intelligence – Prof. Dr. Jürgen Dix 0.2 0.4 0.6 0.8 1 (335/554)
4. Learning in networks 6. Applications
4.6 Applications
Driving:
(1993) ALVINN. Actions: steer, accelerate, brake. Driving
straight.
input: color stereo video, radar.
Picture is a 30 × 32 pixel-array (input units).
output layer: 30 units (representing the steering direction).
The output unit with the highest activation is chosen.
Demanded is a function which maps each image on a
steering direction.
hidden units: one layer with 5 units
training: a human drives for 5 minutes, the network
observes.
back-propagation: 10 minutes
problems:
the human drives too good
brightness! Depends on covering and weather.
Artificial Intelligence – Prof. Dr. Jürgen Dix (339/554)
4. Learning in networks 7. Genetic Algorithms
Remember:
Darwin, Lamarck
survival of the fittest
mutation, selection, reproduction.
Algorithm:
Works on a population, reproduces and selects
according to a fitness-function and returns
individuals.
Question:
What does this have to do with learning?
Question:
How to apply this learning algorithm?
5. Knowledge Engineering
Definition (continued)
v̄(ϕ → γ) =
true, if v̄ (ϕ) = false or (v̄ (ϕ) = true and v̄ (γ) = true),
false, else
Thus each valuation v uniquely defines a v̄ . We call v̄
L -structure or model Av . From now in we will speak of models
and valuations.
A model determines for each formula if it is true or false.
The process of mapping a set of L-formulae into {true, false} is
called semantics.
Lemma 57 (ϕ 6∈ Cn(T))
ϕ 6∈ Cn(T ) if and only if there is a valuation v with v |=
T and v̄ (ϕ) = false.
Attention:
Do not mix up this last condition with the property of a valuation
v (or a model): each valuation is complete in the above sense.
Definition (continued)
The only inference rule in SL is modus ponens:
or short
ϕ, ϕ → ψ
(MP) .
ψ
(ϕ, ψ are arbitrary complex formulae).
Definition 63 (Proof)
Show that:
1
A ` A ∨ B,
2
` A ∨ ¬A,
3
the rule
A → ϕ, ¬A → ψ
(R)
ϕ∨ψ
can be derived.
Theorem 64 (Completeness)
T |= ϕ if and only if T ` ϕ
Theorem 65 (Compactness)
A formula semantically follows from a theory T if
and only if it already follows from a finite subset of T :
CnD (W )
Question:
ϕ
What is the difference between an inference rule ψ and the
implication ϕ → ψ?
Supposed we have a set T of formulae and we choose two
constants p, q ∈ L . We could either consider
p
(1) T together with MP and { q }
or
(2) T ∪ {p → q } together with MP
{ qp }
1. Case: Cn (T ),
2. Case: Cn(T ∪ {p → q }).
If T = {¬q }, then we have in (2):
¬p ∈ Cn(T ∪ {p → q }), but not in (1).
——————————————————————
A ∨ ¬B ∨ C ∨ . . . ∨ ¬E
{A, ¬B , C , . . . , ¬E }.
We define the following inference rule:
Definition 68 (SL resolution)
Deduce the clause C1 ∪ C2 from C1 ∪ {A} and C2 ∪ {¬A}.
Question:
Is this calculus correct and complete?
Answer:
No!
“T ∪ {¬φ} is unsatisfiable”
or rather to
T ∪ {¬φ} `
5.3 Wumpus in SL
Stench Breeze
4 PIT
Breeze
Breeze
3 Stench PIT
Gold
Stench Breeze
2
Breeze Breeze
1 PIT
START
1 2 3 4
Artificial Intelligence – Prof. Dr. Jürgen Dix (372/554)
5. Knowledge Engineering 3. Wumpus in SL
OK OK
1,1 2,1 3,1 4,1 1,1 2,1 3,1 4,1
A P?
A
V B
OK OK OK OK
(a) (b)
(a) (b)
Language definition:
Si ,j stench
Bi ,j breeze
Piti ,j is a pit
Gli ,j glitters
Wi ,j contains Wumpus
General knowledge:
¬S1,1 −→ (¬W1,1 ∧ ¬W1,2 ∧ ¬W2,1 )
¬S2,1 −→ (¬W1,1 ∧ ¬W2,1 ∧ ¬W2,2 ∧ ¬W3,1 )
¬S1,2 −→ (¬W1,1 ∧ ¬W1,2 ∧ ¬W2,2 ∧ ¬W1,3 )
Question:
Can we deduce that the wumpus is located at (1,3)?
Answer:
Yes. Either via resolution or using our Hilbert-calculus.
Problem:
We want more: given a certain situation we want to choose the
best action. This is impossible in SL.
Stench Breeze
4 PIT
Breeze
Breeze
3 Stench PIT
Gold
Stench Breeze
2
Breeze Breeze
1 PIT
START
1 2 3 4
Artificial Intelligence – Prof. Dr. Jürgen Dix (378/554)
5. Knowledge Engineering 3. Wumpus in SL
Disadvantages
actions can only be guessed
database must be changed continuously
the set of rules becomes very big because there
are no variables
Using an appropriate formalisation (additional
axioms) we can check if
KB ` ¬ action or KB ` action
But it can happen that neither one nor the other is
deducible.
Observation:
Polynomial algorithms are fast, exponential ones are
slow!
To show the unsatisfiability of an PL-formula using
the truth-table-method we need to test 2n
assignments in the worst case (n attributes).
Question:
Can this problem be solved in polynomial time with
other methods?
Obvious:
The satisfiability of PL-formulae is in NP and P is
subset of NP.
Question:
Which are the most difficult problems in NP?
Definition 73 (NP-completeness)
A problem in NP is called NP-complete if all problems
in NP can be reduced onto it in polynomial time.
Definition (continued)
The 0-dimensional function symbols are called individuum
constants – we leave out the parentheses. In general we will
need – as in propositional logic – only a certain subset of the
predicate or function symbols.
These define a language L ⊆ LFOL (analogously to definition 52
on page 2). The used set of predicate and function symbols is
also called signature Σ.
Definition (continued)
The concept of an L -term t and an L -formula ϕ are defined
inductively:
Term: L -terms t are defined as follows:
1 each variable is a L -term.
2 if f k is a k -dimensional function symbol from L and t1 ,
. . . ,tk are L -terms, then f k (t1 , . . . , tk ) is a L -Term.
The set of all L -terms that one can create from the set X ⊆ Var
is called TermL (X ) or TermΣ (X ). Using X = 0/ we get the set of
basic terms TermL (0) / , short: TermL .
Definition (continued)
Formula: L -formulae ϕ are also defined inductively:
k
1 if P is a k -dimensional predicate symbol from L and t1 ,
. . . ,tk are L -terms then P k (t1 , . . . , tk ) is an L -formula
2 for all L -formulae ϕ is (¬ϕ) a L -formula
3 for all L formulae ϕ and ψ are (ϕ ∧ ψ) and (ϕ ∨ ψ)
L -formulae.
4 if x is a variable and ϕ a L -formula then are (∃ x ϕ) and
(∀ x ϕ) L -formulae.
Definition (continued)
Atomic L -formulae are those which are composed according to
1., we call them AtL (X ) (X ⊆ VAR ). The set of all L -formulae
in respect to X is called FmlL (X ).
Positive formulae (FmlL+ (X )) are those which are composed
using only 1, 3. and 4.
If ϕ is a L -formula and is part of an other L -formula ψ then ϕ is
called sub-formula of ψ.
An illustrating example
Definition (continued)
The range of IA consists of the predicates and functions on UA .
We write:
IA (P ) = P A , IA (f ) = f A .
Definition (continued)
Now we define inductively the logical value of a formula ϕ in A :
1. if ϕ =def P k (t1 , . . . , tk ) with the terms t1 , . . . , tk and the
k -dimensional predicate symbol P k , then
true, if (A (t1 ), . . . , A (tk )) ∈ P A ,
A (ϕ) =def
false, else.
2. if ϕ =def ¬ψ, then
true, if A (ψ) = false,
A (ϕ) =def
false, else.
3. if ϕ =def (ψ ∧ η), then
true, if A (ψ) = true und A (η) = true,
A (ϕ) =def
false, else.
Definition (continued)
4. if ϕ =def (ψ ∨ η), then
true, if A (ψ) = true oder A (η) = true,
A (ϕ) =def
false, else.
5. if ϕ =def ∀ x ψ, then
true, if ∀ d ∈ UA : A[x /d ] (ψ) = true,
A (ϕ) ==def
false, sonst .
6. if ϕ =def ∃ x ψ, then
true, if ∃d ∈ UA : A[x /d ] (ψ) = true,
A (ϕ) =def
false, else.
In the cases 5. and 6. the notation [x /d ] was used. It is defined
as follows: For d ∈ UA be A[x /d ] the structure A 0 , which is
0 0
identical with A except for the definition of x A : x A =def d
(independent of the fact that IA is defined for x or not).
Artificial Intelligence – Prof. Dr. Jürgen Dix (400/554)
5. Knowledge Engineering 5. First Order Logic (FOL)
Definition (continued)
We write:
A |= ϕ[ρ] for A (ϕ) = true: A is a model for ϕ in respect to
ρ.
If ϕ does not contain free variables, then A |= ϕ[ρ] is
independent from ρ. We simply leave out ρ.
If there is at least one model for ϕ , then ϕ is called
satisfiable or consistent.
Definition 79 (Tautology)
1
A theroy is a set of formulae without free
variables: T ⊆ FmlL . A satisfies T if A |= ϕ
holds for all ϕ ∈ T . We write A |= T .
2
A L -formula ϕ is called L -tautology, if for all
matching L -structures A the following holds:
A |= ϕ.
From now on we mostly leave out the language L ,
because it is clearly obvious from the context.
Nevertheless it has to be defined always.
Lemma 82 (ϕ 6∈ Cn(T))
ϕ 6∈ Cn(T ) if and only if there is a structure A with
A |= T and A |= ¬ϕ.
MOD(T ) =def {A : A |= T }.
If U is a set of structures then we can consider all
sentences, which are true in all structures. We call
this set also Cn(U ):
Attention:
Do not mix up this last condition with the property of a valuation
v (or a model): each valuation is complete in the above sense.
An illustrating example
Example 86 (Natural numbers in different languages)
NPr = (IN, 0N , +N , =N ) („Presburger Arithmetik”),
NPA = (IN, 0N , +N , ·N , =N ) („Peano Arithmetik”),
NPA0 = (IN, 0N , 1N , +N , ·N , =N ) (variant of NPA ).
These sets each define the natural numbers, but in different
languages.
Question:
If the language bigger is bigger then we can express more.
Is LPA0 more expressive then LPA ?
Answer:
No, because one can replace the 1N by a
LPA -formula: there is a LPA -formula φ(x ) so that for
each variable assignment ρ the following holds:
Question:
Is LPA perhaps more expressive than LPr , or can the
multiplication be defined somehow?
recursively enumerable.
T ` ϕ if and only if T |= ϕ
Question:
Is T |=Herb φ not much easier, because we have to
consider only Herbrand models? Is it perhaps
decidable? (More about it later in the next section.)
Then
^
M ∪ {¬∃x1 , . . . xn φi }
i
p(x , a) = q (y , b),
p(g (a), f (x )) = p(g (y ), z ). Basic substitutions
are:
[y /a, x /a, z /f (a)], [y /a, x /f (a), z /f (f (a))], . . .
The mgU is: [y /a, z /f (x )] .
{ x = x , g (h(y ), y ) = g (z , a) }
{ g (h(y ), y ) = g (z , a) }
{ h(y ) = z , y = a }
{ z = h(y ), y = a }
{ z = h(a), y = a }
C1 ∪ {L1 , L2 }
Factorization:
(C1 ∪ {L1 })mgU (L1 , L2 )
Consider for example M = {r (x ) ∨ ¬p(x ), p(a), s(a)}
and the question M |= ∃x (s(x ) ∧ r (x ))?
Artificial Intelligence – Prof. Dr. Jürgen Dix (421/554)
5. Knowledge Engineering 7. Robinson’s Resolution
Question:
Why do we need factorization?
Answer:
Consider
{s(x1 )} ∪ {¬s(y1 )}
or variants of it.
Resolving this new clause with one in M only leads to
variants of the respective clause in M.
Artificial Intelligence – Prof. Dr. Jürgen Dix (422/554)
5. Knowledge Engineering 7. Robinson’s Resolution
Answer (continued):
can never be derived (using resolution only).
Question:
How do we axiomatize the Wumpus-world in FOL?
Idea:
In order to describe actions or their effects consistently we
consider the world as a sequence of situations (snapshots od
the world). Therefore we have to extend each predicate by an
additional argument.
We use the function symbol
as the term for the situation which emerges when the action
action is executed in the situation situation.
Actions: Turn_right, Turn_left, Foreward, Shoot, Grab,
Release, Climb.
PIT
Gold PIT
PIT
PIT
Gold PIT
S3
PIT
PIT
Gold PIT
Forward
S20
PIT
PIT
Gold PIT
Turn (Right)
S1
PIT
Forward
S0
Location_toward ([x , y ], 0) = [x + 1, y ]
Location_toward ([x , y ], 90) = [x , y + 1]
Location_toward ([x , y ], 180) = [x − 1, y ]
Location_toward ([x , y ], 270) = [x , y − 1]
Orient .(Agent , s0 ) = 90
Orient .(p, result (a, s)) = d ↔
((a = turn_right ∧ = mod (Orient .(p, s) − 90, 360))
∨(a = turn_left ∧ d = mod (Orient .(p, s) + 90, 360))
∨(Orient .(p, s) = d ∧ ¬(a = Turn_right ∨ a = Turn_left ))
The goal is not only to find the gold but also to return safely. We
need the additional axioms Holding (Gold , s) → Go_back (s)
etc.
Knowledge:
1
Jack owns a dog.
2
Dog owners are animal lovers.
3
No animal lover kills an animal.
4
Either Jack or Bill killed a cat named Tuna.
Question:
Who killed the cat?
Question:
How do we represent the proposition “every animal
has a brain”?
What about ∀x animal (x ) → has_brain(x )?
Or using a unary function symbol brain_of (·)? E.g.
∀x animal (x ) → ∃b b = brain_of (x )
Question:
Can Modus Ponens be simulated using resolution?
Answer:
No! (Why?)
Question:
Can resolution be simulated using Modus Ponens?
Only because there are no axioms?
Answer:
Yes! (Why?)
Question:
Why is the resolution calculus not complete?
Answer:
No, not only because of the missing axioms. Even
axioms do not help because resolution rules only
operate on clauses. And we cannot derive (in the
calculus!) clauses from the implications (axioms).
This works only with Modus Ponens.
6.2 Equality
Answer:
We can introduce formulae Ψn so that in all
structures A with universe A the following holds:
Question:
. .
Can = be simulated in “PL1 without =”?
Answer:
.
We could try to axiomatize = (like the Wumpus world). We
.
consider PL1 without = with an additional binary eq and
demand the following axioms:
∀x eq (x , x ),
for each function symbol f with adequate arity:
∀x1 . . . xn , y1 . . . yn (eq (x1 , y1 ) . . . eq (xn , yn ))
→ eq (f (x1 , . . . , xn ), f (y1 , . . . , yn )),
for each predicate symbol P with adequate arity:
∀x1 . . . xn y1 . . . yn (eq (x1 , y1 ) . . . eq (xn , yn ))
→ (P (x1 , . . . , xn ) → P (y1 , . . . , yn )).
Important:
This set of axioms depends on the underlying
language L , we call it EQL .
Question:
We consider the following equivalence with φ being a
. .
formula in PL1 with = (φ[= /eq ] is obtained from φ by
.
replacing each appearing = with eq ):
. .
T |==. φ iff T [= /eq ] ∪ EQL |= φ[= /eq ] ?
Does it hold?
Answer:
Unfortunately not, because:
1
the axioms EQL leave open the possibility, that
the interpretation of eq in a structure A consists
of A × A
2
models may contain elements which can not be
represented by terms of the considered language
– we can make only limited statements about
these elements.
It can be shown that we are not able to completely
axiomatize the identity even with additional axioms.
6.3 (Un-)Decidability
∀x ∀y ∃!z ⊕(x , y , z )
∀x ⊕(x , 0, x )
∀x ∀y ∀z ⊕(x , y , z ) → ⊕(x , s(y ), s(z ))
∀x ∀y ∃!z ⊗(x , y , z )
∀x ⊗(x , 0, 0)
∀x ∀y ∀z ∃z ⊗(x , y , z ) → (⊗(x , s(y ), z 0 ) ∧ ⊕(z , x , z 0 ))
0
Obviously N := (IN, 0N , sN , ⊕N , ⊗N , eq N ) is a
model of all these axioms: With 0N being the “right”
0, sN is the successor function, ⊕N is the addition
an ⊗N is the multiplication of natural numbers (each
considered as relations). eq N the identity. The
following holds:
The set {φ : PAfin |= φ} is recursively
enumerable but not recursive (decidable) .
The set {φ : N |= φ} is not even recursively
enumerable .
M ` ∃ϕ.
¬s(x ) ∨ ¬r (x ) s(a)
r (x ) ∨ ¬p(x )
¬r (a)
p(a)
¬p(a)
Linear resolution: in each resolution step one of the two
parent clauses must either be from the input or
must be a successor of the other parent clause.
Idea:
Maybe input resolution is complete for a restricted
class of formulae.
← A1 (t1 ), . . . , An (tn ).
Question:
Is SLD completely instantiated?
Note:
PROLOG always uses the leftmost atom.
← p(x , b)
@
1 @ 2
@@
← q (x , y ), p(y , b)
[x /b]
“Success”
3
← p(b, b)
@
1 @ 2
@
@
← q (b, u ), p(u , b)
[x /a]
“Failure”
“Success”
← p(x , b)
@
1 2
@
@
@
← q (x , y ), p(y , b)
[x /b]
@ “Success”
1 2
@
@
@
← q (x , y ), q (y , u ), p(u , b) ← q (x , b)
@
1 2 3
@
@
@
← q (x , y ), q (y , u ), q (u , v ), p(v , b) ← q (x , y ), q (y , b)
[x /a]
@ 3 “Success”
1
@ 2 ← q (x , a)
@
@ “Failure”
@
@
. .
. .
Artificial .Intelligence – Prof. Dr. Jürgen Dix . (471/554)
6. More about Logic 5. SLD resolution
Question:
How to find successful branches in a SLD tree?
Note:
PROLOG uses depth-first-search.
if ϕ =def ∃ X k ψ, then
true, if there is a R k ⊆ UA × · · · × UA with A[X k /R k ] (ψ) = true,
A (ϕ) =def
false, else.
We can quantify over arbitrary n-ary relations, not just
about elements (like in first order logic).
Question
What do the following two 2nd order sentences mean?
.
∀x ∀y (x = y ⇐⇒ ∀X (X (x ) ⇐⇒ X (y ))),
∀X ( (∀x ∃!yX (x , y ) ∧ ∀x ∀y ∀z
.
( (X (x , z ) ∧ X (y , z )) → x = y ))
→ ∀x ∃yX (y , x ) )
Answer:
The first sentence shows that equality can be defined in 2nd OL
(in contrast to FOL).
The second sentence holds in a structure iff it is finite. Note that
this cannot be expressed in PL1.
7. Planning
Motivation:
problem-solving agent: The effects of a static
sequence of actions are determined.
knowledge-based agent: Actions can be chosen.
Question:
How does a problem-solving agent handle this?
Talk to Parrot
Go To School Go To Class
∀s Result 0 ([], s) := s
∀a, p, s Result ([a|p], s) := Result 0 (p, result (a, s))
0
Problems:
T ` φ is only semi-decidable.
If p is a plan then also [Empty _Action|p] and
[A, A−1 |p].
Result:
We should not resort to a general prover, but to one
which is specially designed for our special case. We
also should restrict the language.
7.2 STRIPS
Go(there)
At(there), L At(here)
+ +
! # % ! # % , / , /
! 3 % ! ! 3 % !
% ,
; % % ? C E
= A
P RECOND :
+
E C E ! # % , / C ! 3 % ! E
A A
E FFECT: I
A
E
A
C
; % J , % ? C E
A
P RECOND :
+
C C E ! # % , / C ! 3 % ! E
A A
E FFECT:
A
E
I
A
C
; % , M C N ! % P %
P RECOND :
+
C N ! % P , / C ! 3 % ! N ! % P ! 3 % ! %
E FFECT: I
C N ! % P
C %
!
$
,
+ $ 0 $ 2 !
/
P RECOND :
!
E FFECT:
#
! $ 7
3
+ $
/
0 $ 2
,
P RECOND :
E FFECT:
#
$ 7
3
,
8
+ $
9
P RECOND :
#
$ 7
3
E FFECT:
#
$ 7
3
+ $
<
2
9
2 = >
,
P RECOND :
E FFECT:
#
$ 7 !
3 3 3
$ 7
3
3
At(P1 , A)
At(P2 , B) Fly(P1 ,A,B)
At(P1 , B)
(b)
At(P2 , B)
At(P1 , B) Fly(P2 ,A,B)
At(P2 , A)
- - -
* * *
$
& $ 7 $ 9 <
4 = >
P RECOND :
<
=
*
- <
*
-
>
"
$ & ' <
<
A
@
=
<
A
@
>
=
A
@
>
,
E FFECT:
<
>
*
-
=
C
<
=
C *
-
>
& $ 7 $ 9 $ <
4 =
P RECOND :
<
=
*
- <
"
$ & ' <
<
A
@
=
,
E FFECT:
< - <
* = C =
Go(there)
At(there), L At(here)
Idea:
Perform a search in plan-space!
Question:
How do we represent plans?
Answer:
We have to consider two things:
instantiation of variables: instantiate only if
necessary, i.e. always choose the mgU
partial order: refrain from the exact ordering
(reduces the search-space)
Start Start
Initial State
Finish Finish
(a) (b)
LeftSockOn RightSockOn
Right Left Right Left Left Right
Left Right
Shoe Shoe Shoe Shoe Sock Sock
Shoe Shoe
Question:
What is a solution?
Answer:
Considering only fully instantiated, linearly ordered plans:
checking is easy.
More precisely:
“Si achieves c for Sj ” means
c ∈ Precondition(Sj ),
Si ≺ Sj ,
6 ∃Sk : ¬c ∈ Effect (Sk ) with Si ≺ Sk ≺ Sj in any linearization of
the plan
“no contradictions” means
neither (Si ≺ Sj and Sj ≺ Si ) nor (v = A and v = B for different
constants A, B).
Note: these propositions may be derivable, because
≺ and = are transitive.
Start
Finish
Start
At(x) At(x)
Go(HWS) Go(SM)
Finish
Start
At(Home) At(Home)
Go(HWS) Go(SM)
Finish
L
S1 S1 S1
S3
c
L
c c c
S2 S2 S2
S3
c
L
(a) (b) (c)
Question:
We have to introduce a Go-step in order to ensure the last
precondition. But how can we ensure the precondition of the
Go-step?
At(Home) At(HWS)
Go(HWS) Go(SM)
Finish
Start
At(Home)
Go(HWS)
At(HWS) Sells(HWS,Drill)
Buy(Drill)
At(HWS)
Go(SM)
Buy(Milk) Buy(Ban.)
At(SM)
Go(Home)
Finish
choose a step S add from operators or STEPS( plan) that has c as an effect
if there is no such step then fail
;!
add the causal link S add c Sneed to LINKS( plan)
add the ordering constraint S add Sneed to ORDERINGS( plan)
if Sadd is a newly added step from operators then
add Sadd to STEPS( plan)
add Start S add Finish to ORDERINGS( plan)
procedure RESOLVE-THREATS(plan)
;!
for each S threat that threatens a link Si c Sj in LINKS( plan) do
choose either
Promotion: Add Sthreat Si to ORDERINGS( plan)
Demotion: Add Sj Sthreat to ORDERINGS( plan)
if not CONSISTENT( plan) then fail
end
We call such a threat possible and ignore it for the time being,
but keep it in mind. But if x is instantiated with home then a real
threat emerges which has to be resolved.
S1
S3
At(home) ¬ At(x)
S2
choose a step S add from operators or STEPS( plan) that has c add as an effect
such that u = UNIFY(c, c add , BINDINGS( plan))
if there is no such step
then fail
add u to BINDINGS( plan)
;!
add Sadd c Sneed to LINKS( plan)
add Sadd Sneed to ORDERINGS( plan)
if Sadd is a newly added step from operators then
add Sadd to STEPS( plan)
add Start S add
Finish to ORDERINGS( plan)
procedure RESOLVE-THREATS(plan)
;!
for each S i c Sj in LINKS( plan) do
for each S threat in STEPS( plan) do
for each c 0 in EFFECT(Sthreat ) do
if SUBST(BINDINGS( plan), c) = SUBST(BINDINGS( plan), : c ) then
0
choose either
Promotion: Add Sthreat Si to ORDERINGS( plan)
Demotion: Add Sj Sthreat to ORDERINGS( plan)
if not CONSISTENT( plan)
then fail
end
end
end
Ls4
Room 4 Door 4
Ls3
Room 3 Door 3
Shakey
Corridor
Ls2
Room 2 Door 2
Box3 Ls1
Box2
Room 1 Door 1
Box4 Box1
At(Spare,Trunk) At(Spare,Ground)
Start PutOn(Spare,Axle) At(Spare,Axle) Finish
At(Flat,Axle) At(Flat,Axle)
At(Spare,Trunk) Remove(Spare,Trunk)
At(Spare,Trunk) At(Spare,Ground)
Start PutOn(Spare,Axle) At(Spare,Axle) Finish
At(Flat,Axle) At(Flat,Axle)
At(Flat,Axle)
At(Flat,Ground)
LeaveOvernight At(Spare,Axle)
At(Spare,Ground)
At(Spare,Trunk)
At(Spare,Trunk) Remove(Spare,Trunk)
At(Spare,Trunk) At(Spare,Ground)
Start PutOn(Spare,Axle) At(Spare,Axle) Finish
At(Flat,Axle) At(Flat,Axle)
At(Flat,Axle) Remove(Flat,Axle)
Question:
What to do if the world is not fully accessible? Where
to get milk? Milk-price has been redoubled and we do
not have enough money.
Idea:
Introduce new actions to query certain conditions and
to react accordingly. (How much does milk cost?)
A flat tire
Op( Action : Remove(x ),
Prec : On(x ),
Effect : Off (x ) ∧ Clear _Hub ∧ ¬On(x )
)
Op( Action : Put_on(x ),
Prec : Off (x ) ∧ Clear _Hub,
Effect : On(x ) ∧ ¬Clear _Hub ∧ ¬Off (x )
)
Op( Action : Inflate(x ),
Prec : Intact (x ) ∧ Flat (x ),
Effect : Inflated (x ) ∧ ¬Flat (x )
)
goal: On(x ) ∧ Inflated (x ),
initial state: Inflated (Spare) ∧ Intact (Spare) ∧ Off (Spare) ∧
On(Tire1 ) ∧ Flat (Tire1 ).
Artificial Intelligence – Prof. Dr. Jürgen Dix (535/554)
7. Planning 4. Conditional Planning
Question:
What does POP deliver?
Answer:
Because of Intact (Tire1 ) not being present POP
delivers the plan
[Remove(Tire1 ), Put_on(Spare)].
Question:
Would you also do it like that?
Answer:
A better way would be a conditional Plan:
On(Tire1)
On(x)
Start Flat(Tire1) Finish
Inflated(x)
Inflated(Spare) (True)
On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (True)
Flat(Tire1)
Inflate(Tire1)
Intact(Tire1)
(Intact(Tire1))
On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (Intact(Tire1))
Flat(Tire1)
1) Inflate(Tire1)
t(Tire Intact(Tire1)
Intac (Intact(Tire1))
Check(Tire1)
On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (Intact(Tire1))
Flat(Tire1)
1 ) Inflate(Tire1)
t(Tire Intact(Tire1)
Intac (Intact(Tire1))
Check(Tire1)
On(x)
Inflated(x) Finish
( Intact(Tire1))
L
Artificial Intelligence – Prof. Dr. Jürgen Dix (544/554)
7. Planning 4. Conditional Planning
Question:
What if new contexts emerge in the second case?
Answer:
Then we have to introduce new copies of the FINISH
step: there must be a complete distinction of cases in
the end.
Question:
How can we make Inflated (x ) true in the FINISH
step? Adding the step Inflate(Tire1 ) would not make
sense, because the preconditions are inconsistent in
combination with the context.
On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (Intact(Tire1))
Flat(Tire1)
1) Inflate(Tire1)
t(Tire Intact(Tire1)
Intac (Intact(Tire1))
Check(Tire1)
On(Spare)
Inflated(Spare) Finish
( Intact(Tire1))
L
Artificial Intelligence – Prof. Dr. Jürgen Dix (546/554)
7. Planning 4. Conditional Planning
At last On(Spare).
On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (Intact(Tire1))
Flat(Tire1)
1) Inflate(Tire1)
t(Tire Intact(Tire1)
Intac (Intact(Tire1))
Check(Tire1)
Intact(Tire1)
L
On(Spare)
Inflated(Spare) Finish
Remove(Tire1) Puton(Spare)
( Intact(Tire1))
L
( Intact(Tire1)) ( Intact(Tire1))
L
L
Attention:
At first “True” is located under
Remove(Tire1 ), Put_on(Spare). This leads to a
threat with the most upper link.
More exactly:
Search a conditional step which precondition makes
the contexts incompatible and thereby resolves
threats.
: _ _
if the plans for existing finish steps are complete and have contexts C 1 . . . Cn then
add a new finish step with a context (C 1 . . . Cn )
this becomes the current context
Subgoal selection and addition:
find a plan step S need with an open precondition c
Action selection:
choose a step S add from operators or STEPS( plan) that adds c or
knowledge of c and has a context compatible with the current context
if there is no such step
then fail
;!
add Sadd c Sneed to LINKS( plan)
add Sadd < Sneed to ORDERINGS( plan)
if Sadd is a newly added step then
add Sadd to STEPS( plan)
add Start < Sadd < Finish to ORDERINGS( plan)
Threat resolution:
;!
for each step S threat that potentially threatens any causal link S i c Sj
with a compatible context do
choose one of
Promotion: Add Sthreat < Si to ORDERINGS( plan)
Demotion: Add Sj < Sthreat to ORDERINGS( plan)
Conditioning:
find a conditional step S cond possibly before both S threat and Sj , where
1. the context of Scond is compatible with the contexts of Sthreat and Sj ;
2. the step has outcomes consistent with S threat and Sj , respectively
add conditioning links for the outcomes from S cond to Sthreat and Sj
augment and propagate the contexts of S threat and Sj
if no choice is consistent
then fail
end
end
7.5 Extensions