Documente Academic
Documente Profesional
Documente Cultură
MIT 6.S191
Alexander Amini
January 30, 2019
AlphaGo video
Data: (", $)
" is data, $ is label
Apple example:
Goal: Learn function to map Goal: Learn underlying Goal: Maximize future rewards
"→$ structure over many time steps
AGENT
AGENT ENVIRONMENT
Reward: feedback that measures the success or failure of the agent’s action.
'" = ) %*
*+"
6.S191 Introduction to Deep Learning
1/30/19
introtodeeplearning.com
Reinforcement Learning (RL): Key Concepts
OBSERVATIONS
State changes: !"#$
Reward: %"
Total reward, !" , is the discounted sum of all rewards obtained from time 0
+ ,, . = / !"
Ultimately, the agent needs a policy ) * , to infer the best action to take at its state, s
Strategy: the policy should choose an action that maximizes future reward
A B
A B
A B
A B
action, $
! ", $%
state, " Deep Deep ! ", $&
! ", $
NN NN
“move state, " ! ", $'
right”
action, $
! ", $%
state, " Deep Deep ! ", $&
! ", $
NN NN
“move state, " ! ", $'
right”
action, $
&
ℒ=* + + - max !(" 2 , $2 ) − ! ", $
12
6.S191 Introduction to Deep Learning
1/30/19
introtodeeplearning.com
Deep Q Networks (DQN): Training
How can we use deep neural networks to model Q-functions?
! ", $%
state, " Deep Deep ! ", $&
! ", $
NN NN
“move state, " ! ", $'
right”
action, $
target predicted
&
ℒ=* + + - max !(" 2 , $2 ) − ! ", $
12
6.S191 Introduction to Deep Learning
1/30/19
introtodeeplearning.com
Deep Q Networks (DQN): Training
How can we use deep neural networks to model Q-functions?
! ", $%
state, " Deep Deep ! ", $&
! ", $
NN NN
“move state, " ! ", $'
right”
action, $
target predicted
&
ℒ=* + + - max !(" 2 , $2 ) − ! ", $
12
6.S191 Introduction to Deep Learning
1/30/19
introtodeeplearning.com
DQN Atari Results
Surpass Below
human-level human-level
! ", $%
! ", $&
Deep
NN
! ", $'
state, "
! "# |%
! "& |%
Deep
NN
! "' |%
state, %
! "# |% ( ! "- |% = 1
)* ∈,
! "& |%
Deep
NN
! "' |%
state, %
! "# |% ( ! "- |% = 1
)* ∈,
! "& |%
Deep
! "|% = 0("23456|%3"37)
NN
! "' |%
state, %
! "# |% ( ! "- |% = 1
)* ∈,
! "& |%
Deep
! "|% = 0("23456|%3"37)
NN
! "' |%
state, %
Board Size %
Positions 3$ % Legal Legal Positions
nxn
1×1 3 33.33% 1
2×2 81 70.37% 57
3×3 19,683 64.40% 12,675
4×4 43,046,721 56.49% 24,318,165
5×5 847,288,609,443 48.90% 414,295,148,741
9×9 4.434264882×1038 23.44% 1.03919148791×1038
13×13 4.300233593×1080 8.66% 3.72497923077×1079
19×19 1.740896506×10172 1.20% 2.08168199382×10170