Sunteți pe pagina 1din 26

Introduction to Deep

Learning
Week 01 Session 01

Sunday, 11 November 2018 Deep Learning TMP


Topics
Course introduction
• Pre-requisite skill-sets
• Software & hardware requirements
Binary Classification
Logistic Regression
Logistic Regression Cost Function

Sunday, 11 November Deep Learning TMP


2018
Course Introduction (1)
• cis.del.ac.id:
• Silabus
• Jadwal mingguan: materi kuliah
• Pengumuman
• Buku pegangan utama:
• Goodfellow, I., Bengio, Y., Courville, A. and Bengio, Y., 2016. Deep learning (Vol. 1).
Cambridge: MIT press.
• Beberapa himbauan:
• Aktif bertanya atau memberikan pendapat di kelas
• Sudah berada di kelas sebelum kelas dimulai
• Bersikap jujur dalam mengerjakan tugas maupun ujian
• Mempersiapkan diri dengan baik sebelum kelas dimulai: membaca materi di cis, membaca
sumber-sumber lain terkait topik yang akan diberikan
• Tepat waktu dalam pengumpulan tugas paper, TELAT = 0

Sunday, 11 November Deep Learning TMP


2018
Pre-requisite skill-sets
• Linear algebra
• Scalars, Vectors, Matrices and Tensors
• Matrices & Vectors multiplication
• Identity and Inverse Matrices
• Eigen-decomposition
• Singular Value Decomposition
• Probability and Information Theory
• Random Variables
• Probability Distributions
• Conditional Probability
• Expectation, Variance and Covariance
• Bayes’ Rule

Sunday, 11 November Deep Learning TMP


2018
Software & Hardware
• Software
• Python 3.6
• Tensorflow or Keras
• Pandas
• Scipy
• Matplotlib, etc
• Hardware
• Memory min. 8 GB
• NVIDIA GeForce GTX 980 (optional but recommended)
• Dedicated server with NVIDIA GeForce GTX 1060 will be announced once it’s
ready for having FUN 

Sunday, 11 November Deep Learning TMP


2018
Binary Classification

1 (cat) or 0 (non cat)

64 1
y= 0

64

64

64

Sunday, 11 November Deep Learning TMP


2018
Binary Classification

1 (cat) or 0 (non cat)

64 1
y= 0
35
64 19
… 64 x 64 x 3 = 12288
9 (input features)
64 x= 7

4
6
64 …

Sunday, 11 November Deep Learning TMP


2018
Binary Classification

1 (cat) or 0 (non cat)

64 1
y= 𝑛 = 𝑛𝑥 = 12288
0
35
64 19
… 64 x 64 x 3 = 12288
9 (input feature /
feature vector)
64 x= 7

4
x y
??
6
64 …

Sunday, 11 November Deep Learning TMP


2018
Binary Classification – Basic Notation
𝑥, 𝑦 ⇒ 𝑥 ∈ ℝ𝑛𝑥 , 𝑦 ∈ 0, 1
e.g. ‘m’ training examples: 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 (𝑚) , 𝑦 (𝑚)

| | | |
X = 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚) 𝑛𝑥
| | | |
m

Sunday, 11 November Deep Learning TMP


2018
Binary Classification – Basic Notation
𝑥, 𝑦 ⇒ 𝑥 ∈ ℝ𝑛𝑥 , 𝑦 ∈ 0, 1
e.g. ‘m’ training examples: 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 𝑚 , 𝑦 𝑚

| | | |
X = 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚) 𝑛
𝑥
| | | |
m
𝑥 ∈ ℝ𝑛𝑥 ×𝑚 ⇒ in python X. shape = (𝑛𝑥 , 𝑚)

Sunday, 11 November Deep Learning TMP


2018
Binary Classification – Basic Notation
𝑥, 𝑦 ⇒ 𝑥 ∈ ℝ𝑛𝑥 , 𝑦 ∈ 0, 1
e.g. ‘m’ training examples: 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 𝑚 , 𝑦 𝑚

| | | |
X = 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚) 𝑛 Y = 𝑦 (1) 𝑦 (2) … 𝑦 (𝑚)
𝑥
| | | | Y ∈ ℝ1×𝑚 ⇒ Y. shape = (1, m)

m
𝑥 ∈ ℝ𝑛𝑥 ×𝑚 ⇒ in python X. shape = (𝑛𝑥 , 𝑚)

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression (1)
• Given 𝑋 ⇒ wants to predict 𝑦ො = 𝑃 𝑦 = 1 𝑥)
• 𝑥 ∈ ℝ𝑛𝑥 , while in logistic regression parameters will be: 𝑤 ∈ ℝ𝑛𝑥 , 𝑏 ∈

• Output: 𝑦ො = 𝑤 𝑇 𝑥 + 𝑏

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression (2)
• Given 𝑋 ⇒ wants to predict 𝑦ො = 𝑃 𝑦 = 1 𝑥)
• 𝑥 ∈ ℝ𝑛𝑥 , while in logistic regression parameters will be: 𝑤 ∈ ℝ𝑛𝑥 , 𝑏 ∈

• Output: 𝑦ො = 𝑤 𝑇 𝑥 + 𝑏
This doesn’t work!
Why ???

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression (3)
• Given 𝑋 ⇒ wants to predict 𝑦ො = 𝑃 𝑦 = 1 𝑥)
• 𝑥 ∈ ℝ𝑛𝑥 , while in logistic regression parameters will be: 𝑤 ∈ ℝ𝑛𝑥 , 𝑏 ∈

• Output: 𝑦ො = 𝑤 𝑇 𝑥 + 𝑏
This doesn’t work!
Why ???

This is linear regression function

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression (4)
• Since we need to classify whether cat or non-cat (1 or 0), it means
that: 𝑦ො = 𝑃 𝑦 = 1 𝑥) ⇒ 0 ≤ 𝑦ො ≤ 1, while linear regression function
may result 𝒚ෝ > 𝟏 𝒐𝒓 ෝ𝒚 < 𝟎.
• So, how do we solve that problem?

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression (5)
• Output: 𝑦ො = 𝜎(𝑤 𝑇 𝑥 + 𝑏)
z
𝜎(𝑧) 1
1 1. If z large then 𝜎 𝑧 ≈ 1+0 = 1
𝜎 𝑧 =
1 + 𝑒 −𝑧 2. If z large negative number, then
1
𝜎 𝑧 ≈ 1+𝑏𝑖𝑔 𝑛𝑢𝑚𝑏𝑒𝑟 = 0
z

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (1)
1
• 𝑦ො = 𝜎(𝑤 𝑇 𝑥 + 𝑏), where 𝜎 𝑧 =
1+𝑒 −𝑧
• Given 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 (𝑚) , 𝑦 (𝑚) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (2)
1
• 𝑦ො = 𝜎(𝑤 𝑇 𝑥 + 𝑏), where 𝜎 𝑧 =
1+𝑒 −𝑧
• Given 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 (𝑚) , 𝑦 (𝑚) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖

Whether the output 𝑦ො (𝑖) roughly


equals to the ground truth of 𝑦 𝑖

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (3)
(𝑖) 𝑇 (𝑖) (𝑖) 1
• 𝑦ො = 𝜎(𝑤 𝑥 + 𝑏), where 𝜎 𝑧 = (𝑖)
1+𝑒 −𝑧
• To obtain 𝑧 (𝑖) ⇒ 𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
• Given 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 (𝑚) , 𝑦 (𝑚) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖

Whether the output 𝑦ො (𝑖) roughly


equals to the ground truth of 𝑦 𝑖

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (4)
(𝑖) 𝑇 (𝑖) (𝑖) 1
• 𝑦ො = 𝜎(𝑤 𝑥 + 𝑏), where 𝜎 𝑧 = (𝑖)
1+𝑒 −𝑧
• To obtain 𝑧 (𝑖) ⇒ 𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
• Given 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 (𝑚) , 𝑦 (𝑚) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖

Note: 𝒊 − 𝒕𝒉 example

Whether the output 𝑦ො (𝑖) roughly


equals to the ground truth of 𝑦 𝑖

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (5)
(𝑖) 𝑇 (𝑖) (𝑖) 1
• 𝑦ො = 𝜎(𝑤 𝑥 + 𝑏), where 𝜎 𝑧 = (𝑖)
1+𝑒 −𝑧
• To obtain 𝑧 (𝑖) ⇒ 𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
• Given 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 (𝑚) , 𝑦 (𝑚) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖
1 2
• Loss (error) function: ℒ 𝑦,
ො 𝑦 = 𝑦ො − 𝑦
2

Square Error

Whether the output 𝑦ො (𝑖) roughly


equals to the ground truth of 𝑦 𝑖

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (6)
(𝑖) 𝑇 (𝑖) (𝑖) 1
• 𝑦ො = 𝜎(𝑤 𝑥 + 𝑏), where 𝜎 𝑧 = (𝑖)
1+𝑒 −𝑧
• To obtain 𝑧 (𝑖) ⇒ 𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
• Given 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 (𝑚) , 𝑦 (𝑚) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖
1 2
• Loss (error) function: ℒ 𝑦,
ො 𝑦 = 𝑦ො − 𝑦
2
This doesn’t work!
Why??
Square Error

Whether the output 𝑦ො (𝑖) roughly


equals to the ground truth of 𝑦 𝑖

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (7)
(𝑖) 𝑇 (𝑖) (𝑖) 1
• 𝑦ො = 𝜎(𝑤 𝑥 + 𝑏), where 𝜎 𝑧 = (𝑖)
1+𝑒 −𝑧
• To obtain 𝑧 (𝑖) ⇒ 𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
• Given 𝑥 (1) , 𝑦 (1) , 𝑥 (2) , 𝑦 (2) , … , 𝑥 (𝑚) , 𝑦 (𝑚) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖
1 2
• Loss (error) function: ℒ 𝑦,
ො 𝑦 = 𝑦ො − 𝑦
2
This doesn’t work!
Gradient Descent may not find Why??
Global Optima
Square Error

Whether the output 𝑦ො (𝑖) roughly


equals to the ground truth of 𝑦 𝑖

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (8)
• So, how do we deal with that?
• ℒ 𝑦,
ො 𝑦 = − 𝑦 log 𝑦ො + (1 − 𝑦) log(1 − 𝑦)

• Remember! Minimise error as much as possible, but maximise 𝑦ො as
large as possible.

Sunday, 11 November Deep Learning TMP


2018
Logistic Regression Cost Function (9)
• E.g.
ℒ 𝑦,
ො 𝑦 = − 𝑦 log 𝑦ො + (1 − 𝑦) log(1 − 𝑦)

If 𝑦 = 1 ⇒ ℒ 𝑦,
ො 𝑦 = − log ෝ𝑦 ෝ 𝑚𝑢𝑡 𝑏𝑒 𝒍𝒂𝒓𝒈𝒆
⇒ 𝑤𝑎𝑛𝑡 log 𝑦ො 𝑙𝑎𝑟𝑔𝑒, 𝑚𝑒𝑎𝑛𝑠 𝑡ℎ𝑎𝑡 𝒚
If 𝑦 = 0 ⇒ ℒ 𝑦, ෝ 𝑚𝑢𝑡 𝑏𝑒 𝒔𝒎𝒂𝒍𝒍
ො 𝑦 = − log(1 − ෝ𝑦) ⇒ 𝑤𝑎𝑛𝑡 log 1 − 𝑦ො 𝑙𝑎𝑟𝑔𝑒, 𝑚𝑒𝑎𝑛𝑠 𝑡ℎ𝑎𝑡 𝒚

1 𝑚 1
∴ Cost Function: ℑ 𝑤, 𝑏 = σ ℒ 𝑦ො (𝑖) , 𝑦 (𝑖) = − 𝑚 σ𝑚
𝑖=1 𝑦
(𝑖) log 𝑦
ො (𝑖) + 1 − 𝑦 (𝑖) log 1 − 𝑦ො (𝑖)
𝑚 𝑖=1

Sunday, 11 November Deep Learning TMP


2018
To be continued
• Next Tuesday, 13 Feb 2018 …

Sunday, 11 November Deep Learning TMP


2018