Documente Academic
Documente Profesional
Documente Cultură
LEARNING IN PYTHON
Introduction
APPLIED MACHINE
LEARNING IN PYTHON
Speech Recognition
However, not all problems lend themselves to being solved
effectively by writing a handcrafted program or set of rules.
Speech
"How do I get
recognition
to Ann
rules
Arbor?
APPLIED MACHINE
LEARNING IN PYTHON
Speech Recognition
• Given how a subtle and complex human speech is with the huge variety of different
pronunciations, vocabulary, accents, and so forth.
• Writing a set of program rules by hand that could recognize portions of an audio
signal and decide which words were in the signal and so forth would be a
gargantuan task.
• And even then, it would still likely be inflexible and not very robust at recognizing
different types of speech.
• User feedback
(Clicks on a search page)
• Surrounding environment
(self-driving cars)
APPLIED MACHINE
LEARNING IN PYTHON
$$$ • Time
• Location Fraud rules Notification
Credit card
transaction • Amount
User
history
APPLIED MACHINE
LEARNING IN PYTHON
"How do I
Feature
Preprocessing Decoder get to Ann
extraction
Arbor?"
Lexicon
APPLIED MACHINE
LEARNING IN PYTHON
O'Reilly Media
APPLIED MACHINE
LEARNING IN PYTHON
𝑥2 𝑦 f f
Lemon 2
"cat"
"cat"
"house"
"cat"
Clicking and reading the "Mackinac Island" result can be
an implicit label for the search engine to learn that
"dog" "Mackinac Island" is especially relevant for the query
[vacations in michigan] for that specific user.
Time on site
Power users
• Detecting abnormal server
access patterns
(unsupervised outlier
detection) Quick browsers
Server accesses
Pages accessed
Categorize users (to offer them
For example, to
detect a cyber discounts) based on their usage of an
attack e-commerce site.
Time
APPLIED MACHINE
LEARNING IN PYTHON
Choose:
Choose: Choose: • How to search for the
• A feature representation • What criterion settings/parameters that
• Type of classifier to use distinguishes good vs. bad give the best classifier for
classifiers? this evaluation criterion
e.g. image pixels, with
k-nearest neighbor classifier e.g. % correct predictions on test set
e.g. try a range of values for "k" parameter
in k-nearest neighbor classifier
APPLIED MACHINE
LEARNING IN PYTHON
Feature Representations
Feature Count Feature representation
to
To: Chris Brooks From: Daniel Romero 1
Subject: Next course offering Hi chris 2
Email Daniel,
Could you please send the outline for
brooks 1 A list of words with
the next course offering? Thanks! --
from 1 their frequency counts
daniel 2
Chris romero 1
the ...
2
Feature Value
DorsalFin Yes
Sea Creatures MainColor
Orange A set of attribute values
Stripes Yes
StripeColor1 White
StripeColor2 Black
Length 4.3 cm
APPLIED MACHINE
LEARNING IN PYTHON
Evaluation
APPLIED MACHINE
LEARNING IN PYTHON
Feature and
model Evaluation
refinement
APPLIED MACHINE
LEARNING IN PYTHON
lemon
height
apple
orange
Credit: Original version of the fruit dataset created by Dr. Iain Murray, Univ. of Edinburgh
APPLIED MACHINE
LEARNING IN PYTHON
• Well, we could choose a fruit sample, called a test sample, for which we already
had a label.
• So we could feed the features of that piece of fruit into the classifier, and then
compare the label that the classifier predicts with the actual true label of the fruit
type.
• So, here's a very important point though, if we use one of our labeled fruit
examples in the data that we use to train the classifier, we can't also use that
same fruit sample later as a test sample to also evaluate the classifier.
APPLIED MACHINE
LEARNING IN PYTHON
• Well, a key ability that our classifier needs to have is that it needs to work
• Well on any input sample, any new pieces of fruit that we might see in the
future not just on the ones that we have on our training set.
APPLIED MACHINE
LEARNING IN PYTHON
…
X_train, X_test, y_train,
y_test
= train_test_split(X, y)
• For example, you might discover that the data set you got has a single
column with person's name that still needs to be split into two separate first
and last name columns.
• For example, if you're using the name as one of the prediction feature, that
might be important.
• Let's say you're doing a health application with a patient record for each row.
APPLIED MACHINE
LEARNING IN PYTHON
• So inspecting and visualizing the data will help you detect and understand
these potential source of noise or errors.
APPLIED MACHINE
LEARNING IN PYTHON
A pairwise feature scatterplot visualizes the data using all possible pairs of features,
with one scatterplot per feature pair, and histograms for each feature along the
diagonal
.
height
apple
mandarin
width
orange
lemon
mass
apple
color_score
Individual scatterplot plotting all fruits by
their height and color_score.
Colors represent different fruit classes.
height width mass color_score
APPLIED MACHINE
LEARNING IN PYTHON
APPLIED MACHINE
LEARNING IN PYTHON
APPLIED MACHINE
LEARNING IN PYTHON
APPLIED MACHINE
LEARNING IN PYTHON
apple
mandarin
lemon orange
APPLIED MACHINE
LEARNING IN PYTHON
apple
mandarin
lemon orange
APPLIED MACHINE
LEARNING IN PYTHON
• What this means is that instance based learning methods work by memorizing
the labeled examples that they see in the training set.
• And then they use those memorized examples to classify new objects later.
• The k in k-NN refers to the number of nearest neighbors the classifier will
retrieve and use in order to make its prediction
APPLIED MACHINE
LEARNING IN PYTHON
1. Find the most similar instances (let's call them X_NN) to x_test
that are in X_train.
2. Get the labels y_NN for the instances in X_NN
3. Predict the label for x_test by combining the labels y_NN
e.g. simple majority vote
APPLIED MACHINE
LEARNING IN PYTHON
Fruit dataset
Decision boundaries
with k = 1
APPLIED MACHINE
LEARNING IN PYTHON
1. A distance metric
2. How many 'nearest' neighbors to look at?
3. Optional weighting function on the neighbor points
4. Method for aggregating the classes of neighbor points
APPLIED MACHINE
LEARNING IN PYTHON
Fruit dataset
with 75%/25%
train-test split