Documente Academic
Documente Profesional
Documente Cultură
Amazon SageMaker
Pedro Paez
Specialist Solutions Architect, AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pre-configured environments to quickly build deep learning
applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS is framework agnostic
Choose from popular frameworks
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep learning on Amazon SageMaker
• Built-in support for TensorFlow, PyTorch, Chainer, MXNet
• Fully customizable Bring Your Own (BYO) container option
• Distributed GPU training
• Code portability
• Experiment tracking
• One-click training and one-click deployment
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Client application
You provide
1. Your data in
Amazon S3
2. A base docker
image, Amazon
SageMaker built-
in or BYO
3. Your source code
in Amazon S3 or
local file system
4. The Amazon EC2
instance type and
Training code
count
Model Training (on EC2)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Client application
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Client application
captured in Amazon
CloudWatch
Captures training
infrastructure &
Training data
performance metrics in
Training code
Amazon CloudWatch
Training code Helper code
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Client application
Inference code
Model artifacts
Training data
Training code
Training code Helper code
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Client application
Amazon SageMaker
Amazon SageMaker
Amazon ECR
Launches docker
image in the Amazon
EC2 instance type
Downloads model
from Amazon S3 to
container host
Inference code Helper code Inference code
Automatically
scales instances
Model artifacts
based on scaling
policy
Training data
Training code
Training code Helper code
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Client application
such as latency,
#invocations
Logs / print results
captured in Amazon
CloudWatch
Inference code Helper code Inference code
Model artifacts
Training code
Training code Helper code
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TensorFlow
30m 14m
Fastest time
for TensorFlow
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hyperparameters and environment variables
main_train.py
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hyperparameters and environment variables
Submit your training job
From sagemaker.tensorflow import TensorFlow
from time import gmtime, strftime
s3_model_path = "s3://{}/models".format(sagemaker_session.default_bucket())
abalone_estimator = TensorFlow(entry_point='main_train.py',
source_dir="./source",
role=role,
py_version="py3",
framework_version = "1.11.0",
hyperparameters={'traindata' : 'abalone_train.csv',
'validationdata' : 'abalone_test.csv',
'epochs': 10,
'batch-size': 32},
model_dir = s3_model_path,
metric_definitions = [{"Name": "mean_squared_error",
"Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_error",
"Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_percentage_error",
main_train.py "Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
],
train_instance_count=1,
train_instance_type='ml.c4.xlarge')
abalone_estimator = TensorFlow(entry_point='main_train.py',
source_dir="./source",
role=role,
py_version="py3",
framework_version = "1.11.0",
hyperparameters={'traindata' : 'abalone_train.csv',
'validationdata' : 'abalone_test.csv',
'epochs': 10,
'batch-size': 32},
model_dir = s3_model_path,
metric_definitions = [{"Name": "mean_squared_error",
"Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_error",
"Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_percentage_error",
main_train.py "Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
],
train_instance_count=1,
train_instance_type='ml.c4.xlarge')
uploaded
abalone_estimator to Amazon S3 and available
= TensorFlow(entry_point='main_train.py',
source_dir="./source",
role=role, during inference.
py_version="py3",
framework_version = "1.11.0",
hyperparameters={'traindata' : 'abalone_train.csv',
'validationdata' : 'abalone_test.csv',
'epochs': 10,
'batch-size': 32},
model_dir = s3_model_path,
metric_definitions = [{"Name": "mean_squared_error",
"Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_error",
"Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_percentage_error",
main_train.py "Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
],
train_instance_count=1,
train_instance_type='ml.c4.xlarge')
s3_model_path = "s3://{}/models".format(sagemaker_session.default_bucket())
abalone_estimator = TensorFlow(entry_point='main_train.py',
source_dir="./source",
role=role,
py_version="py3",
framework_version = "1.11.0",
4) Specify the instance type
hyperparameters={'traindata' : 'abalone_train.csv',
'validationdata' : 'abalone_test.csv', for your training. Increase
'epochs': 10,
'batch-size': 32}, the instance count if your
model_dir = s3_model_path,
code supports distributed
metric_definitions = [{"Name": "mean_squared_error",
training.
"Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_error",
"Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_percentage_error",
main_train.py "Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
],
train_instance_count=1,
train_instance_type='ml.c4.xlarge')
TensorFlow framework
abalone_estimator = TensorFlow(entry_point='main_train.py',
source_dir="./source",
role=role,
version (1.11.0).
py_version="py3",
framework_version = "1.11.0",
hyperparameters={'traindata' : 'abalone_train.csv',
'validationdata' : 'abalone_test.csv',
'epochs': 10,
'batch-size': 32},
model_dir = s3_model_path,
metric_definitions = [{"Name": "mean_squared_error",
"Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_error",
"Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_percentage_error",
main_train.py "Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
],
train_instance_count=1,
train_instance_type='ml.c4.xlarge')
s3_model_path = "s3://{}/models".format(sagemaker_session.default_bucket())
abalone_estimator = TensorFlow(entry_point='main_train.py',
source_dir="./source",
role=role,
py_version="py3",
framework_version = "1.11.0",
hyperparameters={'traindata' : 'abalone_train.csv',
'validationdata' : 'abalone_test.csv',
'epochs': 10,
'batch-size': 32},
model_dir = s3_model_path,
6) Map Amazon S3 prefix to
metric_definitions = [{"Name": "mean_squared_error",
local download directory
"Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_error",
The key name, e.g., ‘train’,
"Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_percentage_error",
main_train.py matches environment variable
"Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
],
train_instance_count=1, SM_CHANNEL _TRAIN suffix
train_instance_type='ml.c4.xlarge')
TRAIN, for train data directory.
abalone_estimator.fit( {'train': s3_input_prefix,
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
'validation':s3_input_prefix},
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. job_name="ablone-age-py3-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime())))
Model metrics
main_train.py
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Model metrics
Submit your training job
from sagemaker.tensorflow import TensorFlow
from time import gmtime, strftime
s3_model_path = "s3://{}/models".format(sagemaker_session.default_bucket())
abalone_estimator = TensorFlow(entry_point='main_train.py',
source_dir="./source",
role=role, 7) Model metrics regex to
py_version="py3",
framework_version = "1.11.0",
trace and visualize your
model performance.
hyperparameters={'traindata' : 'abalone_train.csv',
'validationdata' : 'abalone_test.csv',
'epochs': 10,
'batch-size': 32},
model_dir = s3_model_path,
metric_definitions = [{"Name": "mean_squared_error",
"Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_error",
"Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_percentage_error",
"Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
],
train_instance_count=1,
main_train.py
train_instance_type='ml.c4.xlarge')
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
PyTorch training with Amazon SageMaker
main_train.py
SM_CHANNEL _TRAIN suffix
,{"Name": "mean_absolute_percentage_error",
"Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
], TRAIN, for train data directory
train_instance_count=1,
train_instance_type='ml.c4.xlarge')
[[.34,5.6],[4,56]…]
# Deploy my estimator to an
Amazon SageMaker endpoint
and get a Predictor predictor =
Note: Your model can be
deployed on a CPU or GPU
# E.g., m4.xlarge is a CPU
instance type you choose instance type or if you use
when you deploy the ml.p3.2xlarge then it is a GPU
endpoint. This code supports instance type
both.
abalone_estimator.deploy(
instance_type='ml.m4.xlarge',
initial_instance_count=1)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache MXNet
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MXNet training with Amazon SageMaker
Specify source code
The entry point file and
Submit your training job source code dir
from sagemaker.mxnet import MXNet
from time import gmtime, strftime
s3_model_path = "s3://{}/models".format(sagemaker_session.default_bucket())
abalone_estimator = MXNet(entry_point='main_train.py',
source_dir="./source",
role=role,
py_version="py3",
framework_version = "1.3.0"
hyperparameters={'traindata' : 'abalone_train.csv',
'validationdata' : 'abalone_test.csv',
'epochs': 10,
'batch-size': 32},
model_dir = s3_model_path,
metric_definitions = [{"Name": "mean_squared_error",
"Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_error",
"Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "mean_absolute_percentage_error",
"Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
main_train.py ],
train_instance_count=1,
train_instance_type='ml.c4.xlarge')
job_name="ablone-age-py3-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime())))
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet inference with Amazon SageMaker
# Deploy your MXNet
estimator to an Amazon
SageMaker endpoint
abalone_estimator.deploy
(
instance_type='ml.m4.xlar
ge',
initial_instance_count=1)
# Deserialize the Invoke request body into an object we can perform prediction on
input_object = input_fn(request_body, request_content_type, model)
# Serialize the prediction result into the desired response content type
ouput = output_fn(prediction, response_content_type)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn from AWS experts. Advance your skills and
knowledge. Build your future in the AWS Cloud.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why work with an APN Partner?
APN Partners are uniquely positioned APN Partners with deep expertise in
to help your organization at any AWS services:
stage of your cloud adoption journey, AWS Managed Service Provider (MSP)
and they:
Partners
• Share your goals—focused on your APN Partners with cloud infrastructure and
success application migration expertise
• Help you take full advantage of all the AWS Competency Partners
business benefits that AWS has to offer APN Partners with verified, vetted, and validated
specialized offerings
• Provide services and solutions to
support any AWS use case across your AWS Service Delivery Partners
full customer life cycle APN Partners with a track record of delivering
specific AWS services to customers
aws-apac-marketing@amazon.com
twitter.com/AWSCloud
facebook.com/AmazonWebServices
youtube.com/user/AmazonWebServices
slideshare.net/AmazonWebServices
twitch.tv/aws
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.