Documente Academic
Documente Profesional
Documente Cultură
2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
!2
Thursday, November 21, 13
Everything fails all the time
Werner Vogels, CTO, Amazon.com
!2
Thursday, November 21, 13
Avoid single points of failure.
!4
Thursday, November 21, 13
Elastic Load Balancing and Amazon Route 53 are
critical components when building scalable and
highly-available applications.
!5
Thursday, November 21, 13
Load Balancer Elastic Secure Integrated Cost-Effective
!6
Thursday, November 21, 13
Availability Zone 1a
Client
Availability Zone 1b
!7
Thursday, November 21, 13
3
Levels
of
Availability
!7
Thursday, November 21, 13
1
Instance
Availability
!8
Thursday, November 21, 13
1 2
Instance Zonal
Availability Availability
!8
Thursday, November 21, 13
1 2 3
Instance Zonal Regional
Availability Availability Availability
!8
Thursday, November 21, 13
1
Instance
Availability
!9
Thursday, November 21, 13
First step in increasing the availability
of a system or application.
!10
Thursday, November 21, 13
Load balancer used to
route incoming
requests to multiple
EC2 Instance
Client EC2 instances
[ Instance Redundancy ]
!12
Thursday, November 21, 13
EC2 Instance Load balancer used to
route incoming
Elastic
Load Balancing requests to multiple
EC2 Instance
Client EC2 instances
EC2 Instance
[ Instance Redundancy ]
!13
Thursday, November 21, 13
Incoming request load shared
by all instances behind the load balancer.
!13
Thursday, November 21, 13
EC2 Instance
EC2 Instance
[ Request Routing ]
!15
Thursday, November 21, 13
EQUAL UTILIZATION
ON EACH INSTANCE
EC2 Instance
EC2 Instance
[ Request Routing ]
!15
Thursday, November 21, 13
EQUAL UTILIZATION
TARGETS INSTANCES WITH
ON EACH INSTANCE
FEWEST OUTSTANDING REQUESTS
EC2 Instance
[ Request Routing ]
!15
Thursday, November 21, 13
Instances that fail can be replaced seamlessly
while other instances continue to operate.
!15
Thursday, November 21, 13
EC2 Instance
Application level
Elastic health checks ensure that
Load Balancing
EC2 Instance
request traffic is shifted
Client
away from a failed instance
EC2 Instance
[ Health Checks ]
!17
Thursday, November 21, 13
FAILURE DETECTED
X
EC2 Instance
Application level
Elastic health checks ensure that
Load Balancing
EC2 Instance
request traffic is shifted
Client
away from a failed instance
EC2 Instance
[ Health Checks ]
!17
Thursday, November 21, 13
TRAFFIC SHIFTED
FAILURE DETECTED
X X
EC2 Instance
Application level
Elastic health checks ensure that
Load Balancing
EC2 Instance
request traffic is shifted
Client
away from a failed instance
EC2 Instance
[ Health Checks ]
!17
Thursday, November 21, 13
TRAFFIC SHIFTED
FAILURE DETECTED
X X
EC2 Instance
Application level
Elastic health checks ensure that
Load Balancing
EC2 Instance
request traffic is shifted
Client
away from a failed instance
[ Health Checks ]
!17
Thursday, November 21, 13
TRAFFIC SHIFTED
FAILURE DETECTED
USED TO DETERMINE THE HEALTH OF
THE INSTANCE X
AND APPLICATION
X TCP AND HTTP
EC2 Instance
Application level
Elastic health checks ensure that
Load Balancing
EC2 Instance
request traffic is shifted
Client
away from a failed instance
CONSIDER THE DEPTH AND CUSTOMIZE FREQUENCY
ACCURACY OF YOUR AND FAILURE THRESHOLDS
EC2 Instance HEALTHY INSTANCES CARRY
HEALTH CHECKS
ADDITIONAL REQUEST LOAD
!17
Thursday, November 21, 13
Auto Scaling can be used to automatically adjust
instance capacity up or down depending on
conditions you define.
!18
Thursday, November 21, 13
Elastic
Load Balancing
!19
Thursday, November 21, 13
Elastic
Load Balancing
LOAD INCREASES
[ ELB & Auto Scaling ]
!19
Thursday, November 21, 13
Elastic
INSTANCES ADDED FOR Load Balancing
INCREASED LOAD
EC2 Instance EC2 Instance EC2 Instance EC2 Instance EC2 Instance
!19
Thursday, November 21, 13
Elastic
Load Balancing
EC2 Instance EC2 Instance EC2 Instance EC2 Instance EC2 Instance
LOAD DECREASES
[ ELB & Auto Scaling ]
!19
Thursday, November 21, 13
Elastic INSTANCES REMOVED
Load Balancing
AS LOAD DECREASES
!19
Thursday, November 21, 13
Elastic INSTANCES REMOVED
Load Balancing
AS LOAD DECREASES
AUTOMATICALLY SCALES
INSTANCES UP OR DOWN CUSTOM SCALING METRICS
REDUCES COSTS
EC2 Instance EC2 Instance EC2 Instance
AUTOMATICALLY REPLACES
FAILED INSTANCES
!19
Thursday, November 21, 13
2
Zonal
Availability
!19
Thursday, November 21, 13
Availability Zones are distinct geographical
locations that are engineered to be insulated from
failures in other zones.
!20
Thursday, November 21, 13
Region
Availability Zone
!21
Thursday, November 21, 13
It is important to run application
stacks in more than one zone.
!22
Thursday, November 21, 13
Avoid unnecessary dependencies
between zones.
!23
Thursday, November 21, 13
Zone 1a
EC2 Instances
Load balancer used to
Elastic
Load Balancing balance across instances in
multiple Availability Zones.
Client
EC2 Instances
Zone 1b
!25
Thursday, November 21, 13
Each load balancer will contain one or more
DNS records, one for each load balancer node.
!25
Thursday, November 21, 13
Client
Elastic
192.0.2.1 Load Balancing 192.0.2.2
EC2 Instance EC2 Instance EC2 Instance EC2 Instance EC2 Instance EC2 Instance
[ Understanding DNS ]
!27
Thursday, November 21, 13
Client
Elastic
192.0.2.1 Load Balancing 192.0.2.2
[ Understanding
EACH LOAD BALANCER DOMAIN NAME
DNS ]
MAY CONTAIN MULTIPLE A RECORDS
!27
Thursday, November 21, 13
Using multiple Availability Zones does
bring a few challenges.
!27
Thursday, November 21, 13
requests / minute
time
!28
Thursday, November 21, 13
Zone 1a
2
EC2 Instances An unequal number of
Elastic instances per zone can
Load Balancer
lead to over utilization of
Client
3 instances in a zone.
EC2 Instances
Zone 1b
!30
Thursday, November 21, 13
Problem solved.
!30
Thursday, November 21, 13
Cross-Zone Load Balancing distributes
traffic across all healthy instances,
regardless of Availability Zone.
!31
Thursday, November 21, 13
Zone 1a
EC2 Instances
Zone 1b
!33
Thursday, November 21, 13
requests / minute
time
!33
Thursday, November 21, 13
requests / minute
Availability
ELIMINATES Zones may
IMBALANCES IN
NO BANDWIDTH CHARGE see traffic
INSTANCE imbalances
UTILIZATION
FOR CROSS-ZONE TRAFFIC
due to clients caching
DNS records.
!33
Thursday, November 21, 13
3
Regional
Redundancy
!35
Thursday, November 21, 13
Elastic Load Balancing and Amazon Route 53 have
been integrated to support a single application
across multiple regions.
!36
Thursday, November 21, 13
Region
Availability Zone
!37
Thursday, November 21, 13
AWSs authoritative Domain Name Service (DNS)
ROUTE
53
Health checking service
!39
Thursday, November 21, 13
Improves availability by
ROUTE
53
health checking load balancer nodes and rerouting
traffic to avoid failures
supporting multi-region and backup architectures for
high-availability
!40
Thursday, November 21, 13
Health Checks
Automated requests sent over the
Internet to your application to verify
that your application is reachable,
+ Failover
Only returns answers for resources
that are healthy and reachable from
the outside world, so end users are
available, and functional. routed away a failed application.
!40
Thursday, November 21, 13
Work on Failure Constant Work
System activity System activity
Time to react Time to react
time time
When nothing is failing, volume of API Health checkers and edge locations
calls is zero. When failure occurs, perform the same volume of activity
volume of API calls spikes. whether endpoints are healthy or
unhealthy.
!41
Thursday, November 21, 13
Amazon Route 53
conducts health checks
from within each AWS
region
!43
Thursday, November 21, 13
NETWORK PARTITION
!43
Thursday, November 21, 13
150
MANUAL FAILOVER
operator receives an alarm
vs. operator manually
configures DNS update
!44
Thursday, November 21, 13
NO CONTROL PLANE INVOLVEMENT
150
REQUIRED FOR FAILOVER TO OCCUR
MANUAL FAILOVER
operator receives an alarm
EDGE LOCATIONS PULL HEALTH RESULTS
vs. operator
DIRECTLY manuallyDISTRIBUTED
FROM GLOBALLY
configures DNS update
SECONDS
HEALTH CHECKER FLEET
wait for DNS changes to
propagate
DONT HAVE TO WAIT FOR API REQUESTS
TO SUCCEED AND THEN PROPAGATE
!44
Thursday, November 21, 13
Region
!46
Thursday, November 21, 13
Region
Primary Secondary
ROUTE
53
Elastic
S3
Health
Load Balancing Check
!47
Thursday, November 21, 13
Region
Primary Secondary
X ROUTE
53
Elastic
S3
Health
Load Balancing Check
X
FAILOVER
!48
Thursday, November 21, 13
Static Site Static vs. dynamic content
!48
Thursday, November 21, 13
Provides your globally-distributed end users
with faster performance
!50
Thursday, November 21, 13
Better performance than running in a single region
Easier implementation than traditional DNS solutions Our customers bid on video ad
inventory in real time and our system
Much lower prices than traditional DNS solutions must evaluate the content they're
sponsoring and respond with a
decision in less than 50ms, or they'll
lose the auction. Route 53s Latency
Based Routing lets us easily run
multiple stacks of our whole targeting
platform in each AWS region so we can
meet our customers latency needs.
Jonathan Dodson,
Vice President of Engineering at Affine
[ LBR Benefits ]
!50
Thursday, November 21, 13
Region 1 Region 2
example.com wants faster
page load for customers Elastic Elastic
Load Balancing Load Balancing
Launches application stack in
additional AWS regions
[ Multi-Region Failover ]
!52
Thursday, November 21, 13
Region 1 Region 2
Primary Primary
ROUTE
53
Elastic Health Health Elastic
Load Balancing Check Check Load Balancing
[ Multi-Region Failover ]
!53
Thursday, November 21, 13
Region 1 Region 2
Primary Primary
ROUTE X
53
Elastic Health Health Elastic
Load Balancing Check Check Load Balancing
X
[ Multi-Region Failover ]
!54
Thursday, November 21, 13
Region 1 Region 2
Elastic Elastic
Load Balancing Load Balancing
S3
!55
Thursday, November 21, 13
[ Configuring DNS Failover ]
!56
Thursday, November 21, 13
AWS & InfoSpace
Elastic Load Balancing & Amazon Route 53 for High-Availability
!57
Thursday, November 21, 13
InfoSpace Search
!57
Thursday, November 21, 13
InfoSpace Search
!58
Thursday, November 21, 13
InfoSpace Search
Search Sites
!58
Thursday, November 21, 13
InfoSpace Search
!58
Thursday, November 21, 13
Types of Users
!59
Thursday, November 21, 13
Types of Users
Search Site
Users
!59
Thursday, November 21, 13
Types of Users
Search Site Search API
Users Partners
400 million queries per 150+ partners
month worldwide
Broad geographical Located primarily in
distribution US and EU
2 billion queries/month
!59
Thursday, November 21, 13
Types of Users
Search Site Search API
Click Users
Users Partners
400 million queries per 150+ partners 6.5 billion clicks/month
month worldwide Broad geographical
Broad geographical Located primarily in distribution
distribution US and EU
2 billion queries/month
!59
Thursday, November 21, 13
Global Distribution of Traffic
!60
Thursday, November 21, 13
Global Distribution of Traffic
!60
Thursday, November 21, 13
Global Distribution of Traffic
AZ# AZ#
AZ#
AZ# AZ#
AZ#
AZ# AZ#
AZ#
!60
Thursday, November 21, 13
Global Distribution of Traffic
AZ# AZ#
AZ#
AZ# AZ#
AZ#
AZ# AZ#
AZ#
!60
Thursday, November 21, 13
Global Distribution of Traffic
AZ# AZ#
AZ#
AZ# AZ#
AZ#
AZ# AZ#
AZ#
!60
Thursday, November 21, 13
Global Distribution of Traffic
AZ# AZ#
AZ#
AZ# AZ#
AZ#
AZ# AZ#
AZ#
!60
Thursday, November 21, 13
Global Distribution of Traffic
AZ# AZ#
AZ#
AZ# AZ#
AZ#
AZ# AZ#
AZ#
!60
Thursday, November 21, 13
Global Distribution of Traffic
AZ# AZ#
AZ#
AZ# AZ#
AZ#
AZ# AZ#
AZ#
!60
Thursday, November 21, 13
Key Statistics
4.5 billion requests/month
Migrated from 2 data centers to AWS in 5 months
Deployed in 4 regions
Approximately 500 EC2 instances
Approximately 50 load balancers
Approximately 70 Amazon Route 53 zones
!62
Thursday, November 21, 13
AWS Infrastructure Route$53$
Public$Subnet$ Private$Subnet$
Suppor+ng$Services$
!62
Thursday, November 21, 13
Fire and Forget
!63
Thursday, November 21, 13
Fire and Forget
!63
Thursday, November 21, 13
Fire and Forget
!63
Thursday, November 21, 13
Fire and Forget
Asynchronous
!63
Thursday, November 21, 13
Fire and Forget
!63
Thursday, November 21, 13
Fire and Forget
!63
Thursday, November 21, 13
Fire and Forget
!64
Thursday, November 21, 13
Fire and Forget
!64
Thursday, November 21, 13
Fire and Forget
!64
Thursday, November 21, 13
Fire and Forget
LBR
LBR
!64
Thursday, November 21, 13
Fire and Forget
LBR
LBR
!64
Thursday, November 21, 13
Fire and Forget
LBR
!64
Thursday, November 21, 13
Results
Regional failover in 150 seconds consistently
Decreased latency 25% less latent worldwide
Can easily reroute individual partners to different region to avoid routing
problems
Replaced expensive network gear from datacenter
!65
Thursday, November 21, 13
What next?
Expanding to additional regions
Integration of monitoring data with traffic routing
!66
Thursday, November 21, 13
3
Levels
of
Availability
!67
Thursday, November 21, 13
1
Instance
Availability
!68
Thursday, November 21, 13
1 2
Instance Zonal
Availability Availability
!68
Thursday, November 21, 13
1 2 3
Instance Zonal Regional
Availability Availability Availability
!68
Thursday, November 21, 13
Please give us your feedback on this
presentation
CPN104
As a thank you, we will select prize
Thank You
winners daily for completed surveys!