Sunteți pe pagina 1din 19

Red Hat Storage and AWS

Building Gluster clusters on AWS

Craig Carl, AWS Solutions Architect


crcarl@amazon.com
Gluster is now Red Hat Storage!
!   Gluster was acquired by RedHat in October of 2011
•  Gluster remains an OSS product
•  http://download.gluster.com

•  Supported version of Gluster is available for AWS via Red Hat


Cloud Access

•  Contact Red Hat sales for more information


•  Danielle Cleveland, dclevela@redhat.com
What is Gluster?
!   Gluster is a distributed file system that exports multiple
protocols –
•  GlusterFS, an NFS like interface
•  Transparent to your application
•  NFS
•  CIFS
•  Object store

!   Gluster is the only distributed file system that –


•  Exports NFS, CIFS and does not have a metadata server
•  This makes it ideal for deployment on AWS
Storage options in AWS
!   Simple Storage Service (S3)
•  Object storage with RESTful and SOAP interfaces
•  > 1 trillion objects
•  Growing @ 40,000 object/sec

!   Elastic Block Storage (EBS)


•  iSCSI like block storage for EC2 instances
•  Now with Provisioned IOPs!
•  http://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.html

!   Databases
•  RDS
•  Oracle, SQL Server, MySQL
•  DynamoDB, SimpleDB
Why build a RHS cluster in AWS?

!   AWS does NOT offer a NFS/CIFS interface to storage.

!   You need a NFS/CIFS interface to your data!


What does a RHS cluster in AWS deliver?
!   Reliability
•  Build a storage platform that is redundant across availability zones.
•  Synchronous replication

!   Scalability
•  Build clusters that scale to petabytes.
•  Performance scales with storage.

!   Sharable
•  Supports IO from hundreds of clients simultaneously.

!   Multi-region replication
•  Asynchronous replication
Finding your Gluster performance bottleneck
!   Gluster performance is a function of multiple variables –
•  File size
•  Access size
•  Access patterns
•  Replicated v. distributed
•  Number of clients

!   And the resources dedicated to Gluster -


•  Network bandwidth and/or packets per second
•  Disk I/O
•  CPU
AWS and Gluster
!   EBS
•  Provisioned IOPs and Gluster are an incredible combination!
•  Massive improvements in small file performance

!   EC2 Instances
•  Intra-instance network -
•  On cc* instance types @ 10Gb/sec
•  Otherwise 1Gb/sec

! CloudWatch for instance monitoring


•  Including alarms
Evaluating performance
!   The entire dataset
•  Is the total aggregated I/O available to all clients.
•  Generally equal to the sum of network bandwidth across the cluster.
•  Not useful as a benchmark but it’s a great marketing number.

!   Per file performance


•  How much I/O is available to any single file.
•  Much more useful benchmark.

!   Use LLNL IOR tool to measure performance


•  https://github.com/chaos/ior
Pro-tips
!   Build your cluster in a VPC
•  Doesn’t expose your cluster to the Internet, adds another layer of security
•  Use ENIs with private IPs

!   In EC2 ‘classic’ always use Elastic IP’s


•  Guarantee instance names don’t change after a stop/start.
•  Use the EIP external name for consistent DNS results.
•  Resolves to internal IP!
•  Script EIP attach at instance start.
•  http://blog.cloudreach.co.uk/2011/01/elastic-ip-on-boot-not-too-much-of_17.html

!   Use arrays of Elastic Block Storage volumes


•  Use Provisioned IOPs where necessary
•  Smaller files, small I/O
•  Can be expensive
•  Dramatically improves performance.
•  Consider pre-warming the array
•  Arrays of 8 volumes seems to be the sweet spot.
•  Use ec2-consistant-snapshot for snapshots.
•  Be aware of the recovery implications of an array and always test your setup!
•  https://github.com/jsmartin/raidformer
Pro-tips
!  Use the largest instance size in each class
•  Limits the potential impact of a noisy neighbor.
•  m1.xlarge, m2.4xlarge
•  Test with EBS optimized instances and Provisioned IOPs
•  m1.xlarge and m2.4xlarge have 1000Mbps dedicated EBS bandwidth

!  Always use the GlusterFS client


•  Required for redundancy
•  Improves performance
•  Transparent to your application
•  Requires 64-bit OS
•  EC2 now has 64bit ubiquity!

!  Setup detailed CloudWatch monitoring


•  Watch your CPU : Network : EBS Disk Wait time ratios
•  Setup alarms!
•  Disk wait, Network utilization

!  Build replicated Gluster volumes


•  Reduces write performance by ~50%, improves read performance by ~100%
•  Any single AZ is subject to failure anytime

!  Snapshot your EBS volumes


•  Increases data durability from 99.9% (EBS) to 99.999999999% (S3)
Deployment steps
1)  Start x instances

2)  Attach EIP/ENIs, create startup scripts

3)  Attach EBS volumes to each instance

4)  Build mdadm array(s)

5)  Create a filesystem, mount


1)  ext3|4, XFS

6)  “gluster peer probe”


1)  Creates the cluster, can only be run as root from within the cluster

7)  “gluster volume create


1)  Choose distributed v replicated
2)  One set of server nodes can support multiple, different types of volumes

8)  “gluster volume start”


Scaling Gluster in AWS
scaling UP v scaling OUT
!   Scaling UP
•  Adding storage to existing cluster nodes
•  Appropriate when the instances have free
•  network bandwidth
•  disk I/O
•  CPU cycles
•  memory

!   Scaling OUT
•  Adding nodes to an existing cluster
•  Appropriate when the instances are resource bound
Scaling UP

RHS RHS RHS RHS RHS RHS


Server Server Server Server Server Server

RHS single namespace


AZ-1 AZ-2

Region

!   No performance improvement when scaling up.


!   Improves storage density.
!   Reduces cost/GB.
!   You can also change your instance type, grow from a m1.xlarge to a m2.4xlarge!
•  If you are running a replicated Gluster cluster you can change your instance type with no downtime!
Scaling OUT

R R R R R R R R R R R R

R R R R R R R R R R R R

R R R R R R R R R R R R

RHS single namespace


AZ-1 AZ-2

Region

!   Can improve per file performance.


•  If scaling out relieves a bottleneck
Performance

TEST
TEST
TEST!
!   Every workload is different, your results will vary!
•  AWS makes testing easy, create and destroy cluster in minutes
•  Use real world load to test with
Performance
Single file 400.00
MB/sec On a mirrored pair of ephemeral storage,
300.00 distributed reads.
200.00
100.00
0.00
2 4 8 16 32 64 128 256 512 1024
Block Size (KB)

Gluster 200.00
replicated
150.00
On EBS an array of 8 volumes, replicated single file
MB/sec 100.00
writes. (8000 IOPs/instance)
50.00
0.00
2 4 8 16 32 64 128 256 512 1024
Block Size (KB)
3000

Performance
2500
2448

Finding the perfect ratio of Gluster storage 2352

nodes : clients is complex, test, test, test! 2000

1696 1784

1500

1012 1000
16 16 16
824 880
852
8 8 832 8 8 8 8 8
440
438 440 500
4 4 4 4 4 4 4

2 2 2 2 2 2 2
0
Clients Gluster Servers Throughput (MB/s)
m2.4xlarge servers and clients
Thanks!

http://www.slideshare.net/AmazonWebServices

Craig Carl
crcarl@amazon.com

S-ar putea să vă placă și