Red Hat Storage

Red Hat Storage and AWS
Building Gluster clusters on AWS
Craig Carl, AWS Solutions Architect

crcarl@amazon.com
Gluster is now Red Hat Storage!
!   Gluster was acquired by RedHat in October of 2011
•  Gluster remains an OSS product
•  http://download.gluster.com
•  Supported version of Gluster is available for AWS via Red Hat

Cloud Access
•  Contact Red Hat sales for more information

•  Danielle Cleveland, dclevela@redhat.com
What is Gluster?
!   Gluster is a distributed file system that exports multiple
protocols –
•  GlusterFS, an NFS like interface
•  Transparent to your application
•  NFS
•  CIFS
•  Object store
!   Gluster is the only distributed file system that –

•  Exports NFS, CIFS and does not have a metadata server
•  This makes it ideal for deployment on AWS
Storage options in AWS
!   Simple Storage Service (S3)
•  Object storage with RESTful and SOAP interfaces
•  > 1 trillion objects
•  Growing @ 40,000 object/sec
!   Elastic Block Storage (EBS)

•  iSCSI like block storage for EC2 instances
•  Now with Provisioned IOPs!
•  http://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.html
!   Databases
•  RDS
•  Oracle, SQL Server, MySQL
•  DynamoDB, SimpleDB
Why build a RHS cluster in AWS?
!   AWS does NOT offer a NFS/CIFS interface to storage.
!   You need a NFS/CIFS interface to your data!

What does a RHS cluster in AWS deliver?
!   Reliability
•  Build a storage platform that is redundant across availability zones.
•  Synchronous replication
!   Scalability
•  Build clusters that scale to petabytes.
•  Performance scales with storage.
!   Sharable
•  Supports IO from hundreds of clients simultaneously.
!   Multi-region replication
•  Asynchronous replication
Finding your Gluster performance bottleneck
!   Gluster performance is a function of multiple variables –
•  File size
•  Access size
•  Access patterns
•  Replicated v. distributed
•  Number of clients
!   And the resources dedicated to Gluster -

•  Network bandwidth and/or packets per second
•  Disk I/O
•  CPU
AWS and Gluster
!   EBS
•  Provisioned IOPs and Gluster are an incredible combination!
•  Massive improvements in small file performance
!   EC2 Instances
•  Intra-instance network -
•  On cc* instance types @ 10Gb/sec
•  Otherwise 1Gb/sec
! CloudWatch for instance monitoring

•  Including alarms
Evaluating performance
!   The entire dataset
•  Is the total aggregated I/O available to all clients.
•  Generally equal to the sum of network bandwidth across the cluster.
•  Not useful as a benchmark but it’s a great marketing number.
!   Per file performance

•  How much I/O is available to any single file.
•  Much more useful benchmark.
!   Use LLNL IOR tool to measure performance

•  https://github.com/chaos/ior
Pro-tips
!   Build your cluster in a VPC
•  Doesn’t expose your cluster to the Internet, adds another layer of security
•  Use ENIs with private IPs
!   In EC2 ‘classic’ always use Elastic IP’s

•  Guarantee instance names don’t change after a stop/start.
•  Use the EIP external name for consistent DNS results.
•  Resolves to internal IP!
•  Script EIP attach at instance start.
•  http://blog.cloudreach.co.uk/2011/01/elastic-ip-on-boot-not-too-much-of_17.html
!   Use arrays of Elastic Block Storage volumes

•  Use Provisioned IOPs where necessary
•  Smaller files, small I/O
•  Can be expensive
•  Dramatically improves performance.
•  Consider pre-warming the array
•  Arrays of 8 volumes seems to be the sweet spot.
•  Use ec2-consistant-snapshot for snapshots.
•  Be aware of the recovery implications of an array and always test your setup!
•  https://github.com/jsmartin/raidformer
Pro-tips
!  Use the largest instance size in each class
•  Limits the potential impact of a noisy neighbor.
•  m1.xlarge, m2.4xlarge
•  Test with EBS optimized instances and Provisioned IOPs
•  m1.xlarge and m2.4xlarge have 1000Mbps dedicated EBS bandwidth
!  Always use the GlusterFS client

•  Required for redundancy
•  Improves performance
•  Transparent to your application
•  Requires 64-bit OS
•  EC2 now has 64bit ubiquity!
!  Setup detailed CloudWatch monitoring

•  Watch your CPU : Network : EBS Disk Wait time ratios
•  Setup alarms!
•  Disk wait, Network utilization
!  Build replicated Gluster volumes

•  Reduces write performance by ~50%, improves read performance by ~100%
•  Any single AZ is subject to failure anytime
!  Snapshot your EBS volumes

•  Increases data durability from 99.9% (EBS) to 99.999999999% (S3)
Deployment steps
1)  Start x instances
2)  Attach EIP/ENIs, create startup scripts
3)  Attach EBS volumes to each instance
4)  Build mdadm array(s)
5)  Create a filesystem, mount

1)  ext3|4, XFS
6)  “gluster peer probe”

1)  Creates the cluster, can only be run as root from within the cluster
7)  “gluster volume create

1)  Choose distributed v replicated
2)  One set of server nodes can support multiple, different types of volumes
8)  “gluster volume start”

Scaling Gluster in AWS
scaling UP v scaling OUT
!   Scaling UP
•  Adding storage to existing cluster nodes
•  Appropriate when the instances have free
•  network bandwidth
•  disk I/O
•  CPU cycles
•  memory
!   Scaling OUT
•  Adding nodes to an existing cluster
•  Appropriate when the instances are resource bound
Scaling UP
RHS RHS RHS RHS RHS RHS

Server Server Server Server Server Server
RHS single namespace

AZ-1 AZ-2
Region
!   No performance improvement when scaling up.

!   Improves storage density.
!   Reduces cost/GB.
!   You can also change your instance type, grow from a m1.xlarge to a m2.4xlarge!
•  If you are running a replicated Gluster cluster you can change your instance type with no downtime!
Scaling OUT
R R R R R R R R R R R R
RHS single namespace

AZ-1 AZ-2
Region
!   Can improve per file performance.

•  If scaling out relieves a bottleneck
Performance
TEST
TEST
TEST!
!   Every workload is different, your results will vary!
•  AWS makes testing easy, create and destroy cluster in minutes
•  Use real world load to test with
Performance
Single file 400.00
MB/sec On a mirrored pair of ephemeral storage,
300.00 distributed reads.
200.00
100.00
0.00
2 4 8 16 32 64 128 256 512 1024
Block Size (KB)
Gluster 200.00
replicated
150.00
On EBS an array of 8 volumes, replicated single file
MB/sec 100.00
writes. (8000 IOPs/instance)
50.00
0.00
2 4 8 16 32 64 128 256 512 1024
Block Size (KB)
3000
Performance
2500
2448
Finding the perfect ratio of Gluster storage 2352
nodes : clients is complex, test, test, test! 2000
1696 1784
1500
1012 1000
16 16 16
824 880
852
8 8 832 8 8 8 8 8
440
438 440 500
4 4 4 4 4 4 4
2 2 2 2 2 2 2
0
Clients Gluster Servers Throughput (MB/s)
m2.4xlarge servers and clients
Thanks!
http://www.slideshare.net/AmazonWebServices
Craig Carl
crcarl@amazon.com

Red Hat Storage

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Red Hat Storage

Încărcat de

Drepturi de autor:

Formate disponibile

Red Hat Storage and AWS

Building Gluster clusters on AWS

Craig Carl, AWS Solutions Architect

• Supported version of Gluster is available for AWS via Red Hat

• Contact Red Hat sales for more information

! Gluster is the only distributed file system that –

! Elastic Block Storage (EBS)

! AWS does NOT offer a NFS/CIFS interface to storage.

! You need a NFS/CIFS interface to your data!

! And the resources dedicated to Gluster -

! CloudWatch for instance monitoring

! Per file performance

! Use LLNL IOR tool to measure performance

! In EC2 ‘classic’ always use Elastic IP’s

! Use arrays of Elastic Block Storage volumes

! Always use the GlusterFS client

! Setup detailed CloudWatch monitoring

! Build replicated Gluster volumes

! Snapshot your EBS volumes

2) Attach EIP/ENIs, create startup scripts

3) Attach EBS volumes to each instance

4) Build mdadm array(s)

5) Create a filesystem, mount

6) “gluster peer probe”

7) “gluster volume create

8) “gluster volume start”

RHS RHS RHS RHS RHS RHS

RHS single namespace

! No performance improvement when scaling up.

RHS single namespace

! Can improve per file performance.

Finding the perfect ratio of Gluster storage 2352

nodes : clients is complex, test, test, test! 2000

S-ar putea să vă placă și

•  Supported version of Gluster is available for AWS via Red Hat

•  Contact Red Hat sales for more information

!   Gluster is the only distributed file system that –

!   Elastic Block Storage (EBS)

!   AWS does NOT offer a NFS/CIFS interface to storage.

!   You need a NFS/CIFS interface to your data!

!   And the resources dedicated to Gluster -

!   Per file performance

!   Use LLNL IOR tool to measure performance

!   In EC2 ‘classic’ always use Elastic IP’s

!   Use arrays of Elastic Block Storage volumes

!  Always use the GlusterFS client

!  Setup detailed CloudWatch monitoring

!  Build replicated Gluster volumes

!  Snapshot your EBS volumes

2)  Attach EIP/ENIs, create startup scripts

3)  Attach EBS volumes to each instance

4)  Build mdadm array(s)

5)  Create a filesystem, mount

6)  “gluster peer probe”

7)  “gluster volume create

8)  “gluster volume start”

!   No performance improvement when scaling up.

!   Can improve per file performance.