Sunteți pe pagina 1din 18

EMC SCALEIO PERFORMANCE

REPORTS
Detailed Performance Results

ABSTRACT

This white paper provides detailed results of performance tests done in EMC
Performance Engineer lab. The tests are carried out on ScaleIO systems in various
configurations and hardware selections to evaluate individual components.

July 2015

EMC WHITE PAPER


To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local
representative or authorized reseller, visit www.emc.com, or explore and compare products in the EMC Store

Copyright © 2015 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.

The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with
respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

VMware and <insert other VMware marks in alphabetical order; remove sentence if no VMware marks needed. Remove highlight and
brackets> are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other
trademarks used herein are the property of their respective owners.

2
TABLE OF CONTENTS

INTRODUCTION ........................................................................................ 4
TEST CONFIGURATIONS ..................................................................................... 4
AUDIENCE ......................................................................................................... 5

PERFORMANCE RESULTS .......................................................................... 5


SDS ................................................................................................................. 6
SDC ................................................................................................................. 8
NETWORK ......................................................................................................... 9
RAID CONTROLLER .......................................................................................... 11
ESX ................................................................................................................ 13
RESPONSE TIME .............................................................................................. 14
READ CACHE ................................................................................................... 15

PERFORMANCE BEST PRACTICES ............................................................ 16


OTHER LIMITATIONS .............................................................................. 17
CONCLUSION .......................................................................................... 18

3
INTRODUCTION
ScaleIO is an industry leading Software-defined Storage solution that can offer hyper-convergence, scalability, elasticity, and
performance. In a nutshell, ScaleIO aggregates all the IOPS in the various servers into one high-performing virtual SAN. All servers
participate in servicing I/O requests using massively parallel processing. The architecture allows companies to deploy storage and
compute resources in a single architecture so it’s very flexible to scale storage and compute independently. ScaleIO can scale out
from as little as three servers to thousands by simply adding servers (nodes) to the environment. Increasing and decreasing capacity
can happen “on the fly” without impact to users or applications. In addition, ScaleIO is hardware agnostic so there is no limitation on
what hardware customers are required to use.

Selecting the right hardware components and properly configuring the ScaleIO system is a crucial stage in successful
implementation. The goal of this white paper is to:

• Help evaluate ScaleIO technology for either capacity or performance tier

• Provide parameters and guidelines for future implementation

• Describe performance data to ensure informed judgments about overall solution capabilities

• Understand all considerations in order to plan for operation and scaling

• Pitch ScaleIO to customers or other end-user audiences

Although different applications use various block sizes and different read and write configuration combinations, this paper will focus
mainly on 8k block size in 100/0, 50/50, 70/30 and 0/100 combinations. Most tests are done in Linux environments to achieve
maximum performance. The performance numbers listed in this report are considered as "realistic normal case scenario" results,
meaning that the numbers listed in this document reflect, in most cases, the actual performance capabilities the user receives (or
close enough to actual performance capabilities). Some tests are done using Null device is because there is a need to separate
ScaleIO’s software performance versus real device limitation.

TEST CONFIGURATIONS
Hardware requirements are as below:

Platform Cisco UCS C240 M3S (Bios C240M3.2.0.3.0)

Dual-socket Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz. 10 cores per socket with
Processor
Hyper-Threading enabled

Memory 64 GB DDR3. 4 x 16GB DIMMs @ 1866MHz

Network Intel 2x10Gb Onboard (firmware 0x61C10001-1.446.1)

Storage SSD 800GB HGST, firmware revision C118

Storage HDD 1TB Seagate (ST91000640NS), firmware revision CC03 7200 RPM/6Gb

Software deployed in the test is listed as below

ScaleIO EMC ScaleIO Version: R1_31.216.1

Load Generator FIO 2.1.11

The workload for the testing is generated using the FIO with the following parameters:

4
FIO Parameters Values

rw randrw

bs 8k

iodepth 256

num_jobs 4

runtime Varied

sleep_between_jobs 20

invalidate 1

group_reporting 1

AUDIENCE
This white paper is intended for internal EMC audiences such as System Engineers, Technology Consultants, Professional Services
Consultants and Solution Engineers who desire a deeper understanding of the performance aspects of EMC ScaleIO and/or the
planning considerations for the future growth of their ScaleIO environment(s). This document outlines how each ScaleIO components
perform based on several environment variables and how to apply best practices through basic guidelines and troubleshooting
techniques as uncovered by ScaleIO performance engineering. The audiences should have been familiar with ScaleIO system and
architecture at a 201-level in order to consume the information in this white paper.

PERFORMANCE RESULTS
To improve performance, we need to understand the factors causing performance bottlenecks. In a ScaleIO system, the IO
performance depends on those components:

• SDC: Where data is sent from a client

• SDS: Where the local storage is carved out for ScaleIO system

• Operating System: Where the SDC and SDS are running on

• RAID Controllers: Where the IO must go through before hitting the HDD or SSD

• Network: Where how much data can be transferred in a given time period

ScaleIO scales to thousands of servers. Given a 100% Read workload with 4k block size using Null device, a ScaleIO system with
128 SDSs can achieve around 31M IOPS. In theory, the system would get ~180M IOPs for 1024 nodes. ScaleIO can outperform
alternative solutions many times over in term of performance using a fraction of the existing application servers, which minimizes
the cost of the storage.

5
Below is a real GUI screenshot of a test done by internal EMC engineers.

Let’s take a look at performance characteristics of various ScaleIO components and environmental factors.

SDS
The SDS owns local storage that contributes to the ScaleIO Storage Pools. An instance of the SDS runs on every server that
contributes some or all of its local storage space (HDDs, SSDs, or PCIe flash cards) to the aggregated pool of storage within the
ScaleIO virtual SAN. Local storage may be disks, disk partitions, even files. The role of the SDS is to actually perform the Back-End
IO operations as requested by an SDC.

6
The following graph shows the scalability of the IOPS per physical core, for the 8KB read only workload. The number of read IOPs
scales linearly with the number of cores.

The following graph shows the scalability of the IOPS per physical core, for the 8KB write only workload. Similar to read workload,
the performance of the write operations scales linearly with the cores.

However, the number of IOPS for the write operations is almost half to that of the read operations. This is due to that fact that, for
every write operation, an IO is generated for the primary and the corresponding secondary SDS, which results in 2 IO operations per
write. Whereas, for the read operations, the IOs are served directly from the primary SDS which results in twice the number of IOPs
for reads than writes.

7
SDC
The SDC is a block lightweight device driver that exposes ScaleIO shared block volumes to applications. The SDC runs on the same
server as the application. This enables the application to issue an IO request and the SDC fulfills it regardless of where the particular
blocks physically reside. The SDC communicates with other nodes (beyond its own local server) over TCP/IP-based protocol, so it is
fully routable.

The following graph shows the scalability of the IOPS per physical core, for the read only workload. The results show that the
scalability is limited to 3 threads and 3 cores and having more cores in the system doesn’t help. ScaleIO consumes minimal CPU
resources which is usually 5% and rarely goes up to 30%.

The graph also indicates that the number of read IOPs for SDC is higher than the read IOPs from SDS. This is due to the fact that the
IOPs, which are served to the SDC, are spread across multiple SDS; all of which are being served in parallel.

8
The following graph shows the scalability of the IOPS per physical core, for write only workload. Similar to read workload, the
scalability is limited to 3 threads and 3 cores and having more cores in the system doesn’t help.
It should be noted that the SDC have the similar performance for both reads and writes, due to the fact that for both read and write
operations, SDC issues a single IO against the SDS.

The performance of both the SDS and the SDC levels off after certain number of cores. This is because ScaleIO is very light on
compute resources and it assumes that there are other applications which require more resources, running in parallel with ScaleIO.

NETWORK
To evaluate ScaleIO capability to saturate the network, two tests have been done to look at IO Size and Max IOPS:

1. 3-node (all SDC/SDS) with 2x10Gb network

2. 6-node (2-Tier Model using 3 SDCs and 3 SDSs separately) with 4x10Gb network

The chart below of the first test shows that although the network bandwidth is limited to ~2GB/sec, the system still can go to
3GB/sec because 1/3 of the IOs are local to the SDC. When the SDC and SDS are both in the same node, the SDC will leverage
storage from the local server (SDS node) approximately 1/N where N is the number of nodes. This also means that the more nodes,
the less locality.

9
Also in the first test, the write is only half of the bandwidth. This is because ScaleIO has 2x mirroring scheme.

The below table is for the second test with 6-node configuration. Since this is a 2-Tier deployment, the SDC does cannot take
advantage of the local SDS to go beyond 4GB/sec. This test is done using Null device.

4Nic non-converged Reads Writes

I/O size IOPs MB/Sec IOPs MB/Sec

512 249,406 122 109,265 53

4,096 237,173 926 102,969 402

8,192 234,170 1,829 94,888 741

10
16,384 198,158 3,096 78,118 1,221

32,768 124,514 3,891 59,070 1,846

65,536 63,069 3,942 33,222 2,076

131,072 33,617 4,202 15,506 1,938

262,144 16,288 4,072 7,860 1,965

524,288 7,743 3,872 3,442 1,721

1,048,576 3,543 3,543 1,573 1,573

RAID CONTROLLER
There are several types of Raid Controllers:

With protected DRAM (ROC): This option is better to use for HDD-only configuration. This controller usually has 1-4GB of DRAM
which is good enough to help improve Write Response Time and drive IOPs by using the elevator algorithm.

Pass-Through (IOC): This is used for SSD-only configuration to achieve better max IOPs. There are two main vendors: Avago
(LSI) and PMC (Adaptec). Both have 6Gb and 12Gb cards and Read/Write Caching SSD extensions. The products are marketed under
CacheCade (LSI) and MaxCache (PMC).

To evaluate ScaleIO’s capability to leverage high performance Flash card, P320 PCI Flash card was used in one of the tests as it is a
high performance drive. This graph below shows a comparison between Null versus the P320 to show how much ScaleIO leverages
the Flash card without introducing much performance bottleneck. Both are very close in term of IOPS.

Similar test was done below to compare with Raid Controllers as well. In general, HBA/IOC have slightly higher IOPs than JBOD.

11
Below test scenario is done using 6Gb SAS in a 3-node configuration with IO size of 8KB. Clearly the Raid Controller limits show
120K Reads, 30K Writes (Write Through), and 15K (Write-back). For Sunset cove SSD drive, it can only perform 60K Reads and 20K
Writes. The below results are per node.

READ:WRITE @ 8K 1xSSD 2xSSD 4xSSD

100:0 59,393 112,839 120,571

70:30 37,112 38,650 39,217

50:50 25,455 26,336 26,841

0:100 (With Write-Back) 13,860 14,725 15,021

0:100 (Write Through) 21,000 28,333 29,333

Another test was done using CacheCade as a Read/Write cache using SSD. CacheCade is a great solution to increase HDD
configuration performance in a ScaleIO system. Below chart shows IOPs growth with time because more of the IOPs are from the
SSD.

12
If CacheCade was used for 100% random reads, the IOPS will increase quickly once CacheCade detects the pattern and warms up.
Keep in mind the following limitation:

• Read: When the address space is larger than 256GB, caching doesn’t work for 8KB Read IOs.

• Write: When the address space is larger than 512GB, caching doesn’t work for current version of CacheCade. The next version
will go up to 2TB.

There are several factors that may influent the caching performance:

• Workload: If the workload has some skewness where most of the IOs hit a small address space and that address space fits in
cache, there will be high hit caching ratio.

• Persistency: This depends on how often the workloads hit the same address space. For example, if they are hitting a different
100GB every minute, caching will not perform as expected.

In general, if there are enough IOs going to the same place, the code will decide this is a good candidate for caching and bring it to
the cache. That is why it takes a while to get good performance.

ESX
It’s highly recommended to use ScaleIO 1.31+ for higher performance in ESX environments. The main reason is that iSCSI was used
in 1.30 and became the performance bottleneck for ScaleIO. In version 1.31, ScaleIO deploys the SDC inside the ESX kernel to
eliminate this bottleneck. Comparing to bare metal Linux deployment, version 1.31 in ESX is very close in term of performance.

This test uses various IO size for a 3-node configuration with 2x10Gb network on NULL device.

13
When comparing VMFS and RDM, there is very little difference. See chart below:

RESPONSE TIME
ScaleIO Read Response Time is excellent comparing to competitive solutions. The Write Response Time is good for Write Enabled
(WE) case, but not when Write caching is disabled.

14
READ CACHE
This section describes the impact of cache on the performance of the read operations in ScaleIO. The tests were performed on a 3
node configuration with the default of 50GB Read Cache per node, total of 150GB Cache across 3 HDDs. The test was performed for
4KB reads with total size of 70GB. ScaleIO was able to fit entire 4KB workload in 150GB of cache.

The following graph shows the IOPs over time as the cache started to become hot.

15
With Read Cache, ScaleIO was able:

• To achieve better response time for read operations.

• To reduce the overhead of the max Read IOPs

• To reduce the Read IO traffic on the disk drive.

• To use the whole cache with any Read Workload of any IO size.

Cache Considerations:

• In Converged Infrastructure, using DRAM for the read cache may take away the benefits of the application cache.

• Read cache only improves the performance until the Write limit.

• In ScaleIO version 1.31 and later, the write operations are buffered by default. However, it only benefits the read operations
that access the same address as the previous writes.

• Flash only configurations don’t need Read Cache.

PERFORMANCE BEST PRACTICES


ScaleIO is very flexible by allowing various configurations for different capacity and performance use cases. Below are some common
setups to configure the ScaleIO system and what needs to be taken into consideration:

• HDD-only: This is not recommended for high performance workloads. To improve performance, it’s possible to have many
smaller drives to distribute the IO vs. fewer large drives. There is also option to use a Raid controller with protected DRAM and
define the HDDs as "Write-Back" to achieve better Response Time for mixed workloads. “Write-Back” cache is beneficial because
the write IO goes to the cache first and respond successful writes right away. But this is also a reason to use battery backed
cache to avoid data loss in the event of power failure.

• Mixed SSD and HDD Tiers: Customers may have multiple pools defined as SSD pools and HDD pools. Similarly, the HDD pools
should have "Write-Back" for performance while it’s OK to have SSD pools as "Write-Through".

• SSD as Cache for HDD: There are a few options in the market to for SSD caching. XtremeCache is EMC solution for Read
Cache. CacheCade (if using LSI controller) and MaxCache (if using PMC controller) are good for Read/Write cache. For storage
only configuration (2-tier deployment), ScaleIO only needs 2 cores to have significant performance with low cost servers.

• SSD-only: This configuration is good even with basic IO controller. There is no need for any DRAM buffering functionality. It’s
highly recommended to leverage SSD tier for high performance applications because that is a huge advantage of ScaleIO
comparing to other solutions.

16
• Flash PCI: In general, Flash does not need a Raid controller. The Flash can be used as a Flash tier or cache for XtremeCache.
Today market has many options for Flash drives with different write performance.

Beside the drive configurations, there are many other “knobs” to turn:

• Pool: It’s a good practice not to mix drive types (e.g. SSD with HDD) in a pool. Although ScaleIO allows it, the achieved
performance is usually from the slowest drive.

• Read Cache: The Read Cache is only used for “Read Hits.” What may be confusing is the option to buffer the writes in cache as
well but that only benefits Read that access the same address as the previous write (which we call in short, Read after Write). In
1.31 the write buffering became ON by default which should improve the Read Hits from regular workloads. Real Write buffering
that improves the Write RT and Writes/sec on the drives is usually achieved on the Raid controller, so make sure to always
recommend a Raid controller with protected DRAM to help get better write performance.

• RAID Level: Every disk should be created as a separate RAID-0 and let ScaleIO handle failures.

OTHER LIMITATIONS
ScaleIO is very flexible by allowing various configurations for different capacity and performance use cases. Below are some common
setups to configure the ScaleIO system and what needs to be taken into consideration:

Item Limit Considerations

Volume Size 8 GB – 1 PB Thick or thin. Allow expand volume but no shrink

Snapshots 31 per Volume All snapshots are writable

Maximum Capacity per


64 TB Raw capacity
SDS

SDSs per System 1024 Could create multiple systems

SDSs per Protection


128 Contact EMC for more
Domain

Devices (Disks) per SDS 64 Add more servers for “scale out”

Minimum of 3
Devices (Disks) per
Avoid mix of HDD and SSD in a single pool
Storage Pool
Maximum of 300

SDCs per System 1024 Reduce by number of RPAs if use RecoverPoint

Protection Domains per


256 Possible to create more systems
System

Networks per System 8 For Management & Data. SDS-SDC read/write/rebuilt/rebalance

RAM Cache 128 MB – 128 GB Hyper-converged deployment will use application RAM

17
CONCLUSION
In summary, ScaleIO is an excellent storage solution that offers greater choices as customers have the option of deploying ScaleIO
using existing storage, new storage or a mix of both for capacity and performance tier. Although this paper only covers mainly Linux
and ESX environment, ScaleIO can support OpenStack and Hyper-V. ScaleIO is a great solution for Service Providers or Service
Provider-like Enterprises to deliver Infrastructure-as-a-Service (IaaS) to customers or internal users. Customers not only achieve
lower Total Cost of Ownership (TCO) but also gain complete control over performance, capacity and data location.

The ScaleIO architecture is designed so that there is no bottleneck or single point of failure. In other storage systems, having a
virtualization layer to keep track of data (e.g. index or journal) usually results in massive failure and disruption when such layer
becomes unavailable. A ScaleIO cluster has a many-to-many communication in a mesh network which enables large parallelism and
I/O performance.

When architecting a ScaleIO system for performance and scaling, there are several considerations:

1. To maximize ScaleIO TCO, it should be deployed in hyper-converged mode where storage and compute can reside in the same
hardware. ScaleIO doesn’t affect the application performance as it consumes very minimal CPU resources. The challenge could
be to explain and get a buy-in from the end-users to eliminate concerns about application availability.

2. If the majority of workloads require high performance, customers could leverage all-SSD configuration or use SSD as Read/Write
cache (with RAID Controller). It’s also ideal to leverage RAM cache as this is a great differentiator of ScaleIO versus traditional
SAN or competitive Software-defined Storage solutions. RAM cache can be set up to 128 GB in a single node which should be
plenty for heavy Read workloads.

3. For customers who prefer to deploy ScaleIO for capacity use cases with a great concern on performance (e.g. test/dev, internal
Tier-2 apps, ROBO storage…), ScaleIO is a solution of choice because the system can start with a small cluster of three nodes.
The customers can scale as the workloads require more capacity or IOPS. With such requirements, having ScaleIO to deploy in
servers with HDD-only or HDD and SSD-cached could be an optimized solution to balance between cost and capacity.

18

S-ar putea să vă placă și