Sunteți pe pagina 1din 10

Oracle, the TPC-C and RamSan®

by Michael R. Ault, Oracle Guru

Texas Memory Systems


August 2010
Introduction
Often, you may know that you need to document how a particular system configuration will perform for a particular
type of load. This process is called a benchmark. A benchmark provides a performance metric that then can be
used to judge future performance or performance of other system configurations against the base benchmark.

If your system will be performing many small transactions with numerous inserts, updates, and deletes, then a
test that measures online transaction performance (OLTP) would be the proper benchmark. In this series of tests
we will use the TPC-C benchmark. This paper will demonstrate using TPC-C to contrast and compare HDD and
SSD based IO subsystems.

TPC-C Structure
For our tests the 1000 warehouse size was selected, corresponding to 42.63 gigabytes of data. The 42.63
gigabytes corresponds to over 150 gigabytes of actual database size once the needed undo, redo, and temporary
tablespaces and indexes are added to the 42.63 gigabytes of data. Table 1 shows the beginning row counts for
the TPC-C schema used.

Table Name Row Count


C_WAREHOUSE 1,000
C_DISTRICT 10,000
C_CUSTOMER 3,000,000
C_HISTORY 3,000,000
C_NEW_ORDER 900,000
C_ORDER 3,000,000
C_ORDER_LINE 30,000,000
C_ITEM 500,000
C_STOCK 10,000,000

Table 1: Row Counts for TPC-C Tables


In this test we will utilize a ramp from 5-500 users spread over several client machines with loads being generated
with the Quest Software Benchmark Factory software tool. There will be two copies of the database, one on
RamSan devices and one on two JBOD arrays of 45-15K, 144 gigabyte hard drives. This is because we want to
just see the effects on the number of transactions caused by placing the data and indexes on RamSan versus
placing them on disk arrays. We will also be doing a RAC scaling test with this setup.

The System Setup


The system we are testing consists of four dual socket quad core 2 GHz, 16 GB memory Dell servers using
InfiniBand interconnects for an Oracle11g 11.1.0.7 RAC database. The I/O subsystem is connected through a
Fibre Channel switch to two RamSan-400s, a single RamSan-500 and two JBOD arrays of 45-10K disks each.
This setup is shown in Figure 1.
Figure 1: Test Rack
The database was built on the Oracle11g real application clusters (RAC) platform and consists of four major
tablespaces: DATA, INDEXES, HDDATA, and HDINDEXES. The DATA and INDEXES tablespaces were placed
on the RamSan-500 and the HDDATA and HDINDEXES tablesapces were placed on two sets of 45 disk drives
configured via ASM as a single diskgroup with a failover group (RAID10). The RamSan 400s were used for undo
tablespaces, redo logs, and temporary tablespaces.

By using the dual configuration where one set of tables and indexes owned by schema/user TPCC is placed in
the DATA and INDEXES tablespaces and an identical set using a different schema/owner, HDTPCC, is placed on
the HDDATA and HDINDEXES tablespaces, we can test the effects of having the database reside on RamSan
assets or having it reside on disk drive assets using the exact same database configuration as far as memory and
other internal assets. By placing the redo, undo, and temporary tablespaces on the RamSan-400s we eliminate
the concerns of undo, redo, and temporary activities on the results and can isolate the results due to the relative
placement of data and indexes.

Testing Software
To facilitate database build and the actual TPC-C protocol we utilized the Benchmark Factory software from
Quest Software to build the database and run the tests. By using a standard software system we assured that the
test execution and data gathering would be identical for the two sets of data.

Building the Database


Utilizing a manual script to create the needed partitioned tables and the Benchmark Factory (BMF) application
with a custom build script, we built and loaded the RamSan based. As each table finished loading, a simple
“INSERT INTO HDTPCC.table_name SELECT * FROM TPCC.table_name” command was used to populate the
HDD-based tables. Once all of the tables were built in both locations, the indexes were then built using a custom
script. Following the database build, statistics were built on each schema using the
DBMS_STATS.GATHER_SCHEMA_STATS() Oracle-provided PL/SQL package.

The normal BMF TPC-C build wizard looks for existing tables, if they exist. It assumes that they are loaded and
jumps to the execution of the test. Since we have prebuilt the tables as partitioned tables, this would result in the
test being run against an empty set of tables. Instead of using the provided wizard, Quest provided me with a
custom BMF script that allowed loading of the TPC-C tables in 9 parallel streams against existing tables. The
custom script could also be edited to allow loading of any subset of tables. An example of where this subsetting of
the loading process is critical is in the case of a table running out of room (for example, the C_ORDER_LINE
table). By allowing only a single table to be loaded, if needed, the custom script enabled greater flexibility to
recover from errors. The custom BMF load script did not create indexes. Instead a custom script loading
additional performance-related indexes was used to build the TPC-C related indexes.

Scalability Results – 1 to 4 Instances


The first set of TPC-C tests was designed to measure the scalability of the RamSan against the scalability of the
disk-based environment. In this set of tests the size of the SGA was kept constant for each node while additional
nodes were added and the number of users was cycled from 5 to 300. The transactions per second (TPS) and
peak user count at the peak TPS were noted for both the disk and RamSan-based systems.

Highest TPS and Users at Maximum


Of concern to all IT managers, DBAs, and users are the values where a system will peak in its ability to process
transactions versus the number of users it will support. The first results we will examine concern these two key
statistics. The results for both the RamSan and HDD tests are shown in Figure 2.

HD and SSD TPCC


4000

3500

3000

2500
TPS

2000

1500

1000

500

0
0 50 100 150 200 250 300 350
SSD 1 Server SSD 2 server
Clients SSD 3 Server SSD 4 Server
HD 1 Server HD 2 Server
HD 3 Server HD 4 Server

Figure 2: RamSan and HDD Scalability Results


As can be seen from the graph, the HDD results peak at 1051 TPS and 55 users. The RamSan results peak at
3775 TPS and 245 users. The HDD results fall off from 1051 TPS with 55 users to 549 TPS and 15 users, going
from 4 down to 1 server. The RamSan results fall from 3775 TPS and 245 users down to 1778 TPS and 15 users.
However, the 1778 TPS seems to be a transitory spike in the RamSan data with an actual peak occurring at 1718
TPS and 40 users.

Notice that even at 1 node, the RamSan-based system outperforms the HDD based system with 4 full nodes, with
the exception of 3 data points, across the entire range from 5 to 295 users. In the full data set out to 500 users for
the 4 node run on RamSan, the final TPS is 3027 while for the HDD 4 node run it is a paltry 297 TPS.

Key Wait Events During Tests


Of course part of the test is to determine why the HDD and RamSan numbers vary so much. To do this we look at
the key wait events for the 4-node runs for each set of tests. Listing 1 shows the top five wait events for the
RamSan 4 node run.
Top 5 Timed Foreground Events
-----------------------------
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ---------
gc buffer busy acquire 82,716,122 689,412 8 47.5 Cluster
gc current block busy 3,211,801 148,974 46 10.3 Cluster
DB CPU 120,745 8.3
log file sync 15,310,851 70,492 5 4.9 Commit
resmgr:cpu quantum 6,986,282 58,787 8 4.0 Scheduler

Listing 1: Top 5 Wait Events During 4 Node RAMSAN Run


The only physical I/O related event in the RamSan 4 node top 5 events is the log file sync event with a timing of 5
ms per event. Listing 2 shows the top 5 events for the HDD 4 node run.

Top 5 Timed Foreground Events


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
----------------------------- ------------ ----------- ------ ------ ---------
db file sequential read 15,034,486 579,571 39 37.4 User I/O
free buffer waits 28,041,928 433,183 15 28.0 Configurat
gc buffer busy acquire 4,064,135 150,435 37 9.7 Cluster
gc buffer busy release 116,550 103,679 890 6.7 Cluster
db file parallel read 191,810 102,082 532 6.6 User I/O

Listing 2: Top 5 Wait Events During 4 Node HDD Run


Note in Listing 2 how the predominant event is the db_file_sequential_read with 15,034,486 waits and an average
wait of 39 ms per wait. This was with 579.3 read requests per second per instance. In the RamSan test if we
examine the full wait events list, we can see that we had substantially more db_file_sequential_reads, as shown
in Listing 3.
Avg
%Time Total Wait wait Waits % DB
Event Waits -outs Time (s) (ms) /txn time
------------------------- ------------ ----- ---------- ------- -------- ------
gc buffer busy acquire 82,716,122 0 689,412 8 5.1 47.5
gc current block busy 3,211,801 0 148,974 46 0.2 10.3
log file sync 15,310,851 0 70,492 5 1.0 4.9
resmgr:cpu quantum 6,986,282 0 58,787 8 0.4 4.0
db file sequential read 49,050,918 0 53,826 1 3.0 3.7

Listing 3: Extended Wait List for RAMSAN 4-Node Run


Listing 3 shows that while there were over 3 times the number of db_file_sequential_reads performed (since there
were 3 times the TPS at peak) the actual total wait time for those events was less than 10 percent of the total wait
time for the HDD test (only 53,826 seconds for the RamSan vs 579,571 seconds for the HDD). The RamSan
based system performed 2,510.5 read requests per second with 1.1 ms per wait event (from each instance.)

In addition, the HDD test shows that it was experiencing heavy write stress because the number two wait event
was for free buffer waits which indicates users waited on buffers to be written to disk before they could be reused.
In comparison, the AWR report from the SSD 4-node test shows no listing at all for free buffer waits, indicating
there was little to no write stress on the RamSan during the test.

The statistics above show that the main problem for the HDD runs was latency related. Since the RamSan system
has a latency that is 10 percent of the HDD system, the physical I/O waits are lower by at least a factor of 10
overall. This reduction of latency allows more processing to be accomplished and more transactions as well as
users to be supported.

Transaction and Response Times


Of course to get simple throughput (IOPS) we can add disks to our disk array and eventually get latency down to
between 2-5 ms. However, in this situation we would probably have to increase our number of disks from 90 to
900 in a RAID10 setup to achieve the minimal latency possible. To purchase that many disks, HBAs, cabinets,
and controllers and provide heating and cooling to support less than a 100 gigabyte database would be ludicrous,
yet many of the TPC-C reports shown on the www.tpc.org website do show just such a configuration. A measure
of throughput is also shown by looking at transaction time and response time in a given configuration. You must
th
look at three statistics, minimum time, maximum time, and 90 percentile time for both transactions and response.
Transaction time is the time to complete the full transaction; response time is the amount of time needed to get
the first response back from the system for a transaction. Figure 3 shows the plot of transaction and response
times for the 4-node HDD TPC-C run.

HD Transaction and Response Time

1000

100
Avg Time
10 Min Time
Max Time
Seconds

1 90th Time

0.1 Avg Response Time


Min Response Time
0.01 Max Response Time
90th Response Time
0.001

0.0001
0 100 200 300 400 500 600
Users

Figure 3: HDD Transaction and Response Times


As you can see, the transaction times and the response times for the HDD system track closely to each other.
th
Once you pass around 150 users the 90 percentile times begin to exceed 1 second and soon reach several
seconds as the user load continues to increase. The maximum response and transaction times reach over 200
seconds at around 200 users and hover there for the remainder of the test.
The results for the 4-node RamSan run are shown in Figure 4.

SDD Transaction and Response Times

10

1 Avg Time
Min Time
Max Time
0.1
Seconds

90th Time
Avg Response Time
0.01
Min Response Time
Max Response Time
0.001 90th Response Time

0.0001
0 100 200 300 400 500 600
Users

Figure 4: RAMSAN Transaction and Response Times


th
For the RamSan 4-node run, at no time, even at 500 users, do the average, 90 percentile, or maximum
th
transaction or response times ever exceed 6 seconds. In fact, the average and 90 percentile transaction and
response times for the RamSan run never exceed 0.4 seconds.

The RamSan serves more users, at a higher transaction rate, and delivers results anywhere from 5 to 10 times
faster, even at maximum load.

Summary of Scalability Results


In every phase of scalability testing the RamSan out-performed the HDD-based database. In total transactions,
maximum number of users served at peak transactions, and lowest transaction and response times the RamSan -
based system showed a factor of 5-10 times better performance and scalability. In fact, a single node RamSan
test outperformed the 4 node HDD test.

These tests show that a single server with 8 CPUs and 16GB of memory hooked to a RamSan-500 can
outperform a 4-node, 8 CPU, and 16GB per node RAC system when it is running against a 90 disk RAID10 array
setup with identical configurations for the Oracle database.

It should also be noted that the I/O load from the SYSTEM, UNDO, TEMP, and SYSAUX tablespaces, as well as
from the control file and REDO logs was offloaded to RamSan400s. Had this additional load been placed on the
disk array, performance for the HDD tests would have been much worse.

Memory Restriction Results: 1 to 9 Gigabyte Cache Sizes


In the next set of TPC-C runs the number of 4 gigabit Fibre Channel links was maximized for the RamSan arrays.
Since the disk arrays (2 with 1-2Gbps FC links each) were not bandwidth constrained, no additional FC links were
added for them. The Fibre Channel and InfiniBand layout is shown in Figure 5.

Figure 5: Fibre and InfiniBand Connections


In order to properly utilize the InfiniBand interconnect the kernel of the Oracle executable must be relinked with
the RDP protocol.

With 1-2 Gbps interfaces per disk array this translates to about 512 MB/s of bandwidth, or 64K IOPS at 8k size.
Because the database is configured at an 8k blocksize and this is OLTP with mostly single block reads, this
should give sufficient bandwidth. With 90 disk drives the maximum expected IOPS would be 18,000 if every disk
could be accessed simultaneously at the maximum random I/O rate.
RedHat multipathing software was used to unify the multiple ports presented into a single virtual port in the
servers for each device. The RamSans were presented with the names shown in the diagram in Figure 5. The
HDD arrays were formed into a single ASM diskgroup with one array as the primary and the other as its failure
group. The HDDATA and HDINDEXES tablespaces were then placed into that ASM diskgroup.

The memory sizes for the various caches (default, keep, and recycle) were then reduced proportionately to see
the effects of memory stress on the four node cluster. The results of the memory stress test are shown in Figure
6.

SSD and HD TPS vs Memory 4 Instances

4000

3500

3000

2500
TPS

2000

1500

1000

500

0
0 100 200 300 400 500 600
Users TPS 9 gb TPS 4.5g
TPS 2.25G TPS 6.75g
TPS1.05g HD 9gb
HD 6.85gb HD 4.5 gb
HD 3 gb

Figure 6: RAMSAN and HDD Results from Memory Stress Tests


Figure 6 clearly shows that the RamSan handles the memory stress better by a factor ranging from 3 at the top
end at 9 GB total cache size to a huge 7.5 at the low end comparing a 1.05 GB cache on the RamSan to a 4.5 GB
cache on the HDD run. The HDD run was limited to 3 GB at the lower end by time constraints. However, as
performance would only get worse as the cache was further reduced, further testing was felt to be redundant.

Of course the reason for the wide range between the upper and lower memory results, from a factor of 3 to 7.5
times better performance by the RamSan, can be traced to the increase in physical I/O that resulted from not
being able to cache results and the subsequent increase in physical IOPS. This is shown in Figure 7.
IOPS High and Low Memory
100,000
90,000
80,000
70,000
60,000
IOPS 50,000
40,000
30,000
20,000
10,000
0
0 10 20 30
15 Minute Interval SSD HM
SSD LM
HDD LM
HDD HM

Figure 7: IOPS for RAMSAN and HDD with Memory Stress


Figure 7 shows that the physical IOPS for the RamSan ranged from 86,354 at 1.050 GB down to 33,439 at 9 GB.
The HDD was only able to achieve a maximum of 14,158 at 1.050 GB down to 13,500 at 9 GB of memory utilizing
the 90 disk RAID10 ASM controlled disk array. The relatively flat response curve of the HDD tests indicate that
the HDD array was becoming saturated with I/O requests and had reached its maximum IOPS.

Figure 7 also shows that the IOPS was allowed to reach the natural peak for the data requirements in the
RamSan-based tests, while for the HDD tests it was being artificially capped by the limits of the hardware.

The timing for the db file sequential read waits for the RamSan and HDD runs are also indicative of the latency
differences between RamSan and HDD. For the RamSan runs the time spent doing db file sequential reads
varied from a high of 169,290 seconds for the entire measurement period at 1.05 GB down to 53,826 seconds for
the 9 GB run. In contrast the HDD required 848,389 seconds at 3 GB down to 579,571 seconds at 9 GB.

Memory Stress Results Summary


The tests show that the RamSan array handles a reduction in available memory much better than the HDD array.
Even at a little over 1 GB of total cache area per node for a 4-node RAC environment, the RamSan outperformed
the HDD array at a 9 GB total cache size per node for a 4-node RAC using identical servers and database
parameters. Unfortunately due to bugs in the production release of Oracle11g, release 11.1.0.7, we were unable
to test the Automatic Memory Management feature of Oracle; the bug limits total SGA size to less than 3-4
gigabytes per server.

Summary
These tests prove that the RamSan array out-performs the HDD array in every aspect of the OLTP environment.
In scalability, performance, and under memory stress conditions, the RAMSAN array achieved a minimum of 3
times the performance of the HDD array, many times giving a 10 fold or better performance boost for the same
user load and memory configuration.

In a situation where the choice is between buying additional servers and memory and an HDD SAN versus getting
fewer servers, less memory, and a RamSan array, these results show that you will get better performance from
the RamSan system in every configuration tested. The only way to get near the performance of the RamSan array
would be to over-purchase the number of disks required by almost two orders of magnitude to obtain the required
IOPS through increased spindle counts.
When the costs of purchasing the additional servers and memory and the costs of floor space, disk cabinets,
controllers, and HBAs are combined with the ongoing energy and cooling costs for a large disk array system, it
should be clear that RamSans are a better choice.

S-ar putea să vă placă și