Sunteți pe pagina 1din 38

Real life experiences of RAC scalability

with a 6 node cluster

Eric Grancher
eric.grancher@cern.ch
CERN IT

Anton Topurov
CERN IT Department
anton.topurov@cern.ch
CH-1211 Genève 23
Switzerland openlab, CERN IT
www.cern.ch/it
Outline

• CERN computing challenge


• Oracle RDBMS and Oracle RAC @ CERN
• RAC scalability – what, why and how?
• Real life scalability examples
• Conclusions
• References

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
LHC gets ready …

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it Jürgen Knobloch / CERN
The LHC Computing
Challenge
• Signal/Noise 10-9
• Data volume
– High rate * large number of
channels * 4 experiments
 15 PetaBytes of new data
each year
• Compute power
– Event complexity * Nb.
events * thousands users
 100 k of (today's) fastest
CPUs
• Worldwide analysis &
funding
– Computing funding locally
in major regions &
countries
CERN IT Department
CH-1211 Genève 23 – Efficient analysis
Switzerland
www.cern.ch/it everywhereJürgen Knobloch / CERN
 GRID technology
WLCG Collaboration
• The Collaboration
– 4 LHC experiments
– ~250 computing centres
– 12 large centres
(Tier-0, Tier-1)
– 38 federations of smaller
“Tier-2” centres
• Growing to ~40 countries
– Grids: EGEE, OSG, Nordugrid
• Technical Design Reports
– WLCG, 4 Experiments: June 2005
• Memorandum of Understanding
– Agreed in October 2005
• Resources
– 5-year forward look
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it Jürgen
Jürgen Knobloch / CERN
Knobloch/CERN Slide 5
Centers around the world
form a Supercomputer

• The EGEE and OSG projects are the basis of the


Worldwide LHC Computing Grid Project WLCG

CERN IT Department
CH-1211 Genève 23
Switzerland Inter-operation between Grids is working!
www.cern.ch/it Jürgen Knobloch / CERN
CPU & Disk
Requirements 2006
350 140

300 120
LHCb-Tier-2
CMS-Tier-2
250
CPU ATLAS-Tier-2 100 Disk
ALICE-Tier-2
LHCb-Tier-1
200 80
MSI2000

CMS-Tier-1

PB
ATLAS-Tier-1
150 ALICE-Tier-1 60
LHCb-CERN
CMS-CERN
100 ATLAS-CERN 40
ALICE-CERN

50 20
CERN:
0
~ 10% 0
CERN IT Department
CH-12112007
Genève 23 2008 2009 2010 2007 2008 2009 2010
Switzerland
www.cern.ch/it Year Jürgen Knobloch/CERNYear
Jürgen Knobloch / CERN Slide 7
Oracle databases at CERN

• 1982 : CERN starts using Oracle


• 1996: OPS on Sun SPARC Solaris
• 2000: Use of Linux x86 for Oracle RDBMS

• Today:
Oracle RAC for most demanding services:
– CASTOR mass storage system (15 PB / year)
– Administrative applications (AIS)
– Accelerators and controls etc.
– LHC Computing Grid (LCG)
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
(our view of) RAC basics

• Shared disk infrastructure, all disk devices have to


be accessible from all servers
• Shared buffer cache (with coherency!)

Clients
CERN IT Department DB Servers Storage
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Linux RAC deployment example

• RAC for Administrative Services move (2006)


– 4 nodes x 4 CPUs, 16GB RAM/node
– RedHat Enterprise Linux 4 / x86_64
– Oracle RAC 10.2.0.3

• Consolidation of administrative applications, first:


– CERN Expenditure Tracking
– HR management, planning and follow-up as well as a self
service application
– Computer resource allocation, central repository of
information (“foundation”)…

• Reduce number of databases, profit from HA, profit


from information sharing between the applications
(same database).
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Linux RAC deployment for
administrative applications
• Use of Oracle 10g services
– One service per application OLTP workload
– One service per application batch workload
– One service per application external access
• Use of Virtual IPs
• Raw devices only for OCR and quorum devices
• ASM disk management (2 disk groups)
• RMAN backup to TSM

• Things became must easier and stable over time


(GC_FILES_TO_LOCKS, shared raw device
volumes …) , good experience with consolidation
CERN IT Department on RAC
CH-1211 Genève 23
Switzerland
www.cern.ch/it
RAC Scalability

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/i
Frits t
Ahlefeldt-Laurvig / http://downloadillustration.com/
RAC Scalability (1)

2 ways of upgrading database performance

Upgrading the hardware (scale up)


- Expensive and inflexible

Adding more hardware (scale out)


- Less expensive hardware and more flexible

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
RAC Scalability (2)

Adding more hardware (scale out)


Pros:
- Cheap and flexible
- Getting more popular
- Oracle RAC is there to help you
Cons:
- More is not always better
- Achieving desired scalability is not easy.

Why?
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Reason 1

• Shared disk infrastructure, all disk devices have to


be accessible from all servers
• Shared buffer cache (with coherency!)

Clients
CERN IT Department DB Servers Storage
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Reason 2

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Reason 3

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Examples

2 real life RAC scalability examples

• CASTOR Name Server

• PVSS

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
CASTOR Name Server

• CASTOR
– CERN Advanced STORage manager
– Store physics production files and user files

• CASTOR Name Server


– Implements hierarchical view of CASTOR name
space
– Multithreaded software
– Uses Oracle Database for storing files metadata

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Stress Test Application

• Multithreaded
• Used with up to 40 threads
• Each thread loops 5000 times on
– Creating a file
– Checking it’s parameters
– Changing size of the file

• Test Made:
– Single instance vs. 2 nodes RAC
– No changes in schema and application code

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Result

CNS : Single instance vs RAC


600

500

400
ops/s

300

200

100

0
1 2 5 7 10 12 14 16 20 25 30 35 40

Single instance
RAC, 2 instances
Threads

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Analysis 1/2

Problem:
• Contention on CNS_FILE_METADATA table

Change:
• Hash partition with local PK index

Result:
10% gain, but still worse than single instance

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Analysis (2/2)

• Top event:
– enq: TX - row lock contention
– Again on the CNS_FILE_METADATA

• Findings:
– Application logic causes row lock contention
– Table structure reorganization can’t help

• Follow – up
– No simple solution
– Work in progress now
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
PVSS II

• Commercial SCADA application


– critical for LHC and experiments

• Archiving in Oracle Database

• Out of the box performance:


100 “changes” per second

• CERN needs:
150 000 changes per second = 1500 times faster!
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
The Tuning Process

1. run the workload,


gather ASH/AWR
information, 10046…

2. find the top


4. modify client event that slows
code, database down
schema, the processing
database code,
hardware
configuration

3. understand why time


is spent on this event

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
PVSS Tuning (1/6)
Table
event trigger on
lastval update eventlast
Update eventlastval set …
… merge (…)

Update eventlastval set … Table


events_
history
150 Clients DB Servers Storage

• Shared resource:
EVENTS_HISTORY (ELEMENT_ID, VALUE…)
• Each client “measures” input and registers history with a
“merge” operation in the EVENTS_HISTORY table
Performance:
CERN IT Department
• 100 “changes” per second
CH-1211 Genève 23
Switzerland
www.cern.ch/it
PVSS Tuning (2/6)

Initial state observation:

• database is waiting on the clients


“SQL*Net message from client”

• Use of a generic library C++/DB


• Individual insert (one statement per entry)
• Update of a table which keeps “latest state”
through a trigger

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
PVSS Tuning (3/6)
Changes:
• bulk insert to a temporary table with OCCI, then call PL/SQL
to load data into history table

Performance:
• 2000 changes per second

Now top event: “db file sequential read”


awrrpt_1_5489_5490.html
Event Waits Time(s) Percent Total DB Time Wait Class
db file sequential read 29,242 137 42.56 User I/O
enq: TX - contention 41 120 37.22 Other
CPU time 61 18.88
CERN IT Department
log file parallel write 1,133 19 5.81 System I/O
CH-1211 Genève 23
Switzerland
db file parallel write
www.cern.ch/it
3,951 12 3.73 System I/O
PVSS Tuning (4/6)
Changes:
• Index usage analysis and reduction
• Table structure changes. IOT.
• Replacement of merge by insert.
• Use of “direct path load” with ETL
Performance:
• 16 000 “changes” per second
• Now top event: cluster related wait event
test5_rac_node1_8709_8710.html
% Total
Event Waits Time(s) Avg Wait(ms) Wait Class
Call Time

gc buffer busy 27,883 728 26 31.6 Cluster


CPU time 369 16.0
gc current block busy 6,818 255 37 11.1 Cluster
gc current grant busy 24,370 228 9 9.9 Cluster
CERN IT Department
gc current block 2-
CH-1211 Genève 23 118,454 198 2 8.6 Cluster
way Switzerland
www.cern.ch/it
PVSS Tuning (5/6)
Changes:
• Each “client” receives a unique number.
• Partitioned table.
• Use of “direct path load” to the partition with ETL
Performance:
• 150 000 changes per second
• Now top event : “freezes” once upon a while

rate75000_awrrpt_2_872_873.html
Avg
Event Waits Time(s) % Total Call Time Wait Class
Wait(ms)
row cache lock 813 665 818 27.6 Concurrency
gc current multi block request 7,218 155 22 6.4 Cluster
CPU time 123 5.1
CERN IT Department
log file parallel
CH-1211 Genève 23
write 1,542 109 71 4.5 System I/O
Switzerland
www.cern.ch/i
undo segmentt extension 785,439 88 0 3.6 Configuration
PVSS Tuning (6/6)
Problem investigation:
• Link between foreground process and ASM processes
• Difficult to interpret ASH report, 10046 trace

Problem identification:
• ASM space allocation is blocking some operations

Changes:
• Space pre-allocation, background task.

Result:
• Stable 150 000 “changes” per second.

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
PVSS Tuning Schema
Table
event trigger on
lastval update eventlast
Update eventlastval set …
… merge (…)

Update eventlastval set …


Table
events_
history
150 Clients DB Servers Storage
PL/SQL:
insert /*+ APPEND */
into eventh (…)
Bulk insert into temp table partition
Temp PARTITION (1)
table select …
Table from temp
Bulk insert into temp table events_
history
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
PVSS Tuning Summary

Conclusion:
• from 100 changes per second to 150 000
“changes” per second
• 6 nodes RAC (dual CPU, 4GB RAM), 32
disks SATA with FCP link to host
• 4 months effort:
– Re-writing of part of the application with
changes interface (C++ code)
– Changes of the database code (PL/SQL)
– Schema change
– Numerous work sessions, joint work with other
CERN IT groups
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Scalability Conclusions

• ASM / cluster filesystem / NAS allow


– a much easier deployment
– way less complexity and human error risk.
• 10g RAC is much easier to tune
• RAC can boost your application performance, but
also disclose the weak design points and magnify
their impact
• Proper application design is the key to almost
linear scalability for a “non-read-only” application

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Recommendations

• Understand the extra features


• Take advantage of RAC connections
management
– Use « services » with resource management.
– Load balancing : client side, server side, client +
server side…
• Use connection failover mechanisms:
– « Transparent Application Failover » re-
connection, loss of modifications non yet
committed.
– « Fast Connection Failover » (10g), notification,
CERN IT Department
client side to be implemented.
CH-1211 Genève 23
Switzerland – Minimum: re-connection.
www.cern.ch/it
The Message to You

• RAC can scale even for write intensive


applications
• As you know, RAC scalability tuning is
challenging
• Create RAC-aware applications from the
beginning
– Application design is key

• The effort is paid back:


– With better application performance
– With highly available system
CERN IT Department
CH-1211 Genève 23
Switzerland
– With flexibility and lower price for hardware
www.cern.ch/it
References

• “Pro Oracle Database 10g RAC on Linux”


Julian Dyke and Steve Shaw

• “Connection management in RAC” James


Morle

• "RAC awareness SQL“ pythian.com

• “ETL 10gR2 loading “ Tom Kyte

CERN IT Department
CH-1211 Genève 23
Switzerland
• CERN IT-DES (CERN IT-DES group web
www.cern.ch/it
site)
Q&A

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

S-ar putea să vă placă și