Documente Academic
Documente Profesional
Documente Cultură
Developer: PHP
By John Lim
Published June 2009 Running a software application that is able to work reliably through hardware and software failures is incredibly hard. Our client, a Malaysian bank, commissioned us to develop a Web-based financial application for them that would be used in an intranet setting and was considered mission-critical. This meant that they that wanted a system that was reliable and had automatic failover to another server whenever any component failed. In this article, I will cover the network, architecture and design of our RAC application. Then I will discuss the real-world experiences and problems we experienced. The experiences in general are also relevant even if you are using Java or .NET or your application and not PHP.
In our network, we have two load balancers, in active-passive mode to distribute HTTP queries to our two application servers. The application servers are running Squid 3, Apache 2.2, and PHP 5, and use Oracle's client side load-balancing and failover to connect to the Oracle database cluster. Squid 3 acts as a front-end web server on port 80. Squid will cache and dispense to web browsers static files such as Javascript and CSS, and GIF images. When Squid detects a .php file is being requested, Squid passes the request to Apache listening on a different port; this ensures that Apache can concentrate on executing PHP scripts without distraction. The Oracle RAC is a cluster of independent database servers that share the same data and cooperate as a single system. Our RAC is a two-node cluster using a multipath connection to a Hitachi Storage Area Network (SAN). We use cluster-aware file-systems on the SAN to ensure that we both RAC nodes are able to write to the same files simultaneously without corrupting the data. There is also a private interconnect between the two RAC nodes, which is just a 1GB private LAN for data synchronization purposes.
Oracle ASM
Oracle ASM is a raw storage format highly optimized for database use. If you are familiar with tablespaces and data files inside an Oracle database, then you will see similarities with the way ASM is organized. In an Oracle database, we store tables in virtual tablespaces made up of multiple physical data files. Similarly, in Oracle ASM, we store these data files in virtual diskgroups which is made up of multiple physical disks. Oracle ASM allows you to create mirrored groups of raw disks, or use external raw devices that are already RAID compliant. In our case, the SAN is already RAID 1, so we configured the ASM to use external redundancy. Oracle ASM is managed through a database instance. This is not the same database instance where your production data resides. This means you actually have two database instances running on each RAC node with separate SID's (Oracle database identifiers): An Oracle ASM instance to manage your raw devices, normally with an SID of +ASMx where x is the server node number (starting from 1) A production database instance with a SID that you define yourself, eg. DATAx where x is the server node number (starting from 1) To connect to the Oracle ASM instance from the command line on server 1, you need to set the ORACLE_SID environment variable to connect to the right instance before running sqlplus:
su - oracle ORACLE_SID=+ASM1; export ORACLE_SID sqlplus / as sysdba
You can then view and modify the configuration of the disks the using standard ASM views:
v$asm_diskgroup v$asm_disk v$asm_file
Oracle Clusterware
Oracle Clusterware manages the startup, shutdown and monitoring of database instances, listeners and other services. When a service go down, Clusterware will automatically restart the service for you. The cluster communicates through a private interconnect, in our case a 1Gbit private LAN; this means you need at least two network interface cards on each database node. One for the bank network, and another for the private interconnect. If both the private and bank networks go down, the cluster can still communicate through the cluster filesystem. Special voting files were created in the OCFS2 file system to allow messages to be exchanged between the cluster nodes as a health check and to arbitrate cluster ownership among the instances in case of network failures.
synchronizing data buffered in memory within multiple instances. Oracle uses a technology called Cache Fusion that uses a high-speed interconnect to manage and synchronize the buffer caches. The speed of this synchronization is a good measure of the effectiveness of the RAC cluster. We will discuss more about this when we talk about Oracle Enterprise Manager.
This tells ASM to load all 1HITACHI* devices before sd* devices. For more info see this ASMLib multipath link. Another gotcha we encountered was that we were running 64 bit Linux, so we only installed the required 64 bit RPMs. Mysterious errors occurred. It turns out that some 32-bit RPMs need to be installed too. Based on metalink document 339510.1 we installed both 32 and 64 bit versions of the following RPMs: libaio-0.3.103-3 glibc-devel-2.3.4-2.9 The remaining parts of the installation were relatively straightforward. It just required following the instructions carefully and not making any typing mistakes! One final word of advice: most of the difficulties were because of problems interfacing to non-Oracle software and hardware. Once we overcame these issues the setup was just follow the documentation. For troubleshooting, we found My Oracle Support (formely MetaLink; support contract required) more helpful than Google, and would highly recommend it.
Client-side Setup
On the application servers, make sure you are running the latest release of your Oracle client libraries to ensure that Oracle Transparent Application Failover (TAF) is supported. In our case, we downloaded
http://www.oracle.com/technology/pub/articles/lim-php-rac.html?_template=/ocom/print[7/8/2010 12:13:04 PM]
Oracle 11 g Instant Client install (yes, the Oracle 11 g Instant Client works fine with our 10 gR2 database) and the PHP OCI8 1.3 extension source code. We then compiled the extension by hand. For instructions, see this guide.
RAC1-VIP and RAC2-VIP are the host names (that must be defined in /etc/hosts of database and application servers) of the virtual IPs used by the RAC servers. These virtual IPs are configured during installation and are used for fast connection failover. When one server goes down, the other server will take up both virtual IPs on the public interface, ensuring there is no delay in failover. TYPE=SELECT means that if Oracle is executing a SELECT statement when a server fails, on failover that SELECT statement is re-executed and the cursor is positioned in a manner that the client can seamlessly continue fetching the rows. Transactions that INSERT or UPDATE are still rolled back however. You can also configure TYPE=SESSION meaning if the first connection fails, all work in progress at that point are irrevocably lost. METHOD=BASIC means only connect to the failover server when any server fails. METHOD=PRECONNECT means make connections to all servers even before any server connection fails; this avoids the overhead of reconnection, but means that additional unused resources are taken up by each pre-connect connection.
Oracle client libraries and database must both be 10 gR2 or later for this to work.
Load Balancing
You can also tune load balancing on the server side. Internally Oracle has a load monitoring module called the Load Balancing Advisory (LBA) that will select the node to run your connection on. In PHP, load balancing only occurs on connection (Connection Load Balancing in Oracle terminology) and is tuned for long running connections such as PHP persistent connections. Alternatively, you can configure RAC to be tuned for short running sessions (e.g. PHP non-persistent connections) using:
execute dbms_service.modify_service( service_name => 'RACSERVICE', clb_goal => dbms_service.clb_goal_short); -- or clb_goal_long
PHP Application Integration Our PHP application contains 2,900 source code files and took over 20 man-years to develop. The financial software we built is based on our proprietary PhpLens Application Server technology and comes with a workflow engine, integrated graphical screen designer, and report writer. It uses the open source ADOdb library to connect to Oracle. Integrating our PHP application was very straightforward. We did not have to make any PHP code changes to integrate with Oracle RAC. In terms of error handling, we have a transaction manager built into ADOdb that keeps track of all errors, and will auto-rollback a transaction whenever an error is detected. Something we are considering in a future release of ADOdb is the ability to restart failed transactions caused by failover as these are not automatically handled by Oracle. Instructions on how to do so are described in this Oracle white paper on PHP Scalability and High Availability. We did have to make some changes on the Oracle side of our application to accommodate RAC behavior. In our application we use sequences extensively to generate surrogate primary keys. In Oracle RAC, a global lock is required to ensure that sequences are generated in order within the nodes, and will degrade database performance. So we recreated our surrogate primary key sequences using the following command:
CREATE SEQUENCE seqname START WITH nextvalue CACHE 100 NOORDER
This prevents Oracle from invoking the global lock on every invocation of a new sequence number. The disadvantage of this approach is that although the sequence numbers are guaranteed to be unique, they are no longer in ascending order and gaps might occur in the numbering.
both application servers were configured to run Apache in pre-fork mode with MaxClients set to 150. Note that if we had not been running Squid to manage the static files, the safety margin would have to be higher as at least half of all Apache child processes would be involved in downloading static files. Each Apache child process would be running one PHP persistent connection, so that meant that if only one RAC node was up, we would need 300 Dedicated Server connections configured in the database to handle both application servers. After including other processes, dblink's from other Oracle servers, and a safety factor, we configured the maximum number of Dedicated Server processes (e.g. sessions) to 500. From this we derived the following settings: Oracle parameter processes Value Explanation Memory Used
500 x ~4MB = 2GB 500 Number of dedicated server processes, and is equivalent to the number of connections you can make to the database. For easy calculation, we estimated that each process would take 4 MB. 4GB Memory for Shared Global Area. Used for data buffers and other shared data structures. 4GB
sga_target
I have to admit that workload estimation prior to rollout is more an art than a science. On rollout our estimate met reality; it turned out that the processes parameter was too low, so we increased it to 750 and reduced the sga_target to 3GB.
Failover Testing
We tested failover of the Oracle RAC cluster by running SQL*Plus 11 g (with its -F option) from the application server. We tested the following scenarios: Shutting down Oracle on one server instance by performing using the command shutdown abort Rebooting a server The failover is not instantaneous; the client needs time to detect that the server is down and perform the switch. In testing both scenarios, after a pause of a few seconds, SQL*Plus running on the application server detects the change and switches over to the other database instance. You can determine which instance you are running using:
select instance_name from v$instance
For example, originally I was maintaining the ASM setup using SQL*Plus; I was flabbergasted when I found how easy it was within Enterprise Manager. You can start Oracle Enterprise Manager by running:
emctl start dbconsole
Then you can access the Oracle Enterprise Manager Database Control through the URL: http://serveraddress:1158/em. Oracle RAC performance statistics are another important feature of Oracle Enterprise Manager. A measure of how efficiently RAC is performing is global cache block access latency. This metric measures how long it takes for Oracle RAC to synchronize data blocks between the nodes. If the measure is over consistently over 5 ms, you will need to look into ways of reducing latency, such as database sharding (partitioning the cluster so each node manages a subset of the data). You can view this statistic from the Performance tab:
We then run a PHP script that reads the contents of the asmlogfile.txt and writes it to the production database instance.
Managing Jobs
Management of batch jobs in a high availability environment is challenging. This is because you need to have infrastructure to failover your batch jobs. Oracle provides you with tools to manage this with the DBMS_SCHEDULER package. This allows you associate a job with a database service that supports failover. First you need to define a service that uses failover in tnsnames.ora. In the following example, FAILOVER is defined, but not LOAD_BALANCE: this means all jobs will run on RAC2-VIP by default, unless that server is down:
FAILOVER_SERVICE = (DESCRIPTION = (FAILOVER = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = RAC2-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = RAC1-VIP)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = BATCH_SERVICE) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 20) (DELAY = 1) ) ) )
Then define a job class for all jobs that uses this service:
DBMS_SCHEDULER.create_job_class( job_class_name => 'MY_JOB_CLASS', service => 'FAILOVER_SERVICE');
Conclusion
Once we got past the learning hurdle, our Oracle RAC experiences have been very positive. Uptime is high and performance is good. At peak load, there are 650 users logged in, running 70 transactions per second. For our customer, what is most important is to them is peace of mind that the data is managed reliably, safely and with no interruption to business. Oracle RAC gives them that assurance. John Lim is based in Malaysia. He is the developer of ADOdb, a popular database library for PHP. John has been also been eating, drinking, and sleeping professionally with Oracle since 1995.