Sunteți pe pagina 1din 19

MySQL:

A Guide to High Availability

A MySQL Strategy Whitepaper

Copyright 2016. Oracle and/or its affiliates. All rights reserved.


Table of Contents

1. Executive Summary ...................................................................................... 3


2. Understanding the Causes and Effects of Downtime ................................ 3
2.1. Calculating the Cost & Impact of Downtime 3
3. Determining High Availability Requirements ............................................. 4
Establishing Service Level Agreements (SLAs) 5
3.1. Mapping Application Needs to HA Architectures 5
4. Database Replication .................................................................................... 6
4.1. Replication Modes and Data Consistency 6
4.2. Implementing Master/Slave Replication in MySQL 7
4.3. MySQL 5.7 Replication Enhancements 8
4.4. Monitoring MySQL Replication ........ 10
5. Shared-Nothing, Failover Clusters ............................................................ 11
5.1. MySQL (NDB) Cluster ..................... 11
5.2. MySQL Group Replication ............... 13
6. Comparing MySQL HA Solutions .............................................................. 15
7. General and Third-Party HA Technologies ............................................... 16
8. MySQL Application Client Failover............................................................ 16
8.1. MySQL Connectors ......................... 16
8.2. MySQL Router................................. 16
9. Operational Best Practices......................................................................... 17
10. Conclusion ................................................................................................... 18
11. Additional Resources ................................................................................. 19

Copyright 2016. Oracle and/or its affiliates. All rights reserved.


1. Executive Summary
Data is the currency of todays web, mobile, social, enterprise and cloud applications. Ensuring data is always
available is a top priority for any organization minutes of downtime will result in significant loss of revenue and
reputation.

There is not a one size fits all approach to delivering High Availability (HA). Unique application attributes, business
requirements, operational capabilities and legacy infrastructure can all influence HA technology selection. And then
technology is only one element in delivering HA People and Processes are just as critical as the technology
itself.

This Guide is designed to assist Developers, Architects, and DBAs in navigating the complex waters of HA. It
presents:

A methodology for selecting the right HA solution to meet Service Level Agreements
A tour of the leading certified HA solutions for MySQL
Operational best practices to implement and support HA

As the worlds leading open source database, there are many options for MySQL HA, scaling all the way up to
99.999% uptime. This guide is designed to discuss your options and show you how to get the best levels of
availability for your application.

2. Understanding the Causes and Effects of Downtime


In developing a strategy to make services highly available, it is important to understand the different causes of
downtime and the impact they can have on your organization. As shown in Error! Reference source not found.
below, downtime can generally be attributed to one of four events:

System Failures: server faults, software bugs or crashes, networking errors


Physical Disasters: events causing failures of an entire data center, including fire, flood, hurricanes
Scheduled Maintenance: hardware and software upgrades, patches, hot-fixes
Operator or User Errors: accidental or malicious activities such as file deletion, malware, and poor
operational procedures

For obvious reasons, organizations are reluctant to share details on outages they experience, but anecdotal
evidence suggests the following:

50% of all outages are the result of failure and/or disaster events
30% of all outages are the result of scheduled maintenance operations
20% of all outages are the result of operator or user errors

2.1. Calculating the Cost & Impact of Downtime

The starting point in identifying the appropriate HA strategy for an application is usually the calculation of revenue
losses arising from downtime, over a given time period, i.e.:

Orders cant be placed


Financial trades cant be completed

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 3
Subscribers cant be billed, etc.

However, it is also important to consider that direct revenue loss is only one aspect in calculating the true impact of
downtime. To gain a complete perspective, it is also necessary to factor in other, often less quantifiable factors, that
collectively can dwarf any immediate direct revenue loss, including:

Damage to brand image


Impact to customer relationships, satisfaction and loyalty
Loss in employee productivity
Potential regulatory issues if an essential service is unavailable or important data (i.e. customer records,
financial transactions, etc.) is corrupted

The impact of downtime varies by application, and is dependent on factors such as the affected number of internal
and external users, the value and volume of transactions, regulatory and/or competitive pressures, etc. As an
example, internal procurement systems will not incur the same cost of downtime as a web-based content
management system, which in turn is nowhere near as damaging as the outage of an eCommerce engine or
telecoms service.

To help refine cost analysis and guide technology selection, it is important to understand the amount of time an IT
service can be unavailable before the organization suffers a material loss. This is a factor often used in Business
Continuity Planning, and is referred to as the Recovery Time Objective (RTO), which is discussed in the next
section of the Guide.

The length of acceptable downtime will again vary across business processes. For example a high volume
eCommerce web site, where users expect rapid response times and for which customer switching costs are very
low, is likely to have a very low tolerance to any length of downtime. However, for systems that support back-end
operations such as shipping and billing, the length of downtime can be higher without materially affecting the
business.

Understanding both the direct and indirect costs of downtime for each application or service is essential because
this will determine:

a. The SLAs required by the business


b. The HA technology chosen to meet those SLAs
c. Operational processes to implement, monitor and manage the HA technology

3. Determining High Availability Requirements


Implementing HA for applications can be complex and costly. It is therefore critical that an organization perform a
thorough analysis of their business requirements.

It is important to set expectations with business users. While avoiding any form of downtime is always highly
desirable, it is largely impractical. Higher levels of availability are typically achieved by deploying systems with
increasing levels of redundancy and fault-tolerance. However, greater redundancy will also increase the total cost
and complexity of the system due to requirements for more hardware and software, as well as demanding a larger
investment in IT staff, processes, and services. Change management becomes more rigidly defined, which can
impact business agility.

To guide analysis and technology selection, RTO and RPO are two important considerations.

Recovery Time Objective (RTO):

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 4
The availability level of an HA solution only defines application uptime over a specified period, e.g. 99.99% per year
or 99.9% per week. To choose the right architecture for high availability it is also important to define the maximum
acceptable downtime per incident in order to avoid a break in business continuity. This measure is defined as the
Recovery Time Objective.

Recovery Point Objective (RPO):

The Recovery Point Objective is the point in time to which data must be recovered when a service is re-established.
RPO is typically determined by considering the type of application. For example, financial transactions will demand
a different RPO from clickstream data. The RPO allows an organization to define maximum the window of time prior
to a disaster during which data loss is acceptable. The RPO can be anything from microseconds to days.

Analysis of the business requirements for application availability, including RTO and RPO, coupled with an
understanding of the associated costs, enables an optimal solution to be developed that is balanced to meet the
needs of the organization, within its financial and resource constraints.

Establishing Service Level Agreements (SLAs)


Using the cost and impact analysis described above, the organization can start to define their applications SLAs.
Many organizations categorize their applications into four tiers, and then select the HA solution that best achieves
the defined SLAs:

Tier 1 applications are mission-critical that incur maximum disruption if they are unavailable. They have the
most stringent HA requirements with systems needing to be available on a continuous or near-continuous
basis. Good examples include emergency service systems, utilities (including telecommunications),
eCommerce, and market trading platforms.
Tier 2 applications are typically business-critical, but do not need to maintain the 99.999% availability
demanded by Tier 1, mission-critical applications. Examples include web content management systems,
user authentication, session management, Customer Relationship Management (CRM) systems, corporate-
wide email, etc.
Tier 3 business applications still require some type of HA mechanism to reduce downtime but are not as
critical as the applications above. They would typically serve specific web functions such as feeds, blogs or
wikis, or internal Line of Business processes such as Procurement, Human Resources, Data Marts, etc.
Tier 4 applications may be related to internal development or small departmental deployments. Systems
supporting these processes usually do not have the HA requirements of the higher tiers.

Once the uptime requirements of the application have been agreed, the next step is to evaluate the capabilities of
various HA architectures and select those that best meet the SLA requirements of the business.

3.1. Mapping Application Needs to HA Architectures


There are multiple architectures that can be used to achieve highly available database services, each differentiated
by the levels of uptime they offer. These architectures can be grouped into three main categories:
Database Replication
Tightly Coupled Clusters & Virtualized Systems
Shared-Nothing, Geographically-Replicated Clusters

As illustrated below, each of these architectures offer progressively greater levels of uptime, but this needs to be
balanced against the potentially higher levels of cost and complexity each incurs. Simply deploying a high
availability architecture is not a guarantee of actually delivering HA. In fact, a poorly implemented and managed
shared-nothing cluster could easily deliver lower levels of availability than a simple data replication solution.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 5
By understanding the availability requirements of each application, it is possible to map the database deployment
model to the appropriate HA architecture. As the worlds most popular open source database there are many
different approaches available to delivering highly available MySQL services. The following sections of the Guide
discuss the HA architectures certified and supported by Oracle.

4. Database Replication
Replication is the most common approach to delivering high availability for MySQL. Replication is a native feature of
MySQL, available out-of the-box without any complex add-ons or options.

Replication enables MySQL to copy changes from one instance to others. This is used to increase the availability
and scalability of a database, enabling MySQL to scale-out beyond the read capacity constraints of a single system.

When deployed for HA, database updates are replicated, with the goal of allowing failing-over in the event of an
outage, either due to a failure or maintenance event. Enhancing flexibility, MySQL is able to replicate both within
and across multiple geographically dispersed data centers, thus also enabling disaster recovery.

4.1. Replication Modes and Data Consistency


There are multiple modes of replication, defined as asynchronous master/slave, semi-synchronous master/slave,
and group replication. Note that you can mix and match all three types of replication in order to handle varied and
complex replication topologies.

Asynchronous Master/Slave Replication

By default, MySQL replication is asynchronous. Updates are committed to the database on the master and then
relayed to the slave where they are also applied. The master does not wait for the slave to receive the update, and
so is able to continue processing further write operations without being blocked, waiting for acknowledgement from
the slave(s).

When using asynchronous replication, there are no guarantees that all updates have been replicated to the slave in
the event of an outage of the master.

Any delay (lag) of committed updates to the slaves is most noticeable with highly transactional applications where
there is an abundance of write operations.

With the correct components and tuning, replication can appear to be almost instantaneous to the application.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 6
Using asynchronous replication, slaves also need not be connected permanently to receive updates from the
master. This means that updates can occur over long-distance connections and even over temporary or intermittent
connections. Depending on the configuration, you can replicate all databases, selected databases, or even selected
tables within a database.

Semi-Synchronous Master/Slave Replication

Semi-Synchronous Replication can be used as an alternative to MySQLs default asynchronous replication, serving
to enhance data integrity and consistency.

Using semi-synchronous replication, a commit is returned to the client only when a slave has received the update,
or a timeout occurs. Therefore it is assured that the data exists on the master and at least one slave (note that the
slave will have received the update and stored it in its local log but has not necessarily applied it when a commit is
returned to the master).

Group Replication Native MySQL High Availability



Group Replication 5.2) delivers active/active write-anywhere replication clusters, with support for automatic conflict
detection and resolution.

Group Replication takes care of membership, consistency, and other management related functions without the
1
need for manual intervention or custom tooling. Easy High Availability for MySQL has arrived!

4.2. Implementing Master/Slave Replication in MySQL


MySQL master/slave replication is implemented by configuring one instance as a master, with one or more
additional instances configured as slaves. The master will log the changes to the database, which are then sent
2
and applied to the slave(s) immediately or after a set time interval .

1
5.2 MySQL Group Replication
2
Time Delayed Replication was a new feature added in MySQL 5.6

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 7
Beyond HA, MySQL replication is often employed to scale-out the database across a farm of servers, as illustrated
below, all write operations (and any reads which need to include the most recent changes) are directed to the
Master--or to any node participating in Group Replication--while other SELECT statements are directed to the
slave(s), with query routing implemented either via the appropriate MySQL connector (e.g. the Connector/J JDBC,
Connector/NET .NET, or PHP drivers), or MySQL Router (see section 8).

MySQL replication can be deployed in a range of topologies to support diverse scaling and HA requirements. It
represents a mature and well proven approach to scaling workloads while providing a foundation for HA.

4.3. MySQL 5.7 Replication Enhancements

Multi-Source Replication

MySQL Multi-Source Replication enables a replication slave to receive transactions from multiple sources
simultaneously. Multi-source replication can be used to:

Consolidate data from multiple servers to a single server.


Back up multiple servers to a single server.
Merge table shards.
Combine master/slave replication with Group Replication for disaster recovery and multi-datacenter
solutions

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 8
The monitoring interfaces now also provide details for each replication channel, and we have introduced new
performance_schema tables to easily monitor your entire replication topology.

Transaction Based Parallel Replication

MySQL 5.7 adds intra-schema multi-threaded slaves. With this implementation (slave-parallel-type=LOGICAL-
CLOCK) the slave will be able to apply transactions in parallel, even within a single database or schema, as long as
they have a disjoint read and write set. This work allows the slave to keep up with the master, thus eliminating the
most common cause of slave lag.

250%#
Slave#throughput#vs.#96#Thread#Master#
200%#

150%#

100%#

50%#

0%#
1# 8# 24# 48#
Slave#Threads#

8-10x higher performance with Multi-threaded slaves.

Online Replication Changes



You can now enable Global Transaction ID (GTID) based replication as an online operation, allowing you to take
advantage of the next generation replication features without incurring downtime in your MySQL production
environments.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 9
You can also change replication filters online now (CHANGE REPLICATION FILTER), providing a variety of ways to
configure data replication within your MySQL farm, and finally you can perform master failover operations (CHANGE
MASTER) without stopping replication execution on slaves.

Semi-Sync Replication Enhancements



The Semi-Sync replication plugin has improved semantics that provide better performance and reliability.

Enhanced Monitoring

In addition to the legacy SHOW commands, we have added a variety of new Performance Schema tables that offer
unprecedented insights into what is happening and how things are performing. This allows you to easily ensure that
your replication topology is healthy and performing well, while also providing the information needed to debug any
issues that may occur.

4.4. Monitoring MySQL Replication


In order to achieve high availability it is crucial to monitor systems and receive automatic notifications of issues or
potential problems before they impact performance or availability of the application. Therefore, comprehensive
management and monitoring tools should be regarded as mandatory in any HA installation.

Many MySQL customers use the MySQL Enterprise Monitor (discussed in more depth in Section 9) with its GUI
dashboard to manage their replication topologies. MySQL Enterprise Monitor makes it easier to scale-out and
achieve high availability using the MySQL Replication Monitor, providing auto-detection, grouping, documenting,
and monitoring of your entire replication topology. Changes and additions to existing replication topologies are also
auto-detected and displayed, providing DBAs with instant visibility into newly implemented updates.

As the Replication Advisor identifies a problem and sends out an alert, the DBA can use the alert content along with
the Replication Monitor to drill into the status of the affected instance or group. Using the Replication Monitor and
the expert advice from the Replication Advisor they can review the current status and metrics--such as Slave I/O
and Slave applier thread status, seconds behind master, GTID metadata, the last seen error, and more--which are
relevant to diagnosing and correcting any problems.

The Replication Monitor is designed and implemented to save DevOps time writing and maintaining scripts that
collect, consolidate, and monitor similar MySQL Replication status and diagnostic data.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 10
5. Shared-Nothing, Failover Clusters
The methods of achieving high availability discussed above satisfy the uptime requirements of many applications,
but there are classes of services that are highly transactional and update-intensive, demanding near-continuous
availability. Examples include eCommerce and financial transactions, billing, user access and authentication,
network infrastructure applications, and telecommunications services. Other services such as social gaming and on-
line marketing campaign tracking also have to deal with increasingly demanding SLAs dictating 99.999% uptime.

MySQL (NDB) Cluster delivers in-memory real-time performance and auto-sharding (partitioning) for high write
performance and predictable low latency. MySQL Cluster is proven in environments demanding the highest levels
of availability, continuing to deliver service in the event of failures, disasters, and planned maintenance operations.

MySQL Group Replication--which leverages proven and familiar MySQL features: InnoDB, GTIDs, binary logs,
multi-threaded slave execution, multi-source replication, and Performance Schema--provides native high availability
for standard MySQL databases with built-in group membership management, data consistency guarantees, conflict
detection and handling, node failure detection and database failover related operations, all without the need for
manual intervention or custom tooling.

5.1. MySQL (NDB) Cluster


MySQL (NDB) Cluster is an open source, ACID-compliant transactional database designed to deliver real-time in-
memory performance and 99.999% availability. It powers the subscriber databases of many major communications
service providers and is used in global fraud detection for financial transactions. Designed around a distributed,
multi-master architecture with no single point of failure, MySQL Cluster scales horizontally on commodity hardware
with auto-sharding to serve read and write intensive workloads, while being accessible via SQL and NoSQL
interfaces.

MySQL Cluster's real-time design delivers predictable, millisecond response times with the ability to service millions
of operations per second. Support for in-memory and disk-based data, automatic data partitioning (sharding) with

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 11
load balancing, and the ability to add nodes to a running cluster with zero downtime allows linear database
scalability to handle even the most demanding and unpredictable of workloads.

MySQL Cluster comprises three node-types which collectively provide high availability to the application. By using a
single namespace, the different nodes are transparent to the application which can connect to any node and
queries are routed automatically:

Data nodes manage the storage and access to data. Tables are automatically sharded across the data
nodes which also transparently handle load balancing, synchronous replication, failover, and self-healing.
There is no need for any type of additional heartbeating or resource management middleware all of these
functions are integrated directly into MySQL Cluster.
Application nodes provide connectivity from the application logic to the data nodes. Multiple APIs are
presented to the application. MySQL provides a standard SQL interface, including connectivity to all of the
leading web development languages and frameworks. There are also a range of NoSQL interfaces
including JavaScript in Node.js, Memcached, REST/HTTP, C++ (NDB-API), Java, and JPA.
Management nodes are used to configure the cluster and provide arbitration in the event of a network
partition to avoid a split brain which could otherwise lead to data inconsistency.

Resilience to Failures with Self-Healing Recovery

The distributed, shared-nothing architecture of MySQL Cluster has been carefully designed to ensure resilience to
failures, with automated, self-healing recovery:

The data within a data node is synchronously replicated to a neighboring node. If a data node fails, then
there is always at least one other data node storing the same information.
In the event of a data node failure the MySQL Server or application node will automatically use any other
data node in the node group to execute transactions. The application simply retries the transaction and the
remaining data nodes will successfully satisfy the request.
MySQL Cluster detects any failures instantly and control is automatically failed over to other active nodes in
the cluster, without interrupting service to the clients.
In the event of a failure, the data nodes are able to self-heal by automatically restarting, recovering, and re-
synchronizing themselves with the rest of the cluster, all of which is completely transparent to the
application.
Duplicate management server nodes can be deployed so that no management or arbitration functions are
lost if a single management server fails.

Designing the cluster in this way makes the system reliable and highly available since single points of failure have
been completely eliminated. Any node can be lost without it affecting the system as a whole.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 12
MySQL Cluster continues to deliver service, even in the event of catastrophic failures. As long as one data node
from each node group and an application server remain available, the cluster will remain operational.

In addition to the site-level high-availability achieved through its redundant architecture, MySQL Cluster also
supports geographic distribution between datacenters:

Geographic Replication mirrors complete clusters between geographically remote sites.


Multi-site clustering enables a single cluster to be split across remote data centers, with synchronous
replication between sites. (Note: this requires a very high quality WAN!)

Whichever mode is chosen, all sites are Active/Active, and therefore able to accept write operations, with MySQL
Cluster handling conflict detection and resolution. This ensures organizations do not have to carry the overhead of
provisioning and maintaining systems that are idle for most of the time.

Maintaining Availability During Scheduled Maintenance Activities

As discussed earlier, around 30% of all downtime is attributable to scheduled maintenance activities. MySQL
Cluster supports all of the following events as online operations, ensuring the database always continues to provide
service:

Scaling the cluster by adding new nodes.


Updating the schema with new columns, tables, and indexes.
Re-sharding of tables across data nodes to allow better data distribution.
Performing back-up operations.
Upgrading or patching the underlying hardware and operating system.
Upgrading or patching MySQL Cluster, with full online upgrades between releases.

Through the HA capabilities described above, MySQL Cluster is able to eliminate both planned maintenance and
unplanned downtime in order to deliver the 99.999% availability required by the most critical applications.

5.2. MySQL Group Replication

Group Replication implements both a single-primary mode with automatic leader election and a multi-master update
everywhere mode. By using a powerful new group communication system, which provides an in-house
implementation of the popular Paxos algorithm, the group automatically coordinates on data replication,
consistency, and membership. This provides all of the built-in mechanisms necessary for making your MySQL
databases highly available.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 13
Elasticity

With Group Replication, a set of servers coordinate together to form a


group. The group membership is dynamic and servers can leave,
either voluntarily or involuntarily, and join at any time. The group will
automatically reconfigure itself as needed and ensure that any joining
member is synchronized with the group. This makes it easy to quickly
scale your total database capacity up and down as needed.

Failure Detection

Group Replication implements a distributed failure detector to find and


report servers that have failed or are no longer participating in the
group and the remaining members in the group coordinate to
reconfigure the membership.

Fault Tolerance

Group Replication builds on an in-house implementation of the popular Paxos distributed algorithm to provide
distributed coordination between servers. In order for a group to continue to function, it requires a majority of the
members to be online and for them to form an agreement on every change. This allows your MySQL databases to
safely continue to operate without manual intervention when failures occur, without the risk of data loss or data
corruption.

Self-Healing

If a server joins the group, it will automatically bring itself up to date by synchronizing its state from an existing
member. In the event that a server leaves the group, for instance it was taken down for maintenance, the remaining
servers will see that it has left and will reconfigure the group automatically. When that server later rejoins the group,
it will automatically re-synchronize with the group again.

Monitoring

Performance Schema tables provide clear and detailed information and statistics on individual members and for the
group as a whole.

For more information:


- Documentation
- Presentation
- Forum
- Team Blog

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 14
6. Comparing MySQL HA Solutions

As the above table demonstrates, users have a wide range of options to achieve optimum levels of availability for
their applications:

MySQL master/slave replication, MySQL Cluster, and MySQL Group Replication support a range of
operating systems, while the other solutions are limited to specific platforms.
All of the HA solutions with the exception of MySQL Cluster use the general purpose InnoDB storage
engine, which is also the default engine of the MySQL server. MySQL Cluster uses its own NDB storage
engine to support capabilities such as automatic sharding, failover, and recovery. Therefore if users are
migrating from InnoDB, they will need to optimize their queries and schema to achieve the best possible
performance from MySQL Cluster.
All of the HA solutions support application level failover, with the exception of MySQL replication which
requires the use of MySQL Router (discussed later in this document), failover support in the Connector (e.g.
Connector/J and Connector/NET both support this), or the integration of your own custom scripts for this
functionality.
Failover times vary by technology. MySQL Cluster is designed for sub-second failover times, while other
solutions require 1+ seconds to complete the process.
HA solutions such as Solaris Clustering and Windows Failover Cluster rely on shared storage to maintain
data consistency, while MySQL (NDB) Cluster and MySQL Group Replication use shared-nothing local
storage with consistent data replication.
MySQL Replication uses asynchronous replication by default, which can result in updates that have not
been propagated from the master to a slave being lost if the master fails. Configuring semi-synchronous
replication can mitigate this, and the use of Group Replication eliminates the risk of data loss entirely.
MySQL Cluster and MySQL Group Replication are the only HA solutions supporting a fully active/active,
multi-master architecture. While MySQL Cluster is the only solution that supports auto-sharding
(partitioning), enabling users to scale both read and write operations linearly across the cluster.
Each solution delivers progressively higher availability levels, with MySQL master/slave replication
designed for 99.9% uptime, all the way to MySQL Group Replication and MySQL Cluster with 99.999%
uptime.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 15
7. General and Third-Party HA Technologies
In addition to the general HA solutions certified and supported by Oracle:
- Oracle VM -- http://www.oracle.com/us/technologies/virtualization/oraclevm/index.html
- Oracle Clusterware XAG Agent for MySQL -- Oracle Grid Infrastructure Agents Reference Guide
- Solaris Cluster -- http://www.oracle.com/technetwork/server-storage/solaris-cluster/overview/index.html

There are a range of third party technologies that can also be used to increase the uptime of MySQL deployments.
Examples include VMware vSphere, Red Hat Cluster Suite, and Windows Failover Cluster.

rd
Support for any 3 party HA products must be obtained from the respective vendors. Oracle provides support for
1 rd
MySQL on supported platforms , even when used with 3 party HA technologies, as long as any issues can be
recreated in standalone environments.

8. MySQL Application Client Failover

8.1. MySQL Connectors


2 3
Both the Connector/J JDBC and Connector/NET .NET drivers support application connection failover when
4
multiple hosts are specified during the connection initialization. The MySQL Native Driver for PHP also has a plugin
that supports multiple hosts and connection failover.

8.2. MySQL Router


MySQL Router is lightweight middleware that provides transparent routing between your application and any
backend MySQL Servers. It can be used for a wide variety of use cases, such as providing high availability and
scalability by effectively routing database traffic to appropriate backend MySQL Servers. The pluggable architecture
also enables developers to extend MySQL Router for custom use cases.

The MySQL Router allows you to hide the High Availability implementation and the resulting complexities from your
application and the application developers by providing a simple, standard, and consistent application entry point
into your MySQL topology. It provides several core features:

Application Connection Failover your application connections will transparently be redirected to the
right MySQL endpoint after a database failover event.
Load Balancing balance application load across your available MySQL instances in an HA group.
5
Transparent Access to MySQL Group Replication and InnoDB cluster transparent client connection
routing and failover for your applications leverage the native MySQL HA delivered by Group Replication.

1
MySQL support is dependent on a valid MySQL subscription or support agreement
2
https://dev.mysql.com/doc/connector-j/en/connector-j-config-failover.html
3
https://blogs.oracle.com/MySqlOnWindows/entry/how_to_using_replication_load
4
http://php.net/manual/en/book.mysqlnd-ms.php
5
https://dev.mysql.com/doc/mysql-innodb-cluster/en/

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 16
9. Operational Best Practices
High Availability is not only a function of the underlying technology, but also well established and tested operating
procedures managed by a highly skilled operations team. As discussed earlier in this guide, industry analysts
estimate that 80% of downtime is the result of people and process, so the importance of operational best practices
cannot be overstated.

Oracle offers a range of tools and services to enable MySQL users to achieve operational excellence and deliver
against their committed SLAs.

Oracle University

Training of operational and administrative teams reduces the risk of human error that can result in accidental
system outages. Oracle University offers an extensive range of MySQL training from introductory courses (i.e.
MySQL Essentials, MySQL DBA, etc.) through to advanced certifications such as MySQL High Availability and
MySQL Cluster Administration. It is also possible to define custom training plans for delivery at customer sites.
You can learn more about MySQL training from the Oracle University here: http://www.mysql.com/training/

MySQL Consulting

To ensure adherence to best practices from the initial design phase of a project through to implementation and
sustaining, users can engage Oracles MySQL Professional Services consultants. Delivered remotely or onsite,
these engagements help in optimizing the architecture and increasing operational efficiency.

Again Oracle offers a full range of consulting services, from Architecture and Design through to High Availability,
Replication, and Clustering. You can learn more at http://www.mysql.com/consulting/.

MySQL Enterprise Edition and MySQL Cluster Carrier Grade Edition (CGE)

The commercial editions of MySQL deliver the most comprehensive set of advanced features, management tools,
and technical support so organizations can achieve the highest levels of MySQL availability, performance, and
security.

Key components of MySQL Enterprise Edition and MySQL Cluster CGE are discussed below.

24x7 Global Support

MySQL offers 24x7x365 access to Oracles MySQL Support team, which is staffed by seasoned database experts
ready to help with the most complex technical issues, with direct access to the MySQL development team. Oracles
Premier support provides you with:

24x7x365 phone and online support.


Rapid diagnosis and solution to complex issues.
Unlimited incidents.
Emergency hot fix builds.
Access to Oracles MySQL Knowledge Base.
Consultative support services.
Technical support in 29 languages.

The Support team partners with customers in the analysis and remediation of issues that are causing outages,
leading to faster problem resolution, and if needed, generates hot fixes to restore service. This level of assistance
offers significant benefits for HA over community or self-supported environments.

Access to the best practices Knowledge Base is also included within MySQL support agreements. The Knowledge
Base offers great insights into how to configure, provision, and manage highly available MySQL environments.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 17
You can learn more at http://www.mysql.com/support/.

MySQL Enterprise Monitor

During normal operations, monitoring of the infrastructure is key to maintaining high availability and can help you
detect potential issues before BEFORE problems occur. MySQL Enterprise Monitor provides at-a-glance views of
the health of your databases, continuously monitoring your MySQL Servers and alerting you to potential problems
before they impact your system.

MySQL Enterprise Monitor automatically tracks hundreds of MySQL variables to analyse current status. A
sophisticated rules-based engine alerts administrators whenever parameters exceed defined thresholds so
that DevOps and DBA teams can proactively avoid downtime or performance degradation.

Administrators are alerted immediately should an outage occur, and are presented with diagnostics information and
suggestions to speed remediation of the issue and quickly restore service availability.

MySQL Enterprise Monitor also stores historical MySQL status data so that post-mortem analysis of issues is
greatly simplified.

You can learn more at: http://www.mysql.com/products/enterprise/monitor.html.

MySQL Enterprise Backup

Database backups are well-established processes in production environments. Depending on the technique used,
backup operations can affect on-going services in several ways:

Increased server load, impacting performance of production queries


Blocking of write operations, limiting the service to read-only queries during the backup process
Complete (planned) downtime during backup

Of course, for HA services none of these are acceptable. A full online backup that does not consume excessive
MySQL Server resources is therefore the right choice to achieve HA.

MySQL Enterprise Backup performs online "Hot", non-blocking backups of your MySQL databases. Full backups
1
can be performed on all InnoDB data , while MySQL is online, without interrupting queries or updates. In addition,
incremental backups are supported where only data that has changed is backed up. Also partial backups are
supported when only certain tables or tablespaces need to be captured.

MySQL Enterprise Backup restores your data from a full backup with full backward compatibility. Consistent Point-
in-Time Recovery (PITR) also enables DBAs to perform a restore to a specific point in time.

You can learn more at: http://www.mysql.com/products/enterprise/backup.html

10. Conclusion
High availability is a critical concern for any organization looking to deliver services to users and customers. As this
whitepaper has demonstrated, there is a range of HA technologies available for MySQL delivering 99.9% to
99.999% uptime.

Before selecting the technology, it is important to assess the actual requirements of the application not everything
needs 99.999% uptime, however desirable that may first appear. Combining operational best practices with
technology solutions is essential to delivering true HA.

1
MySQL Cluster GCE has its own on-line backup tool for NDB tables

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 18
This paper has presented a methodology to enable you to determine application requirements, and from there,
select the right HA solution for your MySQL environment coupled with tools and services that reduce risk, cost, and
complexity.

11. Additional Resources

MySQL Whitepapers
http://www.mysql.com/why-mysql/white-papers/

MySQL Webinars:
Live: http://www.mysql.com/news-and-events/web-seminars/index.html
On Demand: http://www.mysql.com/news-and-events/on-demand-webinars/

MySQL Enterprise Edition Demo:


https://www.youtube.com/watch?v=IYcsc9g2mdI

MySQL Cluster Demo:


https://www.youtube.com/watch?v=A7dBB8_yNJI

MySQL Enterprise Edition Trial:


http://www.mysql.com/trials/

MySQL Case Studies:


http://www.mysql.com/why-mysql/case-studies/

MySQL TCO Savings Calculator:


http://mysql.com/tco

To contact an Oracle MySQL Representative:


http://www.mysql.com/about/contact/

Copyright 2016 Oracle Corp. MySQL is a registered trademark of Oracle Corp. in the U.S. and in other countries. Other products mentioned
may be trademarks of their companies.

Copyright 2016 Oracle and/or its affiliates. All rights reserved. Page 19

S-ar putea să vă placă și