Sunteți pe pagina 1din 21

Software preview MySQL Scriptable Replication

Fig. 1 MySQL per-row replication filtering


A MySQL Software preview is available which allows you to write Lua scripts to control replication on
a statement-by-statement basis. Note that this is prototype functionality and is not supported but
feedback on its usefulness would be gratefully received.The final version would allow much greater
functionality but this preview allows you to implement filters on either the master or slave to examine
the statements being replicated and decide whether to continue processing each one or not.
After reading this article, you may be interested in trying this out for yourself and want to create your
own script(s). You can get more information on the functionality and download the special version of
MySQL from http://forge.mysql.com/wiki/ReplicationFeatures/ScriptableReplication
To understand how this feature works, you first need to understand the very basics about how MySQL
replication works. Changes that are made to the Master MySQL Server are written to a binary log.
Any slave MySQL Servers that subscribe to this master are sent the data from the masters binary log;
the slave(s) then copy this data to their own relay log(s). The slave(s) will then work through all of the
updates in their relay logs and apply them to their local database(s). The implementation is a little more
complex when using MySQL Cluster as the masters updates may come through multiple MySQL
Servers or directly from an application through the NDB API but all of the changes will still make it
into the binary log.
MySQL Replication supports both statement and row based replication (as well as mixed) but this
software preview is restricted to statement based replication. As MySQL Cluster must use row based
replication this preview cannot be used with Cluster but the final implementation should work with all
storage engines.
As show in Fig. 1 there are 4 points where you can choose to filter statements being replicated:
1. Before the update is written to the binary log
2. After the update has been read from the binary log
3. Before the update is written to the relay log
4. After the update has been read from the relay log
The final 2 interest me most as it allows us to have multiple slaves which apply different filters this
article includes a worked example of how that could be exploited.
Fig. 2 Details for each filtering point

The filters are written as Lua scripts. The names of the script file, module name and function names
vary depending on which of these filtering points is to be used. Fig. 2 shows these differences. In all
cases, the scripts are stored in the following folder: <mysql-base-directory>/ext/replication.
This article creates 2 different scripts one for each of 2 slave servers. In both cases the filter script is
executed after an update is read from the relay log. One slave will discard any statement of the form
INSERT INTO <table-name> SET sub_id = 401, by searching for the sub string sub_id = X
where X is even while the second slave will discard any where X is odd. Any statement that doesnt
include this pattern will be allowed through.
Fig. 3 Implementation of odd/even sharded replication
If a script returns TRUE then the statement is discarded, if it returns FALSE then the replication
process continues. Fig. 3 shows the architecture and pseudo code for the odd/even replication sharding.

The actual code for the two slaves is included here:


slave-odd: <mysql-base-directory>/ext/replication/relay_log.lua

function after_read(event)
local m = event.query
if m then
id = string.match(m, "sub_id = (%d+)")
if id then
if id %2 == 0 then
return true
else
return false
end
else
id = string.match(m, "sub_id=(%d+)")
if id then
if id %2 == 0 then
return true
else
return false
end
else
return false
end
end
else
return false
end
end

slave-even: <mysql-base-directory>/ext/replication/relay_log.lua

function after_read(event)
local m = event.query
if m then
id = string.match(m, "sub_id = (%d+)")
if id then
if id %2 == 1 then
return true
else
return false
end
else
id = string.match(m, "sub_id=(%d+)")
if id then
if id %2 == 1 then
return true
else
return false
end
else
return false
end
end
else
return false
end
end

Replication can then be set-up as normal as described in Setting up MySQL Asynchronous Replication
for High Availability with the exception that we use 2 slaves rather than 1.
Once replication has been started on both of the slaves, the database and tables should be created; note
that for some reason, the creation of the tables isnt replicated to the slaves when using this preview
load and so the tables actually need to be created 3 times:
mysql-master> CREATE DATABASE clusterdb; mysql-master> USE clusterdb; mysql-master> CREATE TABLE
sys1 (code INT NOT NULL PRIMARY KEY, country VARCHAR (30)) engine=innodb; mysql-master> CREATE
TABLE subs1 (sub_id INT NOT NULL PRIMARY KEY, code INT) engine=innodb;

mysql-slave-odd> USE clusterdb; mysql-slave-odd> CREATE TABLE sys1 (code INT NOT NULL PRIMARY
KEY, country VARCHAR (30)) engine=innodb; mysql-slave-odd> create table subs1 (sub_id INT NOT
NULL PRIMARY KEY, code INT) engine=innodb;

mysql-slave-even> USE clusterdb; mysql-slave-even> CREATE TABLE sys1 (code INT NOT NULL PRIMARY
KEY, country VARCHAR (30)) engine=innodb; mysql-slave-even> CREATE TABLE subs1 (sub_id INT NOT
NULL PRIMARY KEY, code INT) engine=innodb;

The data can then be added to the master and then the 2 slaves can be checked to validate that it
behaved as expected:
mysql-master> INSERT INTO sys1 SET area_code=33, country="France";
mysql-master> INSERT INTO sys1 SET area_code=44, country="UK";
mysql-master> INSERT INTO subs1 SET sub_id=401, code=44;
mysql-master> INSERT INTO subs1 SET sub_id=402, code=33;
mysql-master> INSERT INTO subs1 SET sub_id=976, code=33;
mysql-master> INSERT INTO subs1 SET sub_id=981, code=44;

mysql-slave-odd> SELECT * FROM sys1;


+------+---------+
| code | country |
+------+---------+
| 33 | France |
| 44 | UK |
+------+---------+

mysql-slave-odd> SELECT * FROM subs1;


+--------+------+
| sub_id | code |
+--------+------+
| 401 | 44 |
| 981 | 44 |
+--------+------+
Fig. 4 Results of partitioned replication
mysql-slave-even> SELECT * FROM sys1;
+------+---------+
| code | country |
+------+---------+
| 33 | France |
| 44 | UK |
+------+---------+
mysql-slave-even> SELECT * FROM subs1;
+--------+------+
| sub_id | code |
+--------+------+
| 402 | 33 |
| 976 | 33 |
+--------+------+

Fig. 4 illustrates this splitting of data between the 2 slaves all rows from the system table are stored in
both databases (as well as in the master) while the data in the subscriber table (and it would work for
multiple subscriber tables too) are partitioned between the 2 databases odd values in one, even in the
other. Obviously, this could be extended to more slaves by changing the checks in the scripts.
As an illustration of how this example could be useful, all administrative data could be provisioned into
and maintained by the master both system and subscriber data. Each slave could then serve a subset
of the subscribers, providing read-access to the administrative data andread/write access for the more
volatile subscriber data (which is mastered on the slave). In this way, there can be a central point to
manage the administrative data while being able to scale out to multiple, databases to provide
maximum capacity and performance to the applications. For example, in a telco environment, you may
filter rows by comparing a subscribers phone number to a set of area codes so that the local
subscribers are accessed from the local database minimising latency.
From a data integrity perspective, this approach is safe if (and only if) the partitioning rules ensures that
all related rows are on the same slave (in our example, all rows from all tables for a particular
subscriber will be on the same slave so as long as we dont need transactional consistency between
different subscribers then this should be safe).
Fig. 5 Partitioned replication for MySQL Cluster
As mentioned previously this software preview doesnt work with MySQL Cluster but looking forward
to when it does, the example could be extended by having each of the slave servers be part of the same
Cluster. In this case, the partitioned data will be consolidated back into a single database (for this
scenario, you would likely configure just one server to act as the slave for the system data). On the face
of it, this would be a futile exercise but in cases where the performance bottlenecks on the throughput
of a single slave server, this might be a way to horizontally scale the replication performance for
applications which make massive numbers of database writes.

What is the difference between MySQL Replication and MySQL Cluster?

Discussion
MySQL currently supports two different solutions for creating a high availability environment and
achieving multi-server scalability.

MySQL Replication
The first form is replication, which MySQL has supported since MySQL version 3.23. Replication in
MySQL is currently implemented as an asyncronous master-slave setup that uses a logical log-shipping
backend.
A master-slave setup means that one server is designated to act as the master. It is then required to
receive all of the write queries. The master then executes and logs the queries, which is then shipped to
the slave to execute and hence to keep the same data across all of the replication members.
Replication is asyncronous, which means that the slave server is not guaranteed to have the data when
the master performs the change. Normally, replication will be as real-time as possible. However, there
is no guarantee about the time required for the change to propagate to the slave.
Replication can be used for many reasons. Some of the more common reasons include scalibility,
server failover, and for backup solutions.
Scalibility can be achieved due to the fact that you can now do can do SELECT queries across any of
the slaves. Write statements however are not improved generally due to the fact that writes have to
occur on each of the replication member.
Failover can be implemented fairly easily using an external monitoring utility that uses a heartbeat or
similar mechanism to detect the failure of a master server. MySQL does not currently do automatic
failover as the logic is generally very application dependent. Keep in mind that due to the fact that
replication is asynchronous that it is possible that not all of the changes done on the master will have
propagated to the slave.
MySQL replication works very well even across slower connections, and with connections that aren't
continuous. It also is able to be used across different hardware and software platforms. It is possible to
use replication with most storage engines including MyISAM and InnoDB.

MySQL Cluster
MySQL Cluster is a shared nothing, distributed, partitioning system that uses synchronous replication
in order to maintain high availability and performance.
MySQL Cluster is implemented through a separate storage engine called NDB Cluster. This storage
engine will automatically partition data across a number of data nodes. The automatic partitioning of
data allows for parallelization of queries that are executed. Both reads and writes can be scaled in this
fashion since the writes can be distributed across many nodes.
Internally, MySQL Cluster also uses synchronous replication in order to remove any single point of
failure from the system. Since two or more nodes are always guaranteed to have the data fragment, at
least one node can fail without any impact on running transactions. Failure detection is automatically
handled with the dead node being removed transparent to the application. Upon node restart, it will
automatically be re-integrated into the cluster and begin handling requests as soon as possible.
There are a number of limitations that currently exist and have to be kept in mind while deciding if
MySQL Cluster is the correct solution for your situation.
Currently all of the data and indexes stored in MySQL Cluster are stored in main memory across the
cluster. This does restrict the size of the database based on the systems used in the cluster. Work is
underway to allow data to be stored on disk and will most likely appear in MySQL version 5.1.
MySQL Cluster is designed to be used on an internal network as latency is very important for response
time. As a result, it is not possible to run a single cluster across a wide geographic distance. In addition,
while MySQL Cluster will work over commodity network setups, in order to attain the highest
performance possible special clustering interconnects can be used.
What is mysql cluster?
Web definitions

MySQL Cluster is a technology which provides shared-nothing clustering capabilities for the
MySQL database management system. It was first included in the production release of MySQL
4.1 in November 2004. en.wikipedia.org/wiki/MySQL_Cluster

mysql cluster flow


This guide is based in that you already know about mysql cluster.. at least about how it works.. so i will
give you some tips about mysql cluster very usefull tips..

1.) How to convert an existent schema to mysql cluster


For example how to convert innodb schema to mysql cluster schema , the schema of mysql cluster its
called NBCLUSTER
Importing schema is so simple like change InnoDB engine to NDBCLUSTER engine.
cat schema.james.sql | sed s/InnoDB/NDBCLUSTER/gi > ndb_schema.james.sql #migration of
DDL is so simple like change engine of tables to NDBCLUSTER eng.
cat schema.jamesjara.sql | sed s/InnoDB/NDBCLUSTER/gi > ndb_schema.jamesjara.sql
mysql -u root -e create database james
mysql -u root -e create database jamesjara
mysql -u root james < ndb_schema.james.sql
mysql -u root jamesjara < ndb_schema.jamesjara.sql

2) Scaling across multiple machines with mysql cluster


First , I recommend to read this links to understand data distribution at 100%
http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-and-sharding.html
http://messagepassing.blogspot.com/2011/03/data-distribution-in-mysql-cluster.html
The data and indexes must fit into memory. Each cluster has node groups. Each node group holds a
fragment of the data, and each node group has a number of replicas. So a cluster with 2 node groups
with 4 replicas each has the data split in half, with 4 machines redundantly storing one half the data,
and another 4 redundantly storing the other half. Therefore, the data doesnt have to entirely fit into
RAM, but it does have to fit in (RAM of 1 node)*(# node groups).
(from the blog)Tables are horizontally fragmented into table fragments each containing a disjoint
subset of the rows of the table. The union of rows in all table fragments is the set of rows in the table.
Rows are always identified by their primary key. Tables with no primary key are given a hidden
primary key by MySQLD.

3) How to get the Database status of a mysql cluster ecosystem


DATABASE STATUS
TABLES MEMORY : SHOW TABLE STATUS LIKE %; IN MYSQLD
NODES MEMORY : ndb_mgm -eALL REPORT MEMORYUSAGE # in mgm
To get the EXACTLY status of the mysql cluster tables we use:
SELECT TABLE_NAME,ROW_FORMAT,TABLE_ROWS, AVG_ROW_LENGTH,
DATA_LENGTH,MAX_DATA_LENGTH, INDEX_LENGTH,DATA_FREE,
AUTO_INCREMENT FROM information_schema.tables WHERE table_schema =
DATABASE();

Also With the comando ALL REPORT MEMORYUSAGE, we can check how is the data spreaded.

4) Benchmarking in mysql cluster


To execute perfoming test or other kind of test is simple like:
10 threads and each thread loops 10000 times. 1000 Rows in table visitor:
Command: bencher -s/var/lib/mysql/mysql.sock -t10 -l10000 -d visitor -q select * from visitor
Or more comples:
10 threads and each thread loops 10000 times. 1000 Rows in table visitor:
Command: bencher -s/var/lib/mysql/mysql.sock -t10 -l1000 -d visitor -q Select c.* from visitor_a c
join visitor a on c.count_request = a.count_request join visitor_accesso b on c.count_requested =
b.request_dura_secs where a.count_reest between 0 and 15000 and b.http between 777 and 7777
Well this its all.. This are only 5 topics, but there is more.. in my next post i will talk about :
auto-sharding , rebalance, internal programns of mysql cluster, nice & usefull commands and
others..

MySQL Cluster and Sharding


Sharding is here defined as the ability to partition the data into partitions defined by a condition on a set
of fields. This ability is central to the workings of MySQL Cluster. Within a Cluster we automatically
partition the tables into fragments (shards in the internet world). By default there is a fixed amount of
fragments per node. As mentioned we also use replication inside a Cluster, the replication happens per
fragment. We define the number of replicas we want in the Cluster and then the MySQL Cluster SW
maintains this number of fragment replicas per fragment. These fragment replicas are all kept in synch.
Thus for MySQL Cluster the sharding is automatic and happens inside the Cluster even using
commodity hardware.

One of the defining features of MySQL Cluster is to keep the fragments up and running at all times and
that they are restored after a Cluster crash. However MySQL Cluster also supports adding nodes to the
Cluster while it is operational, this means that we can add nodes on a running Cluster and repartition
the tables during normal operation. This is part of the normal MySQL Cluster and is used in operation
by many users and customers to increase the size of the Clusters in production clusters.

Data distribution in MySQL Cluster


MySQL Cluster distributes rows amongst the data nodes in a cluster, and also provides data replication.
How does this work? What are the trade offs?

Table fragments

Tables are 'horizontally fragmented' into table fragments each containing a disjoint subset of the rows
of the table. The union of rows in all table fragments is the set of rows in the table. Rows are always
identified by their primary key. Tables with no primary key are given a hidden primary key by
MySQLD.

By default, one table fragment is created for each data node in the cluster at the time the table is
created.

Node groups and Fragment replicas

The data nodes in a cluster are logically divided into Node groups. The size of each Node group is
controlled by the NoOfReplicas parameter. All data nodes in a Node group store the same data. In other
words, where the NoOfReplicas parameter is two or greater, each table fragment has a number of
replicas, stored on multiple separate data nodes in the same nodegroup for availability.

One replica of each fragment is considered primary, and the other(s) are considered backup replicas.
Normally, each node contains a mix of primary and backup fragments for every table, which
encourages system balance.

Which replica to use?

The primary fragment replica is used to serialise locking between transactions concurrently accessing
the same row. Write operations update all fragment replicas synchronously, ensuring no committed data
loss on node failure. Read operations normally access the primary fragment replica, ensuring
consistency. Reads with a special lock mode can access the backup fragment replicas.

Primary key read protocol

When an NdbApi client (for example a MySQLD process) wants to read a row by primary key, it sends
a read request to a data node acting as a Transaction Coordinator (TC).
The TC node will determine which fragment the row would be stored in from the primary key, decide
which replica to access (usually the primary), and send a read request to the data node containing that
fragment replica. The data node containing the fragment replica then sends the row's data (if present)
directly back to the requesting NdbApi client, and also sends a read acknowledgement or failure
notification back to the TC node, which also propagates it back to the NdbApi client.

Minimising inter data node hops

The 'critical path' for this protocol in terms of potential inter-data-node hops is four hops :
Client -> TC -> Fragment -> TC -> Client

To minimise remote client experienced latency, ideally two inter-node hops can be avoided by having
the TC node and the Fragment replica(s) on the same node. This requires controlling the choice of node
for TC based on the primary key of the data which will be read. Where a transaction only reads rows
stored on the same node as its TC, this can improve latency and system efficiency.

Distribution awareness

From NdbApi, users can specify a table and key when starting a transaction. The transaction will then
choose a TC data node based on where the corresponding row's primary fragment replica is located in
the system. This mechanism is sometimes referred to as 'transaction hinting'.

The Ndb handler in MySQLD generally waits for the first primary key lookup in a user session before
starting an NdbApi transaction, so that it can choose a TC node based on this. This is a best-effort
attempt at having the data node acting as TC colocated with the accessed data. This feature is usually
referred to as 'Distribution Awareness'.

Write operations also benefit from distribution awareness, but not to the same extent in systems with
NoOfReplicas > 1. Write operations must update all fragment replicas, which must be stored on
different nodes, in the same nodegroup, so for NoOfReplicas > 1, distribution awareness avoids inter-
node-group communication, and some intra-node-group communication, but some inter-data-node
communication is always required. In a system with good data partitioning and distribution awareness,
most read transactions will access only one data node, and write transactions will result in messaging
between the data nodes of a single node group. Messaging between node groups will be minimal.
Distribution keys

By default, the whole of a table's primary key is used to determine which fragment replica will store a
row. However, any subset of the columns in the primary key can be used. The key columns used to
determine the row distribution are called the 'distribution key'.

Where a table's primary key contains only one column, the distribution key must be the full primary
key. Where the primary key has more than one column, the distribution key can be different to (a subset
of) the primary key.

From MySQLD, a distribution key can be set using the normal PARTITION BY KEY() syntax. The
effect of using a distribution key which is a subset of the primary key is that rows with different
primary key values, but the same distribution key values are guaranteed to be stored in the same table
fragment.

For example, if we create a table :

CREATE TABLE user_accounts (user_id BIGINT,


account_type VARCHAR(255),
username VARCHAR(60),
state INT,
PRIMARY KEY (user_id, account_type))
engine = ndb partition by key (user_id);

Then insert some rows :

INSERT INTO user_accounts VALUES (22, "Twitter", "Bader", 2),


(22, "Facebook", "Bd77", 2),
(22, "Flickr", "BadB", 3),
(23, "Facebook", "JJ", 2);

Then we know that all rows with the same value(s) for the distribution key (user_id), will be stored on
the same fragment. If we know that individual transactions are likely to access rows with the same
distribution key value then this will increase the effectiveness of distribution awareness. Many schemas
are 'partitionable' like this, though not all.

Note that partitioning is a performance hint in Ndb - correctness is not affected in any way, and
transactions can always span table fragments on the same or different data nodes. This allows
applications to take advantage of the performance advantages of distribution awareness without
requiring that all transactions affect only one node etc as required by simpler 'sharding' mechanisms.

Correlated distribution keys across tables

A further guarantee from Ndb is that two tables with the same number of fragments, and the same
number and type of distribution keys will have rows distributed in the same way.
For example, if we add another table :

CREATE TABLE user_prefs (user_id BIGINT,


type VARCHAR(60),
value VARCHAR(255),
PRIMARY KEY (user_id, type))
engine = ndb partition by key (user_id);

Then insert some rows :

INSERT INTO user_prefs VALUES (22, "Coffee", "Milk + 6 sugars"),


(22, "Eggs", "Over easy"),
(23, "Custard", "With skin");

Then we know that the rows with the same user_id in the user_prefs and user_accounts tables will be
stored on the same data node. Again, this helps with distribution awareness. In this example, we are
ensuring that rows related to a single user, as identified by a common user_id, will be located on one
data node, maximising system efficiency, and minimising latency.

Ordered index scan pruning

MySQL Cluster supports arbitrary ordered indexes. Ordered indexes are defined on one or more
columns and support range scan operations. Range scans are defined by supplying optional lower and
upper bounds. All rows between these bounds are returned.

Each Ndb ordered index is implemented as a number of in memory tree structures (index fragments),
distributed with the fragments of the indexed table. Each index fragment contains the index entries for
the local table fragment. Having ordered indexes local to the table fragments makes index maintenance
more efficient, but means that there may not be much locality to exploit when scanning as rows in a
range may be spread across all index fragments of an index.

The only case where an ordered index scan does not require to scan all index fragments is where it is
known that all rows in the range will be found in one table fragment.
This is the case where both :
1. The ordered index has all of the table's distribution keys as a prefix
2. The range is contained within one value of the table's distribution keys

NdbApi detects this case when a range scan is defined, and 'prunes' the scan to one index fragment
(and therefore one data node). For all other cases, all index fragments must be scanned.

Continuing the example above, assuming an ordered index on the primary key, the following ordered
index scans can be pruned :
SELECT * FROM user_accounts WHERE user_id = 22;
SELECT * FROM user_accounts WHERE user_id = 22 AND account_type LIKE
'F%';

However, the following ordered index scans cannot be pruned, as matching rows are not guaranteed to
be stored in one table fragment :

SELECT * FROM user_accounts WHERE account_type = "Facebook";


SELECT * FROM user_accounts WHERE user_id > 20 AND user_id < 30;

MySQLD partitioning variants and manually controlling distribution

Since MySQL 5.1, table partitioning has been supported. Tables can be partitioned based on functions
of the distribution keys such as :
KEY
LINEAR KEY
HASH
RANGE
LIST

For engines other than Ndb, partitioning is implemented in the Server, with each partition implemented
as a separate table in the Storage engine. Ndb implements these partition functions natively, using them
to control data distribution across table fragments in a single table.

From Ndb's point of view, KEY and LINEAR KEY are native partitioning functions. Ndb knows how
to determine which table fragment to use for a row from a table's distribution key, based on an MD5
hash of the distribution key.

HASH, RANGE and LIST are not natively supported by Ndb. When accessing tables defined using
these functions, MySQLD must supply information to NdbApi to indicate which fragments to access.
For example before primary key insert, update, delete and read operations, the table fragment to
perform the operation on must be supplied. From MySQLD, the partitioning layer supplies this
information.

Any NdbApi application can use the same mechanisms to manually control data distribution across
table fragments. At the NdbApi level this is referred to as 'User Defined' partitioning. This feature is
rarely used. One downside of using User Defined partitioning is that online data redistribution is not
supported. I'll discuss Online data redistribution in a future post here.
MySQL Cluster Database FAQ - General
1. What is MySQL Cluster?
A: MySQL Cluster is built on the NDB storage engine and provides a highly scalable, real-time, ACID-
compliant transactional database, combining 99.999% availability with the low TCO of open source.
Designed around a distributed, multi-master architecture with no single point of failure, MySQL
Cluster scales horizontally on commodity hardware to serve read and write intensive workloads,
accessed via SQL and NoSQL interfaces.
MySQL Cluster's real-time design delivers predictable, millisecond response times with the ability to
service millions of operations per second. Support for in-memory and disk-based data, automatic data
partitioning (sharding) with load balancing and the ability to add nodes to a running cluster with zero
downtime allows linear database scalability to handle the most unpredictable web-based workloads.
2. What is MySQL Cluster Carrier Grade Edition?
A: MySQL Cluster Carrier Grade Edition (CGE) includes tools for the management, monitoring
security and auditing of the MySQL Cluster database, coupled with access to Oracle Premier Support.
MySQL Cluster CGE is available under a choice of subscription or commercial license and support.
Learn more about MySQL Cluster CGE
3. Who are the customer references for MySQL Cluster?
A: See https://www.mysql.com/customers/cluster/.
4. What is the current version of MySQL Cluster?
A: The current GA version is MySQL Cluster 7.5. MySQL 5.7 is integrated and bundled with MySQL
Cluster.
5. Does MySQL Cluster require any special hardware or software?
A: No, MySQL Cluster is designed to run on commodity hardware. Using specialized hardware such as
Infiniband network interconnects one can achieve even higher levels of performance, especially over
large clusters with many nodes.
6. What are the system requirements for MySQL Cluster?
OS: See current list of Supported Platforms
CPU: Intel Xeon E5-2600 v4 (20+ cores/socket)
A: Memory: 64GB RAM
Storage: 512GB SSD
Network: 1+ nodes (Gigabit Ethernet - TCP/IP)
7. How do I qualify a workload as being a good fit for MySQL Cluster?
A: If you answer "YES" to any of the following questions, then you should consider MySQL Cluster as
an option for your application's database:
Do you need to shard your database to meet growing volumes of write (UPDATE, INSERT,
DELETE) operations?
Do you need to ensure results from SELECT operations are consistent, regardless of which
node they are returned from?
Would a failure of your database result in application downtime that would cause business
disruption, including loss of revenue, loss of reputation, etc?
Would data loss (even if just several seconds worth) during a failover cause business disruption?
Is your user experience sensitive to response times?
Do you need to replicate your database across geographic regions, with each region serving
both read and write operations?
Are you running a diverse range of applications that would benefit from direct access to data,
without always relying on SQL (ie JavaScript with node.js, memcached API, Java and JPA
applications, HTTP/REST web services and C++ apps)?
Does your application primarily consist of "short" transactions (ie 10s of operations per
transaction versus 1000s) executed in parallel?
Does your application mainly consist of:
Primary key database access, with some JOIN operations
versus
Regular execution of full table scans and JOINs returning 10s of thousands of rows?
Please also review our Evaluation Guide to learn more about MySQL Cluster.
8. What are the ideal applications for MySQL Cluster?
A: Ideal applications include:
High volume OLTP
Real time analytics
Ecommerce and financial trading with fraud detection
Mobile and micro-payments
Session management & caching
Feed streaming, analysis and recommendations
Content management and delivery
Massively Multiplayer Online Games
Communications and presence services
Subscriber/user profile management and entitlements
See a full list of MySQL Cluster user case studies and applications.
9. Is MySQL Cluster supported on Virtual Machine environments?
A: Yes. MySQL Cluster is tested and certified on Oracle VM.
10. What are the typical performance metrics for MySQL Cluster?
A:
Availability
99.999% (<5 min downtime/year)
Performance
Response Time: sub 5 millisecond (with synchronous replication and access via SQL).
Faster response times can be achieved using one of the NoSQL access methods.
Throughput of 600,000+ replicated UPDATE operations/sec on a dual-socket Intel
server equipped with 64GB RAM. 1 Billion UPDATE operations per minute across a
cluster of 30 x Intel Servers. See full benchmarks.
Failover
Sub-second failover enables you to deliver service without interruption
Scalability
Scale out, scale up and scale dynamically
For cost-effective scale-out:
Add more application and data nodes per cluster, or
Add more CPU threads (16, 32, 64, etc.) or
Add more Memory (32GB, 64GB, etc.) per data node
11. How many physical servers are needed to create a minimum Cluster configuration?
A: For evaluation and evelopment purposes, you can run all nodes on a single host. For full redundancy
and fault tolerance, you would need a minimum 6 x physical hosts:
2 x data nodes
2 x SQL/NoSQL Application Nodes
2 x Management Nodes
Many users co-locate the Management and Application nodes which reduces the number of nodes to
four.
12. Can Data Nodes be split geographically?
A: Yes, as long as your network has the characteristics discussed here
MySQL Cluster has long offered Geographic Replication, distributing clusters to remote data centers to
reduce the affects of geographic latency by pushing data closer to the user, as well as providing a
capability for disaster recovery.
Geographic replication is asynchronous and can be implemented as an Active / Active or Active /
Passive cluster.
Geographic replication is the recommended deployment model for cross data center deployments.
13. What data access APIs exist for MySQL Cluster?
A: Applications can be developed using any MySQL Connectors. MySQL Cluster additionally
provides native NoSQL connectivity via JavaScript, Memcached, C++, Java, JPA and HTTP/REST.
14. Are the interfaces for 32-bit applications different from 64-bit?
A: No, the interfaces are the same.
15. Is MySQL Cluster suitable as an embedded database?
A: Yes, MySQL Cluster is often used as an embedded database by ISVs and Network Equipment
Providers (NEPs). Customer List
16. What is Geographic Replication for MySQL Cluster?
A: Geographic replication enables asynchronous replication across active / active geographically
separate clusters. This is often used for the scale-out of global services, data locality and disaster
recovery.
17. Is Replication bi-directional?
A: Yes, unidirectional and bi-directional replication are supported in MySQL Cluster. Transaction
conflict detection and resolution is provided when using bi-directional geographic replication.
18. When using MySQL Cluster as an in-memory database, does MySQL Cluster risk losing data?
A: MySQL Cluster configurations will typically have at least 2 copies of all data, held on different
hosts. To cover total system failure, transaction logs and checkpoint files are persisted on disk with
configurable frequency. Additionally, non-indexed data may be stored on disk.
19. Does MySQL Cluster include a diskless option?
A: MySQL Cluster has a diskless option as well as a no-logging option.
For the Diskless option the following restrictions apply:
1. No disk data
2. Loss of data in case of Cluster failure
3. No backup
For the no logging option, Cluster will still create log files, but data will not be checkpointed to disk.

MySQL Cluster Manager FAQ


20. Is MySQL Cluster Manager open source software?
A: No. MySQL Cluster Manager is available only as a part of the commercial MySQL Cluster Carrier
Grade Edition (CGE) database. To purchase subscriptions or licenses for MySQL Cluster CGE, please
contact the MySQL Sales Team.
21. What is MySQL Cluster Manager?
A: MySQL Cluster Manager is software which simplifies the creation and management of the MySQL
Cluster database by automating common management tasks.
22. What are the benefits of MySQL Cluster Manager?
A: By using MySQL Cluster Manager, Database Administrators (DBAs) and Systems Administrator
are more productive, enabling them to focus on strategic IT initiatives and respond more quickly to
changing user requirements. At the same time, risks of database downtime that previously resulted from
manual configuration errors, are significantly reduced.
23. Can you give me a practical example of where MySQL Cluster Manager would help with
productivity and reduce risk of downtime?
A: As an example, management operations requiring rolling restarts of a MySQL Cluster database that
previously demanded 46 manual commands1 and which consumed 2.5 hours of DBA time2 can now be
performed with a single command, and are fully automated with MySQL Cluster Manager, serving to
reduce:
Management complexity and overhead;
Risk of downtime through the automation of configuration and change management processes;
Custom scripting of management commands or developing and maintaining in-house
management tools.
24. What sort of management functionality does MySQL Cluster Manager provide?
A: Administrators are able to create and delete entire clusters and start, stop and restart the cluster with
a single command, as well as add nodes on-line. As a result, administrators no longer need to manually
restart each data node in turn, in the correct sequence, or to create custom scripts to automate the
process.
MySQL Cluster Manager automates on-line management operations, including the upgrade, downgrade
and reconfiguration of running clusters, without interrupting applications or clients accessing the
database. Administrators no longer need to manually edit configuration files and distribute them to all
other cluster nodes, or to determine if rolling restarts are required. MySQL Cluster Manager handles all
of these tasks, thereby enforcing best practices and making on-line operations significantly simpler,
faster and less error-prone.
25. Does MySQL Cluster Manager manage the entire cluster or just individual nodes within it?
A: It can do both. MySQL Cluster Manager provides the ability to control the entire cluster as a single
entity, while also supporting very granular control down to individual processes within the cluster
itself.
26. What sort of monitoring functionality does MySQL Cluster Manager provide?
A: MySQL Cluster Manager is able to monitor cluster health at both an Operating System and per-
process level by automatically polling each node in the cluster. It can detect if a process or server host
is alive, dead or has hung, allowing for faster problem detection, resolution and recovery.
27. Many of the capabilities of MySQL Cluster Manager are already available or can be scripted, so
what is the benefit?
A: MySQL Cluster Manager integrates and extends existing management functionality by automating
tasks that were previously performed manually by an administrator. As demonstrated in the example
above, a process that required 46 manual commands is now reduced to a single command, with each
process step being fully automated.
In terms of scripting or even developing a custom management system, it is time consuming, costly and
potentially error-prone to manually develop, test and maintain such projects. For many maintenance
activities, the need for this type of activity is eliminated with MySQL Cluster Manager.
Through automation, MySQL Cluster Manager simplifies cluster management, while reducing cost,
risk and effort.
28. Can MySQL Cluster Manager recover failed nodes in a cluster?
A: Yes. MySQL Cluster itself has the capability to self-heal from failures by automatically restarting
failed Data Nodes, without manual intervention. MySQL Cluster Manager extends this functionality by
also monitoring and automatically recovering SQL and Management Nodes. This supports a more
seamless and complete self-healing of the Cluster to fully restore operations and capacity to
applications.
29. So MySQL Cluster Manager can manage, monitor and recover all nodes within a cluster?
A: Yes, with the exception of application nodes using the native NDB API (i.e nodes accessing the
Cluster via the C++, Cluster Connector for Java, OpenLDAP, etc direct interfaces).
30. Will the failure of a MySQL Cluster Manager agent impact the availability of the MySQL Cluster
database?
A: No. To ensure high availability operation, MySQL Cluster Manager is decoupled from the actual
database processes, so if a management agent stops or is upgraded, it does not impact the running
database in any way. MySQL Cluster Manager continues to operate across surviving nodes when any
given agent or the associated host is not available.
31. How is MySQL Cluster Manager implemented with the MySQL Cluster database?
A: MySQL Cluster Manager is implemented as a set of agents one running on each physical host that
will contain MySQL Cluster nodes (processes) to be managed. The administrator connects the regular
mysql client to any one of these agents and then the agents each communicate and work with each
other to perform operations across the nodes making up the Cluster.
32. How does MySQL Cluster Manager impact previous approaches to managing MySQL Cluster?
A: When using MySQL Cluster Manager to manage a MySQL Cluster deployment, the administrator
no longer edits the configuration files (for example config.ini and my.cnf); instead, these files are
created and maintained by the agents. In fact, if those files are manually edited, the changes will be
overwritten by the configuration information which is held within the agents.
All processes making up the MySQL Cluster deployment are started, restarted and stopped by MySQL
Cluster Manager. This includes the data nodes, management nodes and MySQL Server nodes.
Similarly when using MySQL Cluster Manager, management actions must not be performed by the
administrator using the ndb_mgm command (which directly connects to the management node meaning
that the agents themselves would not have visibility of any operations performed with it).
33. Do I still need management nodes within my cluster?
A: The introduction of MySQL Cluster manager does not remove the need for management nodes; in
particular they continue to perform a number of critical roles:
When data nodes start up (or are restarted) they connect to the management node(s) to retrieve
their configuration data (the management node in turn fetches that data from the configuration
files created by the agents);
When stopping or restarting a data node through MySQL Cluster Manager, the state change is
actually performed by the management node;
The management node(s) can continue to act as arbitrators (avoiding a split-brain scenario). For
this reason, it is important to continue to run those processes on separate hosts from the data
nodes;
Some reporting information (for example, memory usage) is not yet available in MySQL
Cluster Manager and can still be performed using the ndb_mgm tool.
34. Can MySQL Cluster Manager automatically restart failed agents?
A: There is no angel process for the agents themselves and so for the highest levels of availability, the
administrator may choose to use a process monitor to detect the failure of an agent and automatically
restart it; for example by creating a script in /etc/init.d
35. Can recovered MySQL Cluster Manager agents automatically resynchronize with other agents?
A: Yes. As management agents restart, they are automatically re-synchronized with the other running
management agents to ensure configuration consistency across the entire cluster, without administrator
intervention.
36. Can MySQL Cluster Manager persist configuration data across restarts?
A: Yes. All MySQL Cluster configuration information and process identifiers are persisted to disk,
enabling them to survive system failures or re-starts of the MySQL Cluster Manager.
37. How does MySQL Cluster Manager ensure the cluster configuration remains consistent across all
nodes in the cluster?
A: MySQL Cluster Manager supports asynchronous communication between each management agent
in order to reliably propagate reconfiguration requests. As a result, configurations remain consistent
across all nodes in the cluster.
Any changes are only committed when all nodes confirm they have received the re-configuration
request. If one or more nodes fail to receive the request, then an error is reported back to the client. By
automating the coordination of re-configuration requests, opportunities for errors resulting from
manually distributing configuration files are eliminated.
38. Which platforms are supported by MySQL Cluster Manager?
A: Please refer to the Supported Platforms page.
39. Which releases of the MySQL Cluster database are supported by MySQL Cluster Manager?
A: MySQL Cluster 6.3 and above.
40. Where can I learn more about MySQL Cluster Manager?
A: From the resources as follows:
MySQL Cluster Manager whitepaper

S-ar putea să vă placă și