Documente Academic
Documente Profesional
Documente Cultură
The filters are written as Lua scripts. The names of the script file, module name and function names
vary depending on which of these filtering points is to be used. Fig. 2 shows these differences. In all
cases, the scripts are stored in the following folder: <mysql-base-directory>/ext/replication.
This article creates 2 different scripts one for each of 2 slave servers. In both cases the filter script is
executed after an update is read from the relay log. One slave will discard any statement of the form
INSERT INTO <table-name> SET sub_id = 401, by searching for the sub string sub_id = X
where X is even while the second slave will discard any where X is odd. Any statement that doesnt
include this pattern will be allowed through.
Fig. 3 Implementation of odd/even sharded replication
If a script returns TRUE then the statement is discarded, if it returns FALSE then the replication
process continues. Fig. 3 shows the architecture and pseudo code for the odd/even replication sharding.
function after_read(event)
local m = event.query
if m then
id = string.match(m, "sub_id = (%d+)")
if id then
if id %2 == 0 then
return true
else
return false
end
else
id = string.match(m, "sub_id=(%d+)")
if id then
if id %2 == 0 then
return true
else
return false
end
else
return false
end
end
else
return false
end
end
slave-even: <mysql-base-directory>/ext/replication/relay_log.lua
function after_read(event)
local m = event.query
if m then
id = string.match(m, "sub_id = (%d+)")
if id then
if id %2 == 1 then
return true
else
return false
end
else
id = string.match(m, "sub_id=(%d+)")
if id then
if id %2 == 1 then
return true
else
return false
end
else
return false
end
end
else
return false
end
end
Replication can then be set-up as normal as described in Setting up MySQL Asynchronous Replication
for High Availability with the exception that we use 2 slaves rather than 1.
Once replication has been started on both of the slaves, the database and tables should be created; note
that for some reason, the creation of the tables isnt replicated to the slaves when using this preview
load and so the tables actually need to be created 3 times:
mysql-master> CREATE DATABASE clusterdb; mysql-master> USE clusterdb; mysql-master> CREATE TABLE
sys1 (code INT NOT NULL PRIMARY KEY, country VARCHAR (30)) engine=innodb; mysql-master> CREATE
TABLE subs1 (sub_id INT NOT NULL PRIMARY KEY, code INT) engine=innodb;
mysql-slave-odd> USE clusterdb; mysql-slave-odd> CREATE TABLE sys1 (code INT NOT NULL PRIMARY
KEY, country VARCHAR (30)) engine=innodb; mysql-slave-odd> create table subs1 (sub_id INT NOT
NULL PRIMARY KEY, code INT) engine=innodb;
mysql-slave-even> USE clusterdb; mysql-slave-even> CREATE TABLE sys1 (code INT NOT NULL PRIMARY
KEY, country VARCHAR (30)) engine=innodb; mysql-slave-even> CREATE TABLE subs1 (sub_id INT NOT
NULL PRIMARY KEY, code INT) engine=innodb;
The data can then be added to the master and then the 2 slaves can be checked to validate that it
behaved as expected:
mysql-master> INSERT INTO sys1 SET area_code=33, country="France";
mysql-master> INSERT INTO sys1 SET area_code=44, country="UK";
mysql-master> INSERT INTO subs1 SET sub_id=401, code=44;
mysql-master> INSERT INTO subs1 SET sub_id=402, code=33;
mysql-master> INSERT INTO subs1 SET sub_id=976, code=33;
mysql-master> INSERT INTO subs1 SET sub_id=981, code=44;
Fig. 4 illustrates this splitting of data between the 2 slaves all rows from the system table are stored in
both databases (as well as in the master) while the data in the subscriber table (and it would work for
multiple subscriber tables too) are partitioned between the 2 databases odd values in one, even in the
other. Obviously, this could be extended to more slaves by changing the checks in the scripts.
As an illustration of how this example could be useful, all administrative data could be provisioned into
and maintained by the master both system and subscriber data. Each slave could then serve a subset
of the subscribers, providing read-access to the administrative data andread/write access for the more
volatile subscriber data (which is mastered on the slave). In this way, there can be a central point to
manage the administrative data while being able to scale out to multiple, databases to provide
maximum capacity and performance to the applications. For example, in a telco environment, you may
filter rows by comparing a subscribers phone number to a set of area codes so that the local
subscribers are accessed from the local database minimising latency.
From a data integrity perspective, this approach is safe if (and only if) the partitioning rules ensures that
all related rows are on the same slave (in our example, all rows from all tables for a particular
subscriber will be on the same slave so as long as we dont need transactional consistency between
different subscribers then this should be safe).
Fig. 5 Partitioned replication for MySQL Cluster
As mentioned previously this software preview doesnt work with MySQL Cluster but looking forward
to when it does, the example could be extended by having each of the slave servers be part of the same
Cluster. In this case, the partitioned data will be consolidated back into a single database (for this
scenario, you would likely configure just one server to act as the slave for the system data). On the face
of it, this would be a futile exercise but in cases where the performance bottlenecks on the throughput
of a single slave server, this might be a way to horizontally scale the replication performance for
applications which make massive numbers of database writes.
Discussion
MySQL currently supports two different solutions for creating a high availability environment and
achieving multi-server scalability.
MySQL Replication
The first form is replication, which MySQL has supported since MySQL version 3.23. Replication in
MySQL is currently implemented as an asyncronous master-slave setup that uses a logical log-shipping
backend.
A master-slave setup means that one server is designated to act as the master. It is then required to
receive all of the write queries. The master then executes and logs the queries, which is then shipped to
the slave to execute and hence to keep the same data across all of the replication members.
Replication is asyncronous, which means that the slave server is not guaranteed to have the data when
the master performs the change. Normally, replication will be as real-time as possible. However, there
is no guarantee about the time required for the change to propagate to the slave.
Replication can be used for many reasons. Some of the more common reasons include scalibility,
server failover, and for backup solutions.
Scalibility can be achieved due to the fact that you can now do can do SELECT queries across any of
the slaves. Write statements however are not improved generally due to the fact that writes have to
occur on each of the replication member.
Failover can be implemented fairly easily using an external monitoring utility that uses a heartbeat or
similar mechanism to detect the failure of a master server. MySQL does not currently do automatic
failover as the logic is generally very application dependent. Keep in mind that due to the fact that
replication is asynchronous that it is possible that not all of the changes done on the master will have
propagated to the slave.
MySQL replication works very well even across slower connections, and with connections that aren't
continuous. It also is able to be used across different hardware and software platforms. It is possible to
use replication with most storage engines including MyISAM and InnoDB.
MySQL Cluster
MySQL Cluster is a shared nothing, distributed, partitioning system that uses synchronous replication
in order to maintain high availability and performance.
MySQL Cluster is implemented through a separate storage engine called NDB Cluster. This storage
engine will automatically partition data across a number of data nodes. The automatic partitioning of
data allows for parallelization of queries that are executed. Both reads and writes can be scaled in this
fashion since the writes can be distributed across many nodes.
Internally, MySQL Cluster also uses synchronous replication in order to remove any single point of
failure from the system. Since two or more nodes are always guaranteed to have the data fragment, at
least one node can fail without any impact on running transactions. Failure detection is automatically
handled with the dead node being removed transparent to the application. Upon node restart, it will
automatically be re-integrated into the cluster and begin handling requests as soon as possible.
There are a number of limitations that currently exist and have to be kept in mind while deciding if
MySQL Cluster is the correct solution for your situation.
Currently all of the data and indexes stored in MySQL Cluster are stored in main memory across the
cluster. This does restrict the size of the database based on the systems used in the cluster. Work is
underway to allow data to be stored on disk and will most likely appear in MySQL version 5.1.
MySQL Cluster is designed to be used on an internal network as latency is very important for response
time. As a result, it is not possible to run a single cluster across a wide geographic distance. In addition,
while MySQL Cluster will work over commodity network setups, in order to attain the highest
performance possible special clustering interconnects can be used.
What is mysql cluster?
Web definitions
MySQL Cluster is a technology which provides shared-nothing clustering capabilities for the
MySQL database management system. It was first included in the production release of MySQL
4.1 in November 2004. en.wikipedia.org/wiki/MySQL_Cluster
Also With the comando ALL REPORT MEMORYUSAGE, we can check how is the data spreaded.
One of the defining features of MySQL Cluster is to keep the fragments up and running at all times and
that they are restored after a Cluster crash. However MySQL Cluster also supports adding nodes to the
Cluster while it is operational, this means that we can add nodes on a running Cluster and repartition
the tables during normal operation. This is part of the normal MySQL Cluster and is used in operation
by many users and customers to increase the size of the Clusters in production clusters.
Table fragments
Tables are 'horizontally fragmented' into table fragments each containing a disjoint subset of the rows
of the table. The union of rows in all table fragments is the set of rows in the table. Rows are always
identified by their primary key. Tables with no primary key are given a hidden primary key by
MySQLD.
By default, one table fragment is created for each data node in the cluster at the time the table is
created.
The data nodes in a cluster are logically divided into Node groups. The size of each Node group is
controlled by the NoOfReplicas parameter. All data nodes in a Node group store the same data. In other
words, where the NoOfReplicas parameter is two or greater, each table fragment has a number of
replicas, stored on multiple separate data nodes in the same nodegroup for availability.
One replica of each fragment is considered primary, and the other(s) are considered backup replicas.
Normally, each node contains a mix of primary and backup fragments for every table, which
encourages system balance.
The primary fragment replica is used to serialise locking between transactions concurrently accessing
the same row. Write operations update all fragment replicas synchronously, ensuring no committed data
loss on node failure. Read operations normally access the primary fragment replica, ensuring
consistency. Reads with a special lock mode can access the backup fragment replicas.
When an NdbApi client (for example a MySQLD process) wants to read a row by primary key, it sends
a read request to a data node acting as a Transaction Coordinator (TC).
The TC node will determine which fragment the row would be stored in from the primary key, decide
which replica to access (usually the primary), and send a read request to the data node containing that
fragment replica. The data node containing the fragment replica then sends the row's data (if present)
directly back to the requesting NdbApi client, and also sends a read acknowledgement or failure
notification back to the TC node, which also propagates it back to the NdbApi client.
The 'critical path' for this protocol in terms of potential inter-data-node hops is four hops :
Client -> TC -> Fragment -> TC -> Client
To minimise remote client experienced latency, ideally two inter-node hops can be avoided by having
the TC node and the Fragment replica(s) on the same node. This requires controlling the choice of node
for TC based on the primary key of the data which will be read. Where a transaction only reads rows
stored on the same node as its TC, this can improve latency and system efficiency.
Distribution awareness
From NdbApi, users can specify a table and key when starting a transaction. The transaction will then
choose a TC data node based on where the corresponding row's primary fragment replica is located in
the system. This mechanism is sometimes referred to as 'transaction hinting'.
The Ndb handler in MySQLD generally waits for the first primary key lookup in a user session before
starting an NdbApi transaction, so that it can choose a TC node based on this. This is a best-effort
attempt at having the data node acting as TC colocated with the accessed data. This feature is usually
referred to as 'Distribution Awareness'.
Write operations also benefit from distribution awareness, but not to the same extent in systems with
NoOfReplicas > 1. Write operations must update all fragment replicas, which must be stored on
different nodes, in the same nodegroup, so for NoOfReplicas > 1, distribution awareness avoids inter-
node-group communication, and some intra-node-group communication, but some inter-data-node
communication is always required. In a system with good data partitioning and distribution awareness,
most read transactions will access only one data node, and write transactions will result in messaging
between the data nodes of a single node group. Messaging between node groups will be minimal.
Distribution keys
By default, the whole of a table's primary key is used to determine which fragment replica will store a
row. However, any subset of the columns in the primary key can be used. The key columns used to
determine the row distribution are called the 'distribution key'.
Where a table's primary key contains only one column, the distribution key must be the full primary
key. Where the primary key has more than one column, the distribution key can be different to (a subset
of) the primary key.
From MySQLD, a distribution key can be set using the normal PARTITION BY KEY() syntax. The
effect of using a distribution key which is a subset of the primary key is that rows with different
primary key values, but the same distribution key values are guaranteed to be stored in the same table
fragment.
Then we know that all rows with the same value(s) for the distribution key (user_id), will be stored on
the same fragment. If we know that individual transactions are likely to access rows with the same
distribution key value then this will increase the effectiveness of distribution awareness. Many schemas
are 'partitionable' like this, though not all.
Note that partitioning is a performance hint in Ndb - correctness is not affected in any way, and
transactions can always span table fragments on the same or different data nodes. This allows
applications to take advantage of the performance advantages of distribution awareness without
requiring that all transactions affect only one node etc as required by simpler 'sharding' mechanisms.
A further guarantee from Ndb is that two tables with the same number of fragments, and the same
number and type of distribution keys will have rows distributed in the same way.
For example, if we add another table :
Then we know that the rows with the same user_id in the user_prefs and user_accounts tables will be
stored on the same data node. Again, this helps with distribution awareness. In this example, we are
ensuring that rows related to a single user, as identified by a common user_id, will be located on one
data node, maximising system efficiency, and minimising latency.
MySQL Cluster supports arbitrary ordered indexes. Ordered indexes are defined on one or more
columns and support range scan operations. Range scans are defined by supplying optional lower and
upper bounds. All rows between these bounds are returned.
Each Ndb ordered index is implemented as a number of in memory tree structures (index fragments),
distributed with the fragments of the indexed table. Each index fragment contains the index entries for
the local table fragment. Having ordered indexes local to the table fragments makes index maintenance
more efficient, but means that there may not be much locality to exploit when scanning as rows in a
range may be spread across all index fragments of an index.
The only case where an ordered index scan does not require to scan all index fragments is where it is
known that all rows in the range will be found in one table fragment.
This is the case where both :
1. The ordered index has all of the table's distribution keys as a prefix
2. The range is contained within one value of the table's distribution keys
NdbApi detects this case when a range scan is defined, and 'prunes' the scan to one index fragment
(and therefore one data node). For all other cases, all index fragments must be scanned.
Continuing the example above, assuming an ordered index on the primary key, the following ordered
index scans can be pruned :
SELECT * FROM user_accounts WHERE user_id = 22;
SELECT * FROM user_accounts WHERE user_id = 22 AND account_type LIKE
'F%';
However, the following ordered index scans cannot be pruned, as matching rows are not guaranteed to
be stored in one table fragment :
Since MySQL 5.1, table partitioning has been supported. Tables can be partitioned based on functions
of the distribution keys such as :
KEY
LINEAR KEY
HASH
RANGE
LIST
For engines other than Ndb, partitioning is implemented in the Server, with each partition implemented
as a separate table in the Storage engine. Ndb implements these partition functions natively, using them
to control data distribution across table fragments in a single table.
From Ndb's point of view, KEY and LINEAR KEY are native partitioning functions. Ndb knows how
to determine which table fragment to use for a row from a table's distribution key, based on an MD5
hash of the distribution key.
HASH, RANGE and LIST are not natively supported by Ndb. When accessing tables defined using
these functions, MySQLD must supply information to NdbApi to indicate which fragments to access.
For example before primary key insert, update, delete and read operations, the table fragment to
perform the operation on must be supplied. From MySQLD, the partitioning layer supplies this
information.
Any NdbApi application can use the same mechanisms to manually control data distribution across
table fragments. At the NdbApi level this is referred to as 'User Defined' partitioning. This feature is
rarely used. One downside of using User Defined partitioning is that online data redistribution is not
supported. I'll discuss Online data redistribution in a future post here.
MySQL Cluster Database FAQ - General
1. What is MySQL Cluster?
A: MySQL Cluster is built on the NDB storage engine and provides a highly scalable, real-time, ACID-
compliant transactional database, combining 99.999% availability with the low TCO of open source.
Designed around a distributed, multi-master architecture with no single point of failure, MySQL
Cluster scales horizontally on commodity hardware to serve read and write intensive workloads,
accessed via SQL and NoSQL interfaces.
MySQL Cluster's real-time design delivers predictable, millisecond response times with the ability to
service millions of operations per second. Support for in-memory and disk-based data, automatic data
partitioning (sharding) with load balancing and the ability to add nodes to a running cluster with zero
downtime allows linear database scalability to handle the most unpredictable web-based workloads.
2. What is MySQL Cluster Carrier Grade Edition?
A: MySQL Cluster Carrier Grade Edition (CGE) includes tools for the management, monitoring
security and auditing of the MySQL Cluster database, coupled with access to Oracle Premier Support.
MySQL Cluster CGE is available under a choice of subscription or commercial license and support.
Learn more about MySQL Cluster CGE
3. Who are the customer references for MySQL Cluster?
A: See https://www.mysql.com/customers/cluster/.
4. What is the current version of MySQL Cluster?
A: The current GA version is MySQL Cluster 7.5. MySQL 5.7 is integrated and bundled with MySQL
Cluster.
5. Does MySQL Cluster require any special hardware or software?
A: No, MySQL Cluster is designed to run on commodity hardware. Using specialized hardware such as
Infiniband network interconnects one can achieve even higher levels of performance, especially over
large clusters with many nodes.
6. What are the system requirements for MySQL Cluster?
OS: See current list of Supported Platforms
CPU: Intel Xeon E5-2600 v4 (20+ cores/socket)
A: Memory: 64GB RAM
Storage: 512GB SSD
Network: 1+ nodes (Gigabit Ethernet - TCP/IP)
7. How do I qualify a workload as being a good fit for MySQL Cluster?
A: If you answer "YES" to any of the following questions, then you should consider MySQL Cluster as
an option for your application's database:
Do you need to shard your database to meet growing volumes of write (UPDATE, INSERT,
DELETE) operations?
Do you need to ensure results from SELECT operations are consistent, regardless of which
node they are returned from?
Would a failure of your database result in application downtime that would cause business
disruption, including loss of revenue, loss of reputation, etc?
Would data loss (even if just several seconds worth) during a failover cause business disruption?
Is your user experience sensitive to response times?
Do you need to replicate your database across geographic regions, with each region serving
both read and write operations?
Are you running a diverse range of applications that would benefit from direct access to data,
without always relying on SQL (ie JavaScript with node.js, memcached API, Java and JPA
applications, HTTP/REST web services and C++ apps)?
Does your application primarily consist of "short" transactions (ie 10s of operations per
transaction versus 1000s) executed in parallel?
Does your application mainly consist of:
Primary key database access, with some JOIN operations
versus
Regular execution of full table scans and JOINs returning 10s of thousands of rows?
Please also review our Evaluation Guide to learn more about MySQL Cluster.
8. What are the ideal applications for MySQL Cluster?
A: Ideal applications include:
High volume OLTP
Real time analytics
Ecommerce and financial trading with fraud detection
Mobile and micro-payments
Session management & caching
Feed streaming, analysis and recommendations
Content management and delivery
Massively Multiplayer Online Games
Communications and presence services
Subscriber/user profile management and entitlements
See a full list of MySQL Cluster user case studies and applications.
9. Is MySQL Cluster supported on Virtual Machine environments?
A: Yes. MySQL Cluster is tested and certified on Oracle VM.
10. What are the typical performance metrics for MySQL Cluster?
A:
Availability
99.999% (<5 min downtime/year)
Performance
Response Time: sub 5 millisecond (with synchronous replication and access via SQL).
Faster response times can be achieved using one of the NoSQL access methods.
Throughput of 600,000+ replicated UPDATE operations/sec on a dual-socket Intel
server equipped with 64GB RAM. 1 Billion UPDATE operations per minute across a
cluster of 30 x Intel Servers. See full benchmarks.
Failover
Sub-second failover enables you to deliver service without interruption
Scalability
Scale out, scale up and scale dynamically
For cost-effective scale-out:
Add more application and data nodes per cluster, or
Add more CPU threads (16, 32, 64, etc.) or
Add more Memory (32GB, 64GB, etc.) per data node
11. How many physical servers are needed to create a minimum Cluster configuration?
A: For evaluation and evelopment purposes, you can run all nodes on a single host. For full redundancy
and fault tolerance, you would need a minimum 6 x physical hosts:
2 x data nodes
2 x SQL/NoSQL Application Nodes
2 x Management Nodes
Many users co-locate the Management and Application nodes which reduces the number of nodes to
four.
12. Can Data Nodes be split geographically?
A: Yes, as long as your network has the characteristics discussed here
MySQL Cluster has long offered Geographic Replication, distributing clusters to remote data centers to
reduce the affects of geographic latency by pushing data closer to the user, as well as providing a
capability for disaster recovery.
Geographic replication is asynchronous and can be implemented as an Active / Active or Active /
Passive cluster.
Geographic replication is the recommended deployment model for cross data center deployments.
13. What data access APIs exist for MySQL Cluster?
A: Applications can be developed using any MySQL Connectors. MySQL Cluster additionally
provides native NoSQL connectivity via JavaScript, Memcached, C++, Java, JPA and HTTP/REST.
14. Are the interfaces for 32-bit applications different from 64-bit?
A: No, the interfaces are the same.
15. Is MySQL Cluster suitable as an embedded database?
A: Yes, MySQL Cluster is often used as an embedded database by ISVs and Network Equipment
Providers (NEPs). Customer List
16. What is Geographic Replication for MySQL Cluster?
A: Geographic replication enables asynchronous replication across active / active geographically
separate clusters. This is often used for the scale-out of global services, data locality and disaster
recovery.
17. Is Replication bi-directional?
A: Yes, unidirectional and bi-directional replication are supported in MySQL Cluster. Transaction
conflict detection and resolution is provided when using bi-directional geographic replication.
18. When using MySQL Cluster as an in-memory database, does MySQL Cluster risk losing data?
A: MySQL Cluster configurations will typically have at least 2 copies of all data, held on different
hosts. To cover total system failure, transaction logs and checkpoint files are persisted on disk with
configurable frequency. Additionally, non-indexed data may be stored on disk.
19. Does MySQL Cluster include a diskless option?
A: MySQL Cluster has a diskless option as well as a no-logging option.
For the Diskless option the following restrictions apply:
1. No disk data
2. Loss of data in case of Cluster failure
3. No backup
For the no logging option, Cluster will still create log files, but data will not be checkpointed to disk.