Monday, 13 April 2015

MySQL 5.7.6: It is easier to switch master now!

Introduction
One of the primary objectives of MySQL replication is providing an easy failover process i.e. switching to a redundant system if the primary MySQL server fails. In MySQL this translates to switching to the most appropriate slave server in the eventuality of a failure of the master server.
The promotion of a candidate slave to become a new master is based on a lot of factors, undoubtedly the most important factor being that the chosen slave should be most up to date with the primary server (the old master) before we lost it! This blog explains how to use new features in MySQL 5.7.4 and later to make failover easier.
To find the most up to date slave server, a failover script looks at the set of transactions received by the slave and compares it with every other slave to find the one that has received the biggest set of transactions. There could be more sophisticated ways of doing this, for instance you could choose one preferred slave that you want to promote (perhaps it has a better hardware configuration, it has no filters, physical location etc) and make sure it receives every transaction that has been received by all the other slaves. For simplicity though, let’s narrow down our definition of the most appropriate slave to promote to be the one that is most up to date with the lost master.
How to failover using GTID based replication
To denote a set of transactions, MySQL uses GTID sets (a set of global transaction identifiers). To read more about GTIDs, you can refer to our official documentation or developer blogs. To find the set of transactions received by a MySQL slave server, you simply execute:
mysql> SELECT RECEIVED_TRANSACTION_SET FROM peformance_schema.replication_connection_status;
+------------------------------------------+
| RECEIVED_TRANSACTION_SET                 |
+------------------------------------------+
| 4D8B564F-03F4-4975-856A-0E65C3105328:1-4 |
+------------------------------------------+
Execute this on every slave and compare the sets to find the slave with largest received transaction set- that’s your candidate slave for promotion. Let us call this slave the new master. Before you switch a slave to replicate from the new master, you earlier had to make sure all the transactions the slave has received are executed. In versions of MySQL prior to 5.7.4, you needed to do the following steps:
  1. stop slave.
  2. start slave to replicate until all received transactions are executed.
  3. wait until all received transactions are executed.
  4. switch master to redirect slaves to replicate from new master.
  5. start slave
This would translate to the following MySQL commands:
  1. STOP SLAVE;
  2. START SLAVE UNTIL SQL_AFTER_GTIDS= <received_transaction_set>;
  3. SELECT WAIT_FOR_EXECUTED_GTID_SET(<received_transaction_set>);
  4. CHANGE MASTER TO <new_master_def>;
  5. START SLAVE;
However, in MySQL-5.7.4, we introduced a feature which allows one to selectively stop only that component of replication which requires a change. This means that in the present context, to switch to a new master we only need to stop the receiver module (or in technical terms the I/O thread). The applier threads can continue applying transactions, if there are any pending, while we switch master. Building on this infrastructure we can now reduce the above steps to the following:
  1. Stop the receiver module (STOP SLAVE IO_THREAD).
  2. Switch master (CHANGE MASTER TO <new_master_def>).
  3. Start the receiver module (START SLAVE IO_THREAD).
Note the removal of the wait function (wait_for_gtid_executed_set) to ensure that the received transactions are executed before you switch master. There is no need for that step anymore!
How to failover using non-GTID based replication
If you are not using GTIDs, though we highly recommend you should, you can still take advantage of the current improvements. This means you can change the following part of your failover script:
  1. STOP SLAVE
  2. SHOW SLAVE STATUS to get coordinates (Read_Master_Log_Pos, Master_Log_File)
  3. START SLAVE UNTIL coordinates saved in step (2).
  4. SELECT MASTER_POS_WAIT (coordinates,...).
  5. CHANGE MASTER TO <new_master_def>.
  6. START SLAVE.
to the following simpler steps:
  1. STOP SLAVE IO_THREAD;
  2. CHANGE MASTER TO <new_master_def>;
  3. START SLAVE IO_THREAD.
Conclusion
Using the improvements in the newer versions of MySQL 5.7, failover becomes easier. Our effort to improve MySQL high availability continues and we remain committed to easing the processes. Please do try this, and as always let us know your feedback. You can also use mysqlfailover or MySQL Fabric that automate the failover process.

Sunday, 22 March 2015

MySQL-5.7.6: Introducing Multi-source replication

On March 10, 2015, we released MySQL-5.7.6 and among many other things it includes multi-source replication which provides the ability for a MySQL slave server to replicate from more than one MySQL master. We have been very careful with multi-source replication in terms of what exactly our users want and we have tried our best to incorporate as much feedback as possible. In the process, we released the feature twice under MySQL Labs asking users to try it out and tell us:
  1. If this caters to all their use cases,
  2. Plays well with the existing applications using single source replication so our users don’t have to change their old scripts at all if they do not intend to use multi-source and instead stick to single source replication.
  3. The user interface is in sync with our naming conventions to date, easy to understand and intuitive etc.
Note that the look and feel changed as we moved from one lab release to another and finally to the latest version as MySQL-5.7.6. In this post, I aim to introduce the released feature, the commands and how you can monitor the multi-source replication internals. Let's start with the following figure that best illustrates the core of multi-source replication.
multi-source-basic
In the figure above we have three MySQL sources (independent MySQL servers- master 1, master 2 and master 3) replicating to the only slave acting as a sink to collect all the data from all the three sources.
The use cases of multi-source, as you have probably already guessed, are related to data aggregation. Note that there is no conflict detection or resolution built into multi-source replication. We expect the application to make sure that data coming from different sources are non-conflicting. A typical setup could be something like this:
multi-source-with-databases
 
The same concept could be extended to shards (instead of databases). So another use case of multi-source replication is to join shards and make a full table on the sink (aka slave). .
To understand how to configure and monitor multi-source replication, we introduced the notion of channels. A channel is an abstraction of the internals of MySQL replication's finer details. It hides the machinery underneath while providing the level of detail that helps the user manage and understand multi-source replication. From a user perspective you can imagine a channel as a pipe between a master and a slave. If there are multiple masters, there are the same number of channels (or pipes) emerging out of the slave server as the number of sources as shown in the picture below:
multiple-channels
 
If you understand MySQL internals already and want to know exactly what constitutes a channel, look at the pink strip in the following picture. The replication channel documentation has all the details. But if you don’t know these details already, ignore this figure and move ahead. After all that is what we wanted to achieve with the concept of a channel.
channel
 
With the concept of channels established, you can now follow steps described in the tutorial section of our official documentation to work with multi-source replication. Note how the FOR CHANNEL <channel_name> clause now allows you to take each master-slave instance individually and work with them as if you were working with a single source replication topology.
Having set up multi-source replication and making sure there are no conflicts you can expect it to just work out of the box. But if you want more details you could look at our monitoring interfaces to provide you the details on every channel. In MySQL-5.7.2, we introduced performance_schema tables to monitor replication, the good news is that these tables were always designed with multi-source replication in mind so they should work seamlessly with multi-source replication. All six replication performance schema tables now have a “channel_name” field added to them to individually access the configuration and status on each channel. As an example, we have described performance_schema.replication_connection status in our manual. Given that you are working with multiple channels, lets walk through this table again and see how it presents the internals. Try out the following query to look at the receiver module:
ps-table3
We can see quite a few things here:
  1. Which master does channel1 replicate from?
    The one whose UUID is as given in the “source_uuid” column.
  2. Which thread is responsible for receiving transaction through channel1?
    Thread number 13. Note that this thread number is same as the one in performance_schema.threads table. You can use this information to now look into other performance_schema tables to mine more statistics using the joins.
  3. Is the receiver module of channel1 active or down at the moment?
    The channel is active and receiving transactions because its service state is ON.
  4. What transactions have been received via channel1?
    Transactions 1-4. See the global transaction identifiers in the field "received_transaction_set".
  5. What if there was an error?
    The last three fields in each row give an idea of the error on that channel, if any.
    See  the example below where channel1 has lost connection:
    ps-table4
  6. How about my network?
    Look at the last two columns to get an idea of this. The last time a heartbeat signal was sent is shown in the “last_heartbeat_timestamp” column.
    The "count_received_heartbeat" indicates how frequently heartbeats are being sent out. A big number here would mean that either the connection is not stable or this channel was idle having nothing to replicate.
Likewise there are more replication performance_schema tables that you can use to get to finer details. If there is anything else that you think should be available in these tables per channel, leave us a comment and we would be happy to take it forward. We truly believe in working with our users and look forward to your experience with this feature. There is a lot more to explore in multi-source replication so go try out the feature and if there is anything that you have suggestions for, please do leave a comment. Don't forget that MySQL 5.7.6 is a development milestone release (DMR)and therefore not yet declared generally available.

Monday, 9 March 2015

FOSSASIA 2015: MySQL Group replication preview

I will be presenting group replication on March 14, 2015 in the FOSSASIA conference. It is a 25 minute session and you can find the schedule here.  Here is an abstract of the talk:

MySQL Replication provides a solution for High Availability and Read
Scale-Out. Replication ensures that data written on one MySQL server
is made available on other MySQL servers at runtime in a fast,
consistent and fault tolerant manner with minimal impact to the
overall performance of the server.

Traditionally, MySQL Replication supports a single master and many
slaves, and it is either asynchronous or semi-synchronous. Recently, a
preview of a new replication plugin for MySQL was released and this is
named MySQL Group Replication. This plugin provides multi-master
update everywhere capabilities, making it possible to update data,
concurrently, on any server in a group. While not a fully synchronous
replication solution such as MySQL Cluster, it does provide additional
synchronization related to message exchanged between servers in a
group.

This talk explains how MySQL Group Replication facilitates improves
High Availability and simplifies replication and application
management - it will also include a demo.

Takeaways:
- What's in the MySQL Group Replication MySQL 5.7 labs release.
- Understanding the architecture of MySQL Group Replication.
See you there!