Monday, May 08, 2017

Getting to know MySQL InnoDB cluster, the new kid in the block

Innodb cluster3

InnoDB Cluster was released as GA a few weeks ago. I remember the initial announcement of the product at OOW 2016, promising a seamless solution for replication and high availability with great ease of use. I was a bit disappointed to see that, at GA release time, the InnoDB Cluster is a patchwork of three separate products (Group Replication, MySQL Router, MySQL Shell) which the users have to collect and install separately.

Given this situation, I was very pleased when Matthew Lord published Docker-InnoDB-Cluster, an image for Docker that contains everything you need to get the system up and running. The associated scripts make the experience even easier: not only we don't have to hunt for components, but the cluster deployment procedure is completely automated.

Installation

The process is painless. After cloning the repository the start script takes care of everything. It will create a network, deploy three database nodes, and fire up the router.

$ ./start_three_node_cluster.sh
Creating dedicated grnet network...
# network grnet already exists
NETWORK ID          NAME                DRIVER              SCOPE
8fa365076198        grnet               bridge              local
Bootstrapping the cluster...
12fb4bd975c2fb2e7152ed64e12d2d212bbc9f1d3b39d715ea0c73eeb37fed45
Container mysqlgr1 is up at Sun May  7 22:02:38 CEST 2017
Starting mysqlgr1 container...
Starting mysqlgr1 container...
MySQL init process done. Ready for start up.
Getting GROUP_NAME...
Adding second node...
a2b504ea1920d35b1555f65de24cd364fc1bc7a6ac87ca4eb32f4c02f5afce7c
Container mysqlgr2 is up at Sun May  7 22:02:48 CEST 2017
Starting mysqlgr2 container...
Starting mysqlgr2 container...
MySQL init process done. Ready for start up.
Adding third node...
393d46b9a1795531d99f68645087393a54b2463ef88b9b3c4cbe735c1527fe57
Container mysqlgr3 is up at Sun May  7 22:02:58 CEST 2017
Starting mysqlgr3 container...
Starting mysqlgr3 container...
MySQL init process done. Ready for start up.
Sleeping 10 seconds to give the cluster time to sync up
Adding a router...
830c3125bad70b09b057cee370ee490bcb88b1d4a1bfec347cda847942f3b56e
Container mysqlrouter1 is up at Sun May  7 22:03:17 CEST 2017
Done!
Connecting to the InnoDB cluster...

Most of the configuration (which has been simplified thanks to the usage of MySQL shell to add nodes) is handled inside the container initialization script. Just a few details are needed in the cluster deployment script to get the result.

The deployment script will also invoke the mysql shell in one of the nodes to show the status of the cluster:

Creating a Session to 'root@mysqlgr1:3306'
Classic Session successfully established. No default schema selected.
{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr1:3306",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

The above status is the result of dba.getCluster().status(), which is a convenient way of collecting a bunch of information about the cluster and then present them in a compact JSON structure. If you enable the general log prior to running this command, you would see something like this:

select count(*) from performance_schema.replication_group_members where MEMBER_ID = @@server_uuid AND MEMBER_STATE IS NOT NULL AND MEMBER_STATE != 'OFFLINE';
select count(*) from mysql_innodb_cluster_metadata.instances where mysql_server_uuid = @@server_uuid;
SELECT @@server_uuid, VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'group_replication_primary_member';
SELECT MEMBER_STATE FROM performance_schema.replication_group_members WHERE MEMBER_ID = '0030396b-3300-11e7-a8b6-0242ac170002';
SELECT CAST(SUM(IF(member_state = 'UNREACHABLE', 1, 0)) AS SIGNED) AS UNREACHABLE,  COUNT(*) AS TOTAL FROM performance_schema.replication_group_members;
select count(*) from performance_schema.replication_group_members where MEMBER_ID = @@server_uuid AND MEMBER_STATE IS NOT NULL AND MEMBER_STATE != 'OFFLINE';
select count(*) from mysql_innodb_cluster_metadata.instances where mysql_server_uuid = @@server_uuid;
SELECT @@server_uuid, VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'group_replication_primary_member';
SELECT MEMBER_STATE FROM performance_schema.replication_group_members WHERE MEMBER_ID = '0030396b-3300-11e7-a8b6-0242ac170002';
SELECT CAST(SUM(IF(member_state = 'UNREACHABLE', 1, 0)) AS SIGNED) AS UNREACHABLE,  COUNT(*) AS TOTAL FROM performance_schema.replication_group_members;
SELECT cluster_id, cluster_name, default_replicaset, description, options, attributes FROM mysql_innodb_cluster_metadata.clusters WHERE attributes->'$.default' = true;
show databases like 'mysql_innodb_cluster_metadata';
SELECT replicaset_name, topology_type FROM mysql_innodb_cluster_metadata.replicasets WHERE replicaset_id = 7;
select count(*) from performance_schema.replication_group_members where MEMBER_ID = @@server_uuid AND MEMBER_STATE IS NOT NULL AND MEMBER_STATE != 'OFFLINE';
select count(*) from mysql_innodb_cluster_metadata.instances where mysql_server_uuid = @@server_uuid;
SELECT @@server_uuid, VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'group_replication_primary_member';
SELECT MEMBER_STATE FROM performance_schema.replication_group_members WHERE MEMBER_ID = '0030396b-3300-11e7-a8b6-0242ac170002';
SELECT CAST(SUM(IF(member_state = 'UNREACHABLE', 1, 0)) AS SIGNED) AS UNREACHABLE,  COUNT(*) AS TOTAL FROM performance_schema.replication_group_members;
SELECT @@group_replication_single_primary_mode;
SHOW STATUS LIKE 'group_replication_primary_member';
SELECT mysql_server_uuid, instance_name, role, MEMBER_STATE, JSON_UNQUOTE(JSON_EXTRACT(addresses, "$.mysqlClassic")) as host FROM mysql_innodb_cluster_metadata.instances LEFT JOIN performance_schema.replication_group_members ON `mysql_server_uuid`=`MEMBER_ID` WHERE replicaset_id = 7;

In short, these commands check that the cluster is resilient, summarized in the statusText field, which says that we can lose up to one node and the cluster will keep working.

High Availability

What we have after deployment is a system that is highly available:

  • Group replication with one primary node;
  • Access to the cluster through the router, which provides one port for Read/write and one for Read-Only;
  • Automatic failover. When the primary node fails, another one is promoted on the spot, without any manual labor.

Let's start a test. We can check whether the data inserted from the R/W port is then retrieved by other nodes using the R/O port.

$ docker exec -it mysqlrouter1 /opt/ic/tests/test_router.sh
Server ID of current master
--------------
SELECT @@global.server_id
--------------

+--------------------+
| @@global.server_id |
+--------------------+
|                100 |
+--------------------+
Create content using router
--------------
create schema if not exists test
--------------

--------------
create table t1(id int not null primary key, name varchar(50))
--------------

--------------
insert into t1 values (1, "aaa")
--------------

The first part of the test will show the server ID of the primary node, by using the router R/W port (6446.) Then it will create a table and insert one record.

Server ID of a RO node
--------------
SELECT @@global.server_id
--------------

+--------------------+
| @@global.server_id |
+--------------------+
|                200 |
+--------------------+
retrieving contents using router
--------------
SELECT * from test.t1
--------------

+----+------+
| id | name |
+----+------+
|  1 | aaa  |
+----+------+

Using the read-only port (6447), we get a different node, and we retrieve the data created in the primary node.

Now we can test the high availability. Since we are using Docker, instead of simply kill the MySQL service, we can simulate an anvil falling on the server, by wiping away the container:

$ docker rm -f -v mysqlgr1
mysqlgr1

The primary node is gone for good. Let's see what the cluster status says now:

$ ./tests/check_cluster.sh 2
Creating a Session to 'root@mysqlgr2:3306'
Classic Session successfully established. No default schema selected.
{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr2:3306",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

There are a few differences compared to the initial report:

  • The primary is now node 2 (mysqlgr2);
  • Node 1 is marked as MISSING;
  • The cluster has lost its resilience. Unless we add another node, no further failures will be handled automatically.

We can run the router test again, and it will work just as well, with the differences reported below:

Server ID of current master
--------------
SELECT @@global.server_id
--------------

+--------------------+
| @@global.server_id |
+--------------------+
|                200 |
+--------------------+
Create content using router
--------------
create schema if not exists test
--------------

--------------
drop table if exists t1
--------------

--------------
create table t1(id int not null primary key, name varchar(50))
--------------

--------------
insert into t1 values (1, "aaa")
--------------

Server ID of a RO node
--------------
SELECT @@global.server_id
--------------

+--------------------+
| @@global.server_id |
+--------------------+
|                300 |
+--------------------+

We see that the primary has now ID 200, and the R/O node is 300 (the only other node that has survived.)

Summarizing

  • The good

    • I can see that some of the ease of use promised in San Francisco is already available. We can create a cluster with little effort.
    • The recovery from the master failure is transparent.
    • The cluster status gives clear information about the system.
  • The bad

    • MySQL shell is difficult to use. The command line help is insufficient: some options require trial and error to work correctly. It also does not use an options file like other MySQL clients.
    • Adding a node after the primary has become unavailable is harder than it should be, and the manual does not contemplate this case. It only mentions a server that can be restarted.
    • Restarting the router after the primary died is impossible with the current configuration.
    • The metadata for replication is now in three different schemas: mysql, performance_schema, and mysql_innodb_cluster_metadata. I understand the reasons, but I believe that a simplification would be possible.
  • The bottom line: quite good to start a cluster, but not enough to deal effectively with simple HA cases. Possibly released too early.

2 comments:

Unknown said...

I just pushed a patch that leverages a shared --network-alias value for all three GR nodes. That alias is then used when bootstrapping a router. Since that alias will automatically resolve to one of the available GR nodes, then we can add routers at any time and regardless of topology changes (ignore the first file, as I accidentally added that mysqlsh log file):
https://github.com/mattlord/Docker-InnoDB-Cluster/commit/0e7afc12e50ecdda9177f188f07372f8e9e5f3fa

So this should eliminate one of the noted items in the Bad list:
"Restarting the router after the primary died is impossible with the current configuration."

Of course that's not an issue with persistent volumes as the host list is stored in the router's persistent config file.

Thank you so much for testing out InnoDB Cluster, providing feedback, and helping to make MySQL better!

Giuseppe Maxia said...

@Matt,
Thanks for the update. I don't understand why you think the problem is in the router. When a node dies, the router should just point at the new primary, and operations continue.
I have also seen that you added a test_failure script that creates a new router after the node failure to get a working cluster. While this works, it is not high availability, as the connection with clients would break.