Sunday, September 25, 2016

MySQL at Oracle Open World 2016

MySQL is a growing presence at Oracle Open World. While most of the headlines belong to the main products, where Oracle services are aiming at world domination, MySQL shared the spotlight, as it was announced to be part of Oracle database cloud. It seems a logical move for Oracle: after all the effort to make MySQL 5.7 the biggest release ever, it stands to reason that it is offered as a competitive feature in its own database as a service.

With this offer, Oracle is applying enterprise pricing and methodologies to a target of emerging companies. MySQL in the Oracle cloud differs from the competition by a few key points:

  • It's only MySQL 5.7. While this is the most advanced MySQL server available, users with existing deployments may exercise caution before adopting this cloud service. Companies that are starting now, instead, may take advantage of using the latest and greatest.
  • It's MySQL Enterprise edition, with a wealth of additional monitoring and management tools that are missing in other cloud offerings.
  • Unlike some popular competitors, it includes access to the server host, allowing DBAs to enable plugins, fine tune the deployment, and explore the operating system in addition to the database.
  • It includes Oracle support.

Time will tell if this offering will be successful. It may not appeal to all the users, but there is surely a category of power users who can take advantage of these features.

The surprises of the MySQL keynote did not stop at the cloud. We had already seen, one week before the conference, that MySQL 8.0 was released, with many juicy features. What came unexpected are two announcements:

  • MySQL group replication, a product that has been living in MySQL Labs for quite a while, was declared "release candidate" despite lacking documentation and being released in a format that discourages adoption, except from diehard hackers.
  • Another product is been released, again in the labs, with an ambitious mission. The MySQL InnoDB cluster is based on group replication and wants to be a 15-minute deployment of high-availability and scalable system, thanks to an enhanced version of MySQL Shell (the same used for the document store. Its feature set are exciting, but what we have seen in the demos suggests that the product is still in the early stages of development.

With these moves, Oracle is showing two paths of MySQL development:

  • in the main path, which has produced the GA of MySQL 5.5, 5.6, and 5.7, the MySQL team is showing the positive influence of Oracle engineering, with focus on security, performance, and stability.
  • in a parallel course, which started last April with the announcement of MySQL document store and its related shell, the team wants to introduce new features to a GA release as plugins, with the reasoning that the main release will not be touched (thus avoiding the taboo of altering a stable product) but users are free to enable plugins and unleash new functionalities.

The mix of traditional and agile releases are provoking exciting thoughts, albeit moderated by the fear of using together experimental code in a GA deployment.

The methodology of these releases is also baffling. It is unclear how mature is the document store. The plugin comes with the server, and it is accompanied by an huge set of documentation, which implies that it has been designed extensively and tested internally for a while, but the main tool for the feature, mysql shell is labeled as development preview: not very encouraging. On the other hand, the latest plugin addition, the MySQL group replication, which has been declared of release candidate quality, is still in the labs (no plugin in the server release), and without documentation.

All considered, while it is clear that Oracle is putting an enormous engineering effort into growing MySQL, I have the feeling that the replication features have been neglected and the announcement of group replication mixed fortunes confirms me in this belief.

The conference was useful to me. I had the chance of meeting many Oracle engineers and users, and discuss technical and community matters at length. My own presentation, a two-hour tutorial on MySQL operations in Docker was highly satisfactory, as it proved to be an interesting topic that was actively discussed by the audience.

Wednesday, September 21, 2016

MySQL team: make it easy to give you feedback!

There was a bold announcement during the MySQL Keynote at Oracle Open World. A new product that will mix up with the existing GA server, called MySQL InnoDB Cluster. This is an evolution of MySQL group replication, which has been in the labs for long time, and the MySQL shell, which was introduced as a side feature last April. The boldness I mentioned before is on account of wanting to add to a GA server something that was defined as release candidate despite never having been out of the labs. The product is interesting as it promises to be a quick and painless cluster deployment, with built-in high availability and scalability.

What surprised me most was a heartfelt and urgent request to test this new product and provide feedback, hinting that it would be GA soon.

Here are some thoughts on this matter:

  • A product in the labs is perceived as pre-release, i.e. less than beta quality. This is what happened with previous releases on labs: GTID, multi-source replication, and data dictionary were all released in labs before eventually being integrated in the main project.
  • Putting a product in labs again and declaring it release candidate feels odd.
  • The problem with labs is that the previews are distributed with a limited set of packages, and without documentation. The brave souls that test these packages need to find information about the new software in blog posts or dig in the source code, without any assurance that this package would ever become officially supported.

There is some confusion about which package is of which quality. From the keynote it looked like MySQL InnoDB Cluster (MIC) was the one being RC, but when I asked for clarifications it seems that group replication is RC (from its niche in the labs) while MIC is still of unknown quality. From what I saw in the demos it seems quite alpha to me.

Back to the main topic. MySQL want feedback, but provides software in the labs, in a way that is not easy to use. Specifically:

  • There is an OSX package that contains .dmg files, implying that I should install those in my main computer. Given that the perceived quality is low, I'd say "No, thanks," as I don't want to risk my laptop with alpha quality installed as root. Besides, this is cluster software, so I would need at least three nodes to make it work. There is a "sandbox mode" that allows you to simulate three nodes on a single server, but this still requires a main installation, with all the risks involved. No, thanks, again.
  • There are only .rpm files for Linux, which means that I need to have either servers or VMs where to install software as root. I have the same concerns as I have for the Mac: while VMs can be thrown away and remade, it is still a big investment in time and resources to test something new.
  • Missing are generic .tar.gz binaries, which would allow users to install in user space, without affecting the operating system or other MySQL servers.
  • Missing are also Docker packages, which would allow users to test quickly and painlessly without any risk.
  • Finally, and probably most importantly, there is no documentation. If this is RC software, there should be at least a couple of workloads that could be included in the labs packages for reference.

Summing up, I have a message for the MySQL team product managers and developers: if the software is meant to be usable, i.e. more than a proof of concept as other things in the labs, move it to the downloads section, same as it happened with the MySQL Shell and the document store early this year. Also, provide Docker images early on, so that people can test without many risks. This exercise alone would discover bugs just while you are doing it. And please add documentation for the feature you want feedback for. If the manual is not ready, don't limit the docs to a skinny blog post, but add the specifications used to create the feature (workloads) or even an alpha version of the docs. In short, if the software is worth giving feedback, it should be treated with more respect than it is shown right now. And the same respect goes for the users whom you are asking feedback from.

Improving the design of MySQL replication

Now that MySQL 8.0 has been revealed, it's time to take a deep look at replication features in the latest releases, and review its overall design.

Server UUID vs Server-ID

At the beginning of replication, there was the server_id variable that identified uniquely a node in a replication system. The variable is still here, but in MySQL 5.6 it was joined by another value, which is created during the server initialisation, regardless of its involvement in a replication system. The server_uuid is a string of hexadecimal characters that is the basis for global transaction identifiers:

select @@server_id, @@server_uuid;
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc |
+-------------+--------------------------------------+

If this server is a master, its slave will report:

        Master_Server_ID: 101 
             Master_UUID: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc
       [...]
       Retrieved_Gtid_Set: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc:1-42
        Executed_Gtid_Set: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc:1-42

The serverid is still indispensable. We can't use the same value for two nodes in the replication system, or we get errors. But instead of using the serverid as the identifier for global transaction identifiers (GTID) we use the serverUUID, thus being subjected to the inhuman treatment of dealing with unreadable values to track our valuable transactions. Yet, server UUIDs have not replaced everything: if we want to exclude one or more servers from being applied, we still refer to them by serverid, as in the IGNORE_SERVER_IDS clause of CHANGE MASTER TO.

How should it be instead

While UUIDs guarantee that identifiers are unique in the the whole planet, the number of nodes in a replication system are finite, and for practical a single system may not go beyond the thousands. I don't care if my serverid is the same as a serverid in another continent: all I want is to be unique within my system. And since we have to use unique values for every node server_id, it is unnecessarily cruel to force users to deal with long strings instead of plain numbers.

VARIABLES vs STATUS

In general, MySQL behavior gets changed by setting variables to a given value either in the options file or using SET GLOBAL dynamically. To see the setting of a given value, we use SHOW VARIABLES:

show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 15    |
+-----------------+-------+

Here max_connections was changed from its default value to a much lower one. To see how many connections we have consumed, we use SHOW STATUS:

show status like '%connections%';
+-----------------------------------+---------------------+
| Variable_name                     | Value               |
+-----------------------------------+---------------------+
| Connection_errors_max_connections | 0                   |
| Connections                       | 7                   |
| Max_used_connections              | 1                   |
| Max_used_connections_time         | 2016-09-20 18:01:40 |
+-----------------------------------+---------------------+

That's the general theory. Sometimes, in replication, MySQL follows the same path. For example, in semi-synchronous replication, we have the settings of the functionality using variables:

 show variables like 'rpl%';
+-------------------------------------------+------------+
| Variable_name                             | Value      |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled              | ON         |
| rpl_semi_sync_master_timeout              | 10000      |
| rpl_semi_sync_master_trace_level          | 32         |
| rpl_semi_sync_master_wait_for_slave_count | 1          |
| rpl_semi_sync_master_wait_no_slave        | ON         |
| rpl_semi_sync_master_wait_point           | AFTER_SYNC |
| rpl_stop_slave_timeout                    | 31536000   |
+-------------------------------------------+------------+

and the running results of how the feature is performing using status:

set status like 'rpl%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 2     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 4     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 431   |
| Rpl_semi_sync_master_tx_wait_time          | 863   |
| Rpl_semi_sync_master_tx_waits              | 2     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 2     |
+--------------------------------------------+-------+

However, when it comes to GTID, the paradigm is broken:

slave1 [localhost] {msandbox} ((none)) > show global VARIABLES like '%gtid%';
+----------------------------------+-------------------------------------------+
| Variable_name                    | Value                                     |
+----------------------------------+-------------------------------------------+
| binlog_gtid_simple_recovery      | ON                                        |
| enforce_gtid_consistency         | ON                                        |
| gtid_executed                    | 00011808-1111-1111-1111-111111111111:1-42 |
| gtid_executed_compression_period | 1000                                      |
| gtid_mode                        | ON                                        |
| gtid_owned                       |                                           |
| gtid_purged                      | 00011808-1111-1111-1111-111111111111:1-42 |
| session_track_gtids              | OFF                                       |
+----------------------------------+-------------------------------------------+
8 rows in set (0.00 sec)

slave1 [localhost] {msandbox} ((none)) > show global STATUS like '%gtid%';
Empty set (0.01 sec)

Here we have variables like gtid_mode and enforce_gtid_consistency that are used to enable the behavior, but the monitoring of the result (gtid_executed and gtid_purged) should have been in STATUS, not in VARIABLES. The same usage of VARIABLES can be seen in MySQL group replication.

How should it be instead

The values that monitor the functionality should be in STATUS rather than in VARIABLES. However, in this case, it would be better if the values were tracked in performance_schema tables, as noted below.

GTID executed

When GTIDs are enabled, we can see its value in SHOW MASTER STATUS

SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: mysql-bin.000002
         Position: 6325
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
1 row in set (0.00 sec)

The information here has a precise meaning: the transaction set 1-22 comes from a master identified by UUID 00007801-1111-1111-1111-111111111111. The latest transaction corresponds to binary log mysql-bin.000002 at position 6325.

This is the identification of a source, and as such is the basis for monitoring. When we want to check if the slave service is running fine, we need to compare the slave progress against the information reported by the master. So far, so good.

However, when we use multiple masters, the matter becomes less clear. A slave that was replicating from this master and from two more, will show the following:

SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: mysql-bin.000002
         Position: 154
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

This is completely wrong. What we see here has nothing to do with being a master, as the data shown in the above statement was originated in different servers. The data is also wrong because it is thrown together in the same field. Using the same server, we can see the same information in different ways.

select @@global.gtid_executed\G
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

This is the same info seen before. In this context, it may seem less wrong, as it is only data coming from other servers.

Here again the information will become much more confused when the slave is also a master and produces its own data sets. If we create something in this server, we end up with the following:

select @@global.gtid_executed\G
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2

Now we have in the same field the data that was produced in this server and the data that came through replication. The matter looks even more embarrassing if we look at SHOW SLAVE STATUS, which reports distinct data for every channel, but the executed_gtid_set is reported all mixed up, in every channel, unchanged and confused, master data and slave data alike.

SHOW SLAVE STATUS\G
*************************** 1. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6325
               Relay_Log_File: mysql-relay-node1.000002
                Relay_Log_Pos: 6538
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6325
              Relay_Log_Space: 6747
[...]
             Master_Server_Id: 101
                  Master_UUID: 00007801-1111-1111-1111-111111111111
[...]
           Retrieved_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node1
           Master_TLS_Version:
*************************** 2. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6039
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 6252
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6039
              Relay_Log_Space: 6461
[...]
             Master_Server_Id: 102
                  Master_UUID: 00007802-2222-2222-2222-222222222222
[...]
           Retrieved_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node2
           Master_TLS_Version:
*************************** 3. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1177
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1390
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 1177
              Relay_Log_Space: 1599
[...]
             Master_Server_Id: 103
                  Master_UUID: 00007803-3333-3333-3333-333333333333
[...]
           Retrieved_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node3
           Master_TLS_Version:

How should it be instead

First off, the data should not be in a single variable, because it is composed of multiple values. But even if it were to be kept in the environment (as STATUS items, not variables, as we have seen above,) we should have a distinction:

SHOW STATUS LIKE 'gtid_applied'
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

SHOW STATUS LIKE 'gtid_generated'
*************************** 1. row ***************************
00007804-4444-4444-4444-444444444444:1-2

The SHOW SLAVE STATUS should report the right line for each channel. For example:

*************************** 1. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6325
               Relay_Log_File: mysql-relay-node1.000002
                Relay_Log_Pos: 6538
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6325
              Relay_Log_Space: 6747
[...]
             Master_Server_Id: 101
                  Master_UUID: 00007801-1111-1111-1111-111111111111
[...]
           Retrieved_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node1
           Master_TLS_Version:
*************************** 2. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6039
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 6252
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6039
              Relay_Log_Space: 6461
[...]
             Master_Server_Id: 102
                  Master_UUID: 00007802-2222-2222-2222-222222222222
[...]
           Retrieved_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
            Executed_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node2
           Master_TLS_Version:
*************************** 3. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1177
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1390
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 1177
              Relay_Log_Space: 1599
[...]
             Master_Server_Id: 103
                  Master_UUID: 00007803-3333-3333-3333-333333333333
[...]
           Retrieved_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node3
           Master_TLS_Version:

Missing information in performance_schema tables

The idea of having replication information in performance schema is to replace SHOW SLAVE STATUS with a set of tables that report the same info. The theory is good, but the implementation is lacking.

First of all, the tables are misnamed. replication_applier_configuration has only settings about the applier delay, while replication_applier_status again reports only info about the remaining delay. The information worth looking at for monitoring is in a table named replication_connection_status, while it should refer to applier status or applier progress.

select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: node1
               GROUP_NAME:
              SOURCE_UUID: 00007801-1111-1111-1111-111111111111
                THREAD_ID: 35
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
 RECEIVED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00007802-2222-2222-2222-222222222222
                THREAD_ID: 39
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
 RECEIVED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 3. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00007803-3333-3333-3333-333333333333
                THREAD_ID: 43
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:42
 RECEIVED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

What is missing here is the applied data, i.e. what appears in @@global.gtid_executed, which, as said above, should be split by server. Moreover, we miss the corresponding binary logs and positions, which we find in some neglected tables in mysql databases and that should be removed, with their contents integrated here. Another thing that's missing are the schema and the timestamps of the latest transaction, both the creation and the replication times. All this information is valuable to the DBA when troubleshooting. Rather than opening binary logs to find out the details, they should be all collected in this table for ease of use.

How should it be instead

The monitoring tables should include information about:

  • transaction timestamp;
  • binary log and position (or what a future implementation would use to transport data);
  • default schema.

Here's an example:

select * from replication_applier_progress\G
*************************** 1. row ***************************
             CHANNEL_NAME: node1
               GROUP_NAME:
              SOURCE_UUID: 00007801-1111-1111-1111-111111111111
                THREAD_ID: 35
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
    TRANSACTION_TIMESTAMP: 2016-09-20 22:24:13
          APPLY_TIMESTAMP: 2016-09-20 22:24:15
 RECEIVED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
  APPLIED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 6325
           Relay_Log_File: mysql-relay-node1.000002
            Relay_Log_Pos: 6538
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db1
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00007802-2222-2222-2222-222222222222
                THREAD_ID: 39
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
    TRANSACTION_TIMESTAMP: 2016-09-20 22:25:15
          APPLY_TIMESTAMP: 2016-09-20 22:25:18
 RECEIVED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
  APPLIED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 6039
           Relay_Log_File: mysql-relay-node2.000002
            Relay_Log_Pos: 6252
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db2
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 3. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00007803-3333-3333-3333-333333333333
                THREAD_ID: 43
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:42
    TRANSACTION_TIMESTAMP: 2016-09-20 22:28:16
          APPLY_TIMESTAMP: 2016-09-20 22:28:19
 RECEIVED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
  APPLIED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 1177
           Relay_Log_File: mysql-relay-node3.000002
            Relay_Log_Pos: 1390
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db3
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

The same level of detail should be used for replication_applier_status_by_worker, which, by the way, has a more appropriate name than replication_connection_status.

Monitoring tables are in different places

MySQL 5.6 introduced replication tables in the mysql database. Unfortunately, this inclusion happened before GTID was implemented. As a result, we have tables in mysql database reporting log names and positions and tables in performance_schema reporting GTIDs, but without correlations.

How should it be instead

The replication tables in mysql should be removed, and integrated in the performance_schema as shown above.

There is no table for SHOW MASTER STATUS

The information in SHOW SLAVE STATUS has been translated almost entirely in performance_schema tables. No such thing happened to SHOW MASTER STATUS. To make monitoring more efficient, all the data used for monitoring should be in tables, and the information should not contain more than one piece of information per field, as we have seen in previous cases.

How should it be instead

There should be a table containing the information from SHOW MASTER STATUS, except the filter info, which should be in a different table:

select * from replication_master_status\G
SHOW MASTER STATUS\G
*************************** 1. row ***************************
                 File: mysql-bin.000002
             Position: 154
    Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
Transaction_timestamp: 2016-09-20 22:24:13

select * from replication_filters\G
*************************** 1. row ***************************
     Filter_Name: binlog_do_db
     Filter_Yype: binlog
    Filter_Value: personnel.*
   Filter_Status: ACTIVE

Call for action

Dear MySQL product managers: please take action to make replication more consistent and usable. I may have forgotten something, but I think the requests in this article are a good start to improve the design of MySQL replication.

Monday, September 12, 2016

MySQL 8.0 first impressions

Mysql8

MySQL 8.0.0 was released today. It has been some time in the making, shrouded in a veil of secrecy for over one year. We knew, from listening to the gossip and looking at the few available previews, some of what was going to bring. So, for the observant users, its main features may not come as a surprise. For the rest of you, here's a quick roundup:

Notable features

  • No MyISAM tables anymore! The grant tables are now InnoDB, meaning that grant operations are now atomic.
  • A real data dictionary. This change is less visible than the previous one. The data dictionary tables are hidden and only a subset of the data is available through information_schema views. The reason for the hidden tables is to allow a stable interface through several versions. I am not happy about this choice, but I understand the reason.
  • The .frm files are gone. If you create a table using an engine other than InnoDB, the system creates a JSON file (.SDI) containing serialised data dictionary info.
  • Roles! You can now define a set of privileges as a role, and then assign the role to a user. This feature greatly simplifies user management.
  • You can set persistent variables from SQL (set persist variable_name=something) and they will survive a restart (except when they don't.
  • There is a new plugin infrastructure of which I haven't seen any example yet.

Gotchas

  • Some of the tables in the mysql database are gone. If you have tests that look for them (e.g. proc, event) they will fail, as these tables are now hidden in the data dictionary.
  • Only InnoDB tables support partitioning! If you create MyISAM or Archive tables and try to partition them, you will get an error.
  • The installation requires a new library (libnuma1 in Debian/Ubuntu, numactl-libs in CentOS). You need to install it before starting the MySQL server.

Unmatched expectations

  • The installation still issues the same warnings reported for MySQL 5.7. I was thinking that by now the developers know how to set the initialisation without triggering a warning about TIMESTAMPS. I was mistaken.

[Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
++ 2016-09-11T17:42:06.471261Z 1 [Warning] InnoDB: New log files created, LSN=49311
++ 2016-09-11T17:42:06.548747Z 1 [Warning] InnoDB: Creating foreign key constraint system tables.
++ 2016-09-11T17:42:07.593428Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 0f4a8ceb-7847-11e6-bd60-0242ac110002.
++ 2016-09-11T17:42:07.597075Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
  • The monitoring features for replication are unchanged. There is nothing new about reporting GTID execution in performance_schema tables, except some minor additions for RBR operations.

Taking MySQL 8.0 for a spin

MySQL-Sandbox 3.2.01 can install the new version from a tarball.

If you are on Linux, you can even try the newest MySQL-Sandbox feature, which gets you reduced binaries for a given MySQL version with a single command. Just install MySQL::Sandbox 3.2.02 and run:


make_sandbox GET:8.0 -- --no_show
2016-09-12 17:38:40 URL:https://raw.githubusercontent.com/datacharmer/mysql-docker-minimal/master/dbdata/available.txt [98/98] -> "/tmp/available.txt" [1]
wget -nv -O 8.0.0.tar.gz 'https://github.com/datacharmer/mysql-docker-minimal/blob/master/dbdata/8.0.0.tar.gz?raw=true'
    The MySQL Sandbox,  version 3.2.02
    (C) 2006-2016 Giuseppe Maxia
# Starting server
... sandbox server started
# Loading grants
Your sandbox server was installed in $HOME/sandboxes/msb_8_0_0

$  ~/sandboxes/msb_8_0_0/use
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 8.0.0-dmr MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost] {msandbox} ((none)) >  show schemas;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+
5 rows in set (0.00 sec)

At first sight, it looks like the usual MySQL. Same schemas, and almost the same tables. You will see a few tables related to roles in mysql, and a dozen new tables between information_schema and performance_schema. However, where the database has a different aspect is on the outside. Let's start with the data directory:


total 188468
-rw-r----- 1 root root       56 Sep 11 18:14 auto.cnf
-rw-r----- 1 root root     1813 Sep 11 18:14 ib_buffer_pool
-rw-r----- 1 root root 50331648 Sep 11 18:15 ib_logfile0
-rw-r----- 1 root root 50331648 Sep 11 18:15 ib_logfile1
-rw-r----- 1 root root 79691776 Sep 11 18:15 ibdata1
-rw-r----- 1 root root 12582912 Sep 11 18:15 ibtmp1
-rw-r----- 1 root root     4396 Sep 11 18:39 msandbox.err
drwxr-x--- 2 root root     4096 Sep 11 18:14 mysql
-rw-r----- 1 root root        4 Sep 11 18:15 mysql_sandbox8000.pid
-rw-rw---- 1 root root        3 Sep 11 18:15 mysqld_safe.pid
-rw-r----- 1 root root      225 Sep 11 18:14 performance_sche_3.SDI
drwxr-x--- 2 root root     4096 Sep 11 18:14 performance_schema
drwxr-x--- 2 root root     4096 Sep 11 18:14 sys
-rw-r----- 1 root root      210 Sep 11 18:14 sys_4.SDI
drwxr-x--- 2 root root     4096 Sep 11 18:15 test
-rw-r----- 1 root root      210 Sep 11 18:15 test_5.SDI

Notice that, for each schema except mysql, we have a .SDI file. For example:


cat performance_sche_3.SDI
{
    "sdi_version": 1,
    "dd_version": 1,
    "dd_object_type": "Schema",
    "dd_object": {
        "name": "performance_schema",
        "default_collation_id": 33,
        "created": 0,
        "last_altered": 0
    }
}

For tables other than innodb, we get a similar file, only more complex:


mysql [localhost] {msandbox} (test) > create table t1 (i int not null auto_increment primary key, c char(10) ) engine=myisam;
Query OK, 0 rows affected (0.01 sec)

####

ls -l ./test
total 8
-rw-r----- 1 root root    0 Sep 11 18:49 t1.MYD
-rw-r----- 1 root root 1024 Sep 11 18:49 t1.MYI
-rw-r----- 1 root root 2706 Sep 11 18:49 t1_326.SDI

cat test/t1_326.SDI
{
    "sdi_version": 1,
    "dd_version": 1,
    "dd_object_type": "Table",
    "dd_object": {
        "name": "t1",
        "mysql_version_id": 80000,
        "created": 20160911185434,
        "last_altered": 20160911185434,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=0;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [
            {
                "name": "i",
                "type": 4,
                "is_nullable": false,
                "is_zerofill": false,
                "is_unsigned": false,
                "is_auto_increment": true,
                "is_virtual": false,
                "hidden": false,
                "ordinal_position": 1,
                "char_length": 11,
                "numeric_precision": 10,
                "numeric_scale": 0,
                "datetime_precision": 0,
                "has_no_default": false,
                "default_value_null": false,
                "default_value": "AAAAAA==",
                "default_option": "",
                "update_option": "",
                "comment": "",
                "generation_expression": "",
                "generation_expression_utf8": "",
                "options": "interval_count=0;",
                "se_private_data": "",
                "column_key": 2,
                "column_type_utf8": "int(11)",
                "elements": [],
                "collation_id": 8
            },
            {
                "name": "c",
                "type": 29,
                "is_nullable": true,
                "is_zerofill": false,
                "is_unsigned": false,
                "is_auto_increment": false,
                "is_virtual": false,
                "hidden": false,
                "ordinal_position": 2,
                "char_length": 10,
                "numeric_precision": 0,
                "numeric_scale": 0,
                "datetime_precision": 0,
                "has_no_default": false,
                "default_value_null": true,
                "default_value": "",
                "default_option": "",
                "update_option": "",
                "comment": "",
                "generation_expression": "",
                "generation_expression_utf8": "",
                "options": "interval_count=0;",
                "se_private_data": "",
                "column_key": 1,
                "column_type_utf8": "char(10)",
                "elements": [],
                "collation_id": 8
            }
        ],
        "schema_ref": "test",
        "hidden": false,
        "se_private_id": 18446744073709551615,
        "engine": "MyISAM",
        "comment": "",
        "se_private_data": "",
        "row_format": 1,
        "partition_type": 0,
        "partition_expression": "",
        "default_partitioning": 0,
        "subpartition_type": 0,
        "subpartition_expression": "",
        "default_subpartitioning": 0,
        "indexes": [
            {
                "name": "PRIMARY",
                "hidden": false,
                "is_generated": false,
                "ordinal_position": 1,
                "comment": "",
                "options": "flags=0;",
                "se_private_data": "",
                "type": 1,
                "algorithm": 2,
                "is_algorithm_explicit": false,
                "is_visible": true,
                "engine": "MyISAM",
                "elements": [
                    {
                        "ordinal_position": 1,
                        "length": 4,
                        "order": 2,
                        "column_opx": 0
                    }
                ]
            }
        ],
        "foreign_keys": [],
        "partitions": [],
        "collation_id": 8
    }
}

If the table has many columns, the structure becomes quite large.

Let's try some new features:


mysql [localhost] {root} ((none)) > create role role1;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > grant all on test.* to role1;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > create role role2;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {root} ((none)) > grant select on *.* to role2;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > select host, user from mysql.user;
+-----------+-------------+
| host      | user        |
+-----------+-------------+
| %         | role1       |
| %         | role2       |
| 127.%     | msandbox    |
| 127.%     | msandbox_ro |
| 127.%     | msandbox_rw |
| 127.%     | rsandbox    |
| localhost | msandbox    |
| localhost | msandbox_ro |
| localhost | msandbox_rw |
| localhost | mysql.sys   |
| localhost | root        |
+-----------+-------------+
11 rows in set (0.00 sec)

From this, we see that roles are users. In fact, you can assign a user to another user, as if it were a role.


mysql [localhost] {root} ((none)) > show grants for role1;
+-------------------------------------------------+
| Grants for role1@%                              |
+-------------------------------------------------+
| GRANT USAGE ON *.* TO `role1`@`%`               |
| GRANT ALL PRIVILEGES ON `test`.* TO `role1`@`%` |
+-------------------------------------------------+
2 rows in set (0.00 sec)

mysql [localhost] {root} ((none)) > create user user1 identified by 'msandbox';
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {root} ((none)) > grant role1 to user1;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {root} ((none)) > alter user user1 default role   role1;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > show grants for user1;
+-----------------------------------+
| Grants for user1@%                |
+-----------------------------------+
| GRANT USAGE ON *.* TO `user1`@`%` |
| GRANT `role1`@`%` TO `user1`@`%`  |
+-----------------------------------+
2 rows in set (0.00 sec)

mysql [localhost] {root} ((none)) > show grants for user1 using role1;
+-------------------------------------------------+
| Grants for user1@%                              |
+-------------------------------------------------+
| GRANT USAGE ON *.* TO `user1`@`%`               |
| GRANT ALL PRIVILEGES ON `test`.* TO `user1`@`%` |
| GRANT `role1`@`%` TO `user1`@`%`                |
+-------------------------------------------------+
3 rows in set (0.00 sec)

 create user user2 identified by 'msandbox';
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {root} ((none)) > grant role2 to user2;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {root} ((none)) > show grants for user2;
+-----------------------------------+
| Grants for user2@%                |
+-----------------------------------+
| GRANT USAGE ON *.* TO `user2`@`%` |
| GRANT `role2`@`%` TO `user2`@`%`  |
+-----------------------------------+
2 rows in set (0.00 sec)

mysql [localhost] {root} ((none)) > show grants for user2 using role2;
+------------------------------------+
| Grants for user2@%                 |
+------------------------------------+
| GRANT SELECT ON *.* TO `user2`@`%` |
| GRANT `role2`@`%` TO `user2`@`%`   |
+------------------------------------+
2 rows in set (0.01 sec)

Now we connect with user1


mysql [localhost] {user1} ((none)) > use mysql
ERROR 1044 (42000): Access denied for user 'user1'@'%' to database 'mysql'
mysql [localhost] {user1} ((none)) > use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql [localhost] {user1} (test) > show tables;
+----------------+
| Tables_in_test |
+----------------+
| t1             |
+----------------+
1 row in set (0.00 sec)

That seems to be correct, as user1 can only access the test database. Let's try user2:


mysql [localhost] {user2} ((none)) > show tables from test;
ERROR 1044 (42000): Access denied for user 'user2'@'%' to database 'test'
mysql [localhost] {user2} ((none)) > show grants;
+-----------------------------------+
| Grants for user2@%                |
+-----------------------------------+
| GRANT USAGE ON *.* TO `user2`@`%` |
| GRANT `role2`@`%` TO `user2`@`%`  |
+-----------------------------------+
2 rows in set (0.00 sec)

mysql [localhost] {user2} ((none)) > set role role2;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {user2} ((none)) > show tables from test;
+----------------+
| Tables_in_test |
+----------------+
| t1             |
+----------------+
1 row in set (0.01 sec)

Here we have something that, unless we read the documentation carefully, we may miss. A role is ineffective if is not assigned. For user1, we did an ALTER USER and set the default role. Since we didn't do it for user2, we need to set the role explicitly before using its powers.

We can also assign to a user several roles, and activate them all:


mysql [localhost] {root} ((none)) > create user user3 identified by 'msandbox';
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > grant role1 to user3;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {root} ((none)) > grant role2 to user3;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {root} ((none)) > alter user user3 default role   role1,role2;
Query OK, 0 rows affected (0.00 sec)

Now, when we connect with user3 we can see all tables (SELECT grant from role2) and create or modify tables in test (all privileges on test database from role1.)

One final observation, just because it became crucial when I was starting to test these new binaries. Look at the size of the latest major versions of MySQL:


$ ls -lh mysql-5.5.52-linux2.6-x86_64.tar.gz
-rw-rw-r-- 1 root root 178M Aug 26 15:38 mysql-5.5.52-linux2.6-x86_64.tar.gz

$ du -sh mysql-5.5.52-linux2.6-x86_64/
690M     mysql-5.5.52-linux2.6-x86_64/

$ ls -lh mysql-5.6.33-linux-glibc2.5-x86_64.tar.gz
-rw-rw-r-- 1 root root 300M Aug 26 12:52 mysql-5.6.33-linux-glibc2.5-x86_64.tar.gz

$ du -sh mysql-5.6.33-linux-glibc2.5-x86_64/
1.1G     mysql-5.6.33-linux-glibc2.5-x86_64/

$ ls -lh mysql-5.7.15-linux-glibc2.5-x86_64.tar.gz
-rw-rw-r-- 1 root root 612M Aug 25 13:08 mysql-5.7.15-linux-glibc2.5-x86_64.tar.gz

$ du -sh mysql-5.7.15-linux-glibc2.5-x86_64/
2.5G     mysql-5.7.15-linux-glibc2.5-x86_64/

$ ls -lh mysql-8.0.0-dmr-linux-glibc2.12-x86_64.tar.gz
-rwxr-xr-x 1 root root 961M Aug 26 17:37 mysql-8.0.0-dmr-linux-glibc2.12-x86_64.tar.gz

$ du -sh mysql-8.0.0-dmr-linux-glibc2.12-x86_64/
3.6G     mysql-8.0.0-dmr-linux-glibc2.12-x86_64/

Monday, August 08, 2016

Lightning talks at Percona Live Amsterdam: come on stage!

Noun speech bubble 181713 Noun theater 32561

The schedule to Percona Live Europe 2016 is being finalized. It should be published soon, and it will include 3 hour and 6 hour tutorials, regular 50-minute talks, and shorter 25-minutes talks. It will not include, however, the lightning talks, a session comprising several 5-minute talks, and they are missing from the initial schedule for two reasons:

  • We want to offer a chance to all attendees to propose a lightning talk for some time. We would like to have more quality material to choose from.
  • They will not be in the conference venue. Instead, they will be hosted at Booking, during the community dinner.

Unlike in previous editions, though, the lightning talks will be kept separated from the party. We want the talks to be enjoyable for both the speakers and the attendees, free from distracting noise. Booking will offer its auditorium for this purpose, and this means a chance to speak in a very prestigious place.

Accepted lightning talks speakers will receive a 50 EUR discount on the conference registration. Additionally, LT speakers will enjoy an further benefit, as they will receive a free ticket for the community dinner hosted at Booking headquarters.

You have still some time left. I will accept proposals until the and of August. Don't wait! Submit a proposal for a lightning talk and claim your spot in the limelight (and a free dinner!)

Sunday, July 10, 2016

The fastest MySQL Sandbox setup ever!

MySQL-Sandbox 3.1.11 introduces a new utility, different from anything I have put before in the MySQL Sandbox toolkit.

make_sandbox_from_url downloads a tiny MySQL tarball from a repository and install it straight away.

As of today, the following packages are available

Major release versions package size
(what you download)
expanded size
(storage used)
original size
(not included)
5.0 5.0.96 20M 44M 371M
5.1 5.1.72 23M 59M 485M
5.5 5.5.50 15M 49M 690M
5.6 5.6.31 18M 61M 1.1G
5.7 5.7.13 33M 108M 2.5G

The sizes of the tarballs mentioned in the table above are much smaller than the original packages. The binaries have been stripped of debug info, compressed whenever possible, and purged of all binaries that are not needed for sandbox operations. This means that:

  • You can download the needed tarball very fast;
  • The storage needed for the binaries is reduced immensely.

Noun archive 8572

Here is an example of the script in action. We download and install mySQL 5.0.96 in one go:

$ make_sandbox_from_url 5.0 -- --no_show
wget -O 5.0.96.tar.gz
'http://github.com/datacharmer/mysql-docker-minimal/blob/master/dbdata/5.0.96.tar.gz?raw=true'
URL transformed to HTTPS due to an HSTS policy
--2016-07-10 17:59:33--
https://github.com/datacharmer/mysql-docker-minimal/blob/master/dbdata/5.0.96.tar.gz?raw=true
Resolving github.com (github.com)... 192.30.253.112
Connecting to github.com (github.com)|192.30.253.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location:
https://github.com/datacharmer/mysql-docker-minimal/raw/master/dbdata/5.0.96.tar.gz
[following]
--2016-07-10 17:59:33--
https://github.com/datacharmer/mysql-docker-minimal/raw/master/dbdata/5.0.96.tar.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location:
https://raw.githubusercontent.com/datacharmer/mysql-docker-minimal/master/dbdata/5.0.96.tar.gz
[following]
--2016-07-10 17:59:34--
https://raw.githubusercontent.com/datacharmer/mysql-docker-minimal/master/dbdata/5.0.96.tar.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)...
151.101.12.133
Connecting to raw.githubusercontent.com
(raw.githubusercontent.com)|151.101.12.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20052235 (19M) [application/octet-stream]
Saving to: ‘5.0.96.tar.gz’

5.0.96.tar.gz
100%[=================================================================================>]
19.12M  15.2MB/s    in 1.3s

2016-07-10 17:59:37 (15.2 MB/s) - ‘5.0.96.tar.gz’ saved [20052235/20052235]

    The MySQL Sandbox,  version 3.1.11
    (C) 2006-2016 Giuseppe Maxia
# Starting server
. sandbox server started
# Loading grants
Your sandbox server was installed in $HOME/sandboxes/msb_5_0_96

If you call the same command twice, you will get a message saying that you can now use make_sandbox x.x.xx to install your sandbox.

The script is doing what I should probably have done from the beginning by default: expands the tarball in $SANDBOX_BINARY (by default $HOME/opt/mysql) from where it is easy to reuse with minimum typing.

As of today, the binaries are Linux ONLY. I made this repository to use it with Docker (I will write about it soon) and that means using Linux. This is still part of an experiment that so far is working well. The project can either evolve in smarter directions or merge with clever containers. It's early to say. For now, enjoy the fastest set-up that MySQL Sandbox can offer!

Sunday, June 05, 2016

MySQL 5.7 ghost users

Several months ago I reported on Default Users in MySQL 5.7. With the addition of the sys schema the server needs an extra user to secure operations. The problem with extra users is that, if you are not aware of their existence and why they are needed, you may easily mismanage them. For example, you may have a cleanup routine that you run when you use a new server, and the routine may have a command like

DELETE FROM mysql.user WHERE user != 'root';

This was good up to MySQL 5.6. Then the sys schema was added, and with it the user mysql.sys, which may cause errors if you try to re-create views in the sys schema.

The latest user sneaking below the radar is mysqlxsys. Like its predecessor, it comes well documented, and if only I had read all the manual I'd had known that this user must exist, or things won't work.

Noun bug 9935

Here is how they did not work for me:

$ make_sandbox 5.7.12 -- -c plugin-load=mysqlx=mysqlx.so --no_show
    The MySQL Sandbox,  version 3.1.05
    (C) 2006-2016 Giuseppe Maxia
loading grants
.. sandbox server started
Your sandbox server was installed in $HOME/sandboxes/msb_5_7_12

Check that the plugin is loaded

$ ~/sandboxes/msb_5_7_12/use -e "show plugins" |grep mysqlx
mysqlx    ACTIVE    DAEMON    mysqlx.so    GPL

Check that the port is open

$ sudo netstat -atn  |grep LISTEN|grep 33060
tcp4       0      0  *.33060                *.*                    LISTEN

Try a connection:

$ mysqlsh -h 127.0.0.1 -umsandbox -pmsandbox test
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating an X Session to msandbox@127.0.0.1:33060/test
MySQL Error (1045): Unable to switch context to user mysqlxsys

This was my fault, of course. MySQL-Sandbox uses an aggressive cleanup routine, which was necessary up to MySQL 5.6, and less so in MySQL 5.7, but just for the sake of completeness it's still there. The routine removes all users that are neither root or mysql.sys. In this case the installation failed because I was removing an user that was needed.

I am not alone in this affair. Also the MySQL Docker image does the same thing.

A fix is ready for both these issues. MySQL-Sandbox 3.1.08 is already available.

Monday, May 16, 2016

Custom commands during MySQL Sandbox installation

MySQL Sandbox 3.1.07 adds several options to execute shell or SQL commands during the sandbox installation.

Mysql sandbox states and hooks

Figure 1: MySQL Sandbox states and where you can run the hooks

There are 3 options to run shell commands, 2 to run SQL queries, and 2 to run SQL files.

## Shell commands
--pre_start_exec=command  : runs 'command' after the installation, before the server starts
--pre_grants_exec=command : runs 'command' after the server starts, before loading the grants.
--post_grants_exec=command : runs 'command' after the loading the grants.

## SQL statements
--pre_grants_sql=query : runs 'query' before loading the grants.
--pre_grants_file=filename : runs SQL file 'filename' before loading the grants.
--post_grants_sql=query : runs 'query' after the loading the grants.
--post_grants_file=filename : runs SQL file 'filename' before loading the grants.

Moreover, a new option --load_plugin=plugin[:plugin_file_name] loads a given plugin before the grants file is loaded.

What can you use this for?

The need arose when I needed general log activated at start-up, but did not want the option to stay in the configuration file. Then I also found that running a SQL command to install a plugin and check its status would be nice to have. When enough cases piled up, I implemented a few changes that allow MySQL Sandbox to accept commands at specific stages of the deployment. Figure 1 shows where these hooks are available.

Some use cases for SQL:

  • activate general log;
  • install plugins (although there is a specialized option for that)
  • enable superreadonly : this will only work after loading grants. If enabled before, the user creation commands will fail.
  • show the status of database variables, status, and objects.

Some use case for shell commands:

  • See the composition of the sandbox at various stages;
  • Run diagnostic tools, backups, synchronization tasks;
  • Perform dangerous tasks to test the server responsiveness;
  • General purpose hacks.

When the shell command runs, MySQL Sandbox provides several environment variables that facilitate interaction with the database that has been installed.

SANDBOX_DIR   =  sandbox directory;
BASEDIR       =  base directory for the sandbox binaries
DB_DATADIR    =  data directory
MY_CNF        =  configuration file
DB_PORT       =  database port
DB_USER       =  database user
DB_PASSWORD   =  database password
DB_SOCKET     =  database socket
MYSQL_VERSION =  MySQL version (e.g. 5.7.12)
MYSQL_MAJOR   =  Major part of the version (e.g 5)
MYSQL_MINOR   =  Minor part of the version (e.g 7)
MYSQL_REV     =  Revision part of the version (e.g 12)
EXEC_STAGE    =  Stage of the execution (pre_start_exec, pre_grants_exec, post_grants_exec)

Mysql sandbox order of execution

Figure 2: MySQL Sandbox order of execution

You can combine shell and SQL calls for all stages. They will run in the order shown in Figure 2.

Examples

Let's install the XPlugin. And using the shell commands we can check the status of the system from the outside.

$ make_sandbox 5.7.12 -- --load_plugin=mysqlx \
  --pre_grants_exec='echo "##open ports";sudo netstat -atn  |grep LISTEN|grep 33060; echo "##"' \
  --post_grants_exec='echo "open ports";sudo netstat -atn  |grep LISTEN|grep 33060;echo "##"'
[...]
# Starting server
.. sandbox server started
# [pre_grants_exec] system "echo "##open ports";sudo netstat -atn  |grep LISTEN|grep 33060; echo "##""
##open ports
##
--------------
INSTALL PLUGIN mysqlx soname 'mysqlx.so'
--------------

--------------
select plugin_name, plugin_version, plugin_status from information_schema.plugins where plugin_name = 'mysqlx'
--------------

+-------------+----------------+---------------+
| plugin_name | plugin_version | plugin_status |
+-------------+----------------+---------------+
| mysqlx      | 1.0            | ACTIVE        |
+-------------+----------------+---------------+
# Loading grants
# [post_grants_exec] system "echo "open ports";sudo netstat -atn  |grep LISTEN|grep 33060;echo "##""
open ports
tcp4       0      0  *.33060                *.*                    LISTEN
##
Your sandbox server was installed in $HOME/sandboxes/msb_5_7_12

Here, the load_plugin option loads the plugin before the grants. Before this happens, the shell command runs netstat to check the status of port 33060, which is the one that the MySQL XProtocol uses by default. We can see that the port is not available at that stage. The same command running after the grants are loaded shows that the port is open.

Another example. This time we will use a shell script that uses most of the variables listed above:

$ cat t/test_init_exec.sh
#!/bin/bash
echo "----------------------------------------------------------------"
echo "Stage: $EXEC_STAGE"
if [ "$EXEC_STAGE" == "pre_start_exec" ]
then
    echo "PWD <$PWD> "
    echo "VER <$MYSQL_VERSION> "
    echo "DIR <$SANDBOX_DIR> "
    echo "DATADIR <$DB_DATADIR> "
    echo "BASEDIR <$BASEDIR> "
    echo "SOCKET <$DB_SOCKET> "
    echo "MY_CNF <$MY_CNF>"
    echo "USER/PASSWORD/PORT <$DB_USER> <$DB_PASSWORD> <$DB_PORT> "
    echo "Version components <$MYSQL_MAJOR> <$MYSQL_MINOR> <$MYSQL_REV>"
fi
cd $SANDBOX_DIR
# ls
echo '## Data directory'
ls  data
echo "----------------------------------------------------------------"

The script shows the value of the variables for the first stage only and the contents of the data directory:

$ make_sandbox 5.7.12 --  --pre_start_exec=./t/test_init_exec.sh --post_grants_exec=./t/test_init_exec.sh
[...]
# [pre_start_exec] system "./t/test_init_exec.sh"
----------------------------------------------------------------
Stage: pre_start_exec
PWD <$HOME/workdir/git/mysql-sandbox>
VER <5.7.12>
DIR <$HOME/sandboxes/msb_5_7_12>
DATADIR </Users/gmax/sandboxes/msb_5_7_12/data>
BASEDIR <$HOME/opt/mysql/5.7.12>
SOCKET </tmp/mysql_sandbox5712.sock>
MY_CNF <$HOME/sandboxes/msb_5_7_12/my.sandbox.cnf>
USER/PASSWORD/PORT <msandbox> <msandbox> <5712>
Version components <5> <7> <12>
## Data directory
auto.cnf        ib_buffer_pool        ibdata1            mysql            performance_schema    sys
----------------------------------------------------------------
# Starting server
.. sandbox server started
# Loading grants
# [post_grants_exec] system "./t/test_init_exec.sh"
----------------------------------------------------------------
Stage: post_grants_exec
## Data directory
auto.cnf        ib_logfile0        ibdata1            msandbox.err        mysql_sandbox5712.pid    performance_schema    test
ib_buffer_pool        ib_logfile1        ibtmp1            mysql            mysqld_safe.pid        sys
----------------------------------------------------------------
Your sandbox server was installed in $HOME/sandboxes/msb_5_7_12

We can see that the data directory, before the server starts, contains only the files created by mysqld --initialize-insecure, while after the start and the grant load we get the InnoDB log files, the .pid files, and the test database (created by the grants script).

Wednesday, May 04, 2016

Docker for Mac beta and MySQL - First impressions

Using Docker for development is a great way of ensuring that what you develop will be the same that you deploy in production. This is true for almost everything. If you develop on Linux, the above statement holds. If you develop on a different operating system (OSX or Windows) there are several restrictions.

I showed one of those issues in a recent article (MySQL and Docker on a Mac: networking oddity.) When you want to export a port from a service running in the container, the exported port is not available in your mac, but in the virtual machine that runs Docker services. This happens with any application that listens to a port.

The second limitation I found affects only MySQL, and it is related to using volumes. The proper way of achieving data persistence with containers is through volumes, i.e. telling the container to run the data directory in a virtual path that refers to some safe place in the host computer. That can't be done on a Mac, because the host computer is a virtual machine, and even though Docker can access a folder in your Mac, the server installation fails for lack of permissions.

Both the above restrictions are lifted if you use the beta release of Docker for Mac and Windows. It's a private beta: you need to apply and wait to be given an operational token, but once you are in, you notice the differences between the beta and the "old" Docker-Toolbox:

Docker beta

  • The Docker app is a native app, which you install by copying its icon to the /Application folder;
  • You don't need Virtualbox or VMware Fusion. It comes with its own lightweight VM based on xhyve.
  • There is no need to run docker-machine start xxx and eval $(docker-machine env xxx). The new app is fully integrated with the OS.
  • Ports exported from a container are available in your Mac.
  • You can keep both the Docker Toolbox and the new Docker app in the same host, provided that you don't run them both in the same terminal session.

Back to our claim of lifted limitations: let's try a full installation on a Mac as we would do it on Linux.

$ docker run --name mybox -e MYSQL_ROOT_PASSWORD=secret -d \
    -v ~/docker/mysql/single:/var/lib/mysql \
    -p 5000:3306 mysql/mysql-server
72ca99918076ff0e5702514311cc706ffcc27f98917f211e98ed187dfda3b47b

$ ls  ~/docker/mysql/single/
auto.cnf    client-key.pem  ibdata1     mysql.sock.lock    server-cert.pem
ca-key.pem  ib_buffer_pool  ibtmp1      performance_schema server-key.pem
ca.pem      ib_logfile0     mysql       private_key.pem    sys
client-cert.pem             ib_logfile1 mysql.sock         public_key.pem

We create a MySQL server container with the internal port 3306 exposed to the external port 5000, and the data directory running in the host directory $HOME/docker/mysql/single. It seems that the data directory was created correctly. Now we use a MySQL client on the Mac to connect to the container, using port 5000 on the local network address (Note: there is NO database server running on my mac. Only in the container).

$ sudo netstat -atn  |grep LISTEN | grep 5000
tcp4       0      0  *.5000                 *.*                    LISTEN

$ ~/opt/mysql/5.7.12/bin/mysql -h 127.0.0.1  -u root -psecret -P 5000
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.12 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create schema hello_docker;
Query OK, 1 row affected (0.01 sec)

mysql> use hello_docker
Database changed
mysql> create table i_am_here(id int);
Query OK, 0 rows affected (0.03 sec)

mysql> exit
Bye

$ ls  ~/docker/mysql/single/hello_docker/
db.opt        i_am_here.frm    i_am_here.ibd

This is full integration! Using a Mac client we connected to the container, where we created a table, which then appeared inside the data directory in the Mac host!

It's still early to say if this beta is ready for more serious work, but the first impressions are really good!

Monday, May 02, 2016

Taking the MySQL document store for a spin

This is not a comprehensive review, nor an user guide. It's a step-by-step account of my initial impressions while trying the new MySQL XProtocol and the document store capabilities. In fact, I am barely scratching the surface here: more articles will come as time allows.

MySQL 5.7 has been GA for several months, as it was released in October 2015. Among the many features and improvements, I was surprised to see the MySQL team emphasizing the JSON data type. While it is an interesting feature per se, I failed to see the reason why so many articles and conference talks were focused around this single feature. Everything became clear when, with the release of MySQL 5.7.12, the MySQL team announced a new release model.

Overview

In MySQL 5.7.12, we get the usual MySQL server, which shouldn't have new features. However, in an attempt to combine the stability of the server with a more dynamic release cycle, the server ships with a new plugin, unimaginatively named X-Plugin, which supports an alternative communication protocol, named X-Protocol.

In short, the X-Protocol extends and replaces the traditional client/server protocol, by allowing asynchronous communication to the server, using different API calls, which are available, as of today, in Javascript, Python, C#, and Java, with more languages to come.

The reason for this decision is easy to see. Many developers struggle with relational tables and SQL, while they understand structures made of arrays and associative arrays (maps.) This is also one of the reasons for the recent popularity of NoSQL databases, where schemas and tables are replaced by collections of documents or similar schema-less structures. With this new release, MySQL wants to offer the best of two worlds, by allowing developers to use the database with the tools they feel most comfortable with.

To use the new plugin, you need two components:

  • The plugin itself, which ships with the server package, but is not enabled by default;
  • The MySQL shell, a new command line tool that you have to download and install separately and will allow you to use Javascript or Python with the database.

As a QA professional, I am a bit concerned about this mix of GA and alpha features (The MySQL shell is defined as alpha software. and the shell itself says development preview in its help). Theoretically, the two worlds should be separated. If you don't install the plugin, the server should work as usual. But practice and experience tell me that there are dangers waiting for a chance to destroy our data. If you want a single piece of advice to summarize this article, DON'T USE the new MySQL shell with a production server. That said, let's start a quick tour.

Installation

You need to install the shell, which comes in a package that is different from the rest of MySQL products. The manual shows how to install it on OSX or Linux. The only mention that this product could be dangerous to use is a note reminding the user to enable the MySQL Preview Packages when installing from a Linux repository. The procedure, on any operating system, will install library and executables globally. Unlike the server package, it is not possible to install it in a user-defined directory, like you install the server with MySQL Sandbox. In this context, the standard Oracle disclaimer may have a meaning that goes beyond a regular CYA.

Next, you need to enable the plugin. You can do it in three ways:

(1)

$ mysqlsh --classic -u youruser -p --dba enableXProtocol
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating a Classic Session to youruser@localhost:3306
Enter password:
No default schema selected.

enableXProtocol: Installing plugin mysqlx...
enableXProtocol: done

(2)

Start the server with --plugin-load=mysqlx=mysqlx.so. This will enable the plugin, although it does not seem to work the way it should.

(3)

Enable the plugin with a SQL command.

mysql> install plugin mysqlx soname 'mysqlx.so';

I prefer method #3 because is the only one that does not have side effects or misunderstanding. The issue that hit me when I tried method #1 for the first time is that calling mysqlsh --classic uses the client/server protocol on port 3306 (or the port that you defined for the database) while subsequent calls will use the X-Protocol on port 33060.

Alternatives. Using Docker

If what I said previously has made you cautious and you have decided not to use the shell in your main computer (as you should), there are alternative ways. If you have a data center at your disposal, just fire a virtual machine and play with it. However, be aware that the MySQL shell does not install in Ubuntu 15.04 and 16.04.

A lightweight method to try on the new shell without endangering your production server is to use a Docker image for MySQL, or a combination of MySQL Sandbox and Docker.

In Docker, the MySQL shell does not ship together with the server. It requires a separate image. A quick guide is available in a recent article. I don't like the current approach: having two images is a waste of space. It would be acceptable if the images were based on a slim Linux distribution, such as Alpine. Since they run on OracleLinux, instead, you need to download two beefy images to start testing. With a fast internet connection this should not be a problem, but if you live in a place where 3 MBPS is the norm or if you are traveling, this could become an annoyance. Once you have pulled the images, you can use them at will, even without internet connection.

The above mentioned quick guide suggests using docker run --link to connect the two containers. I recommend a different approach, as the link option is now considered legacy.

$ docker network create mynet
edcc36be21e54cdb91fdc91f2c320efabf62d36ab9d31b0142e901da7e3c84e9
$ docker network ls
NETWORK ID          NAME                DRIVER
a64b55fb7c92        bridge              bridge
0b8a52002dfd        none                null
cc775ec7edab        host                host
edcc36be21e5        mynet               bridge

$ docker run --name mybox  -e MYSQL_ROOT_PASSWORD=secret -d --net mynet mysql/mysql-server:5.7.12 \
    --plugin-load=mysqlx=mysqlx.so
ecbfc322bb17ec0b1511ea7321c2b10f9c7b5091baee4240ab51b7bf77c1e424

$ docker run -it --net mynet mysql/shell -u root -h mybox -p
Creating an X Session to root@mybox:33060
Enter password:
No default schema selected.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help', '\h' or '\?' for help.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.
mysql-js>

The first command creates a network (called mynet).

The second command creates the server container, which is launched using the network mynet and with the plugin-load option (which seems to work well with the docker image). When you use a docker network, the container name is recognized by the network as an host name, and can be called by other members of the network. This is much cleaner than using --link.

The third command runs the MySQL shell, using the same network. This allows us to use the container name (mybox) without any other options.

Running the MySQL Javascript shell

My favorite setup for this test is a mix of MySQL Sandbox for the server and Docker for the shell. This way I can use the alpha shell without polluting my Linux host and use a feature rich MySQL Sandbox to control the server.

Here is what I do:

$ make_sandbox 5.7.12 -- --no_show -c general_log=1 -c general_log_file=general.log

I start a sandbox with MySQL 5.7.12 (tarball expanded and renamed into /opt/mysql/5.7.12), with the general log enabled. We need this to peek under the hood when we use the document store.

Next, we load the sample world_x database from the MySQL documentation page.

$ ~/sandboxes/msb_5_7_12/use  -e 'source world_x.sql'

Finally, we enable the plugin.

$ ~/sandboxes/msb_5_7_12/use  -e "install plugin mysqlx soname 'mysqlx.so'"

Now we can connect the shell:

$ docker run -it --net host mysql/shell -u msandbox -pmsandbox world_x
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating an X Session to msandbox@localhost:33060/world_x
Default schema `world_x` accessible through db.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help', '\h' or '\?' for help.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.
mysql-js>

What have we done? We use the network named 'host', which is a standard Docker protocol that lets a container use the host environment. We don't need to specify a port, since the shell assumes 33060 (enabled by the X-Plugin). The username and password are the usual ones for a sandbox. We enter inside a Javascript shell, where we can communicate with the database server using an alternative syntax. Let's see what we have:

  • We have an "X-Session" using port 33060 and working on database world_x;
  • There is a help, same as in the MySQL client;
  • The database world_x is accessible through the variable db.
  • Note: all the commands used below are the same for Python and Javascript. There are differences only when using the language extensively.

With the above elements, we can try getting data from the database.

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>
}

mysql-js> db.tables
{
    "City": <Table:City>,
    "Country": <Table:Country>,
    "CountryLanguage": <Table:CountryLanguage>
}

What does it mean? Let's abandon the Javascript shell and look at the traditional client:

mysql [localhost] {msandbox} (world_x) > show tables;
+-------------------+
| Tables_in_world_x |
+-------------------+
| City              |
| Country           |
| CountryInfo       |
| CountryLanguage   |
+-------------------+
4 rows in set (0.00 sec)

Here we see 4 tables, while the Javascript console lists only 3. However, the fourth table has the same name as the "collection." Let's have a look:

mysql [localhost] {msandbox} (world_x) > desc CountryInfo;
+-------+-------------+------+-----+---------+------------------+
| Field | Type        | Null | Key | Default | Extra            |
+-------+-------------+------+-----+---------+------------------+
| doc   | json        | YES  |     | NULL    |                  |
| _id   | varchar(32) | YES  |     | NULL    | STORED GENERATED |
+-------+-------------+------+-----+---------+------------------+
2 rows in set (0.00 sec)

mysql [localhost] {msandbox} (world_x) > show create table CountryInfo\G
*************************** 1. row ***************************
       Table: CountryInfo
Create Table: CREATE TABLE `CountryInfo` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Look what we got! A JSON column with a dynamic index implemented as a virtual column. Now we can appreciate why the JSON data type was such an important thing.

Back to the Javascript shell, let's get something from the database. (You can get all the commands I am using, and much more, from the manual.)

mysql-js> db.collections.CountryInfo.find("_id='USA'")
[
    {
        "GNP": 8510700,
        "IndepYear": 1776,
        "Name": "United States",
        "_id": "USA",
        "demographics": {
            "LifeExpectancy": 77.0999984741211,
            "Population": 278357000
        },
        "geography": {
            "Continent": "North America",
            "Region": "North America",
            "SurfaceArea": 9363520
        },
        "government": {
            "GovernmentForm": "Federal Republic",
            "HeadOfState": "George W. Bush"
        }
    }
]
1 document in set (0.00 sec)

Apart from the feeling of being back in the good old times when MySQL was still playing with IPO dreams (look at the HeadOfState field in the above data), this record is a straightforward JSON document, where data that should belong to different normalized tables are bundled together in this unified view. So, we are really querying a Table that contains JSON data associated with an _id. We know because the general log lists what happens after our simple query:

SELECT doc FROM `world_x`.`CountryInfo` WHERE (`_id` = 'USA')

Let's try a more complex query. We want all countries in Oceania with a population of more than 150,000 people, and whose Head of State is Elisabeth II. The query is a bit intimidating, albeit eerily familiar:

mysql-js> db.collections.CountryInfo.find("government.HeadOfState='Elisabeth II' AND geography.Continent = 'Oceania' AND demographics.Population > 150000").fields(["Name", "demographics.Population","geography.Continent"])
[
    {
        "Name": "Australia",
        "demographics.Population": 18886000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "New Zealand",
        "demographics.Population": 3862000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "Papua New Guinea",
        "demographics.Population": 4807000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "Solomon Islands",
        "demographics.Population": 444000,
        "geography.Continent": "Oceania"
    }
]
4 documents in set (0.00 sec)

Here is the corresponding SQL query recorder in the general log:

SELECT JSON_OBJECT(
    'Name', JSON_EXTRACT(doc,'$.Name'),'demographics.Population', \
    JSON_EXTRACT(doc,'$.demographics.Population'),'geography.Continent', \
    JSON_EXTRACT(doc,'$.geography.Continent')
) AS doc 
FROM `world_x`.`CountryInfo` \
WHERE (
    ((JSON_EXTRACT(doc,'$.government.HeadOfState') = 'Elisabeth II') \
    AND (JSON_EXTRACT(doc,'$.geography.Continent') = 'Oceania')) \
    AND (JSON_EXTRACT(doc,'$.demographics.Population') > 150000)
    )

I am not sure which one I prefer. The SQL looks strange, with all those JSON functions, while the Javascript command seems more readable (I had never thought I would say what I have just said!)

Enough with reading data. I want to manipulate some. I'll start by creating a new collection.

 mysql-js> db.createCollection('somethingNew')
 <Collection:somethingNew>

And the general log shows what should not be a surprise, as we have seen a similar structure for CountryInfo:

CREATE TABLE `world_x`.`somethingNew` (doc JSON, \
_id VARCHAR(32) \
    GENERATED ALWAYS AS (JSON_UNQUOTE(JSON_EXTRACT(doc, '$._id'))) \
    STORED NOT NULL UNIQUE
) CHARSET utf8mb4 ENGINE=InnoDB

Now, to the data manipulation:

mysql-js> mynew=db.getCollection('somethingNew')
<Collection:somethingNew>

The variable mynew can access the new collection. It's a shortcut to avoid db.collections.somethingNew

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}
mysql-js> mynew.find()
Empty set (0.00 sec)

As expected, there is nothing inside the new collection. Now we enter a very minimal record.

mysql-js> mynew.add({Name:'Joe'})
Query OK, 1 item affected (0.01 sec)

mysql-js> mynew.find()
[
    {
        "Name": "Joe",
        "_id": "e09ef177c50fe6110100b8aeed734276"
    }
]
1 document in set (0.00 sec)

The collection contains more than what we have inserted. There is an apparently auto-generated _id field. Looking at the general log, we see that the data includes the new field.

INSERT INTO `world_x`.`somethingNew` (doc) VALUES ('{\"Name\":\"Joe\",\"_id\":\"e09ef177c50fe6110100b8aeed734276\"}')

As you can see, an _id field was added automatically. We could override that behavior by providing our own value:

mysql-js> mynew.add({_id: "a dummy string", Name:"Frank", country: "UK"})

The data inserted now includes the _id filed with our manual value. The general log says:

INSERT INTO `world_x`.`somethingNew` (doc) VALUES ('{\"Name\":\"Frank\",\"_id\":\"a dummy string\",\"country\":\"UK\"}')

The value of _id, however, must be unique, or the engine will generate an error:

mysql-js> mynew.add({_id: "a dummy string", Name:"Sam", country: "USA"})
MySQL Error (5116): Document contains a field value that is not unique but required to be

If all this gives you a sense of deja-vu, you're right. This feels and smells a lot like MongoDB, and I am sure it isn't a coincidence.

Synchronizing operations

As our last attempt for the day, we will see what happens when we manipulate data in SQL and then retrieve it in Javascript or Python.

We leave the JS console open, and we do something in SQL

mysql [localhost] {msandbox} (world_x) > drop table somethingNew;
Query OK, 0 rows affected (0.01 sec)

How does it look like on the other side?

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}
mysql-js> db.getCollections()
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}

Oops! mysqlsh didn't get the memo! It still considers somethingNew to be available.

mysql-js> db.collections.somethingNew.find()
MySQL Error (1146): Table 'world_x.somethingNew' doesn't exist

We need to refresh the connection. Unlike the SQL client, you need to specify the connection parameters.

mysql-js> \connect msandbox:msandbox@localhost:33060/world_x
Closing old connection...
Creating an X Session to msandbox@localhost:33060/world_x
Default schema `world_x` accessible through db.

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>
}

We can see the same happening when we create a new table in SQL. The session in mysqlsh keeps showing the cached contents, and we need to refresh the session to see the changes. Looking at the general log, there are no changes when we issue commands asking for metadata, such as db.collections or db.tables. Instead, when we refresh the session, we see this:

SELECT table_name, COUNT(table_name) c FROM information_schema.columns \
   WHERE ((column_name = 'doc' and data_type = 'json') 
   OR (column_name = '_id' and generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))')) \
   AND table_schema = 'world_x' GROUP BY table_name HAVING c = 2
SHOW FULL TABLES FROM `world_x`

The first query lists all tables that contain a JSON document and a generated _id (these are the collections). The second one lists all tables. Then the shell removes from the table list all the ones that were in the collections list.

Given the way it is done, we can cheat the system easily by creating something that looks like a collection, but has extra fields:

CREATE TABLE strangedoc (doc JSON, \
_id VARCHAR(32) \
    GENERATED ALWAYS AS (JSON_UNQUOTE(JSON_EXTRACT(doc, '$._id'))) \
    STORED NOT NULL UNIQUE,
    secret_stash varchar(200),
    more_secret_info mediumtext
) CHARSET utf8mb4 ENGINE=InnoDB;

mysql [localhost] {msandbox} (world_x) > insert into strangedoc (doc,secret_stash,more_secret_info) values \
('{"_id": "abc", "name": "Susan"}', \
'and now for something completely different', \
'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.');
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (world_x) > select * from strangedoc\G
*************************** 1. row ***************************
             doc: {"_id": "abc", "name": "Susan"}
             _id: abc
    secret_stash: and now for something completely different
more_secret_info: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
1 row in set (0.00 sec)

And the Javascript console will be unaware of the extra material:

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "strangedoc": <Collection:strangedoc>
}
mysql-js> db.strangedoc.find()
[
    {
        "_id": "abc",
        "name": "Susan"
    }
]
1 document in set (0.00 sec)

We can add contents to the collection in Javascript, and the database server won't protest (provided that the extra fields are nullable or have a default value). Is it a bug or a feature?

Parting thoughts

As I have said at the beginning, this is a very simple exploration. More work is required to test the full potential of the new model. My impressions are mildly positive. On one hand, it's an exciting environment, which promises to expand to better usefulness with more programming languages and possibly better coordination between shell and server software. On the other hand, there are many bugs, and the software is still very green. It will require more iterations from the community and the development team before it could be trusted with important data.

Vote on Planet MySQL