It may be surprising when a new InnoDB Cluster is set up, and despite not being in production yet and completely idle, it manifests a significant amount of writes visible in growing binary logs. This effect became much more spectacular after MySQL version 8.4. In this write-up, I will explain why it happens and how to address the problem.
MySQL InnoDB Cluster is a modern open source, high-availability solution for MySQL. It consists of three main components: MySQL Shell (cluster setup, backups, management), MySQL Server with Group Replication (SQL, data), and MySQL Router (redirects application requests).
But why would such a cluster experience writes even when not being accessed by an application?
Unlike the other popular proxy used with MySQL, ProxySQL, MySQL Router keeps the cluster status and configuration information in the very same cluster it’s configured for, in a dedicated metadata database. Such cluster metadata is refreshed automatically, by default, every second via the metadata cache TTL mechanism. Now, every refresh action, regardless of whether the cluster state has changed or not, updates the mysql_innodb_cluster_metadata.routers table. When no state changes are to be saved, the router does a simple last check-in timestamp update via ClusterMetadata::update_router_last_check_in
function:
1 2 3 4 |
sqlstring query = "UPDATE mysql_innodb_cluster_metadata.v2_routers set last_check_in = " "NOW() where router_id = ?"; |
Which we can see on the cluster primary node via general log entries:
1 2 3 4 |
2025-08-17T11:43:25.967668Z 10151 Query UPDATE mysql_innodb_cluster_metadata.v2_routers set last_check_in = NOW() where router_id = 55 2025-08-17T11:43:31.457123Z 10160 Query UPDATE mysql_innodb_cluster_metadata.v2_routers set last_check_in = NOW() where router_id = 36 2025-08-17T11:43:33.350665Z 10165 Query UPDATE mysql_innodb_cluster_metadata.v2_routers set last_check_in = NOW() where router_id = 28 |
The v2_routers is a view pointing to the routers table, which looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE TABLE `routers` ( `router_id` int unsigned NOT NULL AUTO_INCREMENT, `router_name` varchar(265) NOT NULL, `product_name` varchar(128) NOT NULL, `address` varchar(256) CHARACTER SET ascii COLLATE ascii_general_ci DEFAULT NULL, `version` varchar(12) DEFAULT NULL, `last_check_in` timestamp NULL DEFAULT NULL, `attributes` json DEFAULT NULL, `cluster_id` char(36) CHARACTER SET ascii COLLATE ascii_general_ci DEFAULT NULL, `options` json DEFAULT NULL, `clusterset_id` varchar(36) CHARACTER SET ascii COLLATE ascii_general_ci DEFAULT NULL, PRIMARY KEY (`router_id`), UNIQUE KEY `address` (`address`,`router_name`) ) ENGINE=InnoDB AUTO_INCREMENT=60 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC |
You can say it is not a big deal, as it’s only one small table column update. Well, with the ROW binary format (the default and required for Group Replication), one column update is actually heavier in terms of binary log than you’d expect, as the whole row gets written twice. Therefore, an example decoded row-based replication (RBR) event will look like below, and this behavior is one of the known disadvantages:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# at 253 (...) /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; (...) BEGIN /*!*/; # at 432 #250817 11:43:25 server id 21426332 end_log_pos 549 CRC32 0xb150c164 Table_map: `mysql_innodb_cluster_metadata`.`routers` mapped to number 149 # has_generated_invisible_primary_key=0 # at 549 #250817 11:43:25 server id 21426332 end_log_pos 1143 CRC32 0x59a06b21 Update_rows: table id 149 flags: STMT_END_F ### UPDATE `mysql_innodb_cluster_metadata`.`routers` ### WHERE ### @1=55 /* INT meta=0 nullable=0 is_null=0 */ ### @2='' /* VARSTRING(1060) meta=1060 nullable=0 is_null=0 */ ### @3='MySQL Router' /* VARSTRING(512) meta=512 nullable=0 is_null=0 */ ### @4='cluster1-router-54bbb45b6-mz2h5' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */ ### @5='8.0.42' /* VARSTRING(48) meta=48 nullable=1 is_null=0 */ ### @6=1755429805 /* TIMESTAMP(0) meta=0 nullable=1 is_null=0 */ ### @7='{"ROEndpoint": "6447", "RWEndpoint": "6446", "ROXEndpoint": "6449", "RWXEndpoint": "6448", "MetadataUser": "mysql_router55_nfrctjh", "bootstrapTargetType": "cluster"}' /* JSON meta=4 nullable=1 is_null=0 */ ### @8='e39051c0-39f5-11f0-82f0-aa4308cfed78' /* STRING(36) meta=65060 nullable=1 is_null=0 */ ### @9=NULL /* JSON meta=4 nullable=1 is_null=1 */ ### @10=NULL /* VARSTRING(36) meta=36 nullable=1 is_null=1 */ ### SET ### @1=55 /* INT meta=0 nullable=0 is_null=0 */ ### @2='' /* VARSTRING(1060) meta=1060 nullable=0 is_null=0 */ ### @3='MySQL Router' /* VARSTRING(512) meta=512 nullable=0 is_null=0 */ ### @4='cluster1-router-54bbb45b6-mz2h5' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */ ### @5='8.0.42' /* VARSTRING(48) meta=48 nullable=1 is_null=0 */ ### @6=1755431005 /* TIMESTAMP(0) meta=0 nullable=1 is_null=0 */ ### @7='{"ROEndpoint": "6447", "RWEndpoint": "6446", "ROXEndpoint": "6449", "RWXEndpoint": "6448", "MetadataUser": "mysql_router55_nfrctjh", "bootstrapTargetType": "cluster"}' /* JSON meta=4 nullable=1 is_null=0 */ ### @8='e39051c0-39f5-11f0-82f0-aa4308cfed78' /* STRING(36) meta=65060 nullable=1 is_null=0 */ ### @9=NULL /* JSON meta=4 nullable=1 is_null=1 */ ### @10=NULL /* VARSTRING(36) meta=36 nullable=1 is_null=1 */ # at 1143 #250817 11:43:25 server id 21426332 end_log_pos 1174 CRC32 0xf014adf8 Xid = 59549 COMMIT/*!*/; # at 1174 |
The overhead will still be fairly acceptable in most cases, though. It’s just about 900 bytes per transaction:
1 2 3 4 5 6 7 8 |
mysql > select format_bytes(1174-253); +------------------------+ | format_bytes(1174-253) | +------------------------+ | 921 bytes | +------------------------+ 1 row in set (0.01 sec) |
MySQL Router 8.4 metadata enhancements
Quoting the documentation:
As of MySQL Router 8.4, the full configuration of routers bootstrapped against an InnoDB Cluster is stored in the InnoDB Cluster Metadata Schema and can be read by the MySQL Shell operation, object.routerOptions, for Cluster, ClusterSet, and ReplicaSets.
The 8.4.0 Router’s release notes also mention this major change:
As of this release, MySQL Router exposes its configuration in the Cluster metadata for all routers bootstrapped against it. This information is stored as JSON in the Cluster metadata schema and can be accessed by the MySQL Shell operation, object.routerOptions for Cluster, ClusterSet, and ReplicaSets. (WL #15649)
This extended functionality brings the related binlog overhead to another level. The attributes column’s JSON document size changes as follows in example router 8.4 vs 8.0 entries:
1 2 3 4 5 6 7 8 9 |
mysql > select router_id,version,LENGTH(attributes) from mysql_innodb_cluster_metadata.routers where router_id in (55,28); +-----------+---------+--------------------+ | router_id | version | LENGTH(attributes) | +-----------+---------+--------------------+ | 28 | 8.4.5 | 7922 | | 55 | 8.0.42 | 166 | +-----------+---------+--------------------+ 2 rows in set (0.00 sec) |
And here is how a simple check-in update query results in the RBR event size in the new major version:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# at 20390 #250817 12:03:33 server id 21426332 end_log_pos 20476 CRC32 0xbda37c6b GTID last_committed=3 sequence_number=6 rbr_only=yes original_committed_timestamp=1755432213596286 immediate_commit_timestamp=17554322135 98319 transaction_length=16453 /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; (...) BEGIN /*!*/; # at 20569 #250817 12:03:33 server id 21426332 end_log_pos 20686 CRC32 0xc2e31f2a Table_map: `mysql_innodb_cluster_metadata`.`routers` mapped to number 149 # has_generated_invisible_primary_key=0 # at 20686 #250817 12:03:33 server id 21426332 end_log_pos 36812 CRC32 0xb0d59679 Update_rows: table id 149 flags: STMT_END_F ### UPDATE `mysql_innodb_cluster_metadata`.`routers` ### WHERE ### @1=28 /* INT meta=0 nullable=0 is_null=0 */ ### @2='' /* VARSTRING(1060) meta=1060 nullable=0 is_null=0 */ ### @3='MySQL Router' /* VARSTRING(512) meta=512 nullable=0 is_null=0 */ ### @4='cluster1-router-8-4-1' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */ ### @5='8.4.5' /* VARSTRING(48) meta=48 nullable=1 is_null=0 */ ### @6=1755431013 /* TIMESTAMP(0) meta=0 nullable=1 is_null=0 */ ### @7='(8 kB document here) ### @8='e39051c0-39f5-11f0-82f0-aa4308cfed78' /* STRING(36) meta=65060 nullable=1 is_null=0 */ ### @9=NULL /* JSON meta=4 nullable=1 is_null=1 */ ### @10=NULL /* VARSTRING(36) meta=36 nullable=1 is_null=1 */ ### SET ### @1=28 /* INT meta=0 nullable=0 is_null=0 */ ### @2='' /* VARSTRING(1060) meta=1060 nullable=0 is_null=0 */ ### @3='MySQL Router' /* VARSTRING(512) meta=512 nullable=0 is_null=0 */ ### @4='cluster1-router-8-4-1' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */ ### @5='8.4.5' /* VARSTRING(48) meta=48 nullable=1 is_null=0 */ ### @6=1755432213 /* TIMESTAMP(0) meta=0 nullable=1 is_null=0 */ ### @7='(8 kB document here) ### @8='e39051c0-39f5-11f0-82f0-aa4308cfed78' /* STRING(36) meta=65060 nullable=1 is_null=0 */ ### @9=NULL /* JSON meta=4 nullable=1 is_null=1 */ ### @10=NULL /* VARSTRING(36) meta=36 nullable=1 is_null=1 */ # at 36812 #250817 12:03:33 server id 21426332 end_log_pos 36843 CRC32 0x8cfc80c2 Xid = 68929 COMMIT/*!*/; # at 36843 mysql> select format_bytes(36843-20390); +---------------------------+ | format_bytes(36843-20390) | +---------------------------+ | 16.07 KiB | +---------------------------+ 1 row in set (0.00 sec) |
To illustrate the impact better, let’s visualize (using Percona Monitoring and Management (PMM)) how the binlog writes volume increases as we add 8.4 router instances to the environment. In the graph below, there were two 8.0 routers initially, and then three more 8.4 router instances were added:
The impact may grow pretty dramatically in the case of the typical router deployment, where it’s located alongside the application on the same host, while the environment consists of dozens of application servers, like in the official recommendations:
https://dev.mysql.com/doc/mysql-router/8.4/en/mysql-router-general-using-deploying.html
It’s easy to imagine volumes of many megabytes of binlog writes per minute just related to the router’s metadata cache check-in updates!
This does not look reasonable, so let’s see how to avoid that unnecessary cost!
Binlog MINIMAL image
The easiest way to avoid storing the whole row image in the binlog RBR event is to switch from the default FULL setting to MINIMAL, via the binlog_row_image variable.
This way, only the changed column will be logged in the “after” row image, in addition to the primary key required in the “before” image! An example RBR event will now be like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
BEGIN /*!*/; # at 1182 #250817 23:48:59 server id 21426332 end_log_pos 1299 CRC32 0x221a49d0 Table_map: `mysql_innodb_cluster_metadata`.`routers` mapped to number 149 # has_generated_invisible_primary_key=0 # at 1299 #250817 23:48:59 server id 21426332 end_log_pos 1347 CRC32 0x4a91cd5f Update_rows: table id 149 flags: STMT_END_F ### UPDATE `mysql_innodb_cluster_metadata`.`routers` ### WHERE ### @1=36 /* INT meta=0 nullable=0 is_null=0 */ ### SET ### @6=1755474539 /* TIMESTAMP(0) meta=0 nullable=1 is_null=0 */ # at 1347 #250817 23:48:59 server id 21426332 end_log_pos 1378 CRC32 0xbb4730d1 Xid = 178385 COMMIT/*!*/; |
The event is now remarkably smaller! And here is how the write volume can be optimized after implementation of the binlog_row_image change:
This solution may not be suitable for all, though. The image format change is global, while some applications that track binary logs may require full row images.
Group replication notifications
Fortunately, another option was implemented: the Group Replication cluster can notify the routers asynchronously and immediately about any changes! This is thanks to worklog WL#10719, which used the MySQL X protocol to implement MySQL to Router communications.
To benefit from this optimization, it is enough to enable the use_gr_notifications option in the MySQL Router configuration.
This dramatically changes the behavior. Regarding the router’s metadata cache refreshing, the TTL setting now only plays an additional safeguard role (i.e., if the X protocol connection is down), while the actual updates are triggered via the GR notification only. No more frequent check-in timestamp updates occur, hence the overhead is down to an absolute minimum.
The Router opens an X listener defined in the following code section:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
xcl::XError GRNotificationListener::Impl::enable_notices( xcl::XSession &session, const NodeId &node_id, const std::string &cluster_name) noexcept { log_info("Enabling GR notices for cluster '%s' changes on node %s:%u", cluster_name.c_str(), node_id.host.c_str(), node_id.port); xcl::XError err; xcl::Argument_value::Object arg_obj; xcl::Argument_value arg_value; arg_value = arg_obj; arg_obj["notice"] = xcl::Argument_value::Arguments{ xcl::Argument_value("group_replication/membership/quorum_loss", xcl::Argument_value::String_type::k_string), xcl::Argument_value("group_replication/membership/view", xcl::Argument_value::String_type::k_string), xcl::Argument_value("group_replication/status/role_change", xcl::Argument_value::String_type::k_string), xcl::Argument_value("group_replication/status/state_change", xcl::Argument_value::String_type::k_string)}; auto stmt_result = session.execute_stmt("mysqlx", "enable_notices", {xcl::Argument_value(arg_obj)}, &err); if (!err) { log_debug( "Enabled GR notices for cluster changes on connection to node %s:%d", node_id.host.c_str(), node_id.port); } else if (err.error() == ER_X_BAD_NOTICE) { |
With this functionality enabled, the router’s log will confirm this mode is being used by the presence of the X listener:
1 2 3 4 5 6 7 8 9 |
2025-08-18 15:02:59 metadata_cache DEBUG [7287446f4640] Connecting GR Notices listener on cluster1-mysql-1.cluster1-mysql.default:33060 2025-08-18 15:02:59 metadata_cache DEBUG [7287446f4640] Connected GR Notices listener on cluster1-mysql-1.cluster1-mysql.default:33060 2025-08-18 15:02:59 metadata_cache DEBUG [7287446f4640] Successfully set mysqlx_wait_timeout on connection to node cluster1-mysql-1.cluster1-mysql.default:33060 2025-08-18 15:02:59 metadata_cache INFO [7287446f4640] Enabling GR notices for cluster 'cluster1' changes on node cluster1-mysql-1.cluster1-mysql.default:33060 2025-08-18 15:02:59 metadata_cache DEBUG [7287446f4640] Enabled GR notices for cluster changes on connection to node cluster1-mysql-1.cluster1-mysql.default:33060 … 2025-08-18 15:02:59 metadata_cache DEBUG [7287446f4640] Enabled GR notices for cluster changes on connection to node cluster1-mysql-0.cluster1-mysql.default:33060 2025-08-18 15:02:59 metadata_cache DEBUG [7287446f4640] Finished refreshing the cluster metadata |
Then, when a real cluster event happens, for example, a GR node is stopped, the router immediately receives the notification about, i.e.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
2025-08-18 15:13:21 metadata_cache DEBUG [7286fd7fa640] GR notification listen thread has read sth from cluster1-mysql-1.cluster1-mysql.default:33060 on fd=85 2025-08-18 15:13:21 metadata_cache DEBUG [7286fd7fa640] Got notification from the cluster. type=2; view_id=17554237821969759:8; Refreshing metadata. 2025-08-18 15:13:21 metadata_cache DEBUG [7286fd7fa640] Got message from cluster: 11 2025-08-18 15:13:21 metadata_cache DEBUG [7287446f4640] Started refreshing the cluster metadata 2025-08-18 15:13:21 metadata_cache DEBUG [7287446f4640] Updating metadata information for cluster 'e2bcda71-39f5-11f0-82f0-aa4308cfed78' … 2025-08-18 15:13:21 metadata_cache INFO [7287446f4640] Potential changes detected in cluster after metadata refresh (view_id=0) 2025-08-18 15:13:21 metadata_cache INFO [7287446f4640] Metadata for cluster 'cluster1' has 3 member(s), single-primary: 2025-08-18 15:13:21 metadata_cache INFO [7287446f4640] cluster1-mysql-0.cluster1-mysql.default:3306 / 33060 - mode=RW 2025-08-18 15:13:21 metadata_cache INFO [7287446f4640] cluster1-mysql-1.cluster1-mysql.default:3306 / 33060 - mode=RO 2025-08-18 15:13:21 metadata_cache INFO [7287446f4640] cluster1-mysql-2.cluster1-mysql.default:3306 / 33060 - mode=n/a 2025-08-18 15:13:21 metadata_cache DEBUG [7287446f4640] Finished refreshing the cluster metadata 2025-08-18 15:13:21 metadata_cache DEBUG [7287446f4640] Started refreshing the cluster metadata 2025-08-18 15:13:21 metadata_cache DEBUG [7287446f4640] Updating metadata information for cluster 'e2bcda71-39f5-11f0-82f0-aa4308cfed78' |
Below is an illustration of how enabling this feature changes the binlog write volume as compared to the default settings:
We can keep the binlog_row_image=full if needed, and there will be no costly binlog overhead from the fact of running many router instances, except when actually something happens in the cluster, like in the example below:
Summary
Running InnoDB Cluster with default MySQL Router settings, especially with versions 8.4 or higher, may bring unexpected write penalties. While increasing the metadata cache TTL value could be one option to lower the impact, it would delay routing changes, and there are much better options. In my opinion, enabling the use_gr_notifications
feature is the best solution to minimize the metadata cache handling overhead to the absolute minimum. For some reason, it is not enabled by default yet, even in the current innovation 9.4 version, but I could not find any drawbacks to using it. The only possible obstacle I see would be cases where the X protocol TCP port would not be allowed in certain network environments. Therefore, I suggested the default setting change: https://bugs.mysql.com/bug.php?id=118868.