Each day there is probably work done to improve performance of the InnoDB storage engine and remove bottlenecks and scalability issues. Hence there was another one I wanted to highlight:
Scalability issues due to tables without primary keys
This scalability issue is caused by the usage of tables without primary keys. This issue typically shows itself as contention on the InnoDB dict_sys mutex. Now the dict_sys mutex controls access to the data dictionary. This mutex is used at various places. I will only mention a few of them:
- During operations such as opening and closing table handles, or
- When accessing I_S tables, or
- During undo of a freshly inserted row, or
- During other data dictionary modification operations such as CREATE TABLE, or
- Within the “Persistent Stats” subsystem, among other things.
Of course this list is not exhaustive but should give you a good picture of how heavily it is used.
But the thing is when you are mainly debugging contention related to a data dictionary control structure, you start to look off at something that is directly related to data dictionary modifications. You look for execution of CREATE TABLE, DROP TABLE, TRUNCATE TABLE, etc. But what if none of that is actually causing the contention on the dict_sys mutex? Are you aware when generating “row-id” values, for tables without explicit primary keys, or without non-nullable unique keys, dict_sys mutex is acquired. So INSERTs to tables with implicit primary keys is a InnoDB system-wide contention point.
Let’s also take a look at the relevant source code.
Firstly, below is the function that does the row-id allocation which is defined in the file storage/innobase/row/row0ins.cc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
3060 /***********************************************************//** 3061 Allocates a row id for row and inits the node->index field. */ 3062 UNIV_INLINE 3063 void 3064 row_ins_alloc_row_id_step( 3065 /*======================*/ 3066 ins_node_t* node) /*!< in: row insert node */ 3067 { 3068 row_id_t row_id; 3069 3070 ut_ad(node->state == INS_NODE_ALLOC_ROW_ID); 3071 3072 if (dict_index_is_unique(dict_table_get_first_index(node->table))) { 3073 3074 /* No row id is stored if the clustered index is unique */ 3075 3076 return; 3077 } 3078 3079 /* Fill in row id value to row */ 3080 3081 row_id = dict_sys_get_new_row_id(); 3082 3083 dict_sys_write_row_id(node->row_id_buf, row_id); 3084 } |
Secondly, below is the function that actually generates the row-id which is defined in the file storage/innobase/include/dict0boot.ic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
26 /**********************************************************************//** 27 Returns a new row id. 28 @return the new id */ 29 UNIV_INLINE 30 row_id_t 31 dict_sys_get_new_row_id(void) 32 /*=========================*/ 33 { 34 row_id_t id; 35 36 mutex_enter(&(dict_sys->mutex)); 37 38 id = dict_sys->row_id; 39 40 if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) { 41 42 dict_hdr_flush_row_id(); 43 } 44 45 dict_sys->row_id++; 46 47 mutex_exit(&(dict_sys->mutex)); 48 49 return(id); 50 } |
Finally, I would like to share results of a few benchmarks that I conducted in order to show you how this affects performance.
Benchmarking affects of non-presence of primary keys
First off all, let me share information about the host that was used in the benchmarks. I will also share the MySQL version and InnoDB configuration used.
Hardware
The host was a “hi1.4xlarge” Amazon EC2 instance. The instance comes with 16 vCPUs and 60.5GB of memory. The instance storage consists of 2×1024 SSD-backed storage volumes, and the instance is connected to a 10 Gigabit ethernet network. So the IO performance is very decent. I created a RAID 0 array from the 2 instance storage volumes and created XFS filesystem on the resultant software RAID 0 volume. This configuration would allows us to get the best possible IO performance out of the instance.
MySQL
The MySQL version used was 5.5.34 MySQL Community Server, and the InnoDB configuration looked as follows:
1 2 3 4 5 6 7 8 9 |
innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 512M innodb-flush-log-at-trx-commit = 2 innodb-file-per-table = 1 innodb-buffer-pool-size = 42G innodb-buffer-pool-instances = 8 innodb-io-capacity = 10000 innodb_adaptive_hash_index = 1 |
I conducted two different types of benchmarks, and both of them were done by using sysbench.
First one involved benchmarking the performance of single-row INSERTs for tables with and without explicit primary keys. That’s what I would be showing first.
Single-row INSERTs
The tables were generated as follows for the benchmark involving tables with primary keys:
1 |
sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test prepare |
This resulted in the following table being created:
1 2 3 4 5 6 7 8 |
CREATE TABLE `sbtest1` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB |
While the tables without primary keys were generated as follows:
1 |
sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-secondary --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test prepare |
This resulted in the tables being created with the following structure:
1 2 3 4 5 6 7 8 |
CREATE TABLE `sbtest1` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', KEY `xid` (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB |
The actual benchmark for the table with primary keys was run as follows: