MySQL has information_schema.tables that contain information such as “data_length” or “avg_row_length.” Documentation on this table however is quite poor, making an assumption that those fields are self explanatory – they are not when it comes to tables that employ compression. And this is where inconsistency is born. Lets take a look at the same table containing some highly compressible data using different storage engines that support MySQL compression:
TokuDB:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: TokuDB VERSION: 10 ROW_FORMAT: tokudb_zlib TABLE_ROWS: 40960 AVG_ROW_LENGTH: 10003 DATA_LENGTH: 409722880 MAX_DATA_LENGTH: 9223372036854775807 INDEX_LENGTH: 0 DATA_FREE: 421888 AUTO_INCREMENT: NULL CREATE_TIME: 2014-10-10 07:59:05 UPDATE_TIME: 2014-10-10 08:01:20 CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: TABLE_COMMENT: 1 row in set (0.00 sec) |
Archive:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: ARCHIVE VERSION: 10 ROW_FORMAT: Compressed TABLE_ROWS: 40960 AVG_ROW_LENGTH: 12 DATA_LENGTH: 501651 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 0 AUTO_INCREMENT: NULL CREATE_TIME: NULL UPDATE_TIME: 2014-10-10 08:08:24 CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: TABLE_COMMENT: 1 row in set (0.01 sec) |
InnoDB:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: InnoDB VERSION: 10 ROW_FORMAT: Compressed TABLE_ROWS: 40660 AVG_ROW_LENGTH: 4168 DATA_LENGTH: 169480192 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 1572864 AUTO_INCREMENT: NULL CREATE_TIME: 2014-10-10 08:33:22 UPDATE_TIME: NULL CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: row_format=COMPRESSED KEY_BLOCK_SIZE=4 TABLE_COMMENT: 1 row in set (0.00 sec) |
From this we can see what Archive and Innodb show the COMPRESSED values for DATA_LENGTH and AVG_ROW_LENGTH while TokuDB shows uncompressed one (as of TokuDB 7.5.1) shipped with Percona Server 5.6.21-69.
The problem here is not only a lack of consistency but also what we need to know about BOTH numbers. We often need to know the uncompressed value to understand how much data there is really in the table, but also how much space it takes on the disk. Comparing these also can help us to understand the compression ratio (or on the contrary expansion due to storage overhead and indexes).
Looking at Information_Schema tables available I can’t find any way to find how much uncompressed data is stored in the Innodb (or Archive) table. The simple trick I can use is running the query along those lines: SELECT SUM(LENGTH(col1)+LENGTH(col2)…) FROM T – This would slightly overestimate the length converting numbers and dates to strings but it is good enough for most purposes.
TokuDB though, while providing uncompressed information in Information_schema TABLES table, allows you to get the information of real data storage on disk from its own information schema tables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
mysql> select * from information_schema.TokuDB_fractal_tree_info where table_schema='test' and table_name='comp' limit 10 G *************************** 1. row *************************** dictionary_name: ./test/comp-main internal_file_name: ./_test_sql_147e_292e_main_2c20c08_1_1b_B_0.tokudb bt_num_blocks_allocated: 125 bt_num_blocks_in_use: 125 bt_size_allocated: 1880088 bt_size_in_use: 1502232 table_schema: test table_name: comp table_dictionary_name: main *************************** 2. row *************************** dictionary_name: ./test/comp-status internal_file_name: ./_test_sql_147e_292e_status_2c20bdd_1_1b.tokudb bt_num_blocks_allocated: 4 bt_num_blocks_in_use: 4 bt_size_allocated: 8704 bt_size_in_use: 512 table_schema: test table_name: comp table_dictionary_name: status 2 rows in set (0.01 sec) |
This shows us that this table is really consisting of 2 files each “bt_size_allocated” bytes in length. These numbers are close to what you will see on the disk but not exactly. I see these files are taking 1886208 and 16384 files, respectfully. I wish there would be an exact length available to query so we do not have to think how much difference there is and if it can get large enough in some cases to care.
If you’re just looking for information about how much space has been allocated and how much is currently used for given TokuDB table you can use a query like this:
1 2 3 4 5 6 7 |
mysql> select sum(bt_size_allocated) total, sum(bt_size_in_use) used, sum(bt_size_allocated)-sum(bt_size_in_use) free from information_schema.TokuDB_fractal_tree_ +---------+---------+--------+ | total | used | free | +---------+---------+--------+ | 1888792 | 1502744 | 386048 | +---------+---------+--------+ 1 row in set (0.01 sec) |
To Sum it up – there is some consistency to improve in terms of reporting compressed and uncompressed data length information in MySQL – both in terms of consistency and information available. It is great to see that TokuDB found a way to report both compressed and uncompressed data size information, yet I would really love to see the actual size on the disk that a given table is taking. And it would be great if there was some consistent way to query it from inside MySQL without having to go to the file-system level and dealing with different ways that different storage engines place data on the file system. This becomes especially important with the coming of full tablespace support in MySQL 5.7 which would make it hard to find all matching files for the table on the filesystem.
It is also worth mentionioning that that the file system used for data may be using compression. At least it is possible with NTFS-compression on Windows
Peter,
Indeed. It might get a very complicated with advanced filesystem ie ZFS due to compression and de-duplication. not to mention various SAN/NAS solutions. I think MySQL at least should provide us information up to how much files take on the filesystem.
We reply to users frequently (last time this morning, actually) along those lines (a little simplified):
”
This query will tell the amount of storage used by MySQL
SELECT SUM(
INDEX_LENGTH
) + SUM(INDEX_LENGTH
) FROMinformation_schema
.TABLES
;However a detail to note that the numbers returned by above query is the *minimum disk storage requirements* for data and indexes The actual amount of storage used may be higher – for two reasons:
1) the storage engine will reserve blocks/chunks of disk storage when a taelspacesi filed up (chunk size depending on configuration settings).
2) when you DELETE data, the data files/tablespaces do not ‘shrink’. To return free dispace you will need to rebuild the datafiles with OPTIMIZE TABLE statement for instance.
”
.. Now, with compression in storage engine and/or file system it gets even more muddy. ’empty space inside tablespaces’ means that file system uses *more* than you’d expect. Compression on the other side means *less*. And X + more – less = unknown, basically.
Using dices may be best method currently. 🙂
Peter,
No it does not. I as mentioned in blog post TokuDB will report the _raw_ data size which takes much less on disk. Not to mention you might be looking at MEMORY table which do not have disk backing at all. That is basically the main problem
Has anything changed recently? Do we still have the same problem? No idea of knowing how big the table really is?