Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Finding the largest tables on MySQL Server

February 4, 2008

Author

Peter Zaitsev

Insight for DBAs

Share this Post:

Finding the largest tables on a MySQL instance is a no brainer in MySQL 5.0+ thanks to Information Schema, but I still wanted to post a little query I use for the purpose so I can easily find it later. Plus it is quite handy in a way it presents information:

SELECT CONCAT(table_schema, '.', table_name),
       CONCAT(ROUND(table_rows / 1000000, 2), 'M')                                    rows,
       CONCAT(ROUND(data_length / ( 1024 * 1024 * 1024 ), 2), 'G')                    DATA,
       CONCAT(ROUND(index_length / ( 1024 * 1024 * 1024 ), 2), 'G')                   idx,
       CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 * 1024 ), 2), 'G') total_size,
       ROUND(index_length / data_length, 2)                                           idxfrac
FROM   information_schema.TABLES
ORDER  BY data_length + index_length DESC
LIMIT  10;
+-------------------------------------+--------+--------+--------+------------+---------+
| concat(table_schema,'.',table_name) | rows   | data   | idx    | total_size | idxfrac |
+-------------------------------------+--------+--------+--------+------------+---------+
| art87.link_out87                    | 37.25M | 14.83G | 14.17G | 29.00G     |    0.96 |
| art87.article87                     | 12.67M | 15.83G | 4.79G  | 20.62G     |    0.30 |
| art116.article116                   | 10.49M | 12.52G | 3.65G  | 16.18G     |    0.29 |
| art84.article84                     | 10.10M | 10.11G | 3.59G  | 13.70G     |    0.35 |
| art104.link_out104                  | 23.66M | 6.63G  | 6.55G  | 13.18G     |    0.99 |
| art118.article118                   | 7.06M  | 10.49G | 2.68G  | 13.17G     |    0.26 |
| art106.article106                   | 9.86M  | 10.19G | 2.76G  | 12.95G     |    0.27 |
| art85.article85                     | 6.20M  | 9.82G  | 2.51G  | 12.33G     |    0.26 |
| art91.article91                     | 8.66M  | 9.17G  | 2.66G  | 11.83G     |    0.29 |
| art94.article94                     | 5.21M  | 10.10G | 1.69G  | 11.79G     |    0.17 |
+-------------------------------------+--------+--------+--------+------------+---------+
10 rows in set (2 min 29.19 sec)

SELECT CONCAT(table_schema, '.', table_name),

CONCAT(ROUND(table_rows / 1000000, 2), 'M') rows,

CONCAT(ROUND(data_length / ( 1024 * 1024 * 1024 ), 2), 'G') DATA,

CONCAT(ROUND(index_length / ( 1024 * 1024 * 1024 ), 2), 'G') idx,

CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 * 1024 ), 2), 'G') total_size,

ROUND(index_length / data_length, 2) idxfrac

FROM information_schema.TABLES

ORDER BY data_length + index_length DESC

LIMIT 10;

+-------------------------------------+--------+--------+--------+------------+---------+

+-------------------------------------+--------+--------+--------+------------+---------+

| art87.link_out87 | 37.25M | 14.83G | 14.17G | 29.00G | 0.96 |

| art87.article87 | 12.67M | 15.83G | 4.79G | 20.62G | 0.30 |

| art116.article116 | 10.49M | 12.52G | 3.65G | 16.18G | 0.29 |

| art84.article84 | 10.10M | 10.11G | 3.59G | 13.70G | 0.35 |

| art104.link_out104 | 23.66M | 6.63G | 6.55G | 13.18G | 0.99 |

| art118.article118 | 7.06M | 10.49G | 2.68G | 13.17G | 0.26 |

| art106.article106 | 9.86M | 10.19G | 2.76G | 12.95G | 0.27 |

| art85.article85 | 6.20M | 9.82G | 2.51G | 12.33G | 0.26 |

| art91.article91 | 8.66M | 9.17G | 2.66G | 11.83G | 0.29 |

| art94.article94 | 5.21M | 10.10G | 1.69G | 11.79G | 0.17 |

+-------------------------------------+--------+--------+--------+------------+---------+

10 rows in set (2 min 29.19 sec)

I do some converting and rounding to see number of rows in millions and data and index size in GB so I can save on counting zeros.
The last column shows how much the index takes compared to the data which is mainly for informational purposes, but for MyISAM can also help you to size your key buffer compared to operating system cache.

I also use it to see which tables may be worth to review in terms of indexes. Large index size compared to data size often indicates there is a lot of indexes (so it is well possible there are some duplicates, redundant or simply unused indexes among them) or maybe there is a long primary key with Innodb tables. Of course, it also could be a perfectly fine table, but it is worth to look.

Changing the query a bit you can look for different sorting order or extra data, such as average row length, you can learn quite a lot about your schema this way.

It is also worth to note queries on information_schema can be rather slow if you have a lot of large tables. On this instance, it took 2.5 minutes to run for 450 tables.

UPDATE: To make things easier I’ve added INFORMATION_SCHEMA to the query so it works whatever database you have active. It does not work with MySQL before 5.0 still, of course 🙂

0 0 votes

Article Rating

28 Comments

Oldest

Newest Most Voted

Unomi

18 years ago

I’m sorry, but where does ‘TABLES’ come from? Do I miss something here? Is it something I’m supposed to know, but is not mentioned in the article?

I run this query against the ‘mysql’ database on 5.0.51, but mysql.TABLES cannot be found. Where should I look for the missing link?

– Unomi –

Matthew Montgomery

18 years ago

Unomi,

This is a query against information_schema.TABLES. The information_schema database was added at v5.0

Matt,

Frank Mash

18 years ago

Unomi, you need to use information_schema:

mysql> use information_schema;

Jay Pipes

18 years ago

Great stuff, Peter! I encourage you to add this to the Forge snippets repository! 🙂

Cheers,

Jay

Gabriel Menini

18 years ago

Same error here…

David Linsin

18 years ago

Awesome Peter, thanks!

Gabriel Menini

18 years ago

Frank, thanks for the tip 🙂 It’s working now.

Unomi

18 years ago

Thanks all for the tip. It’s new for me, but I wanted to have this query running… Everyday a new day to learn something!

– Unomi –

Radek

18 years ago

Example works perfectly.

Great website and great ideas, keep working dude.

gigiduru

18 years ago

Peter,

Why exactly is it taking 2 min 29.19 sec to extract that data?! I thought it’s readily available in the dictionary. And I don’t believe it’s taking a lot of processing power to transform, concatenate and order the results.
Also, this is a thing MySQL AB should work on to fix it (among other several thousands of not-fixed-yet bugs).

Author

Peter Zaitsev

18 years ago

There is no readily available data dictionary in MySQL. The information schema actually have to scan through all database and gets stats from each of the tables and put them into temporary table to have them available to you. Sometimes it may restrict the scan ie if you look for given table but in this case it has to scan through all tables.

Chris

18 years ago

Thanks Peter, this was a great help.

Yafei Qin

18 years ago

I have the same question when it takes more than half minute during check my all database, as gigiduru mentioned.
Thanks peter. 🙂

btw, a tips:
If you want to show tables only in a certain database, add a WHERE clause in the SQL
WHERE table_schema = ‘db_name’

vladislav

17 years ago

Nice. Thanks for your time and effort. Open Source is cool.

lvermilion

17 years ago

Nice post!! The only issue I have is it is not the same result from “select count(*) from “. For that matter the show table status is not the same either. The true size of the table is increasing and I can not seem to find records getting deleted anywhere. Can you explain why this is?

Here is an example.

mysql> select count(*) from table;
+———-+
| count(*) |
+———-+
| 14828558 |
+———-+
1 row in set (39.94 sec)

#########
Now to make it interesting, I will to the show table status and your SQL statement and get different results from the count(*). (Note the “show table status” results match the output of your SQL statement, if I could execute them at identical seconds).
#########

mysql> SELECT concat(table_schema,’.’,table_name),concat(round(table_rows/1000000,2),’M’) rows,concat(round(data_length/(1024*1024*1024),2),’G’) DATA,concat(round(index_length/(1024*1024*1024),2),’G’) idx,concat(round((data_length+index_length)/(1024*1024*1024),2),’G’) total_size,round(index_length/data_length,2) idxfrac FROM information_schema.TABLES ORDER BY data_length+index_length DESC LIMIT 10;
+———————————————-+——–+——-+——-+————+———+
| concat(table_schema,’.’,table_name) | rows | DATA | idx | total_size | idxfrac |
+———————————————-+——–+——-+——-+————+———+
| database.table | 14.73M | 2.08G | 0.00G | 2.08G | 0.00 |
+———————————————-+——–+——-+——-+————+———+
10 rows in set (0.15 sec)

mysql> show table status like ‘table’ G
*************************** 1. row ***************************
Name: table
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 14898977
Avg_row_length: 150
Data_length: 2237661184
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2008-10-01 09:38:00
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment: InnoDB free: 4021248 kB
1 row in set (0.06 sec)

########
I have ran your query and the show table status query yet again and get new values that are not consistent with “select count(*) from “.
########

+———————————————-+——–+——-+——-+————+———+
| concat(table_schema,’.’,table_name) | rows | DATA | idx | total_size | idxfrac |
+———————————————-+——–+——-+——-+————+———+
| database.table | 15.00M | 2.08G | 0.00G | 2.08G | 0.00 |
+———————————————-+——–+——-+——-+————+———+

mysql> show table status like ‘table’ G
*************************** 1. row ***************************
Name: table
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 15120698
Avg_row_length: 147
Data_length: 2237661184
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2008-10-01 09:38:00
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment: InnoDB free: 4021248 kB
1 row in set (0.02 sec)

mysql> select count(*) from table;
+———-+
| count(*) |
+———-+
| 14828945 |
+———-+
1 row in set (44.33 sec)

lvermilion

17 years ago

If I check mysql reference show table status seems to be very inaccurate if we take it for the number of rows, but the datalength seems to be accurate. Is your SQL statement keying off the same values that show table status is?

Rows

The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40 to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.

Stefan

14 years ago

Quick question.
I have a large MYISAM TABLE with 4 fields ( id = primary key, categID = int index, subcategID = int index, content= blob [ this field has a serialized array – avg 9KB ] ) ; this table has 5,000,000 records. The size of it is 9GB ( has serialized arrays ). An select like SELECT content FROM table WHERE categID = ‘categID’ AND subcategID = ‘subcategID’ takes ~ 4 – 5 seconds. Any idea to speed up this functionality?

Jonathan Bayer

14 years ago

Stefan,

Why not make a combined key of categID, subcategID?
CREATE INDEX index_2 on table(categID, subcategID)

Right now, if I’m reading it properly, you reference the categID index, and then scan everything in that index. If you make a combined key, it can do both at the same time.

JDS

14 years ago

How do the results of these queries compare to the size of the tables on disk? For example, using file_per_table, if the total_size column equals 1GB, how will this relate to the actual size of the table.ibd file on disk?

I understand that the size of the table.ibd file will not shrink if data is deleted. However, what I’m wondering is if there is a large discrepancy in the sizes, is this an indicator that it may be a good opportunity to run OPTIMIZE and thus reclaim some hard disk space?

Taking real-world data, I have for example a table that looks like this from the query in this article:

+—————————————+———+——–+——–+————+———+
| CONCAT(table_schema, ‘.’, table_name) | rows | DATA | idx | total_size | idxfrac |
+—————————————+———+——–+——–+————+———+
| databasename.tablename | 147.23M | 24.47G | 25.38G | 49.85G | 1.04 |

Key point: total_size is 49.85GB

On disk, that table’s .ibd file looks like this:

# cd /var/lib/mysql/databasename
# ls -lh tablename*
-rw-rw—- 1 mysql mysql 8.7K 2010-08-05 16:31 tablename.frm
-rw-rw—- 1 mysql mysql 58G 2012-04-24 13:41 tablename.ibd

On disk vs the information_schema query, there is around 9GB discrepancy. Does this have any real world meaning?

Thanks

14 years ago

Thank you very much for this handy query! I adjusted it down from gigabytes to megabytes to suit my needs and it’s working great!

Simon

14 years ago

Thanks, very helpful!

kavitha

13 years ago

Mysql sever version 5 and above has all these posiblities to fetch data from information_schema. But the lower versions like 4.X.X dont have information_schema. Any idea to get the memoy of each schema and largest tables under each schema??

Pavel

12 years ago

Thank you boy!

Rahul Kadukar

12 years ago

Hi,

This does not give exact results, if you are running innoDB. Any suggestions ?

Jason

11 years ago

Awesome man I was looking for exactly this to find out which tables were being bad little piggies! Thanks!!!

Vedavrat

10 years ago

Thank you!

I rewrite your query.
Such way

SELECT CONCAT(table_schema, ‘.’, table_name),
CONCAT(ROUND(table_rows / 1, 0), ‘ r.’) rows,
CONCAT(ROUND(data_length / (1024*1024), 1), ‘ MB’) data,
CONCAT(ROUND(index_length / (1024*1024), 1), ‘ MB’) idx,
CONCAT(ROUND((data_length+index_length) / (1024*1024), 1), ‘ MB’) total,
ROUND(index_length / data_length, 2) idxfrac
FROM information_schema.TABLES
ORDER BY data_length + index_length ASC

seem to me to be more useful.

Vikas Arya

10 years ago

HI ,
i am using mysql server version 5.1(Free version).i have table whose size is 3.93G and this table contain ‘7240704’ rows and i want to insert 9000000 more rows into this table but at the time of insertion server gives error message like Error Code: 1114. The table ‘form_data_archive’ is full.
what should i do i am facing this problem on production server please help ASAP.