A closer look at the MySQL ibdata1 disk space issue and big tablesFernando Laudares Camargos
A recurring and very common customer issue seen here at the Percona Support team involves how to make the ibdata1 file “shrink” within MySQL. I can only imagine there’s a degree of regret by some of the InnoDB architects on their design decisions regarding disk-space management by the shared tablespace* because this has been a big frustration for many MySQL users over the years.
There’s a very old bug (“InnoDB ibdata1 never shrinks after data is removed,” Sept. 8 2003) documenting user dissatisfaction. Shortly before that issue celebrated its 10th anniversary, James Day, MySQL senior principal support engineer at Oracle, posted a comment explaining why things haven’t changed and he also offered possible alternative solutions. Maybe we’ll see it fixed in a future release of MySQL. We can only be sure that any new storage engine aiming to warrant the sympathy of MySQL users can’t make that same mistake again.
One general misunderstanding that exacerbates the problem is the belief that if we enable innodb_file_per_table then all InnoDB tables will live in their own tablespace (a “private” .ibd file), even though the manual is clear about the role this variable plays. The truth is that when you enable innodb_file_per_table it will only immediately affect how new InnoDB tables are created – it won’t magically export tables currently living in the shared tablespace into their own separate .ibd files. That can be manually accomplished at any time afterwards by running ALTER TABLE or OPTIMIZE TABLE on the target table. The “gotcha” here is that by doing so you’ll actually be using additional disk space – as much as the table size, for the newly created .ibd file. ibdata1 won’t automatically shrink: the space previously used by that table will be marked as being “free” (for internal InnoDB use) but won’t be returned to the file system.
Note: Throughout this post I make a common reference between ibdata1 and the shared (also called system) tablespace. In fact, the latter can be composed by a list of file definitions (ibdata1;ibdata2, …). Each file can have a fixed size or be specified with the autoextend parameter and have a cap limiting how big they can grow (well, actually only the last file defined in that list can). The default setting for the variable ruling how the shared tablespace is defined has it living on a file located in the datadir, named ibdata1, and configured with autoextend, hence the popular reference of a “growing ibdata1”.
A big table scenario
The shared tablespace contains more than data and indexes from InnoDB tables created inside it, such as the rollback segment for running transactions (a.k.a. “undo logs”). My colleague Miguel Angel Nieto wrote a blog post last year that explains in detail what is stored inside ibdata files, why they can grow bigger than the sum of data from the tables it hosts and won’t “shrink,” and explains the only real way to reclaim unused disk space to the file system.
But there’s one scenario where the ibdata1 file grows in a “legitimate” way: When it’s storing a big table. If there’s a big table living inside the shared tablespace accounting for most of its size then there’s no “shrinking” it. We can argue this would run counter to best practices (or not) but let’s remember that innodb_file_per_table used to be disabled by default. The first releases of MySQL 5.5 had it enabled by default but then the later ones had it disabled. We find the exact opposite happening with MySQL 5.6 – it is enabled in the later versions.
If a developer who is not a database specialist much less a MySQL expert creates a successful product around a single table and was “unlucky” enough to be using a MySQL release that had innodb_file_per_table disabled by default, he or she might soon find a Terabyte table living inside ibdata1 (BTW, I recommend Bill Karwin’s excellent “How to Avoid Common (but Deadly) MySQL Development Mistakes” recorded webinar and slides for all developers using MySQL).
What to do then ?
We recently helped a customer with such a problem. They found themselves running out of disk space with a steadily growing ibdata1 file containing data from a 1.5 Terabyte table. The replicas weren’t able to keep up either, and binary logs where pilling up, thus contributing to more disk space use. They had access to a network mounted partition in their main master server but the storage was quite slow. We started looking for possible solutions to their case and the team came up with the following list of “solutions”:
1) Add one or more disks to the system. Hopefully they had the datadir lying inside a LVM partition using the XFS filesystem so the perspective of simply increasing it’s size by adding more disk space was a good one to contemplate.
2) Re-arranging files in different partitions. The binary logs were stored inside the datadir and got to use a considerable amount of disk space so moving them to another partition would free a few hundreds gigabytes already. The complicated part here was that the only available alternative storage was the slow network one, and that would contribute to make the replicas delay yet more. Later we contemplated adding a second file (ibdata2) to the shared tablespace to be located in a different partition but, again, the only one with space available was the slow network mount.
3) Archive data. Delete a large amount of old data from the table (possibly using pt-archiver) and then do a mysqldump dump and restore the table with less data; or simply leave it as is – even though the freed space won’t be reclaimed by the filesystem it will be made available internally for InnoDB to re-use it. Be aware that deleting data is a slow process; if you have a lot of old unused data then it might be more efficient to dump the data you want to preserve, truncate the table, and then import the data back to it.
4) Convert the table to MyISAM. This could be seem as a polemic solution: it certainly shouldn’t be taken lightly as a change of storage engine is a big deal. A giving application that works with InnoDB might simply not do with MyISAM. While we do not normally recommend MyISAM, it does take less space to store data than InnoDB. Be aware that MyISAM is not a transactional storage engine, and if you crash, you will have to perform a lengthy repair process on the table. Also keep in mind that after converting the table to MyISAM you would have to either do a dump & restore of the full database to have ibdata1 shrink OR make sure you have all InnoDB tables converted to MyISAM, stop MySQL, remove all ib* files, and start MySQL again to have it recreate ibdata1 with its default size.
5) Try compressing the table. Rebuild the InnoDB table with file per table enabled and ROW_FORMAT=Compressed. If replicas are already lagging behind, replication may not be able to keep up when compressed tables are used. Also, if you have a very high write volume, it probably won’t perform well enough. If that is the case, this solution is not a viable one. However, if you would consider switching to the new TokuDB storage engine you will count with much better compression than InnoDB overall, and also better performance for write intensive workloads.
Of course, from all the outlined solutions the first 2 look to be by far the least invasive ones. The problem is that not always the server has free slots waiting for the addition of an extra disk, plus the arrangement around RAID setups brings extra constraints. Moving the binary logs to a different partition could be handy but it’s not always practical or possible at all, as is the case of making more space available by deleting/archiving data and extending the shared tablespace by adding a new ibdata2 file residing in a different partition. Finally, the other solutions involving converting the table to a different storage engine and compressing were not an option in the case of this particular customer because they required the use of additional disk space in the process, which they didn’t had.
What the customer ended doing was dumping the whole data and importing it into another server, compressing the table in the process. And then re-importing it back into the main server. It took time to complete and it wasn’t an option they were keen to try at first. But sometimes there’s just no easy way out of a problem like this. They made changes on the replicas to allow them to keep up with replication even when using compressed tables and the problem was solved for now, even though they’re aware the disk space they recovered won’t be enough in the long term.
The most important take-away here is to never get yourself into the situation of getting too close to running out of disk space. The best solutions available to minimize disk space being used by InnoDB tables inside the shared tablespace usually require the (temporary, if you intend to do a dump & restore afterwards) use of yet more disk space. And the bigger the table the longer it will take for dump & restore and table conversions so if you’re running out of time it only make things more complicated.
This case made me curious to explore some of the other options mentioned above a little further. In a followup post I’ll share some of my findings.