Spreading .ibd files across multiple disks; the optimization that isn’t

December 25, 2010
Author
Morgan Tocker
Share this Post:

Inspired by Baron’s earlier post, here is one I hear quite frequently –

“If you enable innodb_file_per_table, each table is it’s own .ibd file.  You can then relocate the heavy hit tables to a different location and create symlinks to the original location.”

There are a few things wrong with this advice:

  1. InnoDB does not support these symlinks.  If you run an ALTER TABLE command, what you will find is that a new temporary table is created (in the original location!), the symlink is destroyed, and the temporary table is renamed.  Your “optimization” is lost.
  2. Striping (with RAID) is usually a far better optimization.  Striping a table across multiple disks effectively balances the  ‘heavy hit’ access across many more disks.  With 1 disk/table you are more likely to have the unbalance one disk overloaded, and many idle.
  3. You restrict your backup methods.  You can’t LVM snapshot across logical volumes.

Another common claim with this recommendation is that it allows you to quickly add space when running out.  LVM actually allows you to add physical volumes, and increase the size of logical volumes ;)  This is much easier to do than more one large table around.

0 0 votes
Article Rating
Subscribe
Notify of
guest

25 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Patrick Casey
Patrick Casey
15 years ago

Actually, I’d very much love for innodb to properly support this configuration.
My problem isn’t space, its “protecting” my transactional tables.

I’d like to put my transaction tables (and log files) on fast high end storage.
I’d like to put my (very large) log tables on slow SATA drives.
I’d like to guarantee that if somebody decides to table scan an 80G log file that it doesn’t impact the IO performance of my transactional tables.

A raid set isn’t going to meet criteria #3 since if I saturate the disks ripping in a log file, the won’t be IOPS available for other transactions.

Kristian Köhntopp
15 years ago

See also http://www.sun.com/blueprints/1000/layout.pdf (Sun Performance Blueprint, anno 2000). States the same, with a bit more background.

Andy
Andy
15 years ago

What about putting different tables into different databases?

In that case can I run a single MySQL instance to access the various databases, and put the different databases on different drives?

Patrick Casey
Patrick Casey
15 years ago

Perhaps I have an unusual workload, but I really do disagree with what seems to be the consensus here, to whit that that a well tuned wide RAID array is going to be the best way to deploy my IOPS to meet my business requirements.

Fundamentally I have two classes of work in the same database.

#1 I have transactional tables (and transactions against them) that have realtime requirements (ms matter)
#2 I have sporadic archival searches against totally different tables which can take as long as they take (second, minutes, hours really)

It is impossible at any practical price point for me to deploy a RAID array that gives me “enough” IOPS to satisfy group #1 if a couple of users happen to be running group #2 queries at the same time. Given the way INNODB and the underlying disk scheduler routines work, a sufficiently large surge in IOPS will always squeeze out other processes.

My “workaround” if you want to call it that is to throw enough memory at the system so that my transactional tables are A) memory mapped and B) stay there even when INNODB starts ripping group #2 queries into the scheduler.

This is really fairly old hat if you’ve been working with relational databases for a long time. For a “general case” workload with unpredictable characteristics, one big raid 10 array is the best “default” choice, but once you understand your IO loading, hand tuning your IO subsystem and putting the right tables/indexes on the right storage is a critical part in opimizing.

If you have thousands of instances of the same application running on several hundred servers, then you are not looking for a “general case” solution, but rather a very highly optimized solution to a specific problem.

Harrison
Harrison
15 years ago

At Facebook we’ve been working on adding symlink support for ibd files. I suspect it should be part of our next Facebook @ MySQL code push, but not 100% sure.

Our use case is for mixed SSD and regular disk setups where we want to put some tables on SSD and others on rotational disks. With table_statistics it is relatively easy to figure out what tables can be put where, so I think this would be more common if mainline MySQL supported table_statistics or something similar.

Patrick Casey
Patrick Casey
15 years ago

Morgan,

I agree that the symlink approach is sort of “hacky”, which is one reason we don’t use it.

What I would like to see is for the mainline database to support different data directories. Something like:

create DATA_DIRECTORY slow_directory /sata_array/mysql/data;
alter table SOME_TABLE change DATA_DIRECTORY slow_directory;
or
create table ANOTHER_TABLE DATA_DIRECTORY slow_directory;

Patrick Casey
Patrick Casey
15 years ago

Yeah, but for a variety of reasons (table locks, crash recovery, lack of transactional support just to name 3), myisam is an extreme non starter in my environment. We actually ran a mixed myisam/innodb environment in the (somewhat) distant past and switch to full innodb. We ran stats on monitoring outages vs data store and found that, for our workload at least, we have something like 3X the outages/time on myisam than on innodb so it was an easy decision to make.

Note that I’m not saying myisam is the wrong choice all of the time or will inevitably lead to more downtime in your environment, just saying that in mine we get statistically better stability off of innodb. Editorially I’d almost always look at innodb first on a new project and somebody would have to *prove* to me that myisam was the right choice, but that’s me talking off my gut, not what the data proves.

Anyway, what I want is that support in INNODB. Ideally I would prefer it to work either with file_per_table or w/o file_per_table (you could imagine two big innodb data files, one in fast_directory and one in slow_directory), but if I had to choose I’d choose file_per_table.

Justin Swanhart
15 years ago

As @Andy said, while it is unsafe to directly symlink an .ibd file, it is supported to use a symlink to an entire database directory. This can be combined with views to move physical access to other devices while logically allowing you to access the table in the other database.

For example:

–This database is located on RAID10 and that we already have tables in it,. Also assume that we just added SSD
–and only want to move certain tables there. Last, assume the application can’t be rewritten to access another database
–schema or database server.
create database the_db;

–This database is going to be created on SSD
create database fast_disk;
— Then, in the unix shell, move the directory to SSD and create a symlink to the directory
— cd /var/lib/mysql
— cp -p fast_disk /mnt/path/to/ssd
— rmdir fast_disk
— ln -s /mnt/path/to/sdd/fast_disk .
— Now the `fast_disk` database resides on SSD

All tables created in `fast_disk` will be created on the SSD. ALTER TABLE will work correctly, etc.

Now consider that there are two sets of tables in the_db, transactional tables and log tables, and the application thinks the log tables are in `the_db`. We know that they are actually in `fast_disk` which will cause a ‘table not found error’ in the application. What we need to do is find a way to make the table appear to be in `the_db` but actually access `fast_disk`, without using a .ibd symlink.

To trick the appication we can use a VIEW (using the MERGE algorithm) which simulates a database ‘SYNONYM’:

create table the_db.trx1(c1 int auto_increment, c2 int, c3 int, primary key(c1))engine=innodb;
create table the_db.trx2(c4 int auto_increment, c1 int, key(c1), c5 int) engine=innodb; — c1 is FK to t1

create table the_db.log(log_id int auto_increment primary key, log1 int, log2 char(10), log3 int) engine=innodb;

To move the_db.log to `fast_ssd`:

RENAME TABLE the_db.log to `fast_ssd`.log;
CREATE ALGORITHM=MERGE VIEW the_db.log as select * from `fast_ssd`.log;

The database will treat the the_db.log VIEW as a table with respect to SQL. The MERGE view algorithm will ensure that this doesn’t create performance problems. MySQL will simply access the base table (`fast_ssd`.log) instead of the view since there is no aggregation in the view.

Baron Schwartz
15 years ago

Patrick, I think we’re all lined up behind you. We want InnoDB to support these kinds of options. Someday I hope it will.

Justin Swanhart
15 years ago

Yes, I didn’t want to leave the impression that I think that DATA_DIRECTORY would be a bad idea for an InnoDB table – quite the opposite. I just wanted to let you know of a workaround that will allow normal DML on the table. ALTER won’t work, but at least it wont put a copy of the table on the incorrect device.

Steve Jackson
Steve Jackson
15 years ago
Yzmir Ramirez
Yzmir Ramirez
15 years ago

‘@Patrick Casey

You could have an environment with both InnoDB and MyISAM by having read-only slaves that are MyISAM versions of your InnoDB tables; this would mean you could use DATA DIRECTORY and INDEX DIRECTORY.

Céd
Céd
15 years ago

It seems like facebook patch can preserve symlinks : http://www.facebook.com/note.php?note_id=478421335932

Peter Zaitsev
Admin
15 years ago

Oh,

This is fun topic on so many angles. First not every RAID is a good as you would imagine and because of poor RAID (or its configuration) people are able to get a lot more by manually having many volumes and placing tables on them.

Also this is often used as premature optimization – doing a lot of labor intensive tossing data around when RAID would just work fine. Having said that there are reasons when you want to have different tables on different devices, and doing it per database is often better choice until Innodb supports data directory/index directory or preserves symlinks. Different kind of storage is the main reason – having some very large rarely accessed tables on SATA or having very hot tables on flash can be good

jackbillow
jackbillow
15 years ago

split .ibd file across mutiple disks will increase the cost of maintenance.

Patrick Casey
Patrick Casey
15 years ago

Optimizations almost always involve a tradeoff between simplicity and performance.

If configuration A is always easier to maintain and runs faster than configuration B, then everyone will run A (it’ll be the default).

The nature of hand optimizations is almost always that the “default” isn’t behaving appropriately in a particular use case so we’re will to implement something more complex/harder to maintain/etc in the name of performance, fault tolerance, or some other business critical metric.

The challenge in my experience is that performance tuning is sort of sexy, at least more so than normal DBA/Sysadmin duties. So there’s a lot of unnecessary performance tuning done out there by people because its “fun”, resulting in frail systems that don’t perform materially better than a default deployment but still manage to be excessively complex. The same thing, for what its worth, is true in software design where there’s an axiom that runs something like “premature optimization is the result of all evil”.

Anyway, my point I suppose is that there is a time and a place for hand optimization (and clearly I think I’m at that point in my largest deployment), but I also agree with the consensus here which seems to be “be very, very, sure you A) really need it and B) can maintain it before you start doing exotic performance tuning”.

Baron Schwartz
15 years ago

I 100% agree with this: “there’s a lot of unnecessary performance tuning done out there by people because its “fun”, resulting in frail systems”

sbester
13 years ago
purvvvv
purvvvv
10 years ago

Hello,

I have a InnoDB table and want to store data across mounted partition.
I have followed all the steps in following link:
http://dev.mysql.com/doc/refman/5.6/en/tablespace-placing.html

But it is not storing the data in /var/lib/mysql and also not giving warning and errors.

I have even posted a question on stack overflow for the same.
The link is given below.
http://stackoverflow.com/questions/34904801/changing-data-directory-over-mounted-partition-of-innodb-table

Please Help me solve the problem.

purvvvv
purvvvv
10 years ago

Hello,

I have InnoDB table and want to store the data accross mounted partition.
I have followed the steps in link below:
http://dev.mysql.com/doc/refman/5.6/en/tablespace-placing.html

but data is not storing across mount point.

I have posted a question on stack overflow,
http://stackoverflow.com/questions/34904801/changing-data-directory-over-mounted-partition-of-innodb-table

Please help me solve the problem.

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved