Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live. Register

Downloads

Blog

Working with many files and file system fragmentation

March 18, 2008

Author

Peter Zaitsev

Hardware and Storage

Share this Post:

Working on performance optimization project (not directly MySQL related) we did a test – creating 100 files writing 4K in the random file for some time and when checking the read speed on the files we end up with, compared to just writing the file sequentially and reading it back.

The performance difference was huge – we could read single file at 80MB/sec while fragmented files only deliver about 2MB/sec – this is a massive difference.

The test was done on EXT3 and it looks like it does not do very good job preventing file fragmentation for large amount of growing files.

It would be interesting to see how other filesystems deal with this problem, for example XFS with delayed allocation may be doing better job.

I also would like to repeat the test with MySQL MyISAM tables and see how bad the difference would be for MySQL but I would expect something along those lines.

Interesting enough it should not be that hard to fix this issue – one could optionally preallocate MyISAM tables in some chunks (say 1MB) so its gets less fragmentation. Though it would be interesting to benchmark how much such approach would generally help.

Until we have this feature – reduced fragmentation is one more benefit we get with batching. For example instead of inserting rows one by one in large number of tables once can be buffered in memory (application or MyISAM memory table) and flushed to the actual tables in bulks.

0 0 votes

Article Rating

10 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Artem Russakovskii

18 years ago

I’ve been working with xfs a lot on the storage servers and fragmentation has been a huge problem with it too, though xfs prides itself in not needing defragmentation (at least as much as some other filesystems). Until fragmentation was sorted out, the load on the box would spike to 100+ during an otherwise reasonable load. I’ve used xfs_fsr to defrag and ended up cronning it to run for 30 seconds every 15 minutes. I suspect the issue may be related to nfs3 writing to it in relatively small chunks (as it’s a stateless protocol) and not xfs itself.

As far as the benchmarks above, I’m curious to see a similar one with reiser, if you find time and ext4 when it comes out.

Bill

18 years ago

It would also be interesting to repeat the test with the files on those new solid state drives, to see how much of an improvement you could get.

Author

Peter Zaitsev

18 years ago

Oh yes. SSD reduce seek penalty dramatically so fragmentation should be much less issue for them.

This is multiple Terabyte data storage project so it is not for SSD yet.

brian

18 years ago

Are you sure the speed difference is caused by fragmentation of the filesystem on the disk, and not due to the single large file being in the buffer cache? It might be interesting to use the following to see what’s in the buffer cache:

http://net.doit.wisc.edu/~plonka/fincore/

http://insights.oetiker.ch/linux/fadvise.html

tgabi

18 years ago

I’ve noticed this long time ago. I’m fighting this using different measures: a. small tables are optimized periodically b.big tables that are affected by fragmentation can be arranged as 1 file per filesystem c. recently Mysql 5.1 table partitioning allowed to use SSD for most recent data (most used) and disk for historical data (less used). Index files are the most prone to fragmentation – since 1 record inserted/updated can trigger updates on several indices. No filesystem is immune to fragmentation unfortunately, the only way to reduce it is to have some table options that will pre-allocate space in large chunks (like 1GB data, 1 GB index).

Author

Peter Zaitsev

18 years ago

Thanks tgabi,

If you have updates you may be suffering from internal fragmentation in addition to file system fragmentation – such as rows may get more than one piece in MySQL sequential pages in index order happen to be in different locations etc.

Good you agree on pre allocation option though I think even much smaller values such as 1MB or 4MB would make things much better.

Author

Peter Zaitsev

18 years ago

Brian,

Thanks for heads up I’ve written another blog post on those tools.

I’m quite sure that was not caching because number at VMSTAT matches these quite closely.

Markus

18 years ago

Be aware if you use XFS/ReiserFS on PC-Hardware, you risk loss of data in case of power failure:

http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc

Daniel Schneller

18 years ago

Some time ago (2006) I noticed that even on a fully defragged NTFS drive newly created InnoDB data files got scattered around the whole partition.
http://jroller.com/dschneller/entry/strange_innodb_fragmentation

Author

Peter Zaitsev

18 years ago

Daniel,

I’m not quite sure from your post – did you create the tablespace out of many files and it was already badly fragmented or it was autoextend during some workload run ?

In any case we can learn filesystems may not be as optimal as we would like them to be 🙂

Resources

PostgreSQL

November 21, 2025

Anil Joshi

How to Deploy a Stand-By/Ad-Hoc Cluster Based on Percona Operator for PostgreSQL

PostgreSQL

October 31, 2025

Anil Joshi

How to Configure pgBackRest Backups and Restores in PostgreSQL (Local/k8s) Using a MinIO Object Store

MySQL

March 1, 2025

Rituja Borse

InnoDB Performance Optimization Basics

Far
Enough.

Said no pioneer ever.

Get Started

Open source database software from experts who stand with you in production. Forever free from lock-in and other corporate BS.

Connect

Privacy

Legal

Security Center

MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.

Working with many files and file system fragmentation

How to Deploy a Stand-By/Ad-Hoc Cluster Based on Percona Operator for PostgreSQL

How to Configure pgBackRest Backups and Restores in PostgreSQL (Local/k8s) Using a MinIO Object Store

InnoDB Performance Optimization Basics

Far Enough.

Far
Enough.