Recently I have had a chance to check out MySQL Performance on “Memory Appliance” by Violin Memory which can be used as extremely high speed storage system.
I helped Violin Memory to optimize MySQL for customer workload and Violin memory and also had a chance to do some benchmarks on my own. 2*Quad Core Xeon running CentOS5 was tested using ext2 filesystem and SysBench tool.
Using 16K read sizes (matches Innodb page size) I could get 1.2GB/sec (80K req/sec) for reads and about 500MB/sec writes with 16 concurrent threads. Things scaled well and with 256 threads I got even a bit better performance.
Interesting enough utilization in iostat never went over few percents and load was mostly CPU bound.
Next I went on testing MySQL. My goal was to simulate as much IO as possible so I use Sysbench to get 300Mil rows table and ran primary key lookups with uniform distribution some 90% of which would need to do IO to get their data from the disk.
With Innodb results were quite poor – I got about 6500 req/sec from one thread which grew to 13000 req/sec when 8 concurrent threads were used. Compare this to 20.000 queries/sec from single thread if all data fits to bugger pool, scaling to over 80.000 queries/sec with multiple threads.
So Innodb took very high hit supplied with high performance IO subsystem. But not only performance was poor with single thread it also did not scale well with increased concurrency even though we had 8 cores available for disposal.
Unfortunately oprofile did not work on the box in question so I could not investigate where exactly CPU is wasted. My first obvious quess was Innodb checksumming (innodb_checksums=0) and indeed that allowed to get to 9000 queries/sec from single thread, however it still peaked out at 13.500 queries/sec with multiple threads which corresponds to even worse scalability. Disabling adaptive hash index could take us to 14.500 req/sec which was also a bit surprising as we’re speaking about read-only workload so it is quite surprising adaptive hash index actually hurts performance in this case.
My guess is Innodb just was not really designed and tested in such condition – normal case is to allocate cache memory to buffer pool so IOs will mostly come from drives directly which means hundreds or may be thousands of IOs per system for the whole system which allows to waste some CPU cycles handling them without taking large performance hit.
I really hope this would be one of the Items besides CPU scaling which Innodb team will put on their roadmap. With SSD coming this will be important. And it is also very easy to repeat and test – just run Innodb from the RAM drive
Next I tested MyISAM which is well known for its simplicity and so fast execution path from getting data from the drive and sending it back to the client.
I tested MyISAM with myisam_use_mmap option which did not seems to give any significant performance benefit, may be even reducing performance few percents. I also tried running with key buffer on and off. Interesting enough disabling key buffer actually worked better for this workload avoiding to avoid contention.
In the best conditions I got about 13.5K queries/sec from MyISAM (double of Innodb already) which scaled to 70.000 queries/sec peaking out at 16 threads.
However even with MyISAM we got CPU bound before we could reach the system capacity – these 70K queries/sec generated just over 50K IOs/sec while capacity was over 100K IOs/sec (more than with Innodb as these are smaller sized)
These results reiterate one important things about Innodb – it just loves to have as much data in buffer pool as possible for best performance.
Hopefully some time in the future I get a chance to do more testing with this nice hardware and may be check out other storage engines and other workloads or get some profiling results.