I promised to write little articles about all storage engines which sessions I attended during MySQL Users Conference but I did not get too far yet, too busy. So today it is time for PBXT.
I was very interested about session about PBXT because this is storage engine does not target the same general purpose transactional storage engine market a lot of other people targeting. It also has number of unusual design decisions which will position it further away.
Paul was for a while comparing PBXT to MyISAM with multi versioning and transactions and this is valid comparison for good and for bad. At its current state (as of the conference) PBXT does not offer Durable transactions (meaning you can lose committed transactions if power goes down) furthermore in case of crash database may become corrupted, just as with MyISAM, and you might need to repair tables. This is of course something which is scheduled to be fixed before stable release but performance effect of it is yet unknown – how transactional system implements logging and dirty buffers flushing has serious impact on performance.
The other gotcha which you should aware at this time is “per database transactions” – so if you modify data in number of databases you may see partial changes. Same applies to multi-versioning – if you’re using consistent reads isolation mode this consistent reads is per database not global. We’re yet to see what will happen to this in the future – for transactions there is obvious fix to use two phase commit for transactions spawning multiple databases – this might not even add too much overhead as XA already used for synchronizing with binary log.
The focus of PBXT and one if it strengths seems to be handling of the blobs which are never fragmented and handled efficiently. PBXT team even leads the project to add scalable blob streaming to MySQL which would be fun for many applications if implemented well.
It is too bad MySQL Users Conference only allowed 45 minutes for Storage Engine presentations. This is complex topic and especially newsly presented storage engines deserved more than 45 minutes. I left presentation with a lot of question about details, such as index structures, buffer management locking implementation and so on.
As we already seen in our benchmarks PBXT both performs and scales well in many read workloads. We did not check writes because this is where a lot of changes are expected to happen anyway. We also checked for CPU bound workload – with disk IO situation can be rather different.
Looking at PBXT architecture there is small row pointer file which is expected to be accessed a lot on each row read and row write. The file is taking about 8 bytes per row as I understand so it should be very small and well cached in database cache. Write policy however because important at this stage as unless you can delay and group IOs together you may end up with a lot of extra writes.
Besides row pointer each rows has fixed data part and dynamic data parts stored in separate files. Dynamic length part is stored in file called “Data Log File” which is quite confusing to my taste. So all together we have row which is stored in 3 pieces which are stored in different locations which may require a lot of IO for large data sizes. But you of course should take into account this is only worse case scenario – for many queries you will not need to touch dynamic length part and row pointer file should be cached in most cases.
As PBXT does not use pages for data storage, which tend to lead to “holes” it should have rather compact foot print in optimal case. On other hand because it leaves old versions in the data files it needs special compaction operations to keep data files compact and efficient. As I heard someone joked at the conference “Now so many years after PostgreSQL, MySQL has finally got a storage engine which needs VACUUM”
So overall PBXT is very interesting project to watch and to try out (We’re using PBXT with MySQL 5.1 in one of our projects) and if developed with same pace and dedication it will become one major storage engine for MySQL. We surely should get back soon and run more benchmarks and try out newest version.