MySQL Blob Compression performance benefitsPeter Zaitsev
When you’re storing text of significant size in the table it often makes sense to keep it compressed. Unfortunately MySQL does not provide compressed BLOB/TEXT columns (I would really love to have COMPRESSED attribute for the BLOB/TEXT columns which would make them transparently compressed) but you well can do it yourself by using COMPRESS/UNCOMPRESS functions or compressing/decompressing things on the client.
This choice of server side vs client size compression depends on the queries you have. If you just retrieve these BLOB/TEXT columns for say displaying it in application compressing on application is typically better idea because you will save CPU on database box and move it to the Web box which is typically easier to scale, plus you save on network traffic a bit, though this is not generally the problem.
Compression on the server gives you ability to do some server side filtering (which however would not be overly fast as each row will need to be decompressed) it also makes data easier to work with with console MySQL tools.
Some people get best of both worlds by taking COMPRESS/UNCOMPRESS functions from the MySQL code and having them work on the application side as well which gives you best of both approaches.
One thing you should aware with compression – result of COMPRESS function is always binary, which means even if you’re compressing data from TEXT column you should pick BLOB type for compressed data storage.
Before going ahead with compression I usually run some checks to see how much compression will benefit you.
SELECT AVG(LENGTH(body)) FROM data vs SELECT AVG(LENGTH(COMPRESS(body))) FROM data gives good data point. However it may not present you with true data size and performance gains, especially for Innodb tables.
With Innodb tables BLOB gets its own page (allocated outside of clustered index) if the whole row does not fit in the page (longer than about 8000 bytes in size). The whole page is allocated even if you’re just a bit short plus BLOB retrieval unless it is in cache will require an extra disk seek. Also remember each BLOB which does not fit in the main row page is given its own page or set of pages potentially requiring a lot of disk seeks for row retrieval and a lot of space wasted.
So compressing BLOB/TEXT you should also think about how many rows you will be able to shrink down so they fit back to the page or at least reduce number of BLOBs on the row which need to be stored externally.
As an example in one case I’ve seen the compression showing row sizes will reduce about 3 times after compression (some XML data) while in practice table size reduced over 5 times and performance gains were even better because much fewer rows required external storage now.
The BLOB/TEXT column compression is not for all workloads. When your database is small enough to mostly fit in the cache it can cause CPU waste if compressing on database size with almost no gains. Small BLOBs is another case when compression may not give any gains and sometimes even increase the data size rather than reducing it. However it works great for many standard Web application needs – storing blog posts, product descriptions, forum posts (if they are generally long enough) and other data sets.