October 23, 2014

Large result sets vs. compression protocol

mysql_connect() function in PHP’s MySQL interface (which for reference maps to mysql_real_connect() function in MySQL C API) has a $client_flags parameter since PHP 4.3.0. This parameter is barely known and almost always overlooked but in some cases it could provide a nice boost to your application.

There’s a number of different flags that can be used. We’re interested in a specific one, MYSQL_CLIENT_COMPRESS. This flag tells the client application to enable compression in the network protocol when talking to mysqld. It reduces network traffic but at the cost of some CPU time: server has to compress the data and client has to decompress it. So there’s little sense in using it if your Web application is on the same host as the database.

When the database is on a dedicated server then compression essentially means trading CPU time (on both server and client) for network time. Obviously, if the network is fast enough, the benefit in network time will not outweight the loss in CPU time. The question is, where exactly does the border lie?

It turns out that 100 Mbit link (with 1.4 ms round-trip time) is not fast enough. Oleksandr Typlynski, one of the Sphinx users, has conducted a benchmark, indexing 600 MB of data over 100 Mbit link. The data was textual and compressed well, reducing traffic more than 3 times. With compression, total indexing time reduced to 87 sec from 127 sec. That’s almost 1.5x improvement in total run time. MySQL query time improvement is even greater. On the other hand 1 Gbit link was fast enough; and total run time was 1.2x times worse with compression.

The bottom line: if you’re fetching big result sets to the client, and client and MySQL are on different boxes, and the connection is 100 Mbit, consider using compression. It’s a matter of adding one extra magic constant to your application, but the benefit might be pretty big.

About Andrew Aksyonoff

Andrew is the creator of the Sphinx full-text search engine, which powers Craigslist, Slashdot, Dealnews, and many other large-scale websites.

Comments

  1. So the the right result is: If your are working with large data sets use 1Gbit (or better) link or use will loose significant (at least 20%) part of performace :) (I don’t understand why to use just 100Mbit link between database and client).

  2. sorry: “or you will loose”

  3. shodan says:

    Pavel,
    well, there are different cases. Smaller DCs and/or office LANs could still offer 100 Mbit links; or the link could be 1 Gbit but busy with other stuff. Sometimes compression could help; that’s the whole point.

  4. gigiduru says:

    That’s why stored procedures and cursors were invented. And I’m not talking about MySQL’s flat text stored procedures but about real compiled SP.

  5. Yes…. of course if you’re doing this on 100Mbit it will be faster.

    Do it on 1-10Gbe and it won’t be any faster.

    It’s a shame that it’s taking most shared hosting providers until 2007 to really start offering Gbe

  6. pabloj says:

    That’s why I ranted about this missing feature in Firebird ;-)

Speak Your Mind

*