November 28, 2014

Notes from the Newb

Notes from the Newb.

I’m relatively new to MySQL having come from the world of embedded micro-databases, and though I’m pretty familiar with a number of database systems, I’ve discovered that I have a lot to learn about MySQL.

As a new member to the Percona team, I thought I’d have an ongoing blog theme titled “Notes from the Noob” and discuss common problems that our consultants have encountered when working with customers who are also new to MySQL. To which end, I’m going to make the assumptions that (1) you use InnoDB as your engine of choice (a database that ain’t ACID ain’t no database) and that (2) you are using the latest release, 5.6.

So last week I polled a number of our senior consultants and asked them, what are the most common mistakes made by our customers who are also new to MySQL? Overwhelming, I heard back that it was using the default settings and not configuring my.cnf to their specific workload. Having said that, one of the dangers of MySQL is that because there are literally hundreds of configurable parameters that can be set, one is inclined to start messing around and thus ending up with worse performance, rather than better.

The key to working with the MySQL configuration file is that a light touch is recommended, and generally only a handful of parameters need be re-defined.

So what are some of more important parameters one should consider changing from the default values?

First and foremost is the system variable innodb_buffer_pool_size. This defines the size of the memory pool used for caching InnoDB tables and indices and its default size is a piddling 128 MB. On a dedicated server, it’s generally a good idea to set this at 70-80% of available memory but remember that InnoDB will take up to an additional 10% for buffers and control structures. The idea is that the more memory allocated to the database, the less I/O you can expect when accessing the same data. Other issues you might want to consider when defining this value is the size of your dataset and whether you will have multiple instances of MySQL running. Be careful though if you plan to have your entire LAMP stack on a single machine, you might end up having competition for memory resources resulting in paging in the operating system thus negating the whole purpose of reducing IO.

Next up would be the system variable, innodb_log_buffer_size which can be important if you expect to be committing large transactions. The idea is that if you do have large transactions, setting this variable high enough will prevent the requirement of writing the log to disk before the transaction is committed. The default value is 8 MB, but if you expect to have larger transactions, you should definitely consider upping the value.

As expected, the innodb_log_file_size represents the size of the log files. The larger the value, the less checkpoint flush activity is needed in the buffer pool thus reducing disk IO. The downside however to a large value is that crash recovery can be slower. The default value is 48MB. As usual, you should do a reality check with regards to your actual workload. If your system is doing mostly reads, you may not need a large value, whereas if you’re storing blobs (which by the way, is generally not a good idea) you may want to have a larger value.

Another system variable to consider is the innodb_flush_log_at_trx_commit, which by default is set to 1, which means that the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file. This is great if you can’t risk losing any data, however it can also come with a serious performance penalty. If you’re able to assume more risk and are able to lose up to a second of data, you might want to consider setting this to either 0 or 2. In the case of small ‘write’ transactions, this can provide significant performance improvements.

If you don’t have many tables in your database, you should be aware of the innodb_file_per_table, which as of 5.6, has a default value of ON. This results in MySQL assigning a separate .ibd file for each table in the database. There are a number of reasons why this can be beneficial, but the primary being that you are able to reclaim disk space. One of the quirks of InnoDB tablespace is that when a record is deleted, disk space is not reclaimed. So how are you able to get back that disk space you ask? Let’s say you have a table called biggiesmall and you have deleted 90% of the records. Simple execute the query, ALTER TABLE biggiesmall ENGINE=InnoDB. This will result in a temporary file being created with only the undeleted records. Once the process in completed, the original table is removed and the temporary table is renamed as the original table.

And finally, the best settings are based upon actual runtime statistics. Monitor and record what happens with your database and base your settings upon real numbers.

About Tim Sharp

Tim joined Percona in the summer of 2013 as a MySQL Technical Account Manager. His areas of knowledge include embedded database technologies, customer service and the traditional Linux/Apache/MySQL/PHP stack.

Comments

  1. nate says:

    The “optimize table” command is easier to remember(and less typing) than the alter table, and maps transparently to the alter table, and also supports optimizing MyISAM tables as well in case you have some of them.

    Also of course MySQL isn’t alone in that it doesn’t reclaim space automagically I recall Oracle is similar (though I think with Oracle you can run a non disruptive job that reclaims it whereas in MySQL the job is disruptive). Maybe with Oracle it is completely automatic now but pulling a quote from ask tom.

    “The HWM(High water mark) is relevant since Oracle will scan all blocks under the HWM, even when they contain no data, during a full scan. This will impact the performance of a full scan especially if most of the blocks under the HWM are empty. To see this, just create a table with 1,000,000 rows (or create any table with a large number of rows), and then execute a SELECT COUNT(*) from this table. Now, DELETE every row in it and you will find that the SELECT COUNT(*) takes just as long to count 0 rows as it did to count 1,000,000. This is because Oracle is busy reading all of the blocks below the HWM to see if they contain data.”

    I remember co-workers dealing with those issues quite a bit on Oracle about 10 years ago, looks like the latest version still has the concept so I imagine it’s by design.

    I’m sure other dbs have similar things I just don’t have personal experience with them

    On our systems we’ve had to set the innodb cache to 65% memory capacity, more than that and we’ve seen the systems swapping under decent load. Perhaps the SQL that runs on our systems is just really ugly and memory intensive.

Speak Your Mind

*