Modeling MySQL Capacity by Measuring Resource Consumptions

There are many angles you can look at the system to predict in performance, the model baron has published for example is good for measuring scalability of the system as concurrency growths. In many cases however we’re facing a need to answer a question how much load a given system can handle when load is low and we might not be able to perform reliable benchmark.

Before I get into further details I’d like to look at basics – what resources are really needed to provide resource for given query ? It surely needs CPU cycles, it may need disk IO. You may also need other resources such as network IO or memory to store temporary table, but let us ignore them for a moment. The amount of resources system has will place a limit on amount of queries system can ran, for example if we have query which requires 1 CPU second and 1 IO to execute and we have 16 core system with hard drive which can do 100 IOPS we will be consuming all CPU when we’re running 16 seconds per second.

Of course no system scale perfectly, so you would unlikely be able to get 16 queries per second on such system. There are internal scaling aspect of the system which include both latching as well as inevitable application specific scalability restrictions, such as row level locks. There is also a load aspect – “random arrivals” tell us the number of work system has to do will vary significantly all the time. Baron’s model deals with some of these pretty well and for sake of this discussion we will diminish it to workload and hardware specific constant. For example we can use factor of 0.7 and state we can
safely run 0.7*16 ~= 12 queries per second. For some ideas about what factor may make sense for your system check out Thinking Clearly About Performance by Cary Millsap.

So how we can use this data to estimate capacity of MySQL system ? We can look at CPU and IO consumption per Query and compare it to estimated (or benchmarked) system performance to provide our estimates.

If we’re running Innodb with MySQL we can use Innodb_data_reads , Innodb_data_writes, Innodb_os_log_fsyncs for your disk IO estimation. When you can divide it per number of “Questions” or “Com_select” to get amount of IO per query or per Select. It is good to check it over certain intervals – some workloads will have this as a very stable value for others it might go back and forth a lot.

How to get CPU consumption per query ? You can take a look at procfs for MySQL process:

The #14 and #15 here is kernel and user CPU usage of MySQL process in 1/100 of the second. (This is pretty idle test system). So 347 and 5303 correspond to 3.47 seconds of user time and 53.03 system time consumed by the process. Collecting these at regular intervals and correlating to number of queries running will give average CPU usage per query.

If you’re running Percona Server you can get the value from User Statistics

In this case I can see this user took 49 CPU seconds per 181243 select queries which is about 270us per select query. We can also get “BUSY TIME” here, subtracting CPU time from it we get “Wait Time”
which is in this case 775-49=726 seconds or 4005us per select. Wait time is often IO (which you can see separately through number of IOPS) but it also can be row level locks, etc. The ratio between Wait Time and CPU time is very helpful to see how “wait free your system is”. If it system is already have low wait ratio increasing amount of memory for example is unlikely to help.

One helpful way to use this information is to compare systems with different memory amount having same workload. You will often see increasing amount of memory not only helps you to reduce wait time and number of IOs per query as well as increase CPU time spent. This is because IO handling requires significant number of CPU.

With Percona Server, enabled full query logging and log_slow_verbosity=full you can also get great amount of related data from mk-query-digest report:

In this case I can see there is average time per query is 5ms; it is requiring in average 0.54 read operation per second which takes 4ms which adds up pretty well. We can also see what the query in average examines 20 rows, which means about 1 IO per 40 rows… which amounts to pretty IO bound load for me.

But average is only average. It is a lot more interesting to look at Per-query information from mk-query digest (I omit queries text for client privacy)

We can see this query takes 2ms to respond in average. Most of which is taken by IO and also what this query takes 0.28 IOs per query in average. It is also simple query which touches less than 1 row in average which makes it very IO bound.

So what If I am planning for load growth and need to have system handle another 1000 of such queries per second ? I will need to do another 280 reads per second which you can use to guess whenever current IO subsystem can handle it or whenever it needs an increase.

The query time distribution histogram is also very interesting here we can see this query which analyzes no more than 1 row may take up to 8 io requests (could happen due to looking to undo space etc) and can take between up to 10-100 ms. The queries which are in 100us range are ones where no IO needed to happen so such histogram also gives us a good clue how many queries needed no io, needed 1 IO which was not queued (less than 10 ms) or needed more than that.

Going from this we can also estimate the cost of such of such query. Lets assume it is restricted by IO performance (which it is in this disk) and having and the cost of system which can run 1000 IOPs
costs $500 to run per month (including leasing, power etc). Such system will be able to do 2592000000 IOPS per month and (using our 0.7 factor) we have 6480000000 queries such system can run comfortably in a month. This gives us 12960000 or about 13M queries per dollar.

As a summary it is often very helpful to take a close look at your workload and get an understanding how much your queries (at least most important ones) cost you in terms of CPU and IO. From this you can very easily understand what kind of hardware will take you to reach appropriate performance, what kind of hardware provides better balance of CPU vs IO utilization as well as as simple as how much does it cost to run a query. With Cloud Computing being hot a lot of Directors would like to know the costs in “utility” model and you do not have to be in the cloud to provide them with estimates.

Share this post

Comment (1)

  • shenglin Reply

    Good post! It’s really good model to convert IO/CPU cost to money. That’s the first step for capacity plan. Once we can evaluate the tps/qps with specified hardware and platform, you can provide the capacity plan per the application requirement

    March 10, 2011 at 1:27 am

Leave a Reply