October 24, 2014

How Percona does a MySQL Performance Audit

Our customers or prospective customers often ask us how we do a performance audit (it’s our most popular service). I thought I should write a blog post that will both answer their question, so I can just reply “read all about it at this URL” and share our methodology with readers a little bit. This fits well with our philosophy of openness. It also shocks people sometimes — “you’re giving away the golden goose!” Not really. What you hire us for is our experience, not a recipe that anyone can follow.

A full performance audit is far more extensive than this article can cover, and might wander into Apache, networking config, caching layers, etc. Wherever the problem is, we’ll track it down. I won’t talk about that. That’s not because I want to keep secrets from you. To the contrary, I’d love to share it all with you. But that’s a huge job; it will take many pages, and I’m not going to write that much.

The kickoff call

There’s actually a step before the performance audit begins. We call this a kickoff call. We get on the phone with the key technical staff on the client’s side and discuss the application in general. If possible, we take a brief look at the server beforehand, so we can ask more intelligent questions and skip obvious things. This call is sometimes up to an hour long, if we’re discussing a lot of things like how to build for massive scalability, how to do read-write splitting without breaking the user experience, how to take online non-blocking backups, or things like that.

But in general it can be a lot shorter than clients expect, because

  • time is money
  • your application is not as unique as you think, and we already know a lot about it by browsing your site or hearing you say “we are a SaaS hosted application for data mining stock market transactions,” or whatever the case may be
  • we’ve generally seen just about everything people throw at databases

Many people assume their application is difficult or somehow different, and that we need all kinds of schema diagrams and code listings, but the truth is people often have the same insights into particular problems, and therefore they try similar solutions. And besides, most applications have a lot of the same components. Tagging, friends, queues, click tracking, search, paginated displays — we’ve seen these and dozens of other common patterns done a hundred different ways.

The kickoff call has some other important goals too. We want to know what your concerns are at this stage. Is it the current performance, future performance, scalability, ability to recover from disasters? We also want to know what the operating parameters are. Can we suggest upgrades, rewriting queries, or whatnot? Sometimes there are rigid constraints on what types of solutions we can propose, and we need to know ahead of time so we don’t spend time on things you have no way to change. An example of this is a third-party product whose code you are unwilling or unable to change.

We also want to know the operating mode you expect us to work in. Maybe you want us to analyze and present recommendations for your consideration, and take a look-don’t-touch approach. Or, maybe you just want us to fix things and tell you later what we did. Either is a completely valid approach. You tell us what you want, and that’s what we’ll do (of course, we will also tell you what we think is best for you, which is our job).

After the kickoff call, we write some notes, and then get down to the audit itself. We usually have two people on the kickoff call so that we don’t have a single person knowing everything about you (only one is billable, though). The second person will be from the same team within Percona. However, the audit itself is generally a single person, with another person reading all the notes and generally keeping in touch with what the lead consultant does. We want two sets of eyes on things whenever possible. This also has value in case you decide to continue on to a longer relationship with Percona, which is reasonably common; in that case you will establish a solid relationship with the team, and group involvement early on is a good start to this.

What to look for during the audit

During the audit I look for outliers and anomalies in every aspect of the server’s configuration and performance and contents. I’m looking for characteristics that are much bigger or smaller than normal, or things I don’t usually see. Anything out of the ordinary or out of proportion. These things are often relatively easy to catch if you have enough experience. It’s a little hard to describe without making it sound like it’s some secret handshake, but it’s really not magical. You’re either familiar with what a server normally looks like, or you’re not. It’s just like a mechanic who listens to the car and says “I hear the valves knocking.”

In other cases there may be things that are NOT easy to catch, and may require a lot of experience and investigation. The work we do is often not simple at all. It’s impossible to predict. A customer may call and ask “how long to debug server lockups?” It could be ten minutes, or it could be a really major effort chasing down something like a bug in a hardware component, and you never know ahead of time.

In either case, the point to know here is that we can work with systems that are at any level of tuning, from completely untuned to a system that’s already had a lot of expert attention. I use “tuning” in a generic way here — we focus on far more than my.cnf files. In fact my.cnf files are usually one of the smallest levers we can pull for server optimization. Schema and query optimization can give much greater improvements, for example.

If you’re familiar with Malcolm Gladwell’s excellent book Blink, you know the term “thin-slicing.” That describes an experienced consultant’s approach to the problem very well.

The audit itself

OK, so here’s a very basic audit, whose scope I’ll keep within just a MySQL server. The first step is to log into the machine via SSH and open up a text editor (gedit, notepad, what have you.) I want to keep this open and paste everything I see into it. If there’s no record later that someone else can follow, I am not doing my job.

I would also note that the commands I’ll show here are the lowest common denominator. We often use tools to gather a lot of the data for us and speed the process, when possible, but the core commands I’ll show are what we can always fall back to if that’s not possible.

Gathering information about the server

I start off with these commands:

  • date
  • uname -a
  • cat /etc/*release

The first command is invaluable for later reference. The others will show exactly what kind of server I’m looking at, assuming I’m on a GNU/Linux box (I have to adapt to whatever environment I’m in — obviously a FreeBSD box is different). For example,

From this, I can see a couple of important things. One thing is that I’m looking at a 64-bit OS, which is important to know. I also know I’m on RHEL and I know the kernel — knowing this may tell us a lot of useful things about the system.

Next I continue looking around the server. Holding with the assumption of GNU/Linux, I keep working to get a picture of the OS and hardware.

What I see here is that this is an 8-core machine, two quad-core processors at 2GHz. I also know some things about the CPU, such as the model number and the cache size, etc. What else is important is the ‘lm’ flag, which tells me this is a 64-bit CPU. From this and what I saw earlier I can see whether someone installed a 32-bit OS on 64-bit hardware, which is important if you want to allocate a lot of memory to mysqld.

Next I look at a couple of all-in-one commands. For example, ‘top -n 1′ and ‘uptime’. That shows me what processes are running, but it also shows me load average, memory sizes, and a bunch of other stuff. In this case I’m on a machine with 32GB of memory and there is very little load. I see that mysqld is using 10g of memory with 17g virtual size, and there’s practically nothing else running (this is a dedicated master server). The machine has a lot of swap, but basically none is used. This isn’t all that relevant, as you’ll see later, but in some cases it might be.

I continue my investigation into the machine itself with these commands:

  • mount
  • df -h
  • dmesg
  • pvs; vgs; lvs

These commands show me what filesystems are mounted, how much space they have on them, and what kind of I/O system is under the box in general. dmesg is useful for several reasons. For one thing, it has a lot of information on the box’s hardware, including RAID controllers, installed memory, and so on in the boot output. In addition, at the tail it has information on core dumps and so on. It’s possible that it has filled up with stuff and the boot output has been lost, but in many cases there is a lot of information here that’s hard to get elsewhere.

If I see an LSI MegaRAID card, which is pretty common, I can check whether MegaCli is installed, and use it to get information about the physical drives installed; then I can find out exactly what they are, both in terms of size and spindle speed. I need to know how many physical spindles are under the box, and how fast they are. I can also get information about the all-important battery-backed write cache, how it’s configured and what its status is. I can see the drive status, too. Sometimes a failed drive is present but no one has noticed it yet! (By the way, we can also help you install and configure monitoring and graphing systems, if you don’t have any. These will help catch such problems.) If MegaCLI isn’t installed there may be other ways to get this information, too.

The last three commands are for giving me information about LVM devices, volume groups, and logical volumes. At this point, unless I’ve seen something that I want to investigate further, I have a pretty good idea what hardware and operating system I’m working with.

System performance

Next I investigate what the system is doing, in terms of performance. There are two key commands for this on most operating systems (there are others, such as mpstat or dstat, that I may also elect to use in some cases):

  • vmstat 5 5
  • iostat -dx 5 5

If you don’t know how to read these, there’s a full explanation in our book, High Performance MySQL 2nd Edition (it’s not in the first edition). From this I get an idea where any possible problems might be: is the disk saturated? Is the machine swapping actively? (This is more important than whether it’s using swap.) There are a lot of combinations of possible things you can see here, so I will not try to explain it all. Basically you need to know everything there is to know about these two commands and their output.

Based on what I’ve seen so far, I may look at other things, such as ifconfig (which shows good stuff like dropped packets).

Starting to look at MySQL

Assuming I don’t see anything noteworthy here, I’ll move on to the MySQL server. I mean, let’s just assume the physical hardware and the machine setup is fine, and the client has asked me to figure out why the website is slow; assume also I’ve checked out Apache and network config, etc, and I don’t see anything wrong there. So at this point, it looks like MySQL is probably a good place to look for performance problems. Everything I’ve done up till now is just due diligence; I’ve gotten my bearings on the server as a whole, and now I’m ready to see what’s up with the database.

The first step is to find out what MySQL instances are on the machine. It doesn’t do to just assume there’s a single instance in /var/lib/mysql and it’s reading from /etc/my.cnf. I’ve seen a lot of cases where there’s a /var/lib/mysql and an /etc/my.cnf, and the server is installed in /customsoftware/mysql with a different my.cnf file. The way to find this out is to look at ps:

In this case, yes it’s all default. It’s listening on the usual port, etc etc. Nothing unusual. Next I check what the MySQL version is, and I check (with the ‘file’ command) whether /usr/libexec/mysqld is a 64-bit build, if I’m on a 64-bit OS.

Examining MySQL configuration

Then I’ll look at my.cnf quickly for anything weird:

My goal here is to look at the my.cnf without seeing all the comments, which a) are often a small novel, b) often say what the sysadmin thinks s/he’s doing, which might not be what is really being done. I want to look at it without polluting my brain with that. I’ll just scan this for anything odd. In contrast to what you might think, spending a lot of time here isn’t usually a good idea, because what’s left out is often more important than what’s included. For example, the absence of skip-name-resolve is easy to miss, as is the absence of innodb_log_file_size.

I won’t give you a full rundown over every option in the file. But again, you learn quickly if something strange is here.

This whole time I’ve been copying the output of the commands and saving them in a text file. I’ll attach this text file to the issue in our CRM system so I (or someone else) can look at it later, see what I saw, and follow my line of reasoning about any suggestions I make.

The next thing to do is grab ‘mysqladmin variables’ and put it in the text file. I don’t spend a lot of time looking at this. For one thing, all the values are in big units, so it’s hard to read. If the InnoDB buffer pool is set to 22GB, it’s much easier to see 22GB than 23622320128, especially since that number is buried next to a lot of other long numbers. They make my eyes hurt, and there are better ways to do this (I will run mysqlreport to get friendly numbers I can read easily). But what I will do is scan the output for something strange that might be caused by a syntax error in the my.cnf file. For an example, take a look at this bug report (not a bug).

Looking at MySQL status

Now we’re starting to get into the really interesting bits. Alas, these are the bits where there is the most variability, so I’ll have to be even less detailed and a little vague. The next command is this:

I let that run for at least two iterations. The first iteration is the current values since the server was booted; the second and subsequent are incremental differences. I’ll usually capture two of these. It looks like this:

Then I format the second set of values up beside the first set for ease of reference (an easy task with an automated too, or a quick Vim command if I’m doing this the manual way.) The result looks like this:

This output is very useful when writing the report to the client. Now I go through it line by line looking for things that look wrong. When I find something, I compare it against the server’s variables or whatever other data I need, and write something instructional to the client about it.

Preparing to report findings

I should mention here a bit about how I write the report to the client. I copy and paste the interesting snippets from what I’ve been gathering, save it in a text file, and explain what I’m seeing to the client. Anything I see that looks wrong should be explained in detail, because my goal is to teach the client, not just to awe them with my knowledge and get “I could never understand, O Almighty Percona, here’s your money” in reply. If the client doesn’t understand what I’m doing, I’ve done something wrong. Clients need to understand so that they know they’re getting good service. A Percona consultant never says “just trust me” unless the client really insists “I don’t want to know.”

That’s also why I’ll comment on things that seem good. Clients need to know that I’m being thorough. Once upon a time, a Percona consultant was called back in to do some further work about 6 months after Peter helped a client. Now, if Peter worked with the client 6 months ago, you have big shoes to fill, and getting further gains will probably not be easy. So this consultant looked through things, spent about 2 hours, and came back to the client with “there’s not a lot of performance left to gain here. You can do X, Y and Z, and that’s about all I see. Beyond that, we need to talk about making your overall architecture more scalable.” And the client said “that’s all? Didn’t you do any work? Peter’s report 6 months ago was a small encyclopedia!” The mistake here was that the consultant didn’t show all the things he’d checked, which was extensive indeed. After a quick explanation the customer understood that the consultant had really done a lot of work, but it’s better to show the client the full extent of the investigations in the first place, and not just show “here’s the 3 bad things I found in 2 hours.” We also make sure that we refer back to the previous issue’s findings, so that we don’t duplicate efforts from the earlier optimization.

Back to the output of mysqladmin. When I send the report to the client, it typically looks something like this. First, I explain the meaning of the output, so it’s clear what I’m showing them. Then, I go through it line by line, like this:

And so on, and so on. I’ll often refer to other things; for example, when I’m examining the status values for Created_tmp_tables and Created_disk_tmp_tables, I’ll paste in

If these are not the same size, as often happens, I’ll explain that the minimum of the two is used to determine when an in-memory temporary table will be converted to an on-disk table, which is an expensive operation.

In this way, I work through the status and the variables, and explain to the client what I can deduce about the server from them.

I would point out that I don’t spend a lot of time on settings and status. It is not where the greatest gains are to be found. The real gold mine is yet to come. After you’ve done it a few times, you can go through the settings and status pretty quickly.

Analyzing MySQL’s workload and data

After I’m done with that, I’ll grab a few snapshots of SHOW FULL PROCESSLIST, and look for odd values in this. For example, if I see a lot of queries in odd statuses, like “statistics”, I know something weird is going on, and I look deeper. I also look at the queries themselves. If the client has long-running queries, it can be pretty easy to catch. Maybe they’re long-running because they’re in Locked status, which might be an indication that it’s time to convert MyISAM tables to InnoDB. (But then again, it might not be.) Or maybe the client is doing queries like “… WHERE client IN (SELECT id FROM clients)” which is a really bad query plan.

After this I’ll look at SHOW INNODB STATUS\G and see if there’s anything worth commenting on there. Depending on the workload, there might be a lot of substance here. There’s also a very thorough section on this in our book.

Next I look at the data in the server and see what I think about it. If the server is not heavily loaded, I may even do some INFORMATION_SCHEMA queries to help me find the biggest tables and so forth. If the server is heavily loaded or has a lot of data, touching the INFORMATION_SCHEMA can take a barely-running site and bring it down, so I do this with great care! If it’s not appropriate I may run ‘du’ on the data files, or simply skip this step. This information can help me determine what kinds of things I ought to look for later during query analysis, and also might show me things like huge tables that should be archived, tables that I should be aware of if I see them in queries, or what have you.

I will also look at the error log. The error log shows all sorts of interesting things. Basically, anything but silence in the error log is interesting and needs to be investigated. You might find that there are InnoDB tables whose .frm files exist but have been dropped from InnoDB, for example. (The INFORMATION_SCHEMA query will surely trigger errors about this.)

There are a bunch of other things I’ll look at, including checking for default users in the mysql.* tables, running mk-duplicate-key-checker to find redundant indexes, and so on. As with settings and status, these set the stage for bigger gains to come later, so I don’t spend all that much time here, although there is occasionally something really bad that can be fixed and bring big gains.

Schema, query, and index optimization

Analyzing and optimizing a server’s table and index structures, and the queries that run against them, is by far the most important thing to concentrate on. All the other work is a prerequisite to understanding the context within which these things operate. For example, if the workload is a star schema data warehouse, it is important to know a lot about the I/O subsystem. And by now I know that I should concentrate in certain areas — there may be hundreds of tables, but at this point I should have identified a handful of tables that really matter. That’s why I save this part of the analysis for last.

You cannot consider any one of these factors (schema, queries, indexing) in isolation, because they are tightly bound together, and tweaking one will often have effects on the others. Each decision is a consideration of the costs and benefits on the system as a whole.

The queries the system runs, and hence the slow query log, is one of the more fruitful ways to analyze the system. What I usually do is ask the client to enable it a day before the audit if possible, and set it to one second. That way there’s at least one “cycle” of queries in it. Then I run the following:

By default, this outputs the top 10 most expensive queries, in total execution time. By the way, the slow query logs in the stock MySQL server are extremely limited, and for serious analysis it’s basically mandatory to use a server that has the Percona patches for microsecond logging and additional information in the slow query log output. The stock MySQL server’s limitation of one-second granularity makes it hide problem queries that are faster than one second (which in a high-performance system is virtually every query.) If it’s not possible to use a patched binary, we can use MySQL Proxy, packet sniffing, or other techniques to get more information than the slow log gives us; but by far the richest and most efficient source of analysis information is our custom server, which we have designed exactly for the purpose of giving us as much insight as possible into the server’s true workload. At the moment there is no other technique that approaches the amount of information a Percona build can give you.

Having found the desired information about the queries by any means necessary, I find out which are the most likely to give the greatest gains. I use a dual approach for this: I look for both the queries which cause the greatest load on the server (in aggregate) and which cause the greatest latency for the application. Again, it’s a process of finding outliers by some criteria. Both types of queries matter. Some may be extremely fast, but run very often; others may be run seldom but take a long time to finish. And it’s important to focus on ones that can give the greatest improvement for the customer. I can often tell at a glance whether a query is going to be possible to improve a lot, and a simple mental calculation can then tell me how much total gain I can get from it. Queries are not just “bad” or “good” — it’s a question of where they fall on the scale.

Now I analyze these one at a time. For each query, I run EXPLAIN if possible (rewriting non-SELECT queries if needed, and using careful judgment for queries with subqueries in the FROM clause, which will actually execute the inner subquery!), and examine the query plan. This is where you need to really know how to write queries and how EXPLAIN works. At Percona, we have peer training constantly on our internal IRC channel and mailing list. We share all sorts of dirty tricks and neat ideas with each other. Nothing is off the table. Queries can be rewritten. Indexes and data types and table structures can be changed. Queries can be broken into pieces, combined, or even eliminated entirely (we may suggest caching, or tell the customer to evaluate Sphinx, or something like that.)

In the common case, though, a query simply needs a rewrite or a new index or something. In this case we show the query and the information mysqlsla outputs about it (execution times, etc), the EXPLAIN plan, and the desired modifications to the query or the table. We explain how to interpret what we’re showing, and why the proposed modifications are better. If the client approves it, we may make a copy of the table, make the proposed modifications to it, and show the difference afterwards. Or we might have a test server to run on. It varies widely; some clients have a test server, some don’t; some have a formal QA process; some don’t. It is very customer-specific, and we work with what we have.

Sending the report to the client

After the slow query log analysis is done, I format all my analysis for sending, add in things I may have noticed along the way (comments on backups, for example) read over it once again and make sure I didn’t miss anything important or write something in a confusing way, and then send it to the client. Then I usually call to discuss, or just confirm that it was received and wait for a reply. In most cases the client has a lot of work to do — sometimes weeks of application changes. Afterwards there’s usually more to be gained by doing another pass through.

We may also want to have another call with the customer and talk about changes that can or should be made at a much higher level. If we see that the application’s overall architecture needs to be changed, that’s something to discuss. Customers usually want us to validate their application’s overall scalability and whether it’ll grow to meet their demands for some period of time. We also frequently bring the discussion to topics such as monitoring, alerting, backups, caching, reverse proxies, and high availability solutions, all of which we can frequently help customers set up much more efficiently and with better results, due to our knowledge of what works in the real world and where there’s quicksand to be avoided.

The changes Percona’s performance audits typically suggest can be really significant, and can completely change the performance profile and workload of a server, so it’s often worth another iteration, and sometimes even more after that. A really thorough audit can take up to ten hours! The initial round usually takes less than two, however. The ten-hour cases are usually for really complex applications, or apps that have a lot of stored procedures, which are more difficult to analyze because you have to pull out the statements and analyze them individually. The goal is always to develop X-Ray Vision and see through the symptoms to the underlying causes in every aspect of application performance.

Customer interaction

In all cases, the customer’s guidance is our first rule. The customer determines the depth and complexity of the analysis we do. A lot of times, customers will tell us up front to limit the work to some number of hours, such as 3 hours. Knowing this ahead of time gives us a framework within which we have to deliver the maximum value to the customer. If we have less time, we fix the “sore thumbs” and make notes of places where more effort is likely to pay off, and mention those to the customer. Or we economize by giving the customer easy choices so they can decide which things to have us fix, which to fix themselves, and which to just ignore.

The above process may look like a static recipe that you can just mechanically execute, and in some cases that’s truly what we do. However, we don’t just go through the motions. The “recipe” above is only a framework within which we work. Knowing the rules of a performance audit, and why they are the way they are, means you can know when and how to break them for the client’s benefit. It’s like knowing openings in chess — the great chess players study openings exhaustively, but they don’t robotically play them in tournaments.

For example, although it may sound as though I go work for a while and then emerge with a report, it’s not that way a lot of times. Flexibility and exercising judgment about when it’s appropriate to interact with the customer are a hallmark of our approach. If we need to be interactive and keep in touch with the client early and often, we do that. A lot of times I’ll start out with a quick look and have a second brief call with the customer, or send a quick email and ask for feedback on the direction I’m going; or just get on an instant messaging chat and tell the customer “here’s the top thing you can start working on while I keep investigating.” We make our process as interactive and flexible as the situation calls for. Early, frequent feedback lets us make small adjustments to the direction we’re going, communicate to the client how much time we’re spending, and shortcut effort if it’s not useful, e.g. skipping analysis for a query the client’s developers are in the process of removing from the application completely.

That’s all! I hope this has been an informative and fun trip through the world of a typical Percona performance audit, although as you’ve seen there really isn’t any such thing. Post your questions in the comments, and I’ll try to answer them as time permits.

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. Baron,

    Thank you for this great post!
    If I may add a suggestion: I keep a sample my.cnf file on my website, which contains the parameters I would like to see in any installation, and commented parameters which I may wish to ignore, or those I may wish to play with. This way I rarely miss an missing parameter or a misconfigured one.
    Another review I do even if the customer does not bring this up is the backup policy the company has (if any) and see if the backups they do (if at all) match that policy. Quite a few times it turns out the company didn’t have the notion of a backup plan, or the difficulties in following it.

    Again, thanks.
    Shlomi

  2. Rob Wultsch says:

    Baron,
    Holy crap. Nice.

    – Rob

  3. Wouter says:

    Thanks for sharing!

  4. fenway says:

    A goldmine!

  5. Michael says:

    So….what do you do when this happens:

    You’ve completed your analysis, handed it to the client, and they were expecting that you’d provide a panacea at the server-level, solving all their “code problems”? We regularly have clients that feel that they shouldn’t need to change their application. What do you do at that point? Do you actually re-write the queries for them in your service? It was difficult to convince them to implement caching (which saved massive load on the database servers, even if some changes are still needed). Some areas need to be re-worked completely.

    Michael

  6. peter says:

    Michael,

    Sure we can re-write queries, re-structure the schema and all this kind of stuff. Our goal is to give the client a “menu” so they can pick what they want because business reasoning can be outside of our immediate reach. For example we may say something like we tuned your MySQL settings this should get your capacity up another 20%. You can double capacity by by adding the proper indexes and get another 50% by upgrading to faster system. But if you need more gains you need to restructuring and schema or architecture changes.

    It is more complicated than that really but it gives some good insight.

  7. Michael says:

    Peter:

    Our problem seems to be that we’ve done all the tuning (perhaps even provided a bandaid as such), and so no application changes are made, the only “breathing room” came from the tuning. Then, the wall of scability is hit again, loads go up, queries lag, and the question comes up “Why can’t you do what you did before?”. Obviously it’s because it’s been done before.

    When the client has the fastest (excepting ssd and dedicated appliances) hardware you can buy, an already tuned infrastructure (my.cnf, hardware, kernel patches, the works), and still doesn’t “want” to change the application or doesn’t believe it’s the problem, that’s where I start to pull out my hair.

    Michael

  8. Vadim says:

    Michael,

    There is example (quite simplified):

    if application is running SELECT COUNT(*) from InnoDB_table to scan 100.000.000 rows, there is no good way to optimize it if you do not want to change application.

  9. Gil says:

    Having just completed a Percona audit, I can definitely attest to the thoroughness and volume of documentation provided. It is similar to this except much more detailed and customized to the way your system works.

    Knowing how Percona identified the problems in my specific system was absolutely invaluable, and saved me more time than I can imagine. Thanks guys.

  10. Sheeri says:

    Baron,

    “The stock MySQL server’s limitation of one-second granularity makes it hide problem queries that are faster than one second (which in a high-performance system is virtually every query.)”

    Actually, it’s worse than that. The slow query log logs queries that take >long_query_time. Not >=long_query_time. So actually you don’t get any logging for queries that are faster than 2 seconds. (If you have data to counteract this, let me know, but I haven’t seen any…)

    Michael,

    The point is when *tuning*, you let the client know exactly what’s happening. If the client thinks you’re fixing something when what you’re doing is buying time, you need to explain to them that you’re buying time — not fixing a problem.

    It’s certainly a common problem, as often there are tons of change management procedures regarding the code (and tests and qa to pass) and not so many regarding changing a my.cnf parameter.

    For the toughest clients, use a test database, fill with sample data, make your changes (code and perhaps schema as well) and show them the difference.

    You can always make an analogy, as Peter did — though you can explain that if you have lots of memory, fast cpu and lots of fast disk, and have to process 500 records, it’s faster to only have to process 50 records than 500, no matter how good your stuff is.

  11. Michael,
    The situation you’ve described is so true. I think it is important to clear this up right from the start: “The analysis may find that changes should be made to the application. There’s only so much you can gain by tuning the server or upgrading the hardware.”
    I happened on such a case where I was hired to “help out the DBA fix performance issues”, when eventually I went up to the developers and have shown them how their queries were making for low performance (one of the things they did was, on an InnoDB table, never to update row data, but instead DELETE+INSERT. The performance boost once converted to UPDATE could be seen by the naked eye – I was lucky – and this allowed me to further pursue their application code)

  12. Someone asked why grep my.cnf instead of using my_print_defaults, which will do things like pull values from multiple files (if any) and eliminate duplicate entries. There’s a two part answer.

    1) we want to see mistakes in my.cnf, rather than have them hidden by my_print_defaults. A duplicate entry is pretty common and should be fixed.

    2) it’s true that there may be several .cnf files that mysqld reads from, so a single one isn’t the whole truth. But it’s also the case that variables can be set online, so the only source of truth is SHOW VARIABLES.

    The other thing I didn’t mention is that some of the tools we use will do .cnf file analysis to look for duplicate/conflicting values, and values that are different inside mysql than they are in the config files.

  13. sefa says:

    kımı sevıyorum

  14. hironics says:

    I think you have no way to get know all of the configuration variables by just looking into config file.

    You may probably look the whole config variables with ‘show global variables’.

  15. I would like to get the book for mysql performance, community version 5.x.

    Can anyone direct me to the point?

    I’m from Thailand.

  16. You can get it from http://www.highperfmysql.com/.

  17. Thanks Baron,

    I was wondering is this book specific for any version of mysql? or it is an overall performance tuning of mysql.

    I prefer to get the community version 5.x if you could get me the right one, pls direct me?

Speak Your Mind

*