How expensive is USER_STATISTICS?

One of our customers asked me whether it’s safe to enable the so-called USER_STATISTICS features of Percona Server in a heavy-use production server with many tens of thousands of tables.

If you’re not familiar with this feature, it creates some new INFORMATION_SCHEMA tables that add counters for activity on users, hosts, tables, indexes, and more. This is very helpful information. It lets you run queries to see which tables get the most reads and writes, for example. Another popular use for it is to find unused indexes.

I knew that some of our customers were using the feature enabled in production all the time, and I knew that Google and Facebook and others (the original developers) did also. But I didn’t know the real answer about how expensive it is.

There are two dimensions to the question. One is how much slower the server becomes, and the other is how much more memory it uses. The second question is important on servers with many tables. Actually, “many indexes” is an even better metric. If you have tens of thousands of tables and they have a handful of indexes each, you have many more indexes than tables.

Our performance engineering team ran a variety of benchmark scenarios with the feature disabled versus enabled. In brief, we started the server on a database with 25,000 sysbench tables, performed a warmup, ran a benchmark for 900 seconds, collected a snapshot of system state, enabled userstats, repeated the warmup and benchmark, and captured another snapshot. We ran this in a couple of configurations. This allowed us to see what the impact of the feature is versus the behavior of the server itself. We used tools such as the gperf heap profiler and /proc/pmap to help examine memory usage.

The results? After running the benchmark on 25,000 tables with indexes, the performance change was indistinguishable from measurement error (53 transactions per second). And the feature used less than 10MB of memory to store all the statistics.

It might be time for us to consider making this feature enabled by default instead of disabled by default. It seems that it’s been adequately tested and the benefit is much larger than the cost (possibly infinitely, if you consider that the cost is indistinguishable from zero).

Share this post

Comments (14)

  • Olivier Doucet

    We are using USER_STATISTICS on heavy loaded servers (4 CPUs, 128GB RAM) and did not see any performance decrease when we activated this feature.
    The only limitation I found so far is that some fields (like user_statistics.bytes_received or user_statistics.rows_fetched) are stored as signed integer, so the max value of 2147483647 is rapidly reached. Not a big deal, but this requires you to perform FLUSH USER_STATISTICS once in a while if you monitor these values.

    June 2, 2012 at 10:04 am
  • Justin Rovang

    What would it encompass to add a datetime of last seen for users?

    I find it’s a bear in some environments when performing cleanup / account auditing and nobody really knowing if the account is in use or not…

    The alternatives to collect such data is a lot of work for a small piece of insight.

    June 2, 2012 at 12:15 pm
  • Baron Schwartz

    That’s a good question. I don’t feel competent to answer it, because getting the current time sometimes is surprisingly expensive. Perhaps it can be done once per statement instead of once per row.

    June 2, 2012 at 12:43 pm
  • Justin Rovang

    Or even per connect/disconnect/timeout

    June 2, 2012 at 12:52 pm
  • Olivier Doucet

    That’s an excellent idea. Just recording it at each login might be enough for cleanup auditing.

    June 2, 2012 at 12:55 pm
  • Bartek Bednarowicz

    Excellent idea about building the stats by default – will save the hassle of parsing the logs. I assume there won’t be any gotchas of the “innodb_stats_on_metadata” fame since data always gets saved on event rather than on query, right?

    June 3, 2012 at 5:33 am
  • Steve Jackson

    Hi Baron.

    Are you guys aware that these stats tables use only 4 byte signed ints? I enabled stats yesterday when I saw this posting, and am already at max value on bytes_sent for a few users.

    Thought I would let you know



    June 4, 2012 at 7:45 am
  • Baron Schwartz

    I actually didn’t know that. That’s the best part of blogging — your readers teach you a lot 🙂 I will suggest that our dev team read these comments.

    June 4, 2012 at 8:01 am
  • Alexey Polyakov

    I’d check if it scales as well when you have lots of users and lots of databases, each getting little load (shared web hosting use-case, dedicated MySQL servers there often end up serving tens of thousands users/DBs). If it does, enabling it by default seems to be a safe thing to do.

    June 4, 2012 at 9:44 am
  • Olivier Doucet

    I Just discovered a bug with userstat that may prevent it to be activated by default (at least as long as this bug is not fixed) :

    June 4, 2012 at 10:08 am
  • Eric H

    It would be nice to see some cacti templates that graph userstat metrics.

    June 4, 2012 at 10:18 am
  • Baron Schwartz

    We did not test with lots of users. In general, the feature will scale well but MySQL’s user authentication itself will not scale well. I’ve seen this many times.

    June 4, 2012 at 11:37 am
  • Marcus Bointon

    It might not be the best time to suggest this when the current release does this:
    “set global userstat=’ON’;” is enough to cause an instant crash for me.

    June 18, 2012 at 1:58 am
  • Neha Jaltare

    I have a very unique problem with mysql query can u help me?

    July 10, 2012 at 10:23 pm

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.