Emergency
Subscribe to Latest MySQL Performance Blog posts feed
Percona's Data performance and scalability blog
Updated: 1 hour 16 min ago

Is 80% of RAM how you should tune your innodb_buffer_pool_size?

June 2, 2015 - 3:00am

It seems these days if anyone knows anything about tuning InnoDB, it’s that you MUST tune your innodb_buffer_pool_size to 80% of your physical memory. This is such prolific tuning advice, it seems engrained in many a DBA’s mind.  The MySQL manual to this day refers to this rule, so who can blame the DBA?  The question is: does it makes sense?

What uses the memory on your server?

Before we question such advice, let’s consider what can take up RAM in a typical MySQL server in their broad categories.  This list isn’t necessarily complete, but I think it outlines the large areas a MySQL server could consume memory.

  • OS Usage: Kernel, running processes, filesystem cache, etc.
  • MySQL fixed usage: query cache, InnoDB buffer pool size, mysqld rss, etc.
  • MySQL workload based usage: connections, per-query buffers (join buffer, sort buffer, etc.)
  • MySQL replication usage:  binary log cache, replication connections, Galera gcache and cert index, etc.
  • Any other services on the same server: Web server, caching server, cronjobs, etc.

There’s no question that for tuning InnoDB, the innodb_buffer_pool_size is the most important variable.  It’s expected to occupy most of the RAM on a dedicated MySQL/Innodb server, but of course other local services may affect how it is tuned.  If it (and other memory consumption on the server) is too large, swapping can kick in and degrade your performance rapidly.

Further, the workload of the MySQL server itself may cause a lot of variation.  Does the server have a lot of open connections and active query workload consuming memory?  The memory consumption caused by this can be dramatically different server to server.

Finally, replication mechanisms like Galera have their own memory usage pattern and can require some adjustments to your buffer pool.

We can see clearly that the 80% rule isn’t as nuanced as reality.

A rule of thumb

However, for the sake of argument, let’s say the 80% rule is a starting point.  A rule of thumb to help us get a quick tuning number to get the server running.  Assuming we don’t know anything really about the workload on the system yet, but we know that the system is dedicated to InnoDB, how might our 80% rule play out?

Total Server RAMBuffer pool with 80% ruleRemaining RAM1G800MB200MB16G13G3G32G26G6G64G51G13G128G102G26G256G205G51G512G409G103G1024G819G205G

At lower numbers, our 80% rule looks pretty reasonable.  However, as we get into large servers, it starts to seem less sane.  For the rule to hold true, it must mean that workload memory consumption increases in proportion to needed size of the buffer pool, but that usually isn’t the case.  Our server that has 1TB of RAM likely doesn’t need 205G of that to handle things like connections and queries (likely MySQL couldn’t handle that many active connections and queries anyway).

So, if you really just spent all that money on a beefy server do you really want to pay a 20% tax on that resource because of this rule of thumb?

The origins of the rule

At one of my first MySQL conferences, probably around 2006-2007 when I worked at Yahoo, I attended an InnoDB tuning talk hosted by Heikki Tuuri (the original author of InnoDB) and Peter Zaitsev.  I distinctly remember asking about the 80% rule because at the time Yahoo had some beefy 64G servers and the rule wasn’t sitting right with me.

Heikki’s answer stuck with me.  He said something to the effect of (not a direct quote): “Well, the server I was testing on had 1GB of RAM and 80% seemed about right”.  He then, if memory serves, clarified it and said it would not apply similarly to larger servers.

How should you tune?

80% is maybe a great start and rule of thumb.  You do want to be sure the server has plenty of free RAM for the OS and the usually unknown workload.  However, as we can see above, the larger the server, the more likely the rule will wind up wasting RAM.   I think for most people it starts and ends at the rule of thumb, mostly because changing the InnoDB buffer pool requires a restart in current releases.

So what’s a better rule of thumb?  My rule is that you tune the innodb_buffer_pool_size as large as possible without using swap when the system is running the production workload.  This sounds good in principle, but again, it requires a bunch of restarts and may be easier said than done.

Fortunately MySQL 5.7 and it’s online buffer pool resize feature should make this an easier principle to follow.  Seeing lots of free RAM (and/or filesystem cache usage)?  Turn the buffer pool up dynamically.  Seeing some swap activity?  Just turn it down with no restart required.   In practice, I suspect there will be some performance related hiccups of using this feature, but it is at least a big step in the right direction.

The post Is 80% of RAM how you should tune your innodb_buffer_pool_size? appeared first on MySQL Performance Blog.

Percona TokuMXse 3.0.3-1.0-rc6 is now available

June 1, 2015 - 10:54am

Percona is glad to announce the release of Percona TokuMXse 3.0.3-1.0rc6 on June 1st 2015. TokuMXse is a TokuMX storage engine for MongoDB 3.0.3. Downloads are available from our download site here. Packages for this Release Candidate release, will be available in our Ubuntu Trusty testing and CentOS 7 testing repositories.

Based on MongoDB 3.0.3, including all the bug fixes in it, Percona TokuMXse 3.0.3-1.0 is the current release candidate.

This release contains minor changes to the Fractal Tree, including:

  • Improved tokuftdump information.
  • Fixed sporadic recovery issue due to rare race between transaction rollback and logging.
  • Report capped boolean for uncapped collections when using tokuft storage engine.
Getting started

After installation, you can start mongod with tokuft as the storage engine, with:

$ mongod --storageEngine tokuft

Note: Transparent huge pages must be turned off for the fractal tree engine to work properly. If you attempt to run mongod with that option enabled, an error informing you of this will be printed to the output of the mongod process and it will fail to start.

You can check if the Transparent huge pages are enabled with:

$ cat /sys/kernel/mm/transparent_hugepage/enabled

[always] madvise never

You can disable the transparent huge pages by running the following command as root (NOTE: Setting this will last only until the server is rebooted):

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Percona TokuMXse currently supports various tuning parameters to pass to the mongod instance. Full list can be read by running mongod --help (this will print options for all available storage engines, including tokuft).

The post Percona TokuMXse 3.0.3-1.0-rc6 is now available appeared first on MySQL Performance Blog.

New PERFORMANCE_SCHEMA defaults in MySQL 5.7.7

June 1, 2015 - 7:27am

I thought it was worth a moment to reiterate on the new Performance Schema related defaults that MySQL 5.7.7 brings to the table, for various reasons.

For one, most of you might have noticed that profiling was marked as deprecated in MySQL 5.6.7. So it is expected that you invest into learning more about Performance Schema (and Mark’s sys schema!).

Second, there are lots of virtual environments and appliances out there running Community Edition MySQL where Performance Schema can be a useful tool for analyzing performance. Thus, expect to see more articles about using PERFORMANCE_SCHEMA and SYS_SCHEMA from us!

Third, we have more and more junior readers who might benefit from light reads such as this.

The new defaults that I wanted to highlight are mentioned in the MySQL 5.7.7 release notes:
– The MySQL sys schema is now installed by default during data directory installation.
– The events_statements_history and events_transactions_history consumers now are enabled by default.

Note that if you are upgrading from an earlier version of MySQL to 5.7.7 to get these goodies you will need to run mysql_upgrade and restart the database for the above changes to take effect.

So what do these mean?

If you haven’t had a chance to dig into PERFORMANCE_SCHEMA, check out the quick start guide here. PERFORMANCE_SCHEMA is a nify tool (implemented as a union of a storage engine and a schema in MySQL) to monitor MySQL server execution at a low level, with a focus on performance metrics. It monitors for events that have been “instrumented”, such as function calls, OS wait times, synchronization calls, etc. With performance nomenclature “instruments” are essentially “probes”. The events that the instruments generate can be processed by consumers. Note that not all instruments or consumers are enabled by default.

Some would say that the structure of PERFORMANCE_SCHEMA may be complex and may not be very DBA-friendly. This is what led to the birth of SYS_SCHEMA. For those who are not familiar with Mark Leith’s SYS_SCHEMA and prefer TL;DR – it provides human friendly views, functions and procedures that can help you analyze database usage using PERFORMANCE_SCHEMA. If you haven’t had a chance to check it out yet you might want to read Miguel’s article on using the sys schema or Alexander Rubin’s article about using it in multitenant environments and give it a spin!

I welcome the fact that events_statements_history and events_transactions_history consumers are enabled by default in MySQL 5.7.7 as it means that we get some handy performance details available to us out of the box in vanilla MySQL. Note that these are per-thread tables and by default the history length (the length of the number of entries present; more on those variables here) is automatically sized, thus you may need to increase them.

What details do you get off the bat with them?

Consider the following example:

mysql> select * from performance_schema.events_statements_history where event_id=353G *************************** 1. row *************************** THREAD_ID: 20 EVENT_ID: 353 END_EVENT_ID: 456 EVENT_NAME: statement/sql/select SOURCE: mysqld.cc:963 TIMER_START: 1818042501405000 TIMER_END: 1818043715449000 TIMER_WAIT: 1214044000 LOCK_TIME: 67000000 SQL_TEXT: select * from imdb.title limit 100 DIGEST: ec93c38ab021107c2160259ddee31faa DIGEST_TEXT: SELECT * FROM `imdb` . `title` LIMIT ? CURRENT_SCHEMA: performance_schema OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 0 RETURNED_SQLSTATE: NULL MESSAGE_TEXT: NULL ERRORS: 0 WARNINGS: 0 ROWS_AFFECTED: 0 ROWS_SENT: 100 ROWS_EXAMINED: 100 CREATED_TMP_DISK_TABLES: 0 CREATED_TMP_TABLES: 0 SELECT_FULL_JOIN: 0 SELECT_FULL_RANGE_JOIN: 0 SELECT_RANGE: 0 SELECT_RANGE_CHECK: 0 SELECT_SCAN: 1 SORT_MERGE_PASSES: 0 SORT_RANGE: 0 SORT_ROWS: 0 SORT_SCAN: 0 NO_INDEX_USED: 1 NO_GOOD_INDEX_USED: 0 NESTING_EVENT_ID: NULL NESTING_EVENT_TYPE: NULL 1 row in set (0.00 sec)

As you can see from above you get similar data that you are used to seeing from EXPLAINs and slow query logs, such as query run time, locking time, rows sent/examined, etc. For instance, in above output my query obtained about a 100 rows (lines 26-27), avoided creating temp tables (lines 28-29) and didn’t have to sort (lines 36-38) and no index was used (line 39) and it ran for about 121 ms (TIMER_END-TIMER_START). The list of details provided is not as abundant as it could be but I imagine that with newer releases the list may grow.

If you want to read on and are curious about how to use Performance Schema for profiling check out Jervin’s great article here!

The post New PERFORMANCE_SCHEMA defaults in MySQL 5.7.7 appeared first on MySQL Performance Blog.

MongoDB Community Open House: June 1 in NYC

May 28, 2015 - 6:13am

If you can make it to Manhattan next Monday, please join me at the MongoDB Community Open House. The June 1 event is free and open to all. It runs from 3:30-6:30 p.m. just across the road from MongoDB World.

The MongoDB Community Open House, held at the New York Hilton Midtown, will feature technical presentations and sessions from key members of the MongoDB open source community. A reception will be held afterward featuring plenty of food, drink and fun. Everyone who attends the will get a cool t-shirt, too. Space is limited, though, so I suggest registering now to reserve your spot.

The talks include:
  • “MATH is Hard: TTL Index Configuration and Considerations,” by Kim Wilkins of Rackspace
  • “Implementing MongoDB 3.0 Storage Engine,” with Facebook’s Igor Canadi and Christian Rober of Percona
  • “Is it Fast: Measuring MongoDB Performance,” by Tim Callaghan of Acme Benchmarking
  • “MongoDB for MySQL Users,” by Percona’s Alexander Rubin
  • “Rolling out RocksDB in Production,” by Charity Majors of Facebook
  • “Percona TokuMX and Percona TokuMXSE Performance Benefits,” by Percona’s Jon Tobin

In addition to the free t-shirts, food and drinks, we’ll also be raffling off some prizes, including a full-access pass to Percona Live Amsterdam, to be held this coming September 21-23. Our autumn conference, moved to a new month and venue by popular demand, will be bigger and better than ever with great speakers, tutorials and sessions around MySQL, NoSQL and data in the cloud.

The MongoDB Community Open House will be especially valuable for those unable to attend MongoDB World for budgetary reasons. Our intent is to make MongoDB World even better by adding more technical content. And if you are attending MongoDB World, I invite you to drop in even if only for a session or two.

I hope to see you Monday in NYC!

The post MongoDB Community Open House: June 1 in NYC appeared first on MySQL Performance Blog.

Percona XtraBackup 2.2.11 is now available

May 28, 2015 - 12:59am

Percona is glad to announce the release of Percona XtraBackup 2.2.11 on May 28, 2015. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New Features:

  • Percona XtraBackup has been rebased on MySQL 5.6.24.

Bugs Fixed:

  • Version check would crash innobackupex and abort the backup on CentOS 5. Bug fixed #1255451.
  • Percona XtraBackup could crash when preparing the backup taken on MySQL/Percona Server 5.5 if there were open temporary tables during the backup. Bug fixed #1399471 (Fungo Wang).
  • Percona XtraBackup would fail to prepare the backup if the xtrabackup_logfile was lager than 512GB. Bug fixed #1425269.
  • Fix for bug #1403237 was incomplete, due to setting wrong offset last copied batch of log records was copied from wrong location. Bug fixed #1448447.
  • Percona XtraBackup now executes an extra FLUSH TABLES before executing FLUSH TABLES WITH READ LOCK to potentially lower the impact from FLUSH TABLES WITH READ LOCK. Bug fixed #1277403.
  • Regression introduced by fixing #1436793 in Percona XtraBackup 2.2.10 caused an error when taking an incremental backup from MariaDB 10. Bug fixed #1444541.
  • Percona XtraBackup now prints and stores the file based binlog coordinates in xtrabackup_binlog_info even though GTID is enabled. Bug fixed #1449834.
  • Percona XtraBackup doesn’t print warnings anymore during the prepare phase about missing tables when a filtering option (--databases, --tables, etc.) is provided. Bug fixed #1454815 (Davi Arnaut).

Other bugs fixed: #1415191.

Release notes with all the bugfixes for Percona XtraBackup 2.2.11 are available in our online documentation. Bugs can be reported on the launchpad bug tracker. Percona XtraBackup is an open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

The post Percona XtraBackup 2.2.11 is now available appeared first on MySQL Performance Blog.

MySQL 5.7 key features

May 27, 2015 - 12:00am

The other day I was discussing new features of MySQL 5.7 with a Percona Support customer. After that conversation, I thought it would be a good idea to compile list of important features of MySQL 5.7. The latest MySQL 5.7.6 release candidate (RC) is out and is packed with nice features. Here’s a list of some MySQL 5.7 key features.

Replication Enhancements:

  • One of the top features in MySQL 5.7 is multi-source replication. With multi-source replication you can point multiple master server’s to slave so limitation of slave having only one master is lift off. There is nice blog post written by my colleague on multi-source replication you will find useful.
  • SHOW SLAVE STATUS is non-blocking since MySQL 5.7. SHOW SLAVE STATUS returns immediately without waiting for STOP SLAVE to finish which can be blocked by long running SQL query from replication SQL_THREAD. As a side note, the LOCK FREE SHOW SLAVE STATUS feature is first implemented in Percona Server 5.5.
  • Now you can have all the information about SHOW SLAVE STATUS from performance schema database tables. More details here from the manual.
  • With the new CHANGE REPLICATION FILTER command now you can modify replication filters rules without bouncing MySQL servers.
  • Since MySQL 5.7 you can perform CHANGE MASTER TO without stopping the slave via the STOP SLAVE command. For further details check the manual.
  • There is now a different method for parallel replication. With new implementation the slave can apply transaction in parallel with single database/schema too. Check slave_parallel_type for details.
  • Global Transaction Identifiers (GTID) is a feature that automatically tracks the replication position in replication stream, and since MySQL 5.7 gtid_mode is dynamic variables, which means you can enable/disable GTID in replication topology without synchronizing and restarting entire set of MySQL servers. As a side note, online GTID deployment feature is added in Percona Server 5.6. With this feature you can deploy GTID on existing replication setups without marking master read_only and stopping all slaves in replication chain. My colleague Stephane had written nice blogpost to perform online migration without master downtime.

InnoDB Enhancements:

  • Now you can resize InnoDB buffer pool online. Since MySQL 5.7 innodb_buffer_pool_size is a dynamic variable which provides the ability to resize buffer pool without restarting MySQL server.
  • From MySQL 5.7, online ALTER TABLE also supports RENAME INDEX clause to rename an index. This change will take in place without table copy operation.
  • InnoDB supports Transportable Tablespace feature for partitioned InnoDB tables. I wrote a blog post on Transportable Tablespace that you will find useful.
  • Innochecksum utility is enhanced with new options. I also wrote a recent blog post on this same topic.
  • As of MySQL 5.7, InnoDB supports “spatial indexes” and it also supports online DDL operation to add spatial indexes i.e. ALTER TABLE .. ALGORITHM=INPLACE.
  • Improved InnoDB buffer pool dump/reload operations. A new system variable, innodb_buffer_pool_dump_pct allows you to specify percentage of most recently used pages in each buffer pool to read out and dump.

Triggers:

  • As per SQL standard, MySQL 5.7 now supports multiple triggers per table for trigger event (DML) and timing (BEFORE,AFTER) i.e. multiple triggers are permitted now for each event e.g. multiple triggers on INSERT action.

Performance Improvements:

  • Bulk data load is improved on InnoDB in MySQL 5.7. InnoDB performs a bulk load when creating or rebuilding indexes. This method known as sorted index build and enhance create index operation and it also impacts FULLTEXT indexes.
  • Currently there is a single page cleaner thread responsible for flushing dirty pages from the buffer pool(s). In MySQL 5.7 InnoDB parallel flushing was implemented to improve flushing where separate background thread for each buffer pool instance for flush list, LRU list. It’s worth to mention a two-threaded flushing implemented in Percona Server 5.6.

Optimizer Improvements:

  • EXPLAIN FOR CONNECTION will let you run explain statements for already running queries. This may yield important information towards query optimization.
  • In MySQL 5.7 the optimizer avoids the creatation temporary table for result of UNION ALL queries and this will help to reduce disk I/O and disk space when UNION yields large result set. I found Morgan Tocker post informative on same.
  • JSON format for EXPLAIN first introduced in MySQL 5.6 which produces extended information. JSON format for EXPLAIN is enhanced in version 5.7 by printing total query cost which makes it easier to see the difference between the good and bad execution plans.
  • MySQL 5.7 now supports generated columns also known as virtual columns as new feature. My colleague Alexander explained this really well in this blogpost

MySQL Test Suite Enhancements:

  • The MySQL test suite now uses InnoDB as its default storage engine. Along with that many new tests added and existing tests enhanced including test suite for replication with GTID.

Security Enhancements:

  • Since MySQL 5.7 there is a password expiration policy in place. Any user that connects to a MySQL server goes through a password expiration life cycle and must change the password. More from the manual here.
  • Database administrators can nowo lock/unlock user accounts. Check details here.
  • As of MySQL 5.7, installation only creates only one ‘root@localhost’ user account with random password and marks the password expiration cycle. So, installation no longer creates anonymous-user accounts and along with that there is no test database. For root user account password, MySQL generates it during data directory initialization and marks it as expired and will write a message to stdout displaying the password.

Conclusion:
This is only a short list of new features in MySQL 5.7. Please feel free to add your favorite features in the comments section. Along with new features, there are quite a few deprecated/removed features in MySQL 5.7. You can get full list from the manual.

The post MySQL 5.7 key features appeared first on MySQL Performance Blog.

Storing time-series data with MongoDB and TokuMX

May 26, 2015 - 7:12am

Storing time-series data is a frequent pattern for databases – be it for logs or for any kind of monitoring. Such data has the following properties: records are inserted but also never updated, the insertion rate can be high and records are likely to expire after some time. MongoDB and TokuMX are both good fits because of their flexible schema feature. But how can we handle data expiration efficiently? Several options are available: capped collections, TTL collections and partitioning (TokuMX only), but they all have different features and performance profiles.

Summary
  • Capped collections: very good insert performance, but not eligible for sharding and hard to predict when documents will expire.
  • TTL collections: expiration date is easy to enforce and compatible with sharding, but purge or records is inefficient.
  • Partitioning: very good performance for inserts and purge, but not compatible with sharding and only available with TokuMX.
Capped collections

To create an app_stats capped collection with a size of 1GB, use the following command:

> db.createCollection( "app_stats", { capped: true, size: 1024*1024*1024 } )

MongoDB will write to this collections in a circular fashion: once allocated files are full, data at the beginning of the first file is being overwritten. This is very good to make sure your collection will never exceed the size you set. However predicting the size you will need to store 6 months of data can be tricky.

Another benefit of capped collection is that they keep data in insertion order. So you don’t need to add an extra index to sort data by insertion date: this is a good point for good write performance.

Can you update records in a capped collection? Yes but only as long as they don’t increase the original size of the document, but it is recommended to be light on updates as you might experience strange errors with secondaries in some rare cases.

Another limitation is that sharding is not supported.

TTL collections

TTL collections take a different approach: they are normal, but they have an index that has a special option. For instance:

> db.logs.createIndex({insertDate:1}, {expireAfterSeconds: 86400})

This index will make sure that records will be expired automatically after one day. Pretty neat!

And as a TTL collection is a regular collection, you can shard it if you need.

However the main limitation is the purge process: every minute, a background thread will look for documents that are expired and if it finds some, it will remove them. This is not really different from the application running a cron job every minute to remove old documents: it adds a constant purge workload that can be detrimental to insertion performance.

Note that TTL indexes do not work with capped collections.

Partitioning (TokuMX)

With TokuMX you can partition your data like you would with MySQL. For instance, if you want to create a collection partitioned on an insertDate field, you would use these commands:

> db.createCollection('stats_part',{primaryKey:{insertDate:1,_id:1}, partitioned: true}) > db.stats_part.addPartition({insertDate:ISODate("2015-05-31T23:59:59")}) > db.stats_part.addPartition({insertDate:ISODate("2015-06-30T23:59:59")}) [...]

The main benefit of partitioning is that expiring data is extremely simple and fast: remove the corresponding partition(s) with the dropPartition() function. Another nice property is that you can be very flexible regarding how large your partitions can grow. This could deserve a whole blog post, but let me give a quick example.

Suppose you want to expire data after 6 months. With regular MongoDB, indexes are using B-Trees so insertion is only fast if indexes fit in memory. So if you could create a partitioned collection, insertions would only be fast if the partition you are writing to (the last one) would fit in memory. Then you would probably have to create a partition for every day. With TokuMX, there is no such limitation and insertion performance will be as good if you have larger partitions like one partition per week or one partition per month.

Any drawback of partitioning? It involves some application overhead as you need to write the logic to periodically drop old partitions and create new ones. And sharding is not supported.

Conclusion

There is no one-size-fits-all solution when it comes to storing time-series data in MongoDB and TokuMX, but several options with their own benefits and trade-offs. Note that if your collection needs to be sharded, a TTL index is the only option. And if you are using TokuMX, partitioning is something to look at if you were planning a capped collection.

The post Storing time-series data with MongoDB and TokuMX appeared first on MySQL Performance Blog.

MongoDB / TokuMX plugin for LinkBench (Part 1)

May 22, 2015 - 11:31am

There’s no benchmark for how life’s “supposed” to happen. There is no ideal world for you to wait around for. The world is always just what it is now, it’s up to you how you respond to it.”
― Isaac Marion, Warm Bodies

At one time or another, most of us have heard some version of this question: “Sure the system does fine in the benchmarks, but can it perform in production?”. Benchmark software developers have tried to address this issue in their systems. Features such as configurable workloads and scripting interfaces help to tailor benchmarks to various scenarios, but still require an expert to properly implement them.

A brief overview of LinkBench

The LinkBench benchmark was developed by Tim Armstrong with the guidance and help of a team from Facebook during his internship there. It takes a different approach to the challenge of simulating the real world. LinkBench is designed from the ground up as a replica of the data operations of Facebook’s social graph. By implementing an identical data model along with business methods and workloads directly proportionate the those used in the production social graph, LinkBench can effectively duplicate the data load that will be seen in a production social networking application.

Anatomy of a Social Graph – The Data Model

With this deceptively simple schema, a very robust application can be built.

The nodetable defines an object or end-point within the social graph. Examples of nodes include users, posts, comments, replies, albums, photos, groups, etc… The node’s type attribute is the magic that determines what the node represents and the data attribute contains the object itself (up to 16mb).

The linktable is a generic association or bridge table allowing any two nodes to be associated in some way. The secret sauce in this case, is the link_type attribute which represents a specific relationship between any two nodes.  Examples of links include users being friends, a user liking a post, a user that is tagged in a photo and so on.

The third table, counttable is very important for performance and scalability in a social network. It maintains counts of a given link type for a node. Counts are transactionally updated whenever an operation that could potentially alter the count occurs. This small investment in the form of an additional write operation pays off by allowing for quick access to the number of likes, shares, posts, friends and other associations between nodes. Without the count table, the application would have to continuously query the database to retrieve up-to-date count information for various relationships creating a tremendous amount of system load.

The Social Graph Controller

As you can see, the model is very simple. The real magic in the social graph lies in the controller.  LinkBench simulates the controller<->model interface through it’s workload configuration. The included configuration is based on actual production measurements of data payload size and distribution, node and link ‘temperature’ (popularity) and logged operation mix over a period of days.

A Social Graph In Use

Implementation of MongoDB / TokuMX plugin for LinkBench


LinkBench is designed to be customizable and extensible in order to test new persistence technologies and architecture designs.  A new database plugin can be developed by extending the abstract class com.facebook.LinkBench.LinkStore and/or implementing the interface com.facebook.LinkBench.NodeStore.  There is also a combination com.facebook.LinkBench.GraphStore class that can be sub-classed for a combination of both LinkStore and NodeStore.  One disadvantage of the current implementation is that it is up to the plugin developer to follow all of the business requirements of the social graph in the plugin.  This requires careful auditing of each plugin to insure that it has been implemented to specification.  To assure a 1-to-1 parity with the MySQL plugin, I used it as a base and converted the methods to MongoDB one at a time carefully translating each operation.

Along the way, I’ve learned a lot about NoSQL and MongoDB in particular and dispelled a few myths that I had about NoSQL.  I will save that for another article.  Let me talk about a few design decisions I made while implementing the plugin.

  • Compatibility – In order to provide comparisons, the LinkBench plugin maintains compatibility with MongoDB 2.x, TokuMX 2.x, MongoDB 3.x and TokuMXse (RC)
  • TransactionsMVCC concurrency is used in the LinkBench MySQL plugin. In order to maintain this capability I implemented new configuration transaction_support_level which allows the Benchmark to run with no transaction support, MVCC only if supported or simulated transactions using the Two Phase Commit strategy documented on the MongoDB site.
  • Schema –  The relationship between nodes and links does not facilitate the use of embedded documents.   It would be possible to embed count documents under node, however it probably isn’t worth the the extra complexity and network traffic that would be generated.  I opted to leave the schema flat.

In Part 2 I will dive into the LinkBench Java code a bit to show the comparison between the MySQL plugin the MongoDB plugin.

External Links

The post MongoDB / TokuMX plugin for LinkBench (Part 1) appeared first on MySQL Performance Blog.

Percona XtraBackup 2.3.1-beta1 is now available

May 20, 2015 - 11:10am

Percona is glad to announce the release of Percona XtraBackup 2.3.1-beta1 on May 20th 2015. Downloads are available from our download site here. This BETA release, will be available in Debian testing and CentOS testing repositories.

This is an BETA quality release and it is not intended for production. If you want a high quality, Generally Available release, the current Stable version should be used (currently 2.2.10 in the 2.2 series at the time of writing).

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release contains all of the features and bug fixes in Percona XtraBackup 2.2.10, plus the following:

New Features:

  • innobackupex script has been rewritten in C and it’s set as the symlink for xtrabackup. innobackupex still supports all features and syntax as 2.2 version did, but it is now deprecated and will be removed in next major release. Syntax for new features will not be added to the innobackupex, only to the xtrabackup. xtrabackup now also copies MyISAM tables and supports every feature of innobackupex. Syntax for features previously unique to innobackupex (option names and allowed values) remains the same for xtrabackup.
  • Percona XtraBackup can now read swift parameters from a [xbcloud] section from the .my.cnf file in the users home directory or alternatively from the global configuration file /etc/my.cnf. This makes it more convenient to use and avoids passing the sensitive data, such as --swift-key, on the command line.
  • Percona XtraBackup now supports different authentication options for Swift.
  • Percona XtraBackup now supports partial download of the cloud backup.
  • Options: --lock-wait-query-type, --lock-wait-threshold and --lock-wait-timeout have been renamed to --ftwrl-wait-query-type, --ftwrl-wait-threshold and --ftwrl-wait-timeout respectively.

Bugs Fixed:

  • innobackupex didn’t work correctly when credentials were specified in .mylogin.cnf. Bug fixed #1388122.
  • Options --decrypt and --decompress didn’t work with xtrabackup binary. Bug fixed #1452307.
  • Percona XtraBackup now executes an extra FLUSH TABLES before executing FLUSH TABLES WITH READ LOCK to potentially lower the impact from FLUSH TABLES WITH READ LOCK. Bug fixed #1277403.
  • innobackupex didn’t read user,password options from ~/.my.cnf file. Bug fixed #1092235.
  • innobackupex was always reporting the original version of the innobackup script from InnoDB Hot Backup. Bug fixed #1092380.

Release notes with all the bugfixes for Percona XtraBackup 2.3.1-beta1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker. Percona XtraBackup is an open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

The post Percona XtraBackup 2.3.1-beta1 is now available appeared first on MySQL Performance Blog.

Like my MySQL QA series? Here’s how to record your own!

May 19, 2015 - 6:55am

Professional Screen Recording / Screencast Template 1280×720

Whilst not directly related to MySQL, it may be helpful to post (and open source at the same time) the HD screen recording/screencast template I’ve been using to create the MySQL QA series.

If you’ve never recorded a screencast before, or if you’d like to improve your screen recordings, this short blog post will get you started/help you along the way!

 

Here’s what you need;
1. Software Budget: $0 (Really! All required software is free, provided you have Microsoft Windows to use it on)
2. A reasonable quality headset. I can recommend (and use) a Microsoft LifeChat LX-3000 or similar (~$40)
3. The attached template (right-click the image above and select ‘Save Link/Image As…’ or similar)

Here’s how to get started;
1. Install Microsoft Visual C++ 2010 Redistributable Package (x86) from here.
2. Install CamStudio 2.7 or higher. CamStudio can be downloaded from http://camstudio.org/
3. Install CamStudioCodec 1.5 or higher (a good quality codec, available from the same website as in step 2).
4. Use a 1600×900 or higher screen resolution to match the template size (or you can also edit the template).
5. Set the Professional Screen Recording / Screencast Template (right click & save) as your background.
6. Make the settings/follow the instructions as per the template (tip: read everything first before you start).
7. You can scale your to-be-recorded window to match the black frame (and one of the settings as shown in the template makes CamStudio match the recording size exactly to that frame). The black frame ensures that any slight pixel-mismatch still shows nicely in the resulting video.
8. You may like to install a screen marking tool like DemoHelper or ScreenMarker.

If you followed the steps above you will have already covered the blue (top right) “Installation/use” steps nr’s 1-5, and you can now continue with the red steps 1-8 (and read the additional blue tips 1-4).

There’s quite a bit to it, but if you get it setup right the quality will be great!

I hope to see more technical how-to video’s on this and other blogs in the future!

Enjoy!

The post Like my MySQL QA series? Here’s how to record your own! appeared first on MySQL Performance Blog.

Percona security update: oCERT and SSL improvements

May 18, 2015 - 10:56am

We have recently become a member of oCERT to aid in allowing responsible disclosure for Percona products and services as can be seen on their members page.

We are presently working on the verbiage for the responsible disclosure program, and we are also investigating establishing a bug bounty program. In the mean time you can refer to our security contact page which will be updated as more information becomes available.

Secondly as you have quiet possibly noticed www.percona.com now enforces SSL and requests are redirected to https://www.percona.com should a http request be made.

This is but one small part of the continuing security initiative here at Percona and one I am happy to finally announce completion of as it had been on the “list” for some time.

The current SSL configuration follows best practices such as those laid out in the Mozilla Security Server Side TLS wiki entry, and as such gains an A+ rating from Qualys’ SSLLabs.com

There are of course still improvements to be made, and we are incrementally deploying those as they are completed and pass QA which sometimes leads to unavoidable delays. I would like to thank isvsecwatch for their report (which came in near the end of the overhaul process) and their patience in the extended time it took to get it into production.

The post Percona security update: oCERT and SSL improvements appeared first on MySQL Performance Blog.

MySQL QA Episode 2: Build a MySQL server – Git, Bazaar, Compiling & Build tools

May 15, 2015 - 12:00am

Welcome to MySQL QA Episode 2: Build a MySQL Server – Git, Bazaar (bzr), Compiling, and Build Tools

In this episode you’ll learn how to build Percona Server and/or MySQL Server for QA purposes & more in this short 25 minute tutorial.

In HD quality (set your player to 720p!)

To watch the other episodes in this series, see the MySQL QA & Bash Linux Training Series post. If you missed MySQL QA Episode 1, it was titled “Bash/GNU Tools & Linux Upskill & Scripting Fun.” You are watch it here.

If you have any questions or comments, please leave them below.

The post MySQL QA Episode 2: Build a MySQL server – Git, Bazaar, Compiling & Build tools appeared first on MySQL Performance Blog.

How Percona Support handles bugs

May 11, 2015 - 3:00am

One of the great values of a Percona Support contract is that we provide bug fixes for covered software, and not just support in terms of advice on how to use it. This is the skill which is most likely missing from in-house for most customers, as it requires a team with code knowledge to build and test infrastructure – something only a few companies can afford to invest in.

There is a lot of misunderstanding about bugs. What is a bug? What is a feature? What is a repeatable bug? How will Percona troubleshoot the bug? In this post I will answer some of the questions about this.

Bugs vs. Features ⎼ One thing a lot of people have a hard time understanding is the difference between a bug and a feature, or when software was designed to work a certain way which might be unwelcome. There is a gray line here, but you need to expect that some of the things you consider to be bugs will be seen as behavior-change features and will be considered as such.

Unfixable Bugs ⎼ There are some behaviors that any sane person would call a bug, but which arise from design limitations or oversight that are impossible to fix in the current GA version without introducing changes that would destabilize it. Such bugs will need to be fixed in the next major GA release or sometimes even further in the future. Some bugs are not bugs at all but rather design tradeoffs made. These can’t be “fixed” unless different design tradeoffs are chosen.

Workaround ⎼ There are going to be unexpected behaviors, unfixable bugs and bugs that take awhile to fix, so your first practical response to running into the bug is often finding a workaround which does not expose it. The Percona Support team will help find a workaround that causes minimal impact to your business, but be prepared: changes to the application, deployed version, schema or configuration will often be required.

Emergencies ⎼ When you have an emergency, our focus is to restore the system to working order. In a complex system a bug fix can often not be delivered in a short period of time, which typically means finding a workaround.

Bug Turnaround It is not possible to guarantee the turnaround on a bug fix, as all bugs are different. Some bugs are rather trivial and we might be able to provide a hotfix 24 hours after we have a repeatable test case. In other cases the bug might be complicated and take weeks of engineering to fix or even might be impossible to fix in the current GA version.

Verified Bug Fixes When you submit the bug we have to verify if it is actually being a bug. In many cases it might be intended behavior; in others, a user mistake. It is also possible that the behavior has happened once and can’t be repeated. Having a repeatable test case that reveals the bug is the best way to have a bug fixed quickly. You might be able to create a repeatable test case, or our support team might be able to help you create the test case.

Sporadic Bugs These are very hard bug types that happen sporadically over a period of time. For example, you might have a system crash once every 3 months with no way to repeat it. The cause of such bugs can be very complicated; for example, a buffer overrun in one piece of code can cause corruption and crash in another place hours later. There are a number of diagnostic tools that exist for such bugs, but generally they take quite awhile to resolve. In addition, without a repeatable test case, it is often impossible to verify that the proposed fix actually resolves the bug.

Environmental Bugs ⎼ Some bugs are caused by what can be called your environment. It could be some hardware bugs or incompatibilities, a build not quite compatible with your version of  operating system, operating system bugs, etc. In some cases we can very clearly point to the environment problems. In others we can suspect the environment is an issue and we may ask you to see if the bug also happens in another environment, such as different hardware or OS installation.

Hot Fixes As our default policy we fix bugs in the next release of our software so it can go through the full QA cycle, be properly documented, etc. If you have implemented a workaround and you can wait until the next release, this is the best choice. If not, with the Percona Platinum Support contract, we can provide you with a hotfix that is a special build containing the version of the software you’re running, and with only the bug fix of interest applied. Hotfixes are especially helpful if you’re not looking to do a full software upgrade – requiring several revisions – but want to validate the fix with the minimum possible changes. Hotfixes might also be different from the final bug fix that goes into the GA release. With hotfixes, our goal is to provide a working solution for you faster. Afterward we may optimize or re-architect the code, come up with better option names, etc.

Bug Diagnostics Depending on the nature of the bug there are multiple tools that our support team will use for diagnostics – finding a way to fix the bug. To set expectations correctly, it can be a very involved process, where you might need to provide a lot of information or try things on your system, such as:

  • Test case. If you have a test case that can be repeated by the Percona team to trigger the bug, the diagnostic problem is solved from the customer side. Internal debugging starts at this point. It might not be easy to get to that.
  • If we have a crash that we can’t repeat on our system we often will ask you to enable “core” file, or run the program under a debugger so we can get more information when the crash happens.
  • If the problem is related to performance, you should be ready to gather both MySQL information such as EXPLAIN, status counters, information from performance schema, etc., along with system level information such as pt-pmp output,  pt-stalk,  oprofile, perf, etc.
  • If the problem is a “deadlock,” we often need information from gdb about the full state of the system. Information from processlist, performance_schema, SHOW ENGINE INNODB STATUS can also be helpful.
  • It can be very helpful when you have a test system on which you can repeat the problem in your environment and where you can experiment without impacting production. It is not possible in all cases, but is very helpful.
  • Sometimes, for hard-to-repeat bugs, we will need to run a special diagnostics build that provides us with additional debug information. In others, we might need to run a debug build or do a run under valgrind or other software designed to catch bugs. It often has a large performance impact, so it is good to see how your workload can be scaled down in order for this to be feasible.
  • Depending on your environment we might need to login to troubleshoot your bug or might request that you upload the data needed to repeat the bug in our lab (assuming it is not too sensitive). In cases where direct login is not possible, we can help you to come to a repeatable test case via phone, chat, or email. Using screen sharing can also be very helpful.

Bugs and Non-Percona Software ⎼ Percona Support covers some software not produced by Percona. For open source software, if it is not exempt from bug fix support, we will provide the custom build with a bug fix as well as provide the suggested fix to the software maintainer for its possible inclusion in the next release. For example, if we find a bug in the MySQL Community Edition, we will pass our suggested fix to the MySQL Engineering team at Oracle. For other software that is not open source, such as Amazon RDS, we can help to facilitate creation and submission of a repeatable test case and workaround, but we can’t provide a fix as we do not have access to the source code.

In Conclusion When I think about software bugs, I find some good parallels with human “bugs” (diseases).  Some issues are trivial to diagnose and the fix is obvious. Others might be very hard to diagnose I guess many of us have been in a situation where you visit doctor after doctor and tell them your symptoms, and they run tests but still can’t figure out what’s wrong. Once a diagnosis is done, though, it is not always given the “fix” available or feasible, and while a complete solution is preferred, sometimes we have to settle for “managing” the disease, which is our parallel to implementing changes and settling for a workaround. So in the same way as human doctors, we can’t guarantee we will get to the root of every problem, or if we do, that we will be able to fix every one of them. However, as with having good doctors – having us on your team will maximize the chance of successful resolution.

The post How Percona Support handles bugs appeared first on MySQL Performance Blog.

Percona Server 5.6.24-72.2 is now available

May 8, 2015 - 1:55pm

Percona is glad to announce the release of Percona Server 5.6.24-72.2 on May 8, 2015. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.24, including all the bug fixes in it, Percona Server 5.6.24-72.2 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – and this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.24-72.2 milestone on Launchpad.

New Features:

  • TokuDB storage engine package has been updated to version 7.5.7.

Bugs Fixed:

  • A server binary as distributed in binary tarballs could fail to load on different systems due to an unsatisfied libssl.so.6 dynamic library dependency. This was fixed by replacing the single binary tarball with multiple tarballs depending on the OpenSSL library available in the distribution: 1) ssl100 – for all Debian/Ubuntu versions except Squeeze/Lucid (libssl.so.1.0.0 => /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f2e389a5000)); 2) ssl098 – only for Debian Squeeze and Ubuntu Lucid (libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f9b30db6000)); 3) ssl101 – for CentOS 6 and CentOS 7 (libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007facbe8c4000)); 4) ssl098e – to be used only for CentOS 5 (libssl.so.6 => /lib64/libssl.so.6 (0x00002aed5b64d000)). Bug fixed #1172916.
  • Executing a stored procedure containing a subquery would leak memory. Bug fixed #1380985 (upstream #76349).
  • A slave server restart could cause a 1755 slave SQL thread error if multi-threaded slave was enabled. This is a regression introduced by fix for bug #1331586 in 5.6.21-70.0. Bug fixed #1380985.
  • A string literal containing an invalid UTF-8 sequence could be treated as falsely equal to a UTF-8 column value with no invalid sequences. This could cause invalid query results. Bug fixed #1247218 by a fix ported from MariaDB (MDEV-7649).
  • Percona Server .deb binaries were built without fast mutexes. Bug fixed #1433980.
  • Installing or uninstalling the Audit Log Plugin would crash the server if the audit_log_file variable was pointing to an inaccessible path. Bug fixed #1435606.
  • The audit_log_file would point to random memory area if the Audit Log Plugin was not loaded into server, and then installed with INSTALL PLUGIN, and my.cnf contained audit_log_file setting. Bug fixed #1437505.
  • A specific trigger execution on the master server could cause a slave assertion error under row-based replication. The trigger would satisfy the following conditions: 1) it sets a savepoint; 2) it declares a condition handler which releases this savepoint; 3) the trigger execution passes through the condition handler. Bug fixed #1438990 (upstream #76727).
  • Percona Server client packages were built with with EditLine instead of Readline. This was causing history file produced by the client no longer easy to read. Further, a client built with EditLine could display incorrectly on PuTTY SSH client after its window resize. Bugs fixed #1266386, #1296192 and #1332822 (upstream #63130, upstream #72108 and #69991).
  • Unlocking a table while holding the backup binlog lock would cause an implicit erroneous backup lock release, and a subsequent server crash or hang at the later explicit backup lock release request. Bug fixed #1371827.
  • Initializing slave threads or executing CHANGE MASTER TO statement would crash a debug build if autocommit was disabled and at least one of slave info tables were configured as tables. Bug fixed #1393682.

Other bugs fixed: #1372263 (upstream #72080), #1436138 (upstream #76505), #1182949 (upstream #69453), #1111203 (upstream #68291), and #1384566 (upstream #74615).

Release notes for Percona Server 5.6.24-72.2 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

The post Percona Server 5.6.24-72.2 is now available appeared first on MySQL Performance Blog.

Percona Server 5.5.43-37.2 is now available

May 8, 2015 - 1:55pm


Percona is glad to announce the release of Percona Server 5.5.43-37.2 on May 8, 2015. Based on MySQL 5.5.43, including all the bug fixes in it, Percona Server 5.5.43-37.2 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.43-37.2 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:

  • A server binary as distributed in binary tarballs could fail to load on different systems due to an unsatisfied libssl.so.6 dynamic library dependency. This was fixed by replacing the single binary tarball with multiple tarballs depending on the OpenSSL library available in the distribution: 1) ssl100 – for all Debian/Ubuntu versions except Squeeze/Lucid (libssl.so.1.0.0 => /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f2e389a5000)); 2) ssl098 – only for Debian Squeeze and Ubuntu Lucid (libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f9b30db6000)); 3) ssl101 – for CentOS 6 and CentOS 7 (libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007facbe8c4000)); 4) ssl098e – to be used only for CentOS 5 (libssl.so.6 => /lib64/libssl.so.6 (0x00002aed5b64d000)). Bug fixed #1172916.
  • mysql_install_db would make the server produce an “Error in my_thread_global_end(): 1 threads didn't exit” error message. While this error does not prevent mysql_install_db from completing successfully, its presence might cause any mysql_install_db-calling script to return an error as well. This is a regression introduced by backporting fix for bug #1319904. Bug fixed #1402074.
  • A string literal containing an invalid UTF-8 sequence could be treated as falsely equal to a UTF-8 column value with no invalid sequences. This could cause invalid query results. Bug fixed #1247218 by a fix ported from MariaDB (MDEV-7649).
  • Percona Server .deb binaries were built without fast mutexes. Bug fixed #1433980.
  • Installing or uninstalling the Audit Log Plugin would crash the server if the audit_log_file variable was pointing to an inaccessible path. Bug fixed #1435606.
  • The audit_log_file variable would point to random memory area if the Audit Log Plugin was not loaded into server, and then installed with INSTALL PLUGIN, and my.cnf contained audit_log_file setting. Bug fixed #1437505.
  • Percona Server client .deb packages were built with with EditLine instead of Readline. Further, a client built with EditLine could display incorrectly on PuTTY SSH client after its window resize. Bugs fixed #1266386 and #1332822 (upstream #63130 and #69991).

Other bugs fixed: #1436138 (upstream #76505).

(Please also note that Percona Server 5.6 series is the latest General Availability series and current GA release is 5.6.24-72.2.)

Release notes for Percona Server 5.5.43-37.2 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.43-37.2 is now available appeared first on MySQL Performance Blog.

MongoDB with Percona TokuMXse – experimental build RC5 is available!

May 8, 2015 - 9:43am

While our engineering team is working on finalizing the TokuMXse storage engine, I want to provide an experimental build that you can try and test MongoDB 3.0 with our storage engine.

It is available here
percona.com/downloads/TESTING/Percona-TokuMXse-rc5/percona-tokumxse-3.0.3pre-rc5.tar.gz

To start MongoDB with TokuMXse storage engine use:

mongod --storageEngine=tokuft

I am looking for your feedback!

The post MongoDB with Percona TokuMXse – experimental build RC5 is available! appeared first on MySQL Performance Blog.

MySQL indexing 101: a challenging single-table query

May 8, 2015 - 12:00am

We discussed in an earlier post how to design indexes for many types of queries using a single table. Here is a real-world example of the challenges you will face when trying to optimize queries: two similar queries, but one is performing a full table scan while the other one is using the index we specially created for these queries. Bug or expected behavior? Read on!

Our two similar queries

# Q1 mysql> explain select col1, col2 from t where ts >= '2015-04-30 00:00:00'; +----+-------------+---------------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | t | ALL | ts | NULL | NULL | NULL | 4111896 | Using where | +----+-------------+---------------+------+---------------+------+---------+------+---------+-------------+ # Q2 mysql> explain select count(*) from t where ts >='2015-04-30 00:00:00'; +----+-------------+---------------+-------+---------------+--------------+---------+------+---------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------+---------------+--------------+---------+------+---------+--------------------------+ | 1 | SIMPLE | t | range | ts | ts | 5 | NULL | 1809458 | Using where; Using index | +----+-------------+---------------+-------+---------------+--------------+---------+------+---------+--------------------------+

Q1 runs a full-table scan while Q2 is using the index on ts, which by the way is covering – See Using index in the Extra field. Why such different execution plans?

Let’s try to understand what happens with Q1.

This is a query with a single inequality on the ts field and we have an index on ts. The optimizer tries to see if this index is usable (possible_keys field), this is all very logical. Now if we look at the rows field for Q1 and Q2, we can see that the index would allow us to only read 45% of the records (1.8M out of 4.1M). Granted, this is not excellent but this should be much better than a full table scan anyway, right?

If you think so, read carefully what’s next. Because this assumption is simply not correct!

Estimating the cost of an execution plan (simplified)

First of all, the optimizer does not know if data or indexes are in memory or need to be read from disk, it will simply assume everything is on disk. What it does know however is that sequential reads are much faster than random reads.

So let’s execute Q1 with the index on ts. Step 1 is to perform a range scan on this index to identify the 1.8M records that match the condition: this is a sequential read, so this is quite fast. However now step 2 is to get the col1 and col2 fields for each record that match the condition. The index provides the primary key value for each matching record so we will have to run a primary key lookup for each matching record.

Here is the issue: 1.8M primary key lookups is equivalent to 1.8M random reads, therefore this will take a lot of time. Much more time than sequentially reading the full table (which means doing a full scan of the primary key because we are using InnoDB here).

Contrast that with how Q2 can be executed with the index on ts. Step 1 is the same: identify the 1.8M matching records. But the difference is: there’s no step 2! That’s why we call this index a ‘covering index': we don’t need to run point queries on the primary key to get extra fields. So this time, using the index on ts is much more efficient than reading the full table (which again would mean that we would do a full-table scan of the primary key).

Now there is one more thing to understand: a full-table scan is a sequential operation when you think about it from a logical point of view, however the InnoDB pages are certainly not stored sequentially on disk. So at the disk level, a full table is more like multiple random reads than a single large sequential read.

However it is still much faster than a very large number or point query and it’s easy to understand why: when you read a 16KB page for a full table scan, all records will be used. While when you read a 16KB page for a random read, you might only use a single record. So in the worst case, reading 1.8M records will require 1.8M random reads while reading the full table with 4M records will only require 100K random reads – the full table scan is still an order of magnitude faster.

Optimizing our query

Now that we’ve understood why the optimizer chose a full table scan for Q1, is there a way to make it run faster by using an index? If we can create a covering index, we will no longer need the expensive primary key lookups. Then the optimizer is very likely to choose this index over a full table scan. Creating such a covering index is easy:

ALTER TABLE t ADD INDEX idx_ts_col1_col2 (ts, col1, col2);

Some of you may object that because we have an inequality on ts, the other columns cannot be used. This would be true if we had conditions on col1 or col2 in the WHERE clause, but that does not apply here since we’re only adding these extra columns to get a covering index.

Conclusion

Understanding how indexes can be used to filter, sort or cover is paramount to be able to optimize queries, even simple ones. Understanding (even approximately) how a query is run according to a given execution plan is also very useful. Otherwise you will sometimes be puzzled by the decisions made by the optimizer.

Also note that beginning in MySQL 5.7, the cost model can be tuned. This can help the optimizer make better decisions: for instance random reads are far cheaper on fast storage than on regular disks.

The post MySQL indexing 101: a challenging single-table query appeared first on MySQL Performance Blog.