]]>
]]>

You are here

Latest MySQL Performance Blog posts

Subscribe to Latest MySQL Performance Blog posts feed
Percona's MySQL & InnoDB performance and scalability blog
Updated: 36 min 46 sec ago

Percona Toolkit for MySQL with MySQL-SSL Connections

October 16, 2014 - 6:06am

I recently had a client ask me how to use Percona Toolkit tools with an SSL connection to MySQL (MySQL-SSL). SSL connections aren’t widely used in MySQL due to most installations being within an internal network. Still, there are cases where you could be accessing MySQL over public internet or even over a public “private” network (ex: WAN between two colo datacenters). In order to keep packet sniffers at bay, the connection to MySQL should be encrypted.

If you are connecting to Amazon RDS from home or office (ie: not within the AWS network) you better be encrypted!

As there is already a MySQL Performance Blog post on how to setup MySQL SSL connections, we can skip that and dive right in.

As you probably know, the mysql client can read multiple configuration files; the primary one being /etc/my.cnf  You probably also know that the client reads a config file in your $HOME directory: .my.cnf (that’s dot-my-dot-cnf).  It is inside this file that we can set parameters for our shell-user account when connecting to MySQL hosts.

Percona Toolkit uses Perl’s DBI:mysql to make connections to MySQL hosts. This library is linked to the libmysqlclient C library which is responsible for reading and parsing the global config file as well as your $HOME config file. Let’s set some options here that are not directly available in the toolkit scripts. Using $MY_FAVORITE_EDITOR, edit your $HOME/.my.cnf as such:

[client] user = myuser password = foobar ssl-ca = /Users/drmac/ca-cert.pem

You must use the absolute path to the CA file. Relative paths won’t cut it:

ERROR 2026 (HY000): SSL connection error: SSL_CTX_set_default_verify_paths failed

Test your connection first using the mysql client:

asura:~ drmac$ mysql -h 74.13.19.17 -e "SHOW STATUS LIKE 'Ssl_cipher'" +---------------+--------------------+ | Variable_name | Value | +---------------+--------------------+ | Ssl_cipher | DHE-RSA-AES256-SHA | +---------------+--------------------+

Excellent! Now we can use any Percona Toolkit script and connect via SSL:

asura:~ drmac$ pt-table-checksum -h 74.13.19.17 -d foo -t zipcodes TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 10-13T14:10:02 0 0 45358 7 0 5.959 foo.myzipcodes

Sweet!

Unfortunately, Percona Toolkit scripts are hard-coded to read the [client] section of your .my.cnf. If you don’t want to overwrite any existing configuration that may be present, you can make a new configuration and specify that file to any toolkit script using -F. Again, relative paths won’t work here. Use the absolute path; even if you are in the same directory.

asura:~ drmac$ cp .my.cnf mytestconfig.cnf asura:~ drmac$ rm .my.cnf asura:~ drmac$ pt-table-checksum -h 74.13.19.17 -d foo -t zipcodes -F /Users/drmac/mytestconfig.cnf

Now you can continue using our awesome tools in a secure manner.

Cheers!
-Matthew

The post Percona Toolkit for MySQL with MySQL-SSL Connections appeared first on MySQL Performance Blog.

How to close POODLE SSLv3 security flaw (CVE-2014-3566)

October 15, 2014 - 10:55am
Padding Oracle On Downgraded Legacy Encryption

First off, the naming “convention” as of late for security issues has been terrible. The newest vulnerability (CVE­-2014-3566) is nicknamed POODLE, which at least is an acronym and as per the header above has some meaning.

The summary of this issue is that it is much the same as the earlier B.E.A.S.T (Browser Exploit Against SSL TLS), however there’s no known mitigation method in this case – other than entirely disabling SSLv3 support, in short, an attacker has a vector by which they can retrieve the plaintext form your encrypted streams.

So let’s talk mitigation, the Mozilla Security Wiki Serverside TLS has for some time made strict recommendations of ciphers and protocols; and is certainly worth your attention.

Apache

Disable SSLv3 and SSLv3 in your ssh apache configuration by setting:
SSLProtocol all -SSLv2 -SSLv3

Nginx

Allow support only for TLS in Nginx with the following:
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

MySQL

It is worth noting that unless you deploy sha256_password plugin for MySQL 5.6; then the authentication handshake must be completed BEFORE SSL / TLS connection is negotiated; therefor this attack vector only becomes an issue for valid logins which have access to the data in the stream anyway. (where sha256_password is deployed there is the option to have authentication take place over SSL / TLS)

This is where things get far more interesting; unlike Apache and Nginx there’s no way to allow / disallow entire protocols of the SSL / TLS spec within mysql; there is however the ability to specify the cipher spec to be used in SSL communication.

As such to remove SSLv3 support from MySQL you need only ensure that none of the SSLv3 ciphers are in use wihtin your configuration.

As per information in this bug you can find a list of SSLv3 ciphers by simply
openssl ciphers -v 'DEFAULT' | awk '/SSLv3 Kx=(RSA|DH|DH(512))/ { print $1 }'
DHE-RSA-AES256-SHA
DHE-DSS-AES256-SHA
DHE-RSA-CAMELLIA256-SHA
DHE-DSS-CAMELLIA256-SHA
AES256-SHA
CAMELLIA256-SHA
EDH-RSA-DES-CBC3-SHA
EDH-DSS-DES-CBC3-SHA
DES-CBC3-SHA
DHE-RSA-AES128-SHA
DHE-DSS-AES128-SHA
DHE-RSA-SEED-SHA
DHE-DSS-SEED-SHA
DHE-RSA-CAMELLIA128-SHA
DHE-DSS-CAMELLIA128-SHA
AES128-SHA
SEED-SHA
CAMELLIA128-SHA
RC4-SHA
RC4-MD5
EDH-RSA-DES-CBC-SHA
EDH-DSS-DES-CBC-SHA
DES-CBC-SHA
EXP-EDH-RSA-DES-CBC-SHA
EXP-EDH-DSS-DES-CBC-SHA
EXP-DES-CBC-SHA
EXP-RC2-CBC-MD5
EXP-RC4-MD5

Removing the above form your ssl-cipher configuration should disable SSLv3 support; of course ensuring your MySQL service is NOT generally accessible is by far one of the most important steps you can take in securing your MySQL deployment against CVE-2014-3566.

You can read more about POODLE here.

Te following script will help to identify support for any none SSLv3 ciphers; unfortunately in my limited testing I have yet to have found a supported none SSLv3 cipher.

mysql -se “SHOW STATUS LIKE ‘Ssl_cipher_list'” | sed ‘s/:/n/g’ | sed ‘s/Ssl_cipher_listss//g’ |
while read sspec;
do SPEC=openssl ciphers -v “$sspec” 2>/dev/null | grep -v SSLv3 | awk ‘{print $1}';
[[ "$sspec" == "$SPEC" ]] && mysql –ssl-cipher=$sspec -e QUIT 2>/dev/null && echo “$sspec OK”;
done

The post How to close POODLE SSLv3 security flaw (CVE-2014-3566) appeared first on MySQL Performance Blog.

Rackspace doubling-down on OpenStack Trove and Percona Server

October 15, 2014 - 12:00am

Founded in 1998, Rackspace has evolved over the years to address the way customers are using data – and more specifically, databases. The San Antonio-based company is fueling the adoption of cloud computing among organizations large and small.

Today Rackspace is doubling down on open source database technologies. Why? Because that’s where the industry is heading, according to Sean Anderson, Manager of Data Services at Rackspace. The company, he said, created a separate business unit of 100+ employees focused solely on database workloads.

The key technologies under the hood include both relational databases (e.g., MySQL, Percona Server, and MariaDB) and NoSQL databases (e.g., MongoDB, Redis, and Apache Hadoop).

Last July Rackspace added support for Percona Server and MariaDB to their Cloud Databases DBaaS (Database-as-a-Service) product, primarily at the request of application developers who had been requesting more open source database support options.

Matt Griffin, Percona director of product management, and I recently sat down with Sean and his colleague Neha Verma, product manager of Cloud Databases. Our discussion focused on the shift to DBaaS as well as what to expect in the future from Rackspace in terms of Cloud Databases, OpenStack Trove and more.

* * *

Matt: Why did you expand the Cloud Databases product this past summer?
Sean:  We launched cloud databases about a year and a half ago. Since then we’ve rolled feature after feature (backups, monitoring, configuration management, etc…) focused on simplifying our customers life, this backed by Fanatical support has made the product easier to use and more production ready than ever. We understand that features aren’t enough so in addition to all the features we have also made significant improvements to the hardware and network infrastructure. All this means that we’ve been very busy not just expanding the offering but also making the offering simpler to use, more complete and more scalable.

Our vision is to offer a robust platform that with the most popular Big Data, SQL, and NoSQL databases on dedicated, bare metal, and public cloud infrastructure.

Matt: What type of customer is your Cloud Databases offering aimed at?
Sean: Currently we have a variety of customers running multiple Cloud Database instances ranging from customers running a two-month marketing campaign to customers running web applications, ecommerce applications with highly transactional database workloads. Our customers prefer the simplicity and reliability of the service which allows them to focus on their business and not worry about the heavy lifting associated with scaling and managing databases.

Matt: How is your Cloud Databases offering backed-up?
Neha: We use Percona XtraBackup  to perform a hot copy of all databases on a instance and then stream the backups to Cloud Files for storage. A customer can anytime restore the backup to a new instance. Percona XtraBackup is the only option we offer customers right now.

Tom: In terms of security, how do you address customer concerns? Are cloud-based open source databases more secure?
Sean: Data security concerns are at an all-time high and we have a number of up and coming features that continue to address those concerns.   Today we offer a number of unique features specifically Cloud Databases can only be accessed on the private network so the database can only be accessed by systems on your private network. Additionally, we support SSL for communication between user application and database instance so that any data transfer is encrypted in transit.  These features along with the built in user controls and authentication mechanisms help significantly address customers security concerns.  Ultimately Cloud-based open source databases or no more or less secure than any other database, security is about more than features it is about the process and people that build and manage your database and we have those more than covered.

Matt: Is this for production applications or pre-production?
Sean: It’s very much production capable. While there’s a perception that this type of offering would only fit for use cases around test or dev, the truth is we are running hundreds of very large, fully managed instances of MySQL on the cloud. We don’t make any delineation between production or pre-production. However, we’re definitely seeing more and more production workloads come onto the service as people are getting educated on the development work that we’ve done around adding these new features. Replication and monitoring are the two most popular right now.

Matt: How are people accessing and using it?
Sean: A majority of our users either access the database via the Control Panel, API or a command-line utility.

Matt: Since the launch, how has the reaction been?
Sean: The reaction from the press standpoint has been very positive. When we talk with industry analysts they see our commitment to open source and where we are going with this.

Tom: How committed is Rackspace to OpenStack?
Sean: We all live in OpenStack. We have tons of Rackers heading to the upcoming OpenStack Paris Summit in November. We’re looking forward to many years of contributing to the OpenStack community.

Tom: Last April, Rackspace hosted several sessions on OpenStack and Trove at the Percona Live MySQL Conference and Expo 2014 in Santa Clara, Calif. What are you looking forward to most at Percona Live 2015?
Sean: For us, Percona Live is about listening to the MySQL community. It’s our best opportunity each year to actually setup shop and get to learn what’s top of mind for them. We then can take that information and develop more towards that direction.

Tom: And as you know we’re also launching “OpenStack Live” to run parallel to the Percona Live MySQL conference. OpenStack Live 2015 runs April 13-14 and will emphasize the essential elements of making OpenStack work better with emphasis on the critical role of MySQL and the value of Trove. I look forward to hearing the latest news from Rackspace at both events.

Thanks Sean and Neha for speaking with us and I look forward to seeing you this coming April in Santa Clara at Percona Live and OpenStack Live!

On a related note, I’ll also be attending Percona Live London (Nov. 3-4) where we’ll have sessions on OpenStack Trove and everything MySQL. If you plan on attending, please join me at the 2014 MySQL Community Dinner (pay-your-own-way) on Nov. 3. (Register here to reserve your spot at the Community Dinner because space will be limited. You do not need to attend Percona Live London to join the dinner).

The post Rackspace doubling-down on OpenStack Trove and Percona Server appeared first on MySQL Performance Blog.

Recover orphaned InnoDB partition tablespaces in MySQL

October 14, 2014 - 8:25am

A few months back, Michael wrote about reconnecting orphaned *.ibd files using MySQL 5.6. I will show you the same procedure, this time for partitioned tables. An InnoDB partition is also a self-contained tablespace in itself so you can use the same method described in the previous post.

To begin with, I have an example table with a few orphaned partitions and we will reconnect each partition one by one to the original table.

mysql [localhost] {msandbox} (recovery) > SHOW CREATE TABLE t1 G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( [...] KEY `h_date` (`h_date`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (year(h_date)) (PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (2010) ENGINE = InnoDB, PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (recovery) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) -rw-rw----. 1 revin revin 176M Oct 8 08:41 t1#P#p0.ibd -rw-rw----. 1 revin revin 612M Oct 8 08:41 t1#P#p1.ibd -rw-rw----. 1 revin revin 932M Oct 8 08:42 t1#P#px.ibd

The first step is to create a dummy table and remove partitioning so that we can reattach individual partitions to this table.

mysql [localhost] {msandbox} (recovery) > CREATE TABLE t1_t LIKE t1; Query OK, 0 rows affected (0.02 sec) mysql [localhost] {msandbox} (recovery) > ALTER TABLE t1_t REMOVE PARTITIONING; Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql [localhost] {msandbox} (recovery) > ALTER TABLE t1_t DISCARD TABLESPACE; Query OK, 0 rows affected (0.05 sec)
Once the tablespace for our dummy table has been discarded, we copy  one of the partitions to take the place of the dummy table’s tablespace. For example, we copy t1#P#p0.ibd  as t1_t.ibd  into the MySQL data directory, of course taking into account the permissions afterward. The next step is to import the tablespace to the dummy table.

mysql [localhost] {msandbox} (recovery) > ALTER TABLE t1_t IMPORT TABLESPACE; Query OK, 0 rows affected, 1 warning (7.34 sec)
And for the secret sauce, we will exchange our dummy table recently imported tablespace to replace the target partition in our original table.

mysql [localhost] {msandbox} (recovery) > ALTER TABLE t1 EXCHANGE PARTITION px WITH TABLE t1_t; Query OK, 0 rows affected (6.42 sec) mysql [localhost] {msandbox} (recovery) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 8523686 | +----------+ 1 row in set (2.50 sec)
You can do the same with subpartitions, too! Here’s my slightly different table with subpartitions where I reconnect one of the orphaned tablespacest2#P#px#SP#pxsp1.ibd .

mysql [localhost] {msandbox} (recovery) > SHOW CREATE TABLE t2 G *************************** 1. row *************************** Table: t2 Create Table: CREATE TABLE `t2` ( [...] KEY `h_date` (`h_date`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (year(h_date)) SUBPARTITION BY HASH (u_id) SUBPARTITIONS 2 (PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (2010) ENGINE = InnoDB, PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (recovery) > SELECT COUNT(*) FROM t2; +----------+ | COUNT(*) | +----------+ | 0 | +----------+ 1 row in set (0.94 sec) -rw-rw----. 1 revin revin 92M Oct 8 08:44 t2#P#p0#SP#p0sp0.ibd -rw-rw----. 1 revin revin 92M Oct 8 08:44 t2#P#p0#SP#p0sp1.ibd -rw-rw----. 1 revin revin 304M Oct 8 08:44 t2#P#p1#SP#p1sp0.ibd -rw-rw----. 1 revin revin 316M Oct 8 08:44 t2#P#p1#SP#p1sp1.ibd -rw-rw----. 1 revin revin 480M Oct 8 08:45 t2#P#px#SP#pxsp0.ibd -rw-rw----. 1 revin revin 460M Oct 8 08:45 t2#P#px#SP#pxsp1.ibd mysql [localhost] {msandbox} (recovery) > CREATE TABLE t2_t LIKE t2; Query OK, 0 rows affected (0.02 sec) mysql [localhost] {msandbox} (recovery) > ALTER TABLE t2_t REMOVE PARTITIONING; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql [localhost] {msandbox} (recovery) > ALTER TABLE t2_t DISCARD TABLESPACE; Query OK, 0 rows affected (0.04 sec)
Once again, after copying t2#P#px#SP#pxsp1.ibd  to replace t2_t.ibd  in the MySQL data directory and setting the correct permissions, we can import it into our dummy table and exchange the same to our original table. In this case, on a table with subpartitions, a tablespace is a combined unit of partition and subpartition, hence in our partition name above P#px is our main partition and SP#pxsp1 the subpartition name. For our exchange command below, we will only need the later portion.

mysql [localhost] {msandbox} (recovery) > ALTER TABLE t2_t IMPORT TABLESPACE; Query OK, 0 rows affected, 1 warning (2.49 sec) mysql [localhost] {msandbox} (recovery) > ALTER TABLE t2 EXCHANGE PARTITION pxsp1 WITH TABLE t2_t; Query OK, 0 rows affected (3.11 sec) mysql [localhost] {msandbox} (recovery) > SELECT COUNT(*) FROM t2; +----------+ | COUNT(*) | +----------+ | 4546036 | +----------+ 1 row in set (0.94 sec)
But wait there’s more: Do you know that in MySQL 5.7, you can take the full shortcut? Directly import all partitions back to the original table, sweet! A quick example below on MySQL 5.7.5, I created the same t2 table above,DISCARD TABLESPACE , copy the partition tablespaces from my test 5.6 instance andIMPORT TABLESPACE  And done!
mysql [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t2; +----------+ | COUNT(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > ALTER TABLE t2 DISCARD TABLESPACE; Query OK, 0 rows affected (0.02 sec) mysql [localhost] {msandbox} (test) > ! cp -v /sbx/msb/msb_5_6_210/data/test/t2#P#* /sbx/msb/msb_5_7_5/data/test/ `/sbx/msb/msb_5_6_210/data/test/t2#P#p0#SP#p0sp0.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#p0#SP#p0sp0.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#p0#SP#p0sp1.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#p0#SP#p0sp1.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#p1#SP#p1sp0.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#p1#SP#p1sp0.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#p1#SP#p1sp1.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#p1#SP#p1sp1.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#px#SP#pxsp0.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#px#SP#pxsp0.ibd' `/sbx/msb/msb_5_6_210/data/test/t2#P#px#SP#pxsp1.ibd' -> `/sbx/msb/msb_5_7_5/data/test/t2#P#px#SP#pxsp1.ibd' mysql [localhost] {msandbox} (test) > ALTER TABLE t2 IMPORT TABLESPACE; Query OK, 0 rows affected, 6 warnings (11.36 sec) mysql [localhost] {msandbox} (test) > SHOW WARNINGS G *************************** 1. row *************************** Level: Warning Code: 1810 Message: InnoDB: IO Read error: (2, No such file or directory) Error opening './test/t2#P#p0#SP#p0sp0.cfg', will attempt to import without schema verification [...]
 

The post Recover orphaned InnoDB partition tablespaces in MySQL appeared first on MySQL Performance Blog.

How to avoid hash collisions when using MySQL’s CRC32 function

October 13, 2014 - 7:43am

Percona Toolkit’s  pt-table-checksum performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master – and the tool pt-table-sync synchronizes data efficiently between MySQL tables.

The tools by default use the CRC32. Other good choices include MD5 and SHA1. If you have installed the FNV_64 user-defined function, pt-table-sync will detect it and prefer to use it, because it is much faster than the built-ins. You can also use MURMUR_HASH if you’ve installed that user-defined function. Both of these are distributed with Maatkit. For details please see the tool’s documentation.

Below are test cases similar to what you might have encountered. By using the table checksum we can confirm that the two tables are identical and useful to verify a slave server is in sync with its master. The following test cases with pt-table-checksum and pt-table-sync will help you use the tools more accurately.

For example, in a master-slave setup we have a table with a primary key on column “a” and a unique key on column “b”. Here the master and slave tables are not in sync and the tables are having two identical values and two distinct values. The pt-table-checksum tool should be able to identify the difference between master and slave and the pt-table-sync in this case should sync the tables with two REPLACE queries.

+-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 3 | 3 | | 3 | 4 | | 4 | 4 | +-----+-----+ +-----+-----+

Case 1:  Non-cryptographic Hash function (CRC32) and the Hash collision.

The tables in the source and target have two different columns and in general way of thinking the tools should identify the difference. But the below scenarios explain how the tools can be wrongly used and how to avoid them – and make things more consistent and reliable when using the tools in your production.

The tools by default use the CRC32 checksums and it is prone to hash collisions. In the below case the non-cryptographic function (CRC32) is not able to identify the two distinct values as the function generates the same value even we are having the distinct values in the tables.

CREATE TABLE `t1` ( `a` int(11) NOT NULL, `b` int(11) NOT NULL, PRIMARY KEY (`a`), UNIQUE KEY `b` (`b`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Master Slave +-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 3 | 3 | | 3 | 4 | | 4 | 4 | +-----+-----+ +-----+-----+

Master: [root@localhost mysql]# pt-table-checksum --replicate=percona.checksum --create-replicate-table --databases=db1 --tables=t1 localhost --user=root --password=*** --no-check-binlog-format TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-17T00:59:45 0 0 4 1 0 1.081 db1.t1 Slave: [root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=*** --verbose --sync-to-master 192.**.**.** # Syncing via replication h=192.**.**.**,p=...,u=root # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE

Narrowed down to BIT_XOR:

Master: mysql> SELECT BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) FROM `db1`.`t1`; +------------------------------------------------------------+ | BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) | +------------------------------------------------------------+ | 6581445 | +------------------------------------------------------------+ 1 row in set (0.00 sec) Slave: mysql> SELECT BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) FROM `db1`.`t1`; +------------------------------------------------------------+ | BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) | +------------------------------------------------------------+ | 6581445 | +------------------------------------------------------------+ 1 row in set (0.16 sec)

Case 2: As the tools are not able to identify the difference, let us add a new row to the slave and check if the tools are able to identify the distinct values. So I am adding a new row (5,5) to the slave.

mysql> insert into db1.t1 values(5,5); Query OK, 1 row affected (0.05 sec) Master Slave +-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 3 | 3 | | 3 | 4 | | 4 | 4 | +-----+-----+ | 5 | 5 | +-----+-----+

[root@localhost mysql]# pt-table-checksum --replicate=percona.checksum --create-replicate-table --databases=db1 --tables=t1 localhost --user=root --password=*** --no-check-binlog-format TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-17T01:01:13 0 1 4 1 0 1.054 db1.t1 [root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=*** --verbose --sync-to-master 192.**.**.** # Syncing via replication h=192.**.**.**,p=...,u=root # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE DELETE FROM `db1`.`t1` WHERE `a`='5' LIMIT 1 /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**. 10,p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/; REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('3', '4') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**, p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/; REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('4', '3') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**, p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/; # 1 2 0 0 Chunk 01:01:43 01:01:43 2 db1.t1

Well, apparently the tools are now able to identify the newly added row in the slave and the two other rows having the difference.

Case 3: Advantage of Cryptographic Hash functions (Ex: Secure MD5)

As such let us make the tables as in the case1 and ask the tools to use the cryptographic (secure MD5) hash functions instead the usual non-cryptographic function. The default CRC32 function provides no security due to their simple mathematical structure and too prone to hash collisions but the MD5 provides better level of integrity. So let us try with the –function=md5 and see the result.

Master Slave +-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 3 | 3 | | 3 | 4 | | 4 | 4 | +-----+-----+ +-----+-----+

Narrowed down to BIT_XOR:

Master: mysql> SELECT 'test', 't2', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING (@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := md5(CONCAT_WS('#', `a`, `b`)) , 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `db1`.`t1`; +------+----+---+------+------+------+-----+----------------------------------+ | test | t2 | 1 | NULL | NULL | NULL | cnt | crc | +------+----+---+------+------+------+-----+----------------------------------+ | test | t2 | 1 | NULL | NULL | NULL | 4 | 000000000000000063f65b71e539df48 | +------+----+---+------+------+------+-----+----------------------------------+ 1 row in set (0.00 sec) Slave: mysql> SELECT 'test', 't2', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING (@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := md5(CONCAT_WS('#', `a`, `b`)) , 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `db1`.`t1`; +------+----+---+------+------+------+-----+----------------------------------+ | test | t2 | 1 | NULL | NULL | NULL | cnt | crc | +------+----+---+------+------+------+-----+----------------------------------+ | test | t2 | 1 | NULL | NULL | NULL | 4 | 0000000000000000df024e1a4a32c31f | +------+----+---+------+------+------+-----+----------------------------------+ 1 row in set (0.00 sec)

[root@localhost mysql]# pt-table-checksum --replicate=percona.checksum --create-replicate-table --function=md5 --databases=db1 --tables=t1 localhost --user=root --password=*** --no-check-binlog-format TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-23T23:57:52 0 1 12 1 0 0.292 db1.t1 [root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=amma --verbose --function=md5 --sync-to-master 192.***.***.*** # Syncing via replication h=192.168.56.102,p=...,u=root # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('3', '4') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.168.56.101,p=..., u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.***.***.***,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5608 user:root host:localhost.localdomain*/; REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('4', '3') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.168.56.101,p=..., u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.***.**.***,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum replicate:percona.checksum bidirectional:0 pid:5608 user:root host:localhost.localdomain*/; # 0 2 0 0 Chunk 04:46:04 04:46:04 2 db1.t1

Master Slave +-----+-----+ +-----+-----+ | a | b | | a | b | +-----+-----+ +-----+-----+ | 2 | 1 | | 2 | 1 | | 1 | 2 | | 1 | 2 | | 4 | 3 | | 4 | 3 | | 3 | 4 | | 3 | 4 | +-----+-----+ +-----+-----+

The MD5 did the trick and solved the problem. See the BIT_XOR result for the MD5 given above and the function is able to identify the distinct values in the tables and resulted with the different crc values. The MD5 (Message-Digest algorithm 5) is a well-known cryptographic hash function with a 128-bit resulting hash value. MD5 is widely used in security-related applications, and is also frequently used to check the integrity but MD5() and SHA1() are very CPU-intensive with slower checksumming if chunk-time is included.

 

The post How to avoid hash collisions when using MySQL’s CRC32 function appeared first on MySQL Performance Blog.

Pages

]]>