]]>
]]>

You are here

Feed aggregator

innobackupex can not recover whole database

Lastest Forum Posts - April 30, 2015 - 5:49am
Hi all, Thank you all for support these great tool to hot backup the mysql. Then I met a problem. I am using innobackupex tools to hot backup mysql database, it's very well in test environment. But it fail on the prod environment. The story is: I stop the mysql master server for do some change on my.cnf file. but the replication is doing something at the same time. Then it give me some warning: [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave. Statement: update xxxx(table name) ....(sql command) I don't care about it because I can do hot backup again after that. For safely master restart, I do 'flush tables with read lock', then stop the server. change the cnf file and start it. OK, I do the hot backup using command: innobackupex --user=root --password=xxxx --defaults-file=/etc/mysql/my.cnf /home/user/xtratest --no-timestamp run successful. then I recover them to another server to restore the database for mysql slave. the command is : innobackupex --user=root --password=xxxx --defaults-file=/etc/mysql/my.cnf --apply-log /home/user/xtratest innobackupex --user=root --password=xxxx --defaults-file=/etc/mysql/my.cnf --copy-back /home/user/xtratest start server;change master;start slave; All of them are so right without any error messages. Then I check the slave mysql server missing a database, which just have an empty table. missing a database, which just have some view tables missing two users. missing 10 view tables. There are two tables , I means master.A1 and slave.A1, master.B1 and slave.B1 which of them have almost 1w+ records different. The total recordes of them are only 10w+ records/per table. Can not sync by replication. What happen? Please help me about it! Thansk a lot!

The Cycle: Incremental Backup > Preparation > Restoration > Incremental Backup

Lastest Forum Posts - April 30, 2015 - 2:28am
Hello,

I have been trying to test Percona innobackupex to achieve incremental backups.

Full backup and restore:
  1. Take full backup - let's call it FB1
  2. Prepare FB1
  3. Restore FB1 into the MySQL data directory.
That worked like a charm.

Then I tried the incremental backup - below are the steps:
  1. Take full backup - let's call it FB2
  2. Add a few records in tables and then take incremental backup - let's call it IB1
  3. Add a few more records, then take another incremental backup on top of IB1 - let's call this IB2
  4. Add last set of records, take another incremental backup IB3.
  5. Now prepare the backup - starting with FB2 (--apply-log --redo-only), then IB1 (--apply-log --redo-only), followed by IB2 (--apply-log --redo-only) and finally IB3 (--apply-log).
  6. The optional step of preparing the composite backup [let's call it CB1 (which is FB2+IB1+IB2+IB3)] using --apply-log failed - should this be of some concern?
  7. However, I went ahead and restored the composite backup, and it seems the restoration was successful as I can see all the records that I've created even though I dropped the corresponding table before doing the restoration.
  8. Now here comes the problem - when I try to add new records here in the table after restoration, and then try to take the incremental backup on top of CB1, it throws me an error. I can provide the content of that error if you need to have a look. But before we dig into that, I just wanted to know if that not allowed by design or something, or if I doing something wrong process-wise?
Could you kindly let me know your thoughts? Thanks very much in advance!

Nikhil

Problem Cycle: Incremental Backup > Preparation > Restoration > Incremental Backup

Lastest Forum Posts - April 30, 2015 - 2:26am
Hello,

I have been trying to test Percona innobackupex to achieve incremental backups.

Full backup and restore:
  1. Take full backup - let's call it FB1
  2. Prepare FB1
  3. Restore FB1 into the MySQL data directory.
That worked like a charm.

Then I tried the incremental backup - below are the steps:
  1. Take full backup - let's call it FB2
  2. Add a few records in tables and then take incremental backup - let's call it IB1
  3. Add a few more records, then take another incremental backup on top of IB1 - let's call this IB2
  4. Add last set of records, take another incremental backup IB3.
  5. Now prepare the backup - starting with FB2 (--apply-log --redo-only), then IB1 (--apply-log --redo-only), followed by IB2 (--apply-log --redo-only) and finally IB3 (--apply-log).
  6. The optional step of preparing the composite backup [let's call it CB1 (which is FB2+IB1+IB2+IB3)] using --apply-log failed - should this be of some concern?
  7. However, I went ahead and restored the composite backup, and it seems the restoration was successful as I can see all the records that I've created even though I dropped the corresponding table before doing the restoration.
  8. Now here comes the problem - when I try to add new records here in the table after restoration, and then try to take the incremental backup on top of CB1, it throws me an error. I can provide the content of that error if you need to have a look. But before we dig into that, I just wanted to know if that not allowed by design or something, or if I doing something wrong process-wise?
Could you kindly let me know your thoughts? Thanks very much in advance!

Nikhil

Optimizer hints in MySQL 5.7.7 – The missed manual

Latest MySQL Performance Blog posts - April 30, 2015 - 12:00am

In version MySQL 5.7.7 Oracle presented a new promising feature: optimizer hints. However it did not publish any documentation about the hints. The only note which I found in the user manual about the hints is:

  • It is now possible to provide hints to the optimizer by including /*+ ... */ comments following the SELECT, INSERT, REPLACE, UPDATE, or DELETE keyword of SQL statements. Such statements can also be used with EXPLAIN. Examples:
    SELECT /*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */ f1 FROM t3 WHERE f1 > 30 AND f1 < 33; SELECT /*+ BKA(t1, t2) */ * FROM t1 INNER JOIN t2 WHERE ...; SELECT /*+ NO_ICP(t1) */ * FROM t1 WHERE ...;

There are also three worklogs: WL #3996, WL #8016 and WL #8017. But they describe the general concept and do not have much information about which optimizations can be used and how. More light on this provided by slide 59 from Øystein Grøvlen’s session at Percona Live. But that’s all: no “official” full list of possible optimizations, no use cases… nothing.

I tried to sort it out myself.

My first finding is the fact that slide #59 really lists six of seven possible index hints. Confirmation for this exists in one of two new files under sql directory of MySQL source tree, created for this new feature.

$cat sql/opt_hints.h ... /** Hint types, MAX_HINT_ENUM should be always last. This enum should be synchronized with opt_hint_info array(see opt_hints.cc). */ enum opt_hints_enum { BKA_HINT_ENUM= 0, BNL_HINT_ENUM, ICP_HINT_ENUM, MRR_HINT_ENUM, NO_RANGE_HINT_ENUM, MAX_EXEC_TIME_HINT_ENUM, QB_NAME_HINT_ENUM, MAX_HINT_ENUM };

Looking into file sql/opt_hints.cc we can find out what these optimizations give not much choice: either enable or disable.

$cat sql/opt_hints.cc ... struct st_opt_hint_info opt_hint_info[]= { {"BKA", true, true}, {"BNL", true, true}, {"ICP", true, true}, {"MRR", true, true}, {"NO_RANGE_OPTIMIZATION", true, true}, {"MAX_EXECUTION_TIME", false, false}, {"QB_NAME", false, false}, {0, 0, 0} };

A choice for the way to include hints into SQL statements: inside comments with sign “+”/*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */, – is compatible with style of optimizer hints which Oracle uses.

We actually had access to these hints before: they were accessible via variable optimizer_switch. At least such optimizations like BKA, BNL, ICP, MRR. But with new syntax we cannot only modify this access globally or per session, but can turn on or off particular optimization for a single table and column in the query. I can demonstrate it on this quite artificial but always accessible example:

mysql> use mysql Database changed mysql> explain select * from user where host in ('%', '127.0.0.1'); +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ | 1 | SIMPLE | user | NULL | range | PRIMARY | PRIMARY | 180 | NULL | 2 | 100.00 | Using index condition | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ 1 row in set, 1 warning (0.01 sec) mysql> explain select /*+ NO_RANGE_OPTIMIZATION(user PRIMARY) */ * from user where host in ('%', '127.0.0.1'); +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | user | NULL | ALL | PRIMARY | NULL | NULL | NULL | 5 | 40.00 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)

I used one more hint, which we could not turn on or off directly earlier: range optimization.

One more “intuitively” documented feature is the ability to turn on or off a particular optimization. This works only for BKA, BNL, ICP and MRR: you can specify NO_BKA(table[[, table]…]), NO_BNL(table[[, table]…]), NO_ICP(table indexes[[, table indexes]…]) and NO_MRR(table indexes[[, table indexes]…]) to avoid using these algorithms for particular table or index in the JOIN.

MAX_EXECUTION_TIME does not require any table or key name inside. Instead you need to specify maximum time in milliseconds which query should run:

mysql> select /*+ MAX_EXECUTION_TIME(1000) */ sleep(1) from user; ERROR 3024 (HY000): Query execution was interrupted, max_statement_time exceeded mysql> select /*+ MAX_EXECUTION_TIME(10000) */ sleep(1) from user; +----------+ | sleep(1) | +----------+ | 0 | | 0 | | 0 | | 0 | | 0 | +----------+ 5 rows in set (5.00 sec)

QB_NAME is more complicated. WL #8017 tells us this is custom context. But what is this? The answer is in the MySQL test suite! Tests for optimizer hints exist in file t/opt_hints.test For QB_NAME very first entry is query:

EXPLAIN SELECT /*+ NO_ICP(t3@qb1 f3_idx) */ f2 FROM (SELECT /*+ QB_NAME(QB1) */ f2, f3, f1 FROM t3 WHERE f1 > 2 AND f3 = 'poiu') AS TD WHERE TD.f1 > 2 AND TD.f3 = 'poiu';

So we can specify custom QB_NAME for any subquery and specify optimizer hint only for this context.

To conclude this quick overview I want to show a practical example of when query hints are really needed. Last week I worked on an issue where a customer upgraded from MySQL version 5.5 to 5.6 and found some of their queries started to work slower than before. I wrote an answer which could sound funny, but still remains correct: “One of the reasons for such behavior is optimizer  improvements. While they all are made for better performance, some queries – optimized for older versions – can start working slower than before.”

To demonstrate a public example of such a query I will use my favorite source of information: MySQL Community Bugs Database. In a search for Optimizer regression bugs that are still not fixed we can find bug #68919 demonstrating regression in case the MRR algorithm is used for queries with LIMIT. In run queries, shown in the bug report, we will see a huge difference:

mysql> SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+----+----+----+ | pk | i1 | i2 | i3 | +----+----+----+----+ | 42 | 42 | 42 | 42 | +----+----+----+----+ 1 row in set (6.88 sec) mysql> explain SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ | 1 | SIMPLE | t1 | NULL | range | idx | idx | 4 | NULL | 9999958 | 33.33 | Using index condition; Using MRR | +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SELECT /*+ NO_MRR(t1) */ * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+----+----+----+ | pk | i1 | i2 | i3 | +----+----+----+----+ | 42 | 42 | 42 | 42 | +----+----+----+----+ 1 row in set (0.00 sec)

With MRR query execution takes 6.88 seconds and 0 if MRR is not used! But the bug report itself suggests usingoptimizer_switch="mrr=off";as a workaround. And this will work perfectly well if you are OK to runSET optimizer_switch="mrr=off";every time you are running a query which will take advantage of having it OFF. With optimizer hints you can have one or another algorithm to be ON for particular table in the query and OFF for another one. I, again, took quite an artificial example, but it demonstrates the method:

mysql> explain select /*+ MRR(dept_emp) */ * from dept_emp where to_date in (select /*+ NO_MRR(salaries)*/ to_date from salaries where salary >40000 and salary <45000) and emp_no >10100 and emp_no < 30200 and dept_no in ('d005', 'd006','d007'); +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ | 1 | SIMPLE | dept_emp | NULL | range | PRIMARY,emp_no,dept_no | dept_no | 8 | NULL | 10578 | 100.00 | Using index condition; Using where; Using MRR | | 1 | SIMPLE | <subquery2> | NULL | eq_ref | <auto_key> | <auto_key> | 3 | employees.dept_emp.to_date | 1 | 100.00 | NULL | | 2 | MATERIALIZED | salaries | NULL | ALL | salary | NULL | NULL | NULL | 2838533 | 17.88 | Using where | +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ 3 rows in set, 1 warning (0.00 sec)

 

The post Optimizer hints in MySQL 5.7.7 – The missed manual appeared first on MySQL Performance Blog.

Optimizer hints in MySQL 5.7.7 – The missed manual

Latest MySQL Performance Blog posts - April 30, 2015 - 12:00am

In version MySQL 5.7.7 Oracle presented a new promising feature: optimizer hints. However it did not publish any documentation about the hints. The only note which I found in the user manual about the hints is:

  • It is now possible to provide hints to the optimizer by including /*+ ... */ comments following the SELECT, INSERT, REPLACE, UPDATE, or DELETE keyword of SQL statements. Such statements can also be used with EXPLAIN. Examples:
    SELECT /*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */ f1 FROM t3 WHERE f1 > 30 AND f1 < 33; SELECT /*+ BKA(t1, t2) */ * FROM t1 INNER JOIN t2 WHERE ...; SELECT /*+ NO_ICP(t1) */ * FROM t1 WHERE ...;

There are also three worklogs: WL #3996, WL #8016 and WL #8017. But they describe the general concept and do not have much information about which optimizations can be used and how. More light on this provided by slide 59 from Øystein Grøvlen’s session at Percona Live. But that’s all: no “official” full list of possible optimizations, no use cases… nothing.

I tried to sort it out myself.

My first finding is the fact that slide #59 really lists six of seven possible index hints. Confirmation for this exists in one of two new files under sql directory of MySQL source tree, created for this new feature.

$cat sql/opt_hints.h ... /** Hint types, MAX_HINT_ENUM should be always last. This enum should be synchronized with opt_hint_info array(see opt_hints.cc). */ enum opt_hints_enum { BKA_HINT_ENUM= 0, BNL_HINT_ENUM, ICP_HINT_ENUM, MRR_HINT_ENUM, NO_RANGE_HINT_ENUM, MAX_EXEC_TIME_HINT_ENUM, QB_NAME_HINT_ENUM, MAX_HINT_ENUM };

Looking into file sql/opt_hints.cc we can find out what these optimizations give not much choice: either enable or disable.

$cat sql/opt_hints.cc ... struct st_opt_hint_info opt_hint_info[]= { {"BKA", true, true}, {"BNL", true, true}, {"ICP", true, true}, {"MRR", true, true}, {"NO_RANGE_OPTIMIZATION", true, true}, {"MAX_EXECUTION_TIME", false, false}, {"QB_NAME", false, false}, {0, 0, 0} };

A choice for the way to include hints into SQL statements: inside comments with sign “+”/*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */, – is compatible with style of optimizer hints which Oracle uses.

We actually had access to these hints before: they were accessible via variable optimizer_switch. At least such optimizations like BKA, BNL, ICP, MRR. But with new syntax we cannot only modify this access globally or per session, but can turn on or off particular optimization for a single table and column in the query. I can demonstrate it on this quite artificial but always accessible example:

mysql> use mysql Database changed mysql> explain select * from user where host in ('%', '127.0.0.1'); +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ | 1 | SIMPLE | user | NULL | range | PRIMARY | PRIMARY | 180 | NULL | 2 | 100.00 | Using index condition | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+ 1 row in set, 1 warning (0.01 sec) mysql> explain select /*+ NO_RANGE_OPTIMIZATION(user PRIMARY) */ * from user where host in ('%', '127.0.0.1'); +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | user | NULL | ALL | PRIMARY | NULL | NULL | NULL | 5 | 40.00 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)

I used one more hint, which we could not turn on or off directly earlier: range optimization.

One more “intuitively” documented feature is the ability to turn on or off a particular optimization. This works only for BKA, BNL, ICP and MRR: you can specify NO_BKA(table[[, table]…]), NO_BNL(table[[, table]…]), NO_ICP(table indexes[[, table indexes]…]) and NO_MRR(table indexes[[, table indexes]…]) to avoid using these algorithms for particular table or index in the JOIN.

MAX_EXECUTION_TIME does not require any table or key name inside. Instead you need to specify maximum time in milliseconds which query should run:

mysql> select /*+ MAX_EXECUTION_TIME(1000) */ sleep(1) from user; ERROR 3024 (HY000): Query execution was interrupted, max_statement_time exceeded mysql> select /*+ MAX_EXECUTION_TIME(10000) */ sleep(1) from user; +----------+ | sleep(1) | +----------+ | 0 | | 0 | | 0 | | 0 | | 0 | +----------+ 5 rows in set (5.00 sec)

QB_NAME is more complicated. WL #8017 tells us this is custom context. But what is this? The answer is in the MySQL test suite! Tests for optimizer hints exist in file t/opt_hints.test For QB_NAME very first entry is query:

EXPLAIN SELECT /*+ NO_ICP(t3@qb1 f3_idx) */ f2 FROM (SELECT /*+ QB_NAME(QB1) */ f2, f3, f1 FROM t3 WHERE f1 > 2 AND f3 = 'poiu') AS TD WHERE TD.f1 > 2 AND TD.f3 = 'poiu';

So we can specify custom QB_NAME for any subquery and specify optimizer hint only for this context.

To conclude this quick overview I want to show a practical example of when query hints are really needed. Last week I worked on an issue where a customer upgraded from MySQL version 5.5 to 5.6 and found some of their queries started to work slower than before. I wrote an answer which could sound funny, but still remains correct: “One of the reasons for such behavior is optimizer  improvements. While they all are made for better performance, some queries – optimized for older versions – can start working slower than before.”

To demonstrate a public example of such a query I will use my favorite source of information: MySQL Community Bugs Database. In a search for Optimizer regression bugs that are still not fixed we can find bug #68919 demonstrating regression in case the MRR algorithm is used for queries with LIMIT. In run queries, shown in the bug report, we will see a huge difference:

mysql> SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+----+----+----+ | pk | i1 | i2 | i3 | +----+----+----+----+ | 42 | 42 | 42 | 42 | +----+----+----+----+ 1 row in set (6.88 sec) mysql> explain SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ | 1 | SIMPLE | t1 | NULL | range | idx | idx | 4 | NULL | 9999958 | 33.33 | Using index condition; Using MRR | +----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SELECT /*+ NO_MRR(t1) */ * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1; +----+----+----+----+ | pk | i1 | i2 | i3 | +----+----+----+----+ | 42 | 42 | 42 | 42 | +----+----+----+----+ 1 row in set (0.00 sec)

With MRR query execution takes 6.88 seconds and 0 if MRR is not used! But the bug report itself suggests usingoptimizer_switch="mrr=off";as a workaround. And this will work perfectly well if you are OK to runSET optimizer_switch="mrr=off";every time you are running a query which will take advantage of having it OFF. With optimizer hints you can have one or another algorithm to be ON for particular table in the query and OFF for another one. I, again, took quite an artificial example, but it demonstrates the method:

mysql> explain select /*+ MRR(dept_emp) */ * from dept_emp where to_date in (select /*+ NO_MRR(salaries)*/ to_date from salaries where salary >40000 and salary <45000) and emp_no >10100 and emp_no < 30200 and dept_no in ('d005', 'd006','d007'); +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ | 1 | SIMPLE | dept_emp | NULL | range | PRIMARY,emp_no,dept_no | dept_no | 8 | NULL | 10578 | 100.00 | Using index condition; Using where; Using MRR | | 1 | SIMPLE | <subquery2> | NULL | eq_ref | <auto_key> | <auto_key> | 3 | employees.dept_emp.to_date | 1 | 100.00 | NULL | | 2 | MATERIALIZED | salaries | NULL | ALL | salary | NULL | NULL | NULL | 2838533 | 17.88 | Using where | +----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+ 3 rows in set, 1 warning (0.00 sec)

 

The post Optimizer hints in MySQL 5.7.7 – The missed manual appeared first on MySQL Performance Blog.

Generated (Virtual) Columns in MySQL 5.7 (labs)

Latest MySQL Performance Blog posts - April 29, 2015 - 3:00am

About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.

The idea is very simple: if we store a column

`FlightDate` date

in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.

Here is the original example:

CREATE TABLE `ontime` ( `id` int(11) NOT NULL AUTO_INCREMENT, `FlightDate` date DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestCityName` varchar(100) DEFAULT NULL, `DestState` char(2) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `ArrDelayMinutes` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, `CancellationCode` char(1) DEFAULT NULL, `Diverted` tinyint(4) DEFAULT NULL, PRIMARY KEY (`id`), KEY `FlightDate` (`FlightDate`) ) ENGINE=InnoDB

Now I want to find all flights on Sundays (in 2013) and group by airline.

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm WHERE dayofweek(FlightDate) = 7 group by carrier *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 151253427 Extra: Using where; Using temporary; Using filesort Results: 32 rows in set (1 min 57.93 sec)

The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:

ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;

Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:

CREATE DEFINER = CURRENT_USER TRIGGER ontime_insert BEFORE INSERT ON ontime_sm_triggers FOR EACH ROW SET NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);

One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.

The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:

CREATE TABLE `ontime_sm_virtual` ( `id` int(11) NOT NULL AUTO_INCREMENT, `FlightDate` date DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestCityName` varchar(100) DEFAULT NULL, `DestState` char(2) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `ArrDelayMinutes` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, `CancellationCode` char(1) DEFAULT NULL, `Diverted` tinyint(4) DEFAULT NULL, `CRSElapsedTime` int(11) DEFAULT NULL, `ActualElapsedTime` int(11) DEFAULT NULL, `AirTime` int(11) DEFAULT NULL, `Flights` int(11) DEFAULT NULL, `Distance` int(11) DEFAULT NULL, `Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL, PRIMARY KEY (`id`), KEY `Flight_dayofweek` (`Flight_dayofweek`), ) ENGINE=InnoDB

Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.

Now MySQL can use this index:

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by carrier *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm_virtual partitions: NULL type: ref possible_keys: Flight_dayofweek key: Flight_dayofweek key_len: 2 ref: const rows: 165409 filtered: 100.00 Extra: Using where; Using temporary; Using filesort

To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:

mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, carrier); ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.

We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:

mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL; Query OK, 0 rows affected (0.43 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual); Query OK, 999999 rows affected (36.79 sec) Records: 999999 Duplicates: 0 Warnings: 0 mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm_virtual partitions: NULL type: ref possible_keys: Flight_dayofweek,comb key: comb key_len: 2 ref: const rows: 141223 filtered: 100.00 Extra: Using where; Using index

Now MySQL will use an index and completely avoid the filesort.

The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:

mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; Query OK, 999999 rows affected (27.86 sec) Records: 999999 Duplicates: 0 Warnings: 0 mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; Query OK, 999999 rows affected (16.29 sec) Records: 999999 Duplicates: 0 Warnings: 0

Now the big disappointment: all operations with generated columns are not online right now.

mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE; ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED. mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE; ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

I hope it will be fixed in the future releases.

Conclusion

Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).

More information:

The post Generated (Virtual) Columns in MySQL 5.7 (labs) appeared first on MySQL Performance Blog.

Generated (Virtual) Columns in MySQL 5.7 (labs)

Latest MySQL Performance Blog posts - April 29, 2015 - 3:00am

About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.

The idea is very simple: if we store a column

`FlightDate` date

in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.

Here is the original example:

CREATE TABLE `ontime` ( `id` int(11) NOT NULL AUTO_INCREMENT, `FlightDate` date DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestCityName` varchar(100) DEFAULT NULL, `DestState` char(2) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `ArrDelayMinutes` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, `CancellationCode` char(1) DEFAULT NULL, `Diverted` tinyint(4) DEFAULT NULL, PRIMARY KEY (`id`), KEY `FlightDate` (`FlightDate`) ) ENGINE=InnoDB

Now I want to find all flights on Sundays (in 2013) and group by airline.

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm WHERE dayofweek(FlightDate) = 7 group by carrier *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 151253427 Extra: Using where; Using temporary; Using filesort Results: 32 rows in set (1 min 57.93 sec)

The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:

ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;

Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:

CREATE DEFINER = CURRENT_USER TRIGGER ontime_insert BEFORE INSERT ON ontime_sm_triggers FOR EACH ROW SET NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);

One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.

The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:

CREATE TABLE `ontime_sm_virtual` ( `id` int(11) NOT NULL AUTO_INCREMENT, `FlightDate` date DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestCityName` varchar(100) DEFAULT NULL, `DestState` char(2) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `ArrDelayMinutes` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, `CancellationCode` char(1) DEFAULT NULL, `Diverted` tinyint(4) DEFAULT NULL, `CRSElapsedTime` int(11) DEFAULT NULL, `ActualElapsedTime` int(11) DEFAULT NULL, `AirTime` int(11) DEFAULT NULL, `Flights` int(11) DEFAULT NULL, `Distance` int(11) DEFAULT NULL, `Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL, PRIMARY KEY (`id`), KEY `Flight_dayofweek` (`Flight_dayofweek`), ) ENGINE=InnoDB

Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.

Now MySQL can use this index:

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by carrier *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm_virtual partitions: NULL type: ref possible_keys: Flight_dayofweek key: Flight_dayofweek key_len: 2 ref: const rows: 165409 filtered: 100.00 Extra: Using where; Using temporary; Using filesort

To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:

mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, carrier); ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.

We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:

mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL; Query OK, 0 rows affected (0.43 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual); Query OK, 999999 rows affected (36.79 sec) Records: 999999 Duplicates: 0 Warnings: 0 mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_sm_virtual partitions: NULL type: ref possible_keys: Flight_dayofweek,comb key: comb key_len: 2 ref: const rows: 141223 filtered: 100.00 Extra: Using where; Using index

Now MySQL will use an index and completely avoid the filesort.

The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:

mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; Query OK, 999999 rows affected (27.86 sec) Records: 999999 Duplicates: 0 Warnings: 0 mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; Query OK, 999999 rows affected (16.29 sec) Records: 999999 Duplicates: 0 Warnings: 0

Now the big disappointment: all operations with generated columns are not online right now.

mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE; ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED. mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE; ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

I hope it will be fixed in the future releases.

Conclusion

Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).

More information:

The post Generated (Virtual) Columns in MySQL 5.7 (labs) appeared first on MySQL Performance Blog.

tar: -: Cannot write: Broken pipe error

Lastest Forum Posts - April 28, 2015 - 1:17pm
My third galera mariadb node (out of 3) will not start and I get this:

150428 15:11:27 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_s ocket=/var/lib/mysql/mysql.sock' as 'root' (using password: YES).
150428 15:11:27 innobackupex: Connected to MySQL server
150428 15:11:27 innobackupex: Executing a version check against the server...
150428 15:11:27 innobackupex: Done.
150428 15:11:27 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
At the end of a successful backup run innobackupex
prints "completed OK!".

innobackupex: Using server version 10.0.17-MariaDB-wsrep

innobackupex: Created backup directory /tmp
tar: -: Cannot write: Broken pipe
tar: Error is not recoverable: exiting now
innobackupex: 'tar chf -' returned with exit code 2.
innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 4894.
main::backup_file_via_stream('/tmp', 'backup-my.cnf') called at /usr//bin/innobackupex line 4943
main::backup_file('/tmp', 'backup-my.cnf', '/tmp/backup-my.cnf') called at /usr//bin/innobackupex line 4967
main::write_to_backup_file('/tmp/backup-my.cnf', '# This MySQL options file was generated by innobackupex.\x{a}\x{a}# T...') called at /usr//bin/innobackupex line 3774
main::write_backup_config_file('/tmp/backup-my.cnf') called at /usr//bin/innobackupex line 3701
main::init() called at /usr//bin/innobackupex line 1566
innobackupex: Error: Failed to stream '/tmp/backup-my.cnf': 2 at /usr//bin/innobackupex line 4894.


Any ideas? Thanks

Percona cluster - one server crashed - memory bug?

Lastest Forum Posts - April 28, 2015 - 3:49am
I have three servers in a cluster - running ubuntu 12.04, xeon systems with ECC ram and raid 5 arrays, 32GB ram each. Unlikely to be a hardware issue. This is what the error log said:

18:19:44 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Please help us make Percona XtraDB Cluster better by reporting any bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=37
max_threads=153
thread_count=19
connection_count=2
It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69252 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f2d18000990 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong...

stack_bottom = 7f30005e0a70 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8e811e]
/usr/sbin/mysqld(handle_fatal_signal+0x392)[0x65ffa2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f303d3e5cb0]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_ for_trx_v3EPNS_9TrxHandleE+0xa0)[0x7f302225a0f0]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_ trxs_upto_Elb+0x158)[0x7f302225b8c8]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM18proces s_commit_cutEll+0x85)[0x7f3022288215]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispa tchEPvRK10gcs_actionRb+0x405)[0x7f3022269d75]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7proce ssEPvRb+0x5e)[0x7f302226a8ee]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_ recvEPv+0x78)[0x7f302228f958]
/usr/lib/libgalera_smm.so(galera_recv+0x1e)[0x7f30222a4c8e] /usr/sbin/mysqld[0x5a491c]
/usr/sbin/mysqld(start_wsrep_THD+0x287)[0x58d247] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f303d3dde9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f303c8f88bd]

Trying to get some variables. Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 12
Status: NOT_KILLED

Percona cluster - one server crashed - memory bug?

Lastest Forum Posts - April 28, 2015 - 3:49am
I have three servers in a cluster - running ubuntu 12.04, xeon systems with ECC ram and raid 5 arrays, 32GB ram each. Unlikely to be a hardware issue. This is what the error log said:

18:19:44 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Please help us make Percona XtraDB Cluster better by reporting any bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=37
max_threads=153
thread_count=19
connection_count=2
It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69252 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f2d18000990 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong...

stack_bottom = 7f30005e0a70 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8e811e]
/usr/sbin/mysqld(handle_fatal_signal+0x392)[0x65ffa2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f303d3e5cb0]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_ for_trx_v3EPNS_9TrxHandleE+0xa0)[0x7f302225a0f0]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_ trxs_upto_Elb+0x158)[0x7f302225b8c8]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM18proces s_commit_cutEll+0x85)[0x7f3022288215]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispa tchEPvRK10gcs_actionRb+0x405)[0x7f3022269d75]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7proce ssEPvRb+0x5e)[0x7f302226a8ee]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_ recvEPv+0x78)[0x7f302228f958]
/usr/lib/libgalera_smm.so(galera_recv+0x1e)[0x7f30222a4c8e] /usr/sbin/mysqld[0x5a491c]
/usr/sbin/mysqld(start_wsrep_THD+0x287)[0x58d247] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f303d3dde9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f303c8f88bd]

Trying to get some variables. Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 12
Status: NOT_KILLED

Test your knowledge: Percona XtraDB Cluster (PXC) quiz

Latest MySQL Performance Blog posts - April 28, 2015 - 3:00am

I often talk with people who are very interested in the features of Percona XtraDB Cluster (PXC) such as synchronous and parallel replication, multi-node writing and high availability. However some get confused when operating a real PXC cluster because they do not fully realize the implications of these features. So here is a fun way to test your PXC knowledge: try to solve these 12 questions related to PXC! (you will find the answers at the end of the post.)

Workload

1. With Galera 3.x, support for MyISAM is experimental. When can we expect to have full MyISAM support?
a. This will never happen as Galera is designed for transactional storage engines.
b. This is planned for Galera 4.0.

2. Why aren’t all workloads a good fit for PXC?
a. Execution plans can change compared to a regular MySQL server, so performance is sometimes not as good as with a regular MySQL server.
b. Large transactions and write hotspots can create performance issues with Galera.

3. For workloads with a write hot spot, writing on all nodes to distribute the load is a good way to solve the issue.
a. True
b. False

4. Optimistic locking is used in a PXC cluster. What does it mean?
a. When a transaction starts on a node, locks are only set on this node but never on the remote nodes.
b. When a transaction starts on a node, locks are only set on the remote nodes but never on the local node.
c. Write conflict detection is built-in, so there is no need to set locks at all.

Replication

5. Galera implements virtually synchronous replication. What does it mean?
a. A transaction is first committed locally, and then it is committed on all remote nodes at the same exact point in time.
b. Transactions are replicated synchronously, but they are applied asynchronously on remote nodes.
c. Replication is actually asynchronous, but as it is faster than MySQL replication, so marketing decided to name it ‘virtually synchronous’.

6. When the receive queue of a node exceeds a threshold, the node sends flow control messages. What is the goal of these flow control messages?
a. They instruct the other nodes that they must pause processing writes for some time, to allow the slow node to catch up.
b. The other nodes trigger an election and if they have quorum they will evict the slow node.
c. The messages can be used by monitoring systems to detect a slow node, but they have no effect.

7. When you change the state of a node to Donor/Desynced, what happens?
a. The node stops receiving writes from the other nodes.
b. The node intentionally replicates writes at a slower pace, this is roughly equivalent to a delayed replica when using MySQL replication.
c. The node keeps working as usual, but it will not send flow control messages if its receive queue becomes large.

High Availability

8. You should always use an odd number of nodes, because with an even number (say 4 or 6), the failure of one node will create a split-brain situation.
a. True
b. False

9. With a 3-node cluster, what happens if you gracefully stop 2 nodes?
a. The remaining node can process queries normally.
b. The remaining node is up but it stops processing queries as it does not have quorum.

Operations

10. If a node has been stopped for less than 5 minutes, it will always perform an IST.
a. True: SST is only performed after a node crash, never after a regular shutdown.
b. False: it depends on the gcache size.

11. Even with datasets under 5GB, the preferred SST method is xtrabackup-v2 not mysqldump.
a. True
b. False

12. Migration from a master-slave setup to a PXC cluster always involves a downtime to dump and reload the database.
a. True, because MySQL replication and Galera replication are incompatible.
b. False, one node of the PXC cluster can be set up as an asynchronous replica of the old master.

Solutions

1. a      2. b      3. b
4. a      5. b      6. a
7. c      8. b      9. a
10. b    11. a    12. b

The post Test your knowledge: Percona XtraDB Cluster (PXC) quiz appeared first on MySQL Performance Blog.

Test your knowledge: Percona XtraDB Cluster (PXC) quiz

Latest MySQL Performance Blog posts - April 28, 2015 - 3:00am

I often talk with people who are very interested in the features of Percona XtraDB Cluster (PXC) such as synchronous and parallel replication, multi-node writing and high availability. However some get confused when operating a real PXC cluster because they do not fully realize the implications of these features. So here is a fun way to test your PXC knowledge: try to solve these 12 questions related to PXC! (you will find the answers at the end of the post.)

Workload

1. With Galera 3.x, support for MyISAM is experimental. When can we expect to have full MyISAM support?
a. This will never happen as Galera is designed for transactional storage engines.
b. This is planned for Galera 4.0.

2. Why aren’t all workloads a good fit for PXC?
a. Execution plans can change compared to a regular MySQL server, so performance is sometimes not as good as with a regular MySQL server.
b. Large transactions and write hotspots can create performance issues with Galera.

3. For workloads with a write hot spot, writing on all nodes to distribute the load is a good way to solve the issue.
a. True
b. False

4. Optimistic locking is used in a PXC cluster. What does it mean?
a. When a transaction starts on a node, locks are only set on this node but never on the remote nodes.
b. When a transaction starts on a node, locks are only set on the remote nodes but never on the local node.
c. Write conflict detection is built-in, so there is no need to set locks at all.

Replication

5. Galera implements virtually synchronous replication. What does it mean?
a. A transaction is first committed locally, and then it is committed on all remote nodes at the same exact point in time.
b. Transactions are replicated synchronously, but they are applied asynchronously on remote nodes.
c. Replication is actually asynchronous, but as it is faster than MySQL replication, so marketing decided to name it ‘virtually synchronous’.

6. When the receive queue of a node exceeds a threshold, the node sends flow control messages. What is the goal of these flow control messages?
a. They instruct the other nodes that they must pause processing writes for some time, to allow the slow node to catch up.
b. The other nodes trigger an election and if they have quorum they will evict the slow node.
c. The messages can be used by monitoring systems to detect a slow node, but they have no effect.

7. When you change the state of a node to Donor/Desynced, what happens?
a. The node stops receiving writes from the other nodes.
b. The node intentionally replicates writes at a slower pace, this is roughly equivalent to a delayed replica when using MySQL replication.
c. The node keeps working as usual, but it will not send flow control messages if its receive queue becomes large.

High Availability

8. You should always use an odd number of nodes, because with an even number (say 4 or 6), the failure of one node will create a split-brain situation.
a. True
b. False

9. With a 3-node cluster, what happens if you gracefully stop 2 nodes?
a. The remaining node can process queries normally.
b. The remaining node is up but it stops processing queries as it does not have quorum.

Operations

10. If a node has been stopped for less than 5 minutes, it will always perform an IST.
a. True: SST is only performed after a node crash, never after a regular shutdown.
b. False: it depends on the gcache size.

11. Even with datasets under 5GB, the preferred SST method is xtrabackup-v2 not mysqldump.
a. True
b. False

12. Migration from a master-slave setup to a PXC cluster always involves a downtime to dump and reload the database.
a. True, because MySQL replication and Galera replication are incompatible.
b. False, one node of the PXC cluster can be set up as an asynchronous replica of the old master.

Solutions

1. a      2. b      3. b
4. a      5. b      6. a
7. c      8. b      9. a
10. b    11. a    12. b

The post Test your knowledge: Percona XtraDB Cluster (PXC) quiz appeared first on MySQL Performance Blog.

Errors using mysql datadir snapshot from another node

Lastest Forum Posts - April 28, 2015 - 1:30am
I'm evaluating a Percona xtradb 5.6 cluster of 3 nodes in AWS environment. I'm using ec2-consistent-snapshot with --mysql option to issue FTWRL, and make a EBS snapshot of the data. However when a snapshot was made on node 1, and then node 2 is relaunched using that snapshot, the cluster would break.

Through trial-and-error I've found that this is caused by reusing auto.cnf and gvwstate.dat files in mysql datadir, which would contain ids of node 1, and the issues were caused by another node trying to join with ids of a node already in cluster. Removing the said files appears to have fixed the issue and now nodes go up and down as expected.

My question is: did I do the right thing? Do I need to remove auto.cnf and gvwstate.dat before using another server's datadir? Do I need to do anything else? What's the standard practice for this sort of thing?

Errors using mysql datadir snapshot from another node

Lastest Forum Posts - April 28, 2015 - 1:30am
I'm evaluating a Percona xtradb 5.6 cluster of 3 nodes in AWS environment. I'm using ec2-consistent-snapshot with --mysql option to issue FTWRL, and make a EBS snapshot of the data. However when a snapshot was made on node 1, and then node 2 is relaunched using that snapshot, the cluster would break.

Through trial-and-error I've found that this is caused by reusing auto.cnf and gvwstate.dat files in mysql datadir, which would contain ids of node 1, and the issues were caused by another node trying to join with ids of a node already in cluster. Removing the said files appears to have fixed the issue and now nodes go up and down as expected.

My question is: did I do the right thing? Do I need to remove auto.cnf and gvwstate.dat before using another server's datadir? Do I need to do anything else? What's the standard practice for this sort of thing?

comparison chart between percona xtradb cluster mariadb cluster

Lastest Forum Posts - April 28, 2015 - 12:31am
Hi,

Is there any comparison chart between percona xtradb cluster mariadb cluster that I could refer in terms of featurewise?

Incorrect definition of table mysql.proc

Lastest Forum Posts - April 27, 2015 - 7:12pm
Restoring all databases from MariaDB 10 to Percona Xtradb Cluster, I saw an error on mysql.log on one of db nodes as below

[ERROR] Incorrect definition of table mysql.proc: expected column 'definer' at position 11 to have type char(77), found type char(141).

percona-xtradb-cluster-galera-3.x
3.9.3494.trusty

Do I simply change char(77) to char(141) on one of the node or there's a few extra steps?

pt-duplicate-keys, simple doubt...

Lastest Forum Posts - April 27, 2015 - 12:44pm
Hey Guys,

A simple doubt, I've been working on a test environment to validate the usage of pt-duplicate-keys on certain tables I think have an ugly design and many duplicate/unnecessary indexes and column's repetitions. BTW, the information on the end of the script execution is very useful, but, I'm in doubt about the "Size Duplicate Indexes" - is this the sum(#) of duplicate indexes lines?

: # ######################################################################## # Summary of indexes # ######################################################################## # Size Duplicate Indexes 3165489 <<<<<<---- HERE!!! # Total Duplicate Indexes 9 # Total Indexes 22

Debian Jessie repository

Lastest Forum Posts - April 27, 2015 - 10:20am
Hello,

I know, Debian Jessie is not even 2 days old, but I still want to ask when we could expect a Debian Jessie repository.
Also I want to know what would be currently the best solution to put Percona Server on a Debian Jessie.
The Linux Generic package or build it directly from the git repository?

Thanks in advance!

Regards,
Christian

PAM installation with mysql using LDAP Authentication is not letting to login

Lastest Forum Posts - April 27, 2015 - 4:43am
I've setup PAM 5.6.23 from tar and configured to use LDAP authentication, in some server it is working fine while i've some other boxes having [ Red Hat Enterprise Linux Server release 6.2 (Santiago) & CentOS release 5.8 (Final) ] is not working properly.
I checked all the permissions and file configuration and everything seems fine.

#cat /etc/pam.d/mysqld
auth required pam_ldap.so
account required pam_ldap.so

Can you please help to figure out the issue ?
Please do let me know if any other info is required.

Indexing 101: Optimizing MySQL queries on a single table

Latest MySQL Performance Blog posts - April 27, 2015 - 3:00am

I have recently seen several cases when performance for MySQL queries on a single table was terrible. The reason was simple: the wrong indexes were added and so the execution plan was poor. Here are guidelines to help you optimize various kinds of single-table queries.

Disclaimer: I will be presenting general guidelines and I do not intend to cover all scenarios. I am pretty confident that you can find examples where what I am writing does not work, but I am also confident that it will help you most of the time. Also I will not discuss features you can find in MySQL 5.6+ like Index Condition Pushdown to keep things simple. Be aware that such features can actually make a significant difference in query response time (for good or for bad).

What an index can do for you

An index can perform up to 3 actions: filter, sort/group and cover. While the first 2 actions are self-explanatory, not everyone may know what a ‘covering index’ is. Actually that’s very easy. The general workflow for a basic query is:
1. Use an index to find matching records and get the pointers to data.
2. Use the pointers to the corresponding data.
3. Return records

When a covering index can be used, the index already covers all fields requested in the query, so step #2 can be skipped and the workflow is now:
1. Use an index to find matching records
2. Return records

In many cases, indexes are small and can fit in memory while data is large and does not fit in memory: by using a covering index, you can avoid lots of disk operations and performance can be order of magnitudes better.
Let’s now look at different common scenarios.

Single equality

This is the most basic scenario:

SELECT * FROM t WHERE c = 100

The idea is of course to add an index on (c). However note that if the criteria is not selective enough, the optimizer may choose to perform a full table scan that will certainly be more efficient.
Also note that a frequent variation of this query is when you only select a small subset of fields instead of all fields:

SELECT c1, c2 FROM t WHERE c = 100

Here it could make sense to create an index on (c, c1, c2) because it will be a covering index. Do not create an index on (c1, c2, c)! It will still be covering but it will not be usable for filtering (remember that you can only use a left-most prefix of an index to filter).

Multiple equalities

SELECT * FROM t WHERE c = 100 and d = 'xyz'

It is also very easy to optimize: just add an index on (c, d) or (d, c).

The main mistake here is to add 2 indexes: one on (c) and one on (d). Granted, MySQL is able to use both indexes with the index_merge algorithm, but it is almost always a very bad option.

Equality and inequality

SELECT * FROM t WHERE c > 100 and d = 'xyz'

Here we must be careful because as long as we are using a column with an inequality, this will prevent us from using further columns in the index.

Therefore if we create an index on (d, c), we will be able to filter both on c and d, this is good.
But if we create an index on (c, d), we will only be filtering on c, which is less efficient.

So unlike the situation when you have equalities, order of columns matters when inequalities are used.

Multiple inequalities

SELECT * FROM t WHERE c > 100 and b < 10 and d = 'xyz'

As we have 2 inequalities, we already know that we will not be able to filter on both conditions (*). So we have to make a decision: will we filter on (d, b) or on (d, c)?

It is not possible to tell which option is better without looking at the data: simply choose the column where the inequality is the most selective. The main point is that you must put the column(s) with an equality first.

(*) Actually there is a way to ‘filter’ on both inequalites: partition on b and add an index on (d, c) or partition on c and add an index on (d, b). The details are out of the scope of this post but it might be an option for some situations.

Equalities and sort

SELECT * FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

As mentioned in the first paragraph, an index can filter and sort so this query is easy to optimize. However like for inequalities, we must carefully choose the order of the columns in the index: the rule is that we will filter first, and then sort.

With that in mind, it is easy to know that (c, d, b) or (d, c, b) will be good indexes while (b, c, d) or (b, d, c) are not as good (they will sort but not filter).

And if we have:

SELECT c1, c2 FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

We can create a super efficient index that will filter, sort and be covering: (c, d, b, c1, c2).

Inequality and sort

We have 2 main variations here. The first one is:

SELECT * FROM t WHERE c > 100 and d = 'xyz' ORDER BY b

Two options look reasonable in this case:
1. filter on d and sort by b.
2. filter on d and c.

Which strategy is more efficient? It will depend on your data, so you will have to experiment.

The second variation is:

SELECT * FROM t WHERE c > 100 ORDER BY b

This time we have no equality so we have to choose between filtering and sorting. Most likely you will choose filtering.

Conclusion

Not all cases have been covered in this post but you can already see that in some cases you will create poor MySQL indexes if you are not careful. In a future post, I will present a case that can look confusing at first sight but which is easy to understand if you already know everything mentioned here.

The post Indexing 101: Optimizing MySQL queries on a single table appeared first on MySQL Performance Blog.

Pages

Subscribe to Percona aggregator
]]>