Buy Percona ServicesBuy Now!

Causes and Workarounds for Slave Performance Too Slow with Row-Based Events

Latest MySQL Performance Blog posts - May 3, 2018 - 12:09pm

Recently I worked on one customer issue that I would describe as “slave performance too slow”. During a quick analysis, I’ve found that the replication slave SQL thread cannot keep up while processing row-based events from the master’s binary log.

For example:

mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** ... Master_Log_File: binlog.0000185 Read_Master_Log_Pos: 86698585 ... Relay_Master_Log_File: binlog.0000185 Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Exec_Master_Log_Pos: 380 Relay_Log_Space: 85699128 ... Master_UUID: 98974e7f-2fbc-18e9-72cd-07003817585c ... Retrieved_Gtid_Set: 98974e7f-2fbc-18e9-72cd-07003817585c:1055-1057 Executed_Gtid_Set: 7f42e2c5-3fbc-16e7-7fb8-05003715789a:1-2, 98974e7f-2fbc-18e9-72cd-07003817585c:1-1056 ...

The processlist state for the SQL thread can be one of the following: Reading event from the relay log, or System lock, or potentially some other state. In my case:

mysql> SHOW PROCESSLIST; +----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+ ... | 4 | system user | | NULL | Connect | 268 | Reading event from the relay log | NULL | ... +----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+

What causes that?

Let’s take a look what could potentially cause such behavior and what we need to pay attention to. When the SQL thread applies the change from a row-based event, it has to locate the exact row that was updated. With a primary key, this is trivial as only one row can possibly have the same value for the primary key.

However, if there is no primary key on the table on the replication slave side, the SQL thread must search the entire table to locate the row to update or delete. It repeats the search for each updated row. This search is both very resource usage intensive (CPU usage can be up to 100%) and slow causing the slave to fall behind.

For InnoDB tables, the “hidden” key used for the clustered index for tables without a primary key cannot be used to avoid searching the entire table for the rows to update or delete. We need to keep in mind that the “hidden” key is unique only to each MySQL instance, so the replication master and replication slave generally don’t have the same values for the “hidden” key for the same row. What can we do to solve that?

The best solution is to ensure that all tables have a primary key. This not only ensures the SQL thread can easily locate rows to update or delete, but it is also considered as a best practice since it ensures all rows are unique.

If there is no way to logically add a natural primary key for the table, a potential solution is to add an auto-increment unsigned integer column as the primary key.

The query below helps you to locate tables without a primary key:

SELECT tables.table_schema, tables.table_name, tables.table_rows FROM information_schema.tables LEFT JOIN ( SELECT table_schema, table_name FROM information_schema.statistics GROUP BY table_schema, table_name, index_name HAVING SUM( CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END ) = COUNT(*) ) puks ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name WHERE puks.table_name IS NULL AND tables.table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') AND tables.table_type = 'BASE TABLE' AND engine='InnoDB';

Please note that for InnoDB, there must always be a unique NOT NULL key for all tables. It is required for the clustered index. So adding an explicit “dummy” column as suggested above will not add to the overall storage requirements as it will merely replace the hidden key.

It’s not always possible to add a primary key to the table immediately if, for example, there are many relations on the application side/legacy system, lack of resources, unknown application behavior after the change which required testing, etc.

In this case, a short-term solution is to change the search algorithm used by the replication slave to locate the rows changed by row-based events.

The search algorithm is set using the slave_rows_search_algorithms option which is available in MySQL 5.6 and later. The default value is to use an index scan if possible, otherwise a table scan.

https://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#option_mysqld_slave-rows-search-algorithms

However, for tables without a primary key using a hash scan, which causes the SQL thread to temporarily cache hashes to reduce the overhead of searching the whole table. The value of slave_rows_search_algorithms can be changed dynamically using:

mysql> SET GLOBAL slave_rows_search_algorithms = 'INDEX_SCAN,HASH_SCAN';
Just to note INDEX_SCAN,HASH_SCAN is the default value in MySQL 8.0.

One thing to be aware of when using hash scans that the hashes are only reused within one row-based event. (Each row-based event may have changes to several rows in the same table originating from the same SQL statement).

The binlog_row_event_max_size option on the replication master controls the maximum size of a row-based event. The default max event size is 8kB. This means that switching to hash scans only improves the performance of the SQL thread when:

  1. Several rows fit into one row based event. It may help to increase the value of binlog_row_event_max_size on the replication master, if you perform updates or deletes on large rows (e.g., with blob or text data). You can only set the binlog_row_event_max_size in the MySQL configuration file, and resetting this value requires a restart.
  2. One statement changes several rows.

https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#option_mysqld_binlog-row-event-max-size

Conclusion

Even if enabling hash scans improves the performance enough for the replication slave to keep up, the permanent solution is to add an explicit primary key to each table. This should be the general rule of thumb in the schema design in order avoid and/or minimize many issues like slave performance too slow (as described in this post).

Next, I am going to investigate how we can find out the exact thread state using Performance Schema in order to make issue identification less of a guessing game.

The post Causes and Workarounds for Slave Performance Too Slow with Row-Based Events appeared first on Percona Database Performance Blog.

ProxySQL Query Rewrite Use Case

Latest MySQL Performance Blog posts - May 2, 2018 - 4:02pm

In this blog post, I’m going to revisit the ProxySQL Query Rewrite feature. You may have seen me talking about possible use case scenarios in the past few conferences, but the reason I’m starting with this is that query rewriting was the original intention for building ProxySQL.

Why would you need to rewrite a query?

  • You’ve identified a query that’s causing bottleneck or slowness
  • A special operation requires query routing
  • You cannot modify application code

So here we have a case of a bad query hitting the backend database. You as a DBA have identified the query as causing severe slowdown, which could lead to a site-wide outage. This query needs to be optimized, and you have asked the developer to correct this bad query. Their answer isn’t really what you expected. You can rewrite some queries to have the same data result by choosing a different optimizer path. In cases where an application was written in ORM – such as Hibernate or similar – it is not easy to quickly make a code change.

The query rewrite feature of ProxySQL makes this possible (until the application can be modified).

How do we rewrite a query? There are two ways to accomplish this with ProxySQL.

Query rewrite is just a match_pattern + replace_pattern activity, whereas match_digest is only used for matching a query, not rewriting it. Logically, match_digest serves the same purpose of username, schemaname, proxy_addr, etc. It only matches the query.

These two different mechanisms offers ways to optimize query matching operation efficiently depending on the query type (such as DML operation versus SELECT query). Please note that if your intention is to rewrite queries, the rule must match the original query by using match_pattern. Query rules are processed by using rule_id field and only applied if active = 1.

Here’s how we can demonstrate match_digest in our test lab:

mysql> SELECT hostgroup hg, sum_time, count_star, digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC limit 10; +----+-----------+------------+-----------------------------------+ | hg | sum_time | count_star | digest_text | +----+-----------+------------+-----------------------------------+ | 0 | 243549572 | 85710 | SELECT c FROM sbtest10 WHERE id=? | | 0 | 146324255 | 42856 | COMMIT | | 0 | 126643488 | 44310 | SELECT c FROM sbtest7 WHERE id=? | | 0 | 126517140 | 42927 | BEGIN | | 0 | 123797307 | 43820 | SELECT c FROM sbtest1 WHERE id=? | | 0 | 123345775 | 43460 | SELECT c FROM sbtest6 WHERE id=? | | 0 | 122121030 | 43010 | SELECT c FROM sbtest9 WHERE id=? | | 0 | 121245265 | 42400 | SELECT c FROM sbtest8 WHERE id=? | | 0 | 120554811 | 42520 | SELECT c FROM sbtest3 WHERE id=? | | 0 | 119244143 | 42070 | SELECT c FROM sbtest5 WHERE id=? | +----+-----------+------------+-----------------------------------+ 10 rows in set (0.00 sec) mysql> INSERT INTO mysql_query_rules (rule_id,active,username,match_digest, match_pattern,replace_pattern,apply) VALUES (10,1,'root','SELECT.*WHERE id=?','sbtest2','sbtest10',1); Query OK, 1 row affected (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | 0 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ 1 row in set (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | 593 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ 1 row in set (0.00 sec)

We can also monitor Query Rules activity live using the ProxyTop utility:

To reset ProxySQL’s statistics for query rules, use following steps:

mysql> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.01 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec)

Here’s a match_pattern example:

mysql> SELECT hostgroup hg, sum_time, count_star, digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC limit 5; +----+----------+------------+----------------------------------+ | hg | sum_time | count_star | digest_text | +----+----------+------------+----------------------------------+ | 0 | 98753983 | 16292 | BEGIN | | 0 | 84613532 | 16232 | COMMIT | | 1 | 49327292 | 16556 | SELECT c FROM sbtest3 WHERE id=? | | 1 | 49027118 | 16706 | SELECT c FROM sbtest2 WHERE id=? | | 1 | 48095847 | 16396 | SELECT c FROM sbtest4 WHERE id=? | +----+----------+------------+----------------------------------+ 5 rows in set (0.01 sec) mysql> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (20,1,'root','DISTINCT(.*)ORDER BY c','DISTINCT1',1); Query OK, 1 row affected (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.01 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | 0 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | | 0 | 20 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | NULL | 1 | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ 2 rows in set (0.01 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | 9994 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | | 6487 | 20 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | NULL | 1 | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ 2 rows in set (0.00 sec) mysql> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec)

The key in query ruling for a rewrite is the order of the apply field:

  • apply = 1 means don’t evaluate any other rules if there’s a match already.
  • apply = 0 means evaluate the next rules in the chain.

As we can see in the test below, all queries matching with rule_id = 10 or rule_id = 20 have hits. In reality, all rules in runtime_mysql_query_rules are active. If we want to disable a rule that is in the mysql_query_rules table, set active = 0:

mysql> update mysql_query_rules set apply = 1 where rule_id in (10); Query OK, 1 row affected (0.00 sec) mysql> update mysql_query_rules set apply = 0 where rule_id in (20); Query OK, 1 row affected (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | 0 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | | 0 | 20 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | NULL | 0 | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ 2 rows in set (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, flagIN, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | flagIN | apply | +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ | 10195 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | 0 | 1 | | 6599 | 20 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | 0 | 0 | +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ 2 rows in set (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, flagIN, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | flagIN | apply | +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ | 20217 | 5 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | 0 | 1 | | 27020 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | 0 | 0 | +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ 2 rows in set (0.00 sec) mysql> update mysql_query_rules set active = 0 where rule_id = 5; Query OK, 1 row affected (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.02 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | 0 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 0 | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ 1 row in set (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | 4224 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 0 | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ 1 row in set (0.01 sec)

Additionally, ProxySQL can help to identify bad queries. Login to the admin module and follow these steps:

Find the most time-consuming queries:

mysql> SELECT SUM(sum_time), SUM(count_star), digest_text FROM stats_mysql_query_digest GROUP BY digest ORDER BY SUM(sum_time) DESC LIMIT 3G *************************** 1. row *************************** SUM(sum_time): 95053795 SUM(count_star): 13164 digest_text: BEGIN *************************** 2. row *************************** SUM(sum_time): 85094367 SUM(count_star): 13130 digest_text: COMMIT *************************** 3. row *************************** SUM(sum_time): 52110099 SUM(count_star): 13806 digest_text: SELECT c FROM sbtest3 WHERE id=? 3 rows in set (0.00 sec)

Find highest average execution time:

mysql> SELECT SUM(sum_time), SUM(count_star), SUM(sum_time)/SUM(count_star) avg, digest_text FROM stats_mysql_query_digest GROUP BY digest ORDER BY SUM(sum_time)/SUM(count_star) DESC limit 1; +---------------+-----------------+--------+--------------------------------+ | SUM(sum_time) | SUM(count_star) | avg | digest_text | +---------------+-----------------+--------+--------------------------------+ | 972162 | 1 | 972162 | CREATE INDEX k_5 ON sbtest5(k) | +---------------+-----------------+--------+--------------------------------+ 1 row in set (0.00 sec)

The above information can also be gathered from information_schema.events_statements_summary_by_digest, but I prefer the ProxySQL admin interface. Also, you can run the slow query log analysis by running a detailed pt-query-digest on your system to identify slow queries. You can also use PMM’s QAN.

Conclusion

I’ve found the best documentation on ProxySQL query rewrite is at IBM’s site, where they explain query rewrite fundamentals with examples. It’s worth a read. I’m not going to get into the details of these techniques here, but if you find more relevant resources, please post them in the comments section.

A few of the possible query optimization techniques:

  • Operation merging
  • Operation movement
  • Predicate translation

At the time of this blog post, ProxySQL has also announced a new fast schema routing algorithm to support thousands of shards.

There may be other cases where you want to divert traffic to another table. Think of a table hitting the maximum integer value, and you want to keep inserts going into a new table while you alter the old one to correct the issue. In the mean time, all selects can still point to the old table to continue operation.

As of MySQL 5.7.6, Oracle also offers query rewrite as a plugin, and you can find the documentation here. The biggest disadvantage of using Oracle’s built-in solution is the rewrite rule sits with the server it is implemented on. That’s where ProxySQL has a bigger advantage: it sits between the application and database server, so the rule applies to the entire topology, not just for a single host.

As you can see, ProxySQL query rewrite is a great way to solve some real operational issues and make you a hero to the team and project. To become a rock star, you might want to consider Percona Training on ProxySQL. The training will provide the knowledge to set up a ProxySQL environment with best practices, understand when and how to change the configuration, and maintain it to ensure increasing your uptime SLAs. Contact us for more details at info@percona.com.

References:

https://www.percona.com/blog/2017/04/10/proxysql-rules-do-i-have-too-many/

http://www.proxysql.com/blog/query-rewrite-with-proxysql-use-case-scenario

https://github.com/sysown/proxysql/wiki/ProxySQL-Configuration#query-rewrite

https://dev.mysql.com/doc/refman/5.7/en/rewriter-query-rewrite-plugin.html

https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c0005293.html

The post ProxySQL Query Rewrite Use Case appeared first on Percona Database Performance Blog.

Installed PMM-Server on a new server and I cannot select all hosts any longer

Lastest Forum Posts - May 2, 2018 - 8:19am
On an older machine, I had PMM running and in the MySQL overview, I was able to select All hosts that displayed all of my hosts on a single graph. It was very nice to monitor. Since I installed PMM server on a new server, the 1st host is automatically selected in the hosts list, I can select others, but the All option is not there any linger.

high CPU usage on pmm server

Lastest Forum Posts - May 2, 2018 - 3:41am
i've installed pmm server v1.10.0 via docker deployment method.
this is a dedicated server only running pmm server ,it has 16 cores/48GB RAM/1000GB SSD, monitors about 60 mysql servers.
what puzzled me is extremely high CPU usage, almost always 100%,
and caused bad response time.
i used -e METRICS_MEMORY=30720000 and -e METRICS_RETENTION=360h to limit 30GB RAM usage, 15 days retention.

what should i do to improve pmm response?
thanks in advance

added new user, but can not login

Lastest Forum Posts - May 2, 2018 - 1:16am
i've installed pmm v1.10.0 via docker,
after installation, i added new user via web page,
but new user could not login,
i found the user added to sqlite user table of grafana,
what did i miss to configure?

MongoDB Rollback in replicaset

Latest MySQL Performance Blog posts - May 1, 2018 - 3:30pm

In this blog post, we’ll look at how MongoDB rollback works during replicaset failovers.

In recent versions, MongoDB has provided lots of features related to replicaset and automatic failover. When it comes to failover, the next question that arises is “How does MongoDB ROLLBACK work during replicaset failover?”

If a PRIMARY member (say node A) stepped down with some data writes that were executed but not replicated to the SECONDARY members yet, then a ROLLBACK occurs on the former PRIMARY A when it rejoins the replicaset. I’ll explain below how the ROLLBACK works!

ROLLBACK Scenario:

ROLLBACK is rare in a replicaset as MongoDB tries to avoid it by replicating the operations from PRIMARY to SECONDARY without delay, under normal conditions. Most of the time ROLLBACK occurs in the event of network partitioning, or if SECONDARY members can’t keep up with the throughput of operations on the former PRIMARY.

ROLLBACK Process:

We will see the process with a test. I have used Docker for this test with the MongoDB 3.2 Jessie version to setup a replicaset with members mongo1 – A, mongo2 – B, mongo3 – C and set Priority 10 to A. Now A is PRIMARY as expected in the replicaset. We need to write some data into A and create a network partition scenario with B and C at the same time. For that, I inserted 25000 documents into A and made it out of network at the same time.

Terminal 1 (A’s mongo prompt):

my-mongo-set:PRIMARY> for (var i = 1; i <= 25000; i++) { ...    db.testData.insert( { x : i } ) ... } WriteResult({ "nInserted" : 1 }) my-mongo-set:PRIMARY> db.testD2018-03-30T17:34:51.455+0530 I NETWORK  [thread1] trying reconnect to 127.0.0.1:30001 (127.0.0.1) failed 2018-03-30T17:34:51.464+0530 I NETWORK  [thread1] reconnect 127.0.0.1:30001 (127.0.0.1) ok                       db.testD admin.testD my-mongo-set:SECONDARY> rs.slaveOk() my-mongo-set:SECONDARY> db.testData.count() 25000

Terminal2:

Vinodhs-MBP:~ vinodhkrish$ docker ps CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS                           NAMES b27d82ac2439        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days           0.0.0.0:30003->27017/tcp        mongo3 2b39f9e41973        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days           0.0.0.0:30002->27017/tcp        mongo2 105b6df757d7        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days           0.0.0.0:30001->27017/tcp        mongo1 Vinodhs-MBP:~ vinodhkrish$ docker network disconnect my-mongo-cluster mongo1

The member A has now become as SECONDARY, because it couldn’t reach other members in the replicaset. On the other side, B and C members see that A is not reachable and then B is elected as PRIMARY. We could see that some inserts from former A replicated to B before the network split happens.

(B node)

my-mongo-set:PRIMARY> db.testData.count() 15003

Now do some write operations in current PRIMARY – B and then let node A join the network back by joining the container back to the bridge network. You can observe below that the node A’s member states are changing in the mongo prompt. (I just connected to A and pressed ENTER/RETURN button many times to see the member states, or you can see them in the log file):

(A node)

Vinodhs-MacBook-Pro:mongodb-osx-x86_64-3.2.19 vinodhkrish$ ./bin/mongo 127.0.0.1:30001/admin MongoDB shell version: 3.2.19 connecting to: 127.0.0.1:30001/admin my-mongo-set:ROLLBACK>  my-mongo-set:RECOVERING>  my-mongo-set:SECONDARY>  my-mongo-set:SECONDARY>  my-mongo-set:PRIMARY>

ROLLBACK Internal

From MongoDB point of view, we will see the replicaset process to understand what happened above. Normally the SECONDARY member syncs the oplog entries from its syncSource (the member from where the data is replicated) by using oplogFetcher. The OplogFetcher first sends a find() command to the syncSource’s oplog, and then follows with a series of getMores on the cursor. When node A rejoins the replicaset, node A’s oplogFetcher first sends find() command to syncSource node B and check it has a greater than or equal predicate on the timestamp of the last oplog entry it has fetched. Usually the find() command should at least return one doc due to the greater than or equal predicate. If not, it means that the syncSource is behind and so it will not replicate from it and look for other syncSource.

In this case, A’s oplogFetcher sees that the first document returned from node B does not match the last entry in its oplog. That means node A’s oplog has diverged from node B’s and it should go into ROLLBACK.

Node A first finds the common point between its oplog and its syncSource B’s oplog. It then goes through all of the operations in its oplog back to the common point and figures out how to undo them. Here, 9997 inserts are missed from B and C nodes, and so these documents will be recovered from A’s oplog.

2018-03-30T12:08:37.160+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: 4, timestamp: Mar 30 12:03:52:139). source's GTE: (term: 5, timestamp: Mar 30 12:05:37:1) hashes: (3789163619674410187/3226093795606474294) 2018-03-30T12:08:37.160+0000 I REPL     [rsBackgroundSync] rollback 0 2018-03-30T12:08:37.160+0000 I REPL     [ReplicationExecutor] transition to ROLLBACK 2018-03-30T12:08:37.163+0000 I REPL     [rsBackgroundSync] beginning rollback 2018-03-30T12:08:37.163+0000 I REPL     [rsBackgroundSync] rollback 1 2018-03-30T12:08:37.164+0000 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint 2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback our last optime:   Mar 30 12:03:52:139 2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback their last optime: Mar 30 12:08:17:1c5 2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback diff in end of log times: -265 seconds 2018-03-30T12:08:37.269+0000 I REPL     [rsBackgroundSync] rollback common point is (term: 4, timestamp: Mar 30 12:03:46:d2) 2018-03-30T12:08:37.269+0000 I REPL     [rsBackgroundSync] rollback 3 fixup 2018-03-30T12:08:38.240+0000 I REPL     [rsBackgroundSync] rollback 3.5 2018-03-30T12:08:38.240+0000 I REPL     [rsBackgroundSync] Setting minvalid to (term: 5, timestamp: Mar 30 12:08:17:1c5) 2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4 n:1 2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4.6 2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4.7 2018-03-30T12:08:38.391+0000 I REPL     [rsBackgroundSync] rollback 5 d:9997 u:0 2018-03-30T12:08:38.391+0000 I REPL     [rsBackgroundSync] rollback 6 2018-03-30T12:08:38.394+0000 I REPL     [rsBackgroundSync] rollback done 2018-03-30T12:08:38.396+0000 I REPL     [rsBackgroundSync] rollback finished

ROLLBACK data

Where would these 9997 recovered documents go? MongoDB writes these ROLLBACK documents under the rollback directory in the dbpath. These recovered collections are named with namespace as the prefix and the date time as the suffix in their names. These are in BSON format, and we need to convert into JSON to analyze them so the plan for the next course of action can be done. In our case, the testData collection’s rollback data are as follows:

root@105b6df757d7:/# cd /data/db root@105b6df757d7:/data/db# ls -l rollback/ total 324K -rw-r--r-- 1 mongodb mongodb 323K Mar 30 12:08 admin.testData.2018-03-30T12-08-38.0.bson

root@105b6df757d7:/data/db/rollback# bsondump admin.testData.2018-03-30T12-08-38.0.bson > rollback.json 2018-03-30T12:13:00.033+0000 9997 objects found root@105b6df757d7:/data/db/rollback# head rollback.json {"_id":{"$oid":"5abe279f97044083811b5975"},"x":15004.0} {"_id":{"$oid":"5abe279f97044083811b5976"},"x":15005.0} {"_id":{"$oid":"5abe279f97044083811b5977"},"x":15006.0} {"_id":{"$oid":"5abe279f97044083811b5978"},"x":15007.0} {"_id":{"$oid":"5abe279f97044083811b5979"},"x":15008.0} {"_id":{"$oid":"5abe279f97044083811b5980"},"x":15009.0} {"_id":{"$oid":"5abe279f97044083811b5981"},"x":15010.0}

That’s it? Now check the counts of the testData collection in node A:

my-mongo-set:PRIMARY> db.testData.count() 15003

So the records 9997 which were rollbacked into the rollback directory would also be dropped from the collection. This ensures the data consistency throughout the replicaset.

How to avoid ROLLBACK – writeConcern

The default writeConcern in the replicaSet is w:1., i.e., When a client writes into a replicaSet, then it receives an acknowledgment from the PRIMARY alone and won’t wait for SECONDARY members’ acknowledgment. If you want to avoid the ROLLBACK scenario in your environment, then you have to use the {w:majority} or {w:n}, where 1 > n <=  (no. of members in your replica set). This ensures that the writes are propagated to so many members of the replica set before sending the acknowledgment to the client. This solves the problem of ROLLBACK.

But please be careful that you are not giving higher value to writeConcern, because it also affects the write performance. The acknowledgment needs to be received from the number of members mentioned in the value. The value {w:majority} provides the acknowledgement that write operations have propagated to the majority of voting nodes, including the primary and is suitable for most of the environments.

ROLLBACK – Limitation

The main thing to note here is that mongod will not rollback more than 300MB data. In such cases, we need to manually check the instance to recover the data. You can see the below message in mongod.log in such cases:

[replica set sync] replSet syncThread: 13410 replSet too much data to roll back

Understanding this simple ROLLBACK background helps us to decide what needs to be done with the rollbacked data. It also helps us avoid such scenarios, because data is data and is very important!

The post MongoDB Rollback in replicaset appeared first on Percona Database Performance Blog.

Webinar Thursday May 3, 2018: Running MongoDB in Production (Part 3)

Latest MySQL Performance Blog posts - May 1, 2018 - 1:29pm

Please join Percona’s Senior Technical Operations Architect, Tim Vaillancourt as he presents Running MongoDB in Production (Part 3) on Thursday, May 3, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

Are you a seasoned MySQL DBA that needs to add MongoDB to your skills? Are you used to managing a small environment that runs well, but want to know what you might not know yet?

MongoDB works well, but when it has issues the number one question is “where should I go to solve a problem?”

This webinar on running MongoDB covers:

  • Troubleshooting
    • Log File
    • Slow Query
    • Operations
  • Schema Design
    • Data Types
    • Indexes
    • Workflows
  • Data Integrity
    • Replica Sets
    • Write Concerns
    • Data Recovery
  • Scaling (Read/Writes)

Register for the webinar now.

Missed Part 1 and Part 2 of our Running MongoDB in Production series? You can watch and download the slides of Part 1 here and watch or download the slides of Part 2 here.

Timothy Vaillancourt, Senior Technical Operations Architect

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB with the goal of making MongoDB operations as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming, combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Prior to moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thursday May 3, 2018: Running MongoDB in Production (Part 3) appeared first on Percona Database Performance Blog.

ClickHouse Meetup in Salt Lake City

Latest MySQL Performance Blog posts - May 1, 2018 - 12:07pm

Join Percona CTO Vadim Tkachenko at the Cloud Native Utah meetup in Salt Lake City on Tuesday, May 8, 2018, for an Intro to ClickHouse.

Next week, I’ll be switching from MyRocks performance testing and present an introduction to ClickHouse to the Cloud Native Utah meetup.

Interestingly enough, even though it is totally different from OLTP engines, ClickHouse uses a MergeTree engine. MergeTree engines have a lot of similarities with Log Structured Merge Tree (which is what is used by MyRocks / RocksDB). This the structure is optimized to run on huge datasets / low memory scenarios.

PingCAP TiDB and CockroachDB – the new databases on the block – are using RocksDB as the main storage engine. So is Log Structured Merge Tree the future of databases?

We can talk about this and other questions next week in Salt Lake City. If you are in town please join us at the Cloud Native Utah meetup.

 

The post ClickHouse Meetup in Salt Lake City appeared first on Percona Database Performance Blog.

A Look at MyRocks Performance

Latest MySQL Performance Blog posts - April 30, 2018 - 4:35pm

In this blog post, I’ll look at MyRocks performance through some benchmark testing.

As the MyRocks storage engine (based on the RocksDB key-value store http://rocksdb.org ) is now available as part of Percona Server for MySQL 5.7, I wanted to take a look at how it performs on a relatively high-end server and SSD storage. I wanted to check how it performs for different amounts of available memory for the given database size. This is similar to the benchmark I published a while ago for InnoDB (https://www.percona.com/blog/2010/04/08/fast-ssd-or-more-memory/).

In this case, I plan to use a sysbench-tpcc benchmark (https://www.percona.com/blog/2018/03/05/tpcc-like-workload-sysbench-1-0/) and I will execute it for both MyRocks and InnoDB. We’ll use InnoDB as a baseline.

For the benchmark, I will use 100 TPC-C warehouses, with a set of 10 tables (to shift the bottleneck from row contention). This should give roughly 90GB of data size (when loaded into InnoDB) and is a roughly equivalent to 1000 warehouses data size.

To vary the memory size, I will change innodb_buffer_pool_size from 5GB to 100GB for InnoDB, and rocksdb_block_cache_size for MyRocks.

For MyRocks we will use LZ4 as the default compression on disk. The data size in the MyRocks storage engine is 21GB. Interesting to note, that in MyRocks uncompressed size is 70GB on the storage.

For both engines, I did not use FOREIGN KEYS, as MyRocks does not support it at the moment.

MyRocks does not support SELECT .. FOR UPDATE statements in REPEATABLE-READ mode in the Percona Server for MySQL implementation. However, “SELECT .. FOR UPDATE” is used in this benchmark. So I had to use READ-COMMITTED mode, which is supported.

The most important setting I used was to enable binary logs, for the following reasons:

  1. Any serious production uses binary logs
  2. With disabled binary logs, MyRocks is affected by a suboptimal transaction coordinator

I used the following settings for binary logs:

  • binlog_format = ‘ROW’
  • binlog_row_image=minimal
  • sync_binlog=10000 (I am not using 0, as this causes serious stalls during binary log rotations, when the  content of binary log is flushed to storage all at once)

While I am not a full expert in MyRocks tuning yet, I used recommendations from this page: https://github.com/facebook/mysql-5.6/wiki/my.cnf-tuning. The Facebook-MyRocks engineering team also provided me input on the best settings for MyRocks.

Let’s review the results for different memory sizes.

This first chart shows throughput jitter. This helps to understand the distribution of throughput results. Throughput is measured every 1 second, and on the chart I show all measurements after 2000 seconds of a run (the total length of each run is 3600 seconds). So I show the last 1600 seconds of each run (to remove warm-up phases):

To better quantify results, let’s take a look at them on a boxplot. The quickest way to understand boxplots is to take a look at the middle line. It represents a median of measurements (see more at https://www.percona.com/blog/2012/02/23/some-fun-with-r-visualization/):

Before we jump to the summary of results, let’s take a look at a variation of the throughput for both InnoDB and MyRocks. We will zoom to a 1-second resolution chart for 100 GB of allocated memory:

We can see that there is a lot of variation with periodical 1-second performance drops with MyRocks. At this moment, I do not know what causes these drops.

So let’s take a look at the average throughput for each engine for different memory settings (the results are in tps, and more is better):

Memory, GB InnoDB MyRocks 5 849.0664 4205.714 10 1321.9 4298.217 20 1808.236 4333.424 30 2275.403 4394.413 40 2968.101 4459.578 50 3867.625 4503.215 60 4756.551 4571.163 70 5527.853 4576.867 80 5984.642 4616.538 90 5949.249 4620.87 100 5961.2 4599.143

 

This is where MyRocks behaves differently from InnoDB. InnoDB benefits greatly from additional memory, up to the size of working dataset. After that, there is no reason to add more memory.

At the same time, interestingly MyRocks does not benefit much from additional memory.

Basically, MyRocks performs as expected for a write-optimized engine. You can refer to my article How Three Fundamental Data Structures Impact Storage and Retrieval for more details. 

In conclusion, InnoDB performs better (compared to itself) when the working dataset fits (or almost fits) into available memory, while MyRocks can operate (and outperform InnoDB) on small memory sizes.

IO and CPU usage

It is worth looking at resource utilization for each engine. I took vmstat measurements for each run so that we can analyze IO and CPU usage.

First, let’s review writes per second (in KB/sec). Please keep in mind that these writes include binary log writes too, not just writes from the storage engine.

Memory, GB InnoDB MyRocks 5 244754.4 87401.54 10 290602.5 89874.55 20 311726 93387.05 30 313851.7 93429.92 40 316890.6 94044.94 50 318404.5 96602.42 60 276341.5 94898.08 70 217726.9 97015.82 80 184805.3 96231.51 90 187185.1 96193.6 100 184867.5 97998.26

 

We can also calculate how many writes per transaction each storage engine performs:

This chart shows the essential difference between InnoDB and MyRocks. MyRocks, being a write-optimized engine, uses a constant amount of writes per transaction.

For InnoDB, the amount of writes greatly depends on the memory size. The less memory we have, the more writes it has to perform.

What about reads?

The following table shows reads in KB per second.

Memory, GB InnoDB MyRocks 5 218343.1 171957.77 10 171634.7 146229.82 20 148395.3 125007.81 30 146829.1 110106.87 40 144707 97887.6 50 132858.1 87035.38 60 98371.2 77562.45 70 42532.15 71830.09 80 3479.852 66702.02 90 3811.371 64240.41 100 1998.137 62894.54

 

We can translate this to the number of reads per transaction:

This shows MyRocks’ read-amplification. The allocation of more memory helps to decrease IO reads, but not as much as for InnoDB.

CPU usage

Let’s also review CPU usage for each storage engine. Let’s start with InnoDB:

The chart shows that for 5GB memory size, InnoDB spends most of its time in IO waits (green area), and the CPU usage (blue area) increases with more memory.

This is the same chart for MyRocks:

In tabular form:

Memory, GB engine us sys wa id 5 InnoDB 8 2 57 33 5 MyRocks 56 11 18 15 10 InnoDB 12 3 57 28 10 MyRocks 57 11 18 13 20 InnoDB 16 4 55 25 20 MyRocks 58 11 19 11 30 InnoDB 20 5 50 25 30 MyRocks 59 11 19 10 40 InnoDB 26 7 44 24 40 MyRocks 60 11 20 9 50 InnoDB 35 8 38 19 50 MyRocks 60 11 21 7 60 InnoDB 43 10 36 10 60 MyRocks 61 11 22 6 70 InnoDB 51 12 34 4 70 MyRocks 61 11 23 5 80 InnoDB 55 12 31 1 80 MyRocks 61 11 23 5 90 InnoDB 55 12 32 1 90 MyRocks 61 11 23 4 100 InnoDB 55 12 32 1 100 MyRocks 61 11 24 4

 

We can see that MyRocks uses a lot of CPU (in us+sys state) no matter how much memory is allocated. This leads to the conclusion that MyRocks performance is limited more by CPU performance than by available memory.

MyRocks directory size

As MyRocks writes all changes and compacts SST files down the road, it would be interesting to see how the data directory size changes during the benchmark so we can estimate our storage needs. Here is a chart of datadirectory size:

We can see that datadirectory goes from 20GB at the start, to 31GB during the benchmark. It is interesting to observe the data growing until compaction shrinks it.

Conclusion

In conclusion, I can say that MyRocks performance increases as the ratio of dataset size to memory increases, outperforming InnoDB by almost five times in the case of 5GB memory allocation. Throughput variation is something to be concerned about, but I hope this gets improved in the future.

MyRocks does not require a lot of memory and shows constant write IO, while using most of the CPU resources.

I think this potentially makes MyRocks a great choice for cloud database instances, where both memory and IO can cost a lot. MyRocks deployments would make it cheaper to deploy in the cloud.

I will follow up with further cloud-oriented benchmarks.

Extras Raw results, scripts and config

My goal is to provide fully repeatable benchmarks. To this end, I’m  sharing all the scripts and settings I used in the following GitHub repo:

https://github.com/Percona-Lab-results/201803-sysbench-tpcc-myrocks

MyRocks settings

rocksdb_max_open_files=-1 rocksdb_max_background_jobs=8 rocksdb_max_total_wal_size=4G rocksdb_block_size=16384 rocksdb_table_cache_numshardbits=6 # rate limiter rocksdb_bytes_per_sync=16777216 rocksdb_wal_bytes_per_sync=4194304 rocksdb_compaction_sequential_deletes_count_sd=1 rocksdb_compaction_sequential_deletes=199999 rocksdb_compaction_sequential_deletes_window=200000 rocksdb_default_cf_options="write_buffer_size=256m;target_file_size_base=32m;max_bytes_for_level_base=512m;max_write_buffer_number=4;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=20;level0_stop_writes_trigger=30;max_write_buffer_number=4;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;compaction_pri=kMinOverlappingRatio;compression=kLZ4Compression;bottommost_compression=kLZ4Compression;compression_opts=-14:4:0" rocksdb_max_subcompactions=4 rocksdb_compaction_readahead_size=16m rocksdb_use_direct_reads=ON rocksdb_use_direct_io_for_flush_and_compaction=ON

InnoDB settings

# files  innodb_file_per_table  innodb_log_file_size=15G  innodb_log_files_in_group=2  innodb_open_files=4000 # buffers  innodb_buffer_pool_size= 200G  innodb_buffer_pool_instances=8  innodb_log_buffer_size=64M # tune  innodb_doublewrite= 1  innodb_support_xa=0  innodb_thread_concurrency=0  innodb_flush_log_at_trx_commit= 1  innodb_flush_method=O_DIRECT_NO_FSYNC  innodb_max_dirty_pages_pct=90  innodb_max_dirty_pages_pct_lwm=10  innodb_lru_scan_depth=1024  innodb_page_cleaners=4  join_buffer_size=256K  sort_buffer_size=256K  innodb_use_native_aio=1  innodb_stats_persistent = 1  #innodb_spin_wait_delay=96 # perf special  innodb_adaptive_flushing = 1  innodb_flush_neighbors = 0  innodb_read_io_threads = 4  innodb_write_io_threads = 2  innodb_io_capacity=2000  innodb_io_capacity_max=4000  innodb_purge_threads=4  innodb_adaptive_hash_index=1

Hardware spec

Supermicro server:

  • CPU:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
  • Memory: 256GB of RAM
  • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
  • Filesystem: ext4
  • Percona-Server-5.7.21-20
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic

The post A Look at MyRocks Performance appeared first on Percona Database Performance Blog.

Keep Sensitive Data Secure in a Replication Setup

Latest MySQL Performance Blog posts - April 30, 2018 - 2:47pm

This blog post describes how to keep sensitive data secure on slave servers in a MySQL async replication setup.

Almost every web application has a sensitive data: passwords, SNN, credit cards, emails, etc. Splitting the database to secure and “public” parts allows for restricting user and application parts access to sensitive data.

Field encryption

This is based on MySQL encryption functions or on client-side encryption when the authorized user knows a secret, but encrypted data is distributed to all slaves.

  • If possible, use hashes with a big enough salt, and do not store real sensitive data in the database. A good example is passwords. An end-user sends the login and password, application/SQL code calculates the hash with a salt value unique for each end-user and compares the hash with the value stored in the database. Even if the attacker gets the hashes, it’s still hard or even impossible to extract real passwords for all users. Make sure that you are using a good random number generator for the salt, application-side secret, and a good hash function (not MD5).
  • Encryption is not suitable if you are going to provide public access to your database (via slave dumps in sql/csv/xml/json format).
  • Encryption is a complex topic. Check here for a good blog post explaining hashing usage, and try to find a security consultant if you are inventing some “new” method of storing and encrypting data.
Field encryption example

I’m using a single server setup, because the most important part of data separation should be done on the application side. The secure part of the application has a secret passphrase. For example, you can place the code working with authentication, full profile and payments on a separate server and use a dedicated MySQL account.

create database encrypted; use encrypted; create table t(c1 int, c2 varchar(255), rnd_pad varbinary(16), primary key(c1)); SET block_encryption_mode = 'aes-256-cbc'; SET @key_str = SHA2('My secret passphrase',512); SET @init_vector = RANDOM_BYTES(16); insert into t (c1,c2, rnd_pad) values (1, AES_ENCRYPT('Secret', @key_str, @init_vector), @init_vector); -- decrypt data select c1, AES_DECRYPT(c2,@key_str, rnd_pad) from t;

Summary
  • GOOD: Master and slave servers have exactly the same data and no problems with replication.
  • GOOD: Even if two different end-users have exactly the same password, the stored values are different due to random bytes in the init vector for AES encryption.
  • GOOD: Both the encryption and random number generation uses an external library (openssl).
  • CONF: It’s important to have binlog_format=ROW to avoid sending the secret to slave servers.
  • CONF: Do not allow end-users to change data without changing the init_vector, especially for small strings without random padding. Each update should cause init_vector re-generation.
  • BAD: Encrypted data is still sent to slave servers. If the encryption algorithm or protocol is broken, it is possible to get access to data from an insecure part of the application.
  • BAD: The described protocol still could be insecure.
Replication filters

There are two types of replication filters: a master-side with binlog-*db and a slave-side with replicate-*.

Both could cause replication breakage. Replication filters were created for STATEMENT-based replication and are problematic with modern binlog_format=ROW + gtid_mode=on setup. You can find several cases related to database-level slave-side filters in this blog post. If you still need slave-side filtering, use per-table replicate-wild-*-table options.

Master-side

Even if binary logging is disabled for a specific database, the statement still could be stored in the binary log if it’s a DDL statement, or if the binlog_format is STATEMENT or MIXED and default database is not used by the statement. For details, see the reference manual for the binlog-do-db option. In order to avoid replication issues, you should use ROW-based replication and run SET SESSION sql_log_bin=0; before each DDL statement is executed against the ignored database. It’s not a good idea to use binlog-do-db, because you are losing control of what should be replicated.

Why is binary log filtering useful? Changing the sql_log_bin variable is prohibited inside transactions. The sql_log_bin is DANGEROUS, please do not use it instead of binlog-ignore-db in production on the application side. If you need it for database administration, make sure that you are always typing the “session” word before sql_log_bin. This makes problematic consistent updates of multiple entities inside database.

We still should have the ability to hide just one column from the table. But if we are ignoring the database, we should provide a method of reading non-secure data on slaves / by restricted MySQL accounts. This is possible with triggers and views:

create database test; set session sql_log_bin=0; create table test.t(c1 int, c2 int, primary key(c1)); alter table test.t add primary key(c1); set session sql_log_bin=1; create database test_insecure; create table test_insecure.t(c1 int, c2 int default NULL, primary key(c1)); use test delimiter // create trigger t_aft_ins after insert on test.t FOR EACH ROW BEGIN INSERT test_insecure.t (c1) values (NEW.c1); END // create trigger t_aft_upd after update on test.t FOR EACH ROW BEGIN UPDATE test_insecure.t SET c1 = NEW.c1 WHERE c1 = OLD.c1; END // create trigger t_aft_del after delete on test.t FOR EACH ROW BEGIN DELETE FROM test_insecure.t WHERE c1 = OLD.c1; END // delimiter ; -- just on slave: create database test; create view test.t as select * from test_insecure.t; -- typical usage INSERT INTO test.t values(1,1234); SELECT * from test.t; -- works on both master and slave, c2 field will have NULL value on slave.

Summary
  • BAD: The data is not the same on the master and slaves. It potentially breaks replication. It’s not possible to use a slave’s backup to restore the master or promote the slave as a new master.
  • BAD: Triggers could reduce DML statement performance.
  • GOOD: The sensitive data is not sent to slaves at all (and not written to binary log).
  • GOOD: It works with GTID
  • GOOD: It requires no application changes (or almost no application changes).
  • GOOD: binlog-ignore-db allows us to not use the dangerous sql_log_bin variable after initial table creation.

The post Keep Sensitive Data Secure in a Replication Setup appeared first on Percona Database Performance Blog.

application access error

Lastest Forum Posts - April 30, 2018 - 8:11am
hello
I’m facing serious error while trying access my application through percona xdb cluster

Caused by: java.sql.SQLException: User does not have access to metadata required to determine stored procedure parameter types. If rights cannot be granted, configure connection with "noAccessToProcedureBodies=true" to have driver generate parameters that represent INOUT strings irregardless of actual parameter types.

for more info I’m using same dump DB and same apps source on MySQL 5.5 without any kind of problem

RDS performance_schema

Lastest Forum Posts - April 29, 2018 - 7:54pm
pmm qan api will make very high load query t0 performace_schema events_statements_history on RDS instances all.
as u konw aurora db performace_schema variables (performance_schema_digests_size, performance_schema_events_stages_history_size ) is not able to modify.
so how can i reduce the load ?

thanks.



PMM server High Availability

Lastest Forum Posts - April 28, 2018 - 9:28pm
Do we have any case study for how we can achieve High Availablity for our PMM server. In case we lose the VM on which PMM is present, we would be losing a lot of Metrics data. Metrics data and alert system will be all at halt! Do we have any suggestions on how can we achieve High Availability for this PMM server with a minimal time frame of metrics data lost?

This Week In Data with Colin Charles 37: Percona Live 2018 Wrap Up

Latest MySQL Performance Blog posts - April 27, 2018 - 3:17pm

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Santa Clara 2018 is now over! All things considered, I think it went off quite well; if you have any comments/complaints/etc., please don’t hesitate to drop me a line. I believe a survey will be going out as to where you’d like to see the conference in 2019 – yes, it is no longer going to be at the Santa Clara Convention Centre.

I was pleasantly surprised that several people came up to me saying they read this column and enjoy it. Thank you!

The whole conference was abuzz with MySQL 8.0 GA chatter. Many seemed to enjoy the PostgreSQL focus too, now that Percona announced PostgreSQL support.

Congratulations as well to the MySQL Community Awards 2018 winners.

Releases Link List Upcoming appearances Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week In Data with Colin Charles 37: Percona Live 2018 Wrap Up appeared first on Percona Database Performance Blog.

MySQL 8.0 GA: Quality or Not?

Latest MySQL Performance Blog posts - April 27, 2018 - 12:36pm

What does Anton Ego – a fictional restaurant critic from the Pixar movie Ratatouille – have to do with MySQL 8.0 GA?

When it comes to being a software critic, a lot.

In many ways, the work of a software critic is easy. We risk very little and thrive on negative criticism, which is fun to read and write.

But what about those who give their many hours of code development, and those who have tested such code before release? How about the many people behind the scenes who brought together packaging, documentation, multiple hours of design, marketing, online resources and more?

And all of that, I might add, is open source! Free for the world to take, copy, adapt and even incorporate in full or in part into their own open development.

It is in exactly that area that the team at MySQL shines once again – they have from their humble beginnings build up a colossally powerful database software that handles much of the world’s data, fast.

Used in every area of life – aerospace, defense, education, finances, government, healthcare, pharma, manufacturing, media, retail, telecoms, hospitality, and finally the web – it truly is a community effort.

My little contribution to this effort is first and foremost to say: well done! Well done for such an all-in-all huge endeavor. When I tested MySQL 8.0, I experienced something new: an extraordinarily clean bug report screen when I unleashed our bug hunting rats, ahem, I mean tools. This was somewhat unexpected. Usually, new releases are a fun playground even for seasoned QA engineers who look for the latest toy to break.

I have a suspicion that the team at Oracle either uses newly-improved bug-finding tools or perhaps they included some of our methods and tools in their setup. In either case, it is, was and will be welcome.

When the unexpected occurs, a fight or flight syndrome happens. I tend to be a fighter, so I upped the battle and managed to find about 30 bugs, with 21 bugs logged already. Quite a few of them are Sig 11’s in release builds. Signal 11 exceptions are unexpected crashes, and release builds are the exact same build you would download at dev.mysql.com.

The debug build also had a number of issues, but less than expected, leading me to the conclusions drawn above. Since Oracle engineers marked many of the issues logged as security bugs, I didn’t list them here. I’ll give Oracle some time to fix them, but I might add them later.

In summary, my personal recommendation is this: unless you are a funky new web company thriving on the latest technology, give Oracle the opportunity to make a few small point bugfix releases before adapting MySQL 8.0 GA. After that, providing upgrade prerequisites are matched, and that your software application is compatible, go for it and upgrade.

Before that, this is a great time to start checking out the latest and greatest that MySQL 8.0 GA has to offer!

All in all, I like what I saw, and I expect MySQL 8.0 GA to have a bright future.

Signed, a seasoned software critic.

The post MySQL 8.0 GA: Quality or Not? appeared first on Percona Database Performance Blog.

The Evolution of the DBA in an “As-A-Service” World

Latest MySQL Performance Blog posts - April 26, 2018 - 12:15pm

The requirements for managing and running a database in a modern enterprise have evolved over the past ten years. Those in charge of running enterprise databases have seen their focus shift from ensuring access and availability, to architecture, design and scalability responsibilities. Web-first companies pioneered the change by charging site reliability engineers (SRE’s) or multi-faceted DBAs with the task of ensuring that the company’s main revenue engine not only stayed up, but could scale to wherever the business needed to go. This is a far cry from the classic enterprise DBA’s top responsibilities: keep it up, keep it backed up, and react to issues as they present themselves.

Today, enterprises look for new revenue models to keep up with a shifting technology paradigm driven by the cloud. The requirements and needs for managing their database environments are changing along with this shift. In the SaaS world, application outages mean lost revenue. Worse, it leads to customer churn and gives your competitors an opening. To keep revenue flowing, every part of a SaaS company’s critical infrastructure needs to be planned out: redundancy should be built-in, and a future-proof architecture should be built to accommodate scale.

The more issues you can design out before launch, the less chance of a catastrophic outage later on. This means as a SaaS provider you want your DBAs and database engineers architecting a database that avoids problems at scale, and you want them working with your developers to write better, more efficient database calls. The database infrastructure is designed and automated to work at scale, while taking into account efficient use of resources for meeting today’s requirements.

When companies move to the cloud, the cloud provider takes care of much of the operational automation and many of the mundane day-to-day tasks (for example, using database as a service (DBaaS) options such as Amazon RDS and Aurora). But this does not eliminate the need for database expertise: it moves the function closer to the design and development side of the application. Someone needs to not only design and tune the database to support the application, but also has to understand how to build the modular pieces available in the cloud into a cohesive scalable unit that meets the needs of the application and the company. This means there are much higher impacts and clearer ROIs realized from efficient database expertise.

 

Over the years at Percona, we have seen this shift as well. Currently, more than 50% of the support tickets our customers open are related to application design issues, query performance or database infrastructure design. This is a far cry from five years ago when these represented less than 20% of our overall caseload. This makes sense, however, when you think about the maturity of our database products and the technological advances that impact the database. A more stable MySQL and MongoDB, coupled with advances in either homegrown automation or cloud-based infrastructure, reduce the likelihood of common crashing bugs and “Core Database Software” related bugs. Instead, outages and issues are increasingly caused by design decisions, bad code or unplanned-for edge cases. In order to keep up, DBAs need to evolve to move upstream to have the greatest impact.

At Percona, we recognize the changing requirements of modern database deployments. In fact, we have been providing database expertise since the beginning of the SaaS and cloud era. We recognize the needs of clients that choose to run on a DBaaS platform are slightly different than those managing their own full-stack database deployments.

That’s why we created a brand new tier of support focused on DBaaS platforms. These services allow you to rely on your cloud provider for operational break-fix support, while augmenting that with proven world-class expertise focused on the design, development, and tuning of the database itself (which cloud providers typically don’t address).

We also launched a DBaaS-focused version of our Percona DBA service. The Percona DBA service focuses on designing, setting up, and proactively improving your DBaaS cloud environment to ensure you get the most out of your investment. 

Contact us for more details on our new support and managed service options that can help optimize your cloud database environments, and make them run as efficiently as possible with the applications that drive your business.

The post The Evolution of the DBA in an “As-A-Service” World appeared first on Percona Database Performance Blog.

innobackupex --copy-back failed

Lastest Forum Posts - April 26, 2018 - 1:24am
Hi,

I got this error when I try to copy-back
Can help to solve my issue?

[root@db1 2018-04-26_11-29-01]# innobackupex --copy-back /data/backups/2018-04-26_11-29-01/
180426 16:04:22 innobackupex: Starting the copy-back operation

IMPORTANT: Please check that the copy-back run completes successfully.
At the end of a successful copy-back run innobackupex
prints "completed OK!".

innobackupex version 2.4.10 based on MySQL server 5.7.19 Linux (x86_64) (revision id: 3198bce)
180426 16:04:22 [01] Compressing ibdata1 to /var/lib/mysql/ibdata1
180426 16:04:28 [01] ...done
xtrabackup: Starting 8 threads for parallel data files transfer
180426 16:04:28 [03] Compressing ./ib_buffer_pool to /var/lib/mysql/ib_buffer_pool
180426 16:04:28 [01] Compressing ./xtrabackup_info to /var/lib/mysql/xtrabackup_info
180426 16:04:28 [03] ...done
180426 16:04:28 [01] ...done
Error: copy-back thread 1 failed.
Error: copy-back thread 3 failed.
Error: copy-back thread 4 failed.
Error: copy-back thread 5 failed.
Error: copy-back thread 6 failed.
Error: copy-back thread 7 failed.
[root@db1 2018-04-26_11-29-01]#

Somebody please help.
Why it failed?
Im using percona-xtrabackup 2.4.10

MySQL Community Awards Winners 2018

Latest MySQL Performance Blog posts - April 25, 2018 - 9:45pm

One of the highlights of Percona Live is that the winners of the annual MySQL Community Awards are announced. A 100% community effort, the awards were created to recognize community contribution. This year saw six very deserving winners in three categories:

MySQL Community Awards: Community Contributor of the year 2018

Two individuals received these awards:

  • Jean-François Gagné
    Jean-François was nominated for his many blog posts, bug reports, and experiment results that make MySQL much better. His blog: https://jfg-mysql.blogspot.com/
  • Sveta Smirnova
    Sveta spreads knowledge and good practice on all things MySQL as a frequent speaker and blogger. Her years of experience in testing, support, and consulting are shared in webinars, technical posts, conferences around the world and in her book “MySQL Troubleshooting”. While we’re proud to say that Sveta works for Percona, this reward is for her outstanding individual contribution irrespective of that. Kudos and respect, Sveta!
MySQL Community Awards: Application of the year 2018

Three applications were honoured:

  • MyRocks
    MyRocks is now in MariaDB, Percona Server and PolarDB (Alibaba). Intel, MariaDB and Rockset are optimizing it for cloud native storage.
  • ProxySQL
    ProxySQL solves serious, real-world problems experienced by DBAs in an elegant way.
  • Vitess
    Vitess is a database clustering system for horizontal scaling of MySQL. Originally developed at YouTube/Google and now under CNCF, Vitess is free and open source and aims to solve scaling problems for MySQL installations.
MySQL Community Awards: Corporate Contributor of the year 2018

The awards were presented by Agustín Gallego and Emily Slocombe.

In the spirit of open source, much of the content of this post has been sourced from the MySQL Community Awards website and the full information can be read there. Please do take the time to read the full details and you can also read about past winners and initiatives on that site.

Congratulations to all!

The post MySQL Community Awards Winners 2018 appeared first on Percona Database Performance Blog.

Unable to connect to repo.percona.com

Lastest Forum Posts - April 25, 2018 - 9:19pm
Hello,

I've got 2 servers hosted at OVH in Canada, running Debian Jessie, that are no longer able to reach the repo. This just happened in the last 2-3 days or so I believe. From these servers I am unable to report port 80 or 443 connecting to repo.percona.com (74.121.199.234). Has Percona done anything to block access from that country?

From the US I can connect and download updates just fine. From Europe the same, no issues there.

Thanks

Percona Server for MySQL 5.7.21-21 Is Now Available

Lastest Forum Posts - April 24, 2018 - 11:24pm
Percona announces the GA release of Percona Server for MySQL 5.7.21-21 on on April 24, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

This version of Percona Server for MySQL 5.7.21 includes three new encryption features – Vault keyring plug-in, encryption for InnoDB general tablespaces, and encryption for binary log files.

These new capabilities, which allow companies to immediately increase security for their existing databases, are also part of a larger project to build complete, robust, enterprise-grade encryption capabilities into Percona Server for MySQL, allowing customers and the community to satisfy their most rigorous security compliance requirements. Percona also announced the release of a new version of Percona XtraBackup that supports backing up Percona Server for MySQL instances that have these encryption features enabled.

Based on MySQL 5.7.21, including all the bug fixes in it, Percona Server for MySQL 5.7.21-21 is the current GA release in the Percona Server for MySQL 5.7 series. Percona provides completely open-source and free software. New Features:

  • A new variable innodb_temp_tablespace_encrypt is introduced to turn encryption of temporary tablespace and temporary InnoDB file-per-table tablespaces on/off. Bug fixed #3821.
  • A new variable innodb_encrypt_online_alter_logs simultaneously turns on encryption of files used by InnoDB for merge sort, online DDL logs, and temporary tables created by InnoDB for online DDL. Bug fixed #3819.
  • A new variable innodb_encrypt_tables can be set to ON, making InnoDB tables encrypted by default, to FORCE, disabling creation of unencrypted tables, or OFF, restoring the like-before behavior. Bug fixed #1525.
  • Query response time plugin now can be disabled at session level with use of a new variable query_response_time_session_stats.
Bugs Fixed:

  • Attempting to use a partially-installed query response time plugin could have caused server crash. Bug fixed #3959.
  • There was a server crash caused by a materialized temporary table from semi-join optimization with key length larger than 1000 bytes. Bug fixed #296.
  • A regression in the original 5.7 port was causing integer overflow with thread_pool_stall_limit variable values bigger than 2 seconds. Bug fixed #1095.
  • A memory leak took place in Percona Server when performance schema is used in conjunction with thread pooling. Bug fixed #1096.
  • A code clean-up was done to fix compilation with clang, both general warnings (bug fixed #3814, upstream #89646) and clang 6 specific warnings and errors (bug fixed #3893, upstream #90111).
  • Compilation warning was fixed for -DWITH_QUERY_RESPONSE_TIME=ON CMake compilation option, which makes QRT to be linked statically. Bug fixed #3841.
  • Percona Server returned empty result for SELECT query if number of connections exceeded 65535. Bug fixed #314 (upstream #89313).
  • A clean-up in Percona Server binlog-related code was made to avoid uninitialized memory comparison. Bug fixed #3925 (upstream #90238).
  • mysqldump utility with --innodb-optimize-keys option was incorrectly working with foreign keys on the same table, producing invalid SQL statements. Bugs fixed #1125 and #3863.
  • A fix of the mysqld startup script failed to detect jemalloc library location for preloading, thus not starting on systemd based machines, introduced in Percona Server 5.7.21-20, was improved to take into account previously created configuration file. Bug fixed #3850.
  • The possibility of a truncated bitmap file name was fixed in InnoDB logging subsystem. Bug fixed #3926.
  • Temporary file I/O was not instrumented for Performance Schema. Bug fixed #3937 (upstream #90264).
  • A crash in the unsafe query warning checks with views took place for UPDATE statement in case of statement binlogging format. Bug fixed #290.
MyRocks Changes:

  • A re-implemented variable rpl_skip_tx_api allows to turn on simple RocksDB write batches functionality, increasing replication performance by the transaction api skip. Bug fixed MYR-47.
  • Decoding value-less padded varchar fields could under some circumstances cause assertion and/or data corruption. Bug fixed MYR-232.
TokuDB Changes:

  • Two new variables introduced for the TokuDB fast updates feature, tokudb_enable_fast_update and tokudb_enable_fast_upsert should be now used instead of the NOAR keyword, which is now optional at compile time and off by default. Bugs fixed #63 and #148.
  • A set of compilation fixes was introduced to make TokuDB successfully build in MySQL / Percona Server 8.0. Bugs fixed #84, #85, #114, #115, #118, #128, #139, #141, and #172.
  • Conditional compilation code dependent on version ID in the TokuDB tree was separated and arranged to specific version branches. Bugs fixed #133, #134, #135, and #136.
  • ALTER TABLE ... COMMENT = ... statement caused TokuDB to rebuild the whole table, which is not needed, as only FRM metadata should be changed. Bug fixed #130, and #137.
  • Data race on the cache table pair attributes was fixed.
Other bugs fixed: #3793, #3812, #3813, #3815, #3818, #3835, #3875 (upstream #89916), #3843 (upstream #89822), #3848, #3856, #3887, MYR-160, MYR-245, #109, #111,#180, #181, #182, and #188.

The release notes for Percona Server for MySQL 5.7.21-20 are available in the online documentation. Please report any bugs on the project bug tracking system.
Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.