Buy Percona ServicesBuy Now!

Open Source Databases on Big Machines: Disk Speed and innodb_io_capacity

Latest MySQL Performance Blog posts - March 1, 2017 - 3:00pm

In this blog post, I’ll look for the bottleneck that prevented the performance in my previous post from achieving better results.

The powerful machine I used in the tests in my previous post has a comparatively slow disk, and therefore I expected my tests would hit a point when I couldn’t increase performance further due to the disk speed.

Hardware configuration:

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

Versions tested and configuration: same as in the first post of this series (check the post for specifics).

Even though I expected my tests would stop increasing in performance due to the disk speed, I did not observe high IO rates in the iostat output. I already tested with a full data set that fits in memory. In this case, write performance only affected data flushes and log writes. But we should still see a visible decrease in speed. So I decided to try RW tests totally in memory. I created a ramdisk and put the MySQL datadir on it. Surprisingly, results on the SSD and ramdisk did not differ.

I asked my colleagues from “Postgres Professional” to test PostgreSQL with the ramdisk. They got similar results:

It’s interesting that the value of innodb_io_capacity does not have any effect on this situation. Data for the graph below was taken when I ran tests on ramdisk. I wanted to see if I could control the IO activity of a disk, which is extremely fast by default, using this variable.

This totally contradicts all my past experiences with smaller machines. Percona re-purposed the machine with a faster disk (which I used before, described in this post), so I used a similar one with slower disk speed.

Hardware configuration:

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 47.2G
Disk speed: about 3K IOPS
OS: Ubuntu 14.04.5 LTS (trusty)
File system: ext4

Again, in this case innodb_io_capacity benchmarks with a smaller number of CPU cores showed more predictable results.

Conclusion:

Both MySQL and PostgreSQL on a machine with a large number of CPU cores hit CPU resources limits before disk speed can start affecting performance. We only tested one scenario, however. With other scenarios, the results might be different.

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Percona Monitoring and Management (PMM) Graphs Explained: MongoDB MMAPv1

Latest MySQL Performance Blog posts - February 28, 2017 - 3:28pm

This post is part of the series of Percona’s MongoDB 3.4 bundle release blogs. In this blog post, I hope to cover some areas to watch with Percona Monitoring and Management (PMM) when running MMAPv1. The graph examples from this article are from the MMAPv1 dashboard that will be released for the first time in PMM 1.1.2.

Since the very beginning of MongoDB, the MMAPv1 storage engine has existed. MongoDB 3.0 added a pluggable storage engine API. You could only use MMAPv1 with MongoDB before that. While MMAPv1 often offers good read performance, it has become famous for its poor write performance and fragmentation at scale. This means there are many areas to watch for regarding performance and monitoring.

Percona Monitoring and Management (PMM)

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB. It was developed by Percona on top of open-source technology. Behind the scenes, the graphing features this article covers use Prometheus (a popular time-series data store), Grafana (a popular visualization tool), mongodb_exporter (our MongoDB database metric exporter) plus other technologies to provide database and operating system metric graphs for your database instances.

(Beware of) MMAPv1

mmap() is a system-level call that causes the operating system kernel to map on-disk files to memory while it is being read and written by a program.

As mmap() is a core feature of the Unix/Linux operating system kernel (and not the MongoDB code base), I’ve always felt that calling MMAPv1 a “storage engine” is quite misleading, although it does allow for a simpler explanation. The distinction and drawbacks of the storage logic being in the operating system kernel vs. the actual database code (like most database storage engines) becomes very important when monitoring MMAPv1.

As Unix/Linux are general-purpose operating systems that can have many processes, users and uses cases, they offer limited OS-level metrics in terms of activity, latency and performance of mmap(). Those metrics are for the entire operating system, not just for the MongoDB processes.

mmap() uses memory from available OS-level buffers/caches for mapping the MMAPv1 data to RAM — memory that can be “stolen” away by any other operating system process that asks for it. As many deployments “micro-shard” MMAPv1 to reduce write locks, this statement can become exponentially more important. If 3 x MongoDB instances run on a single host, the kernel fights to cache and evict memory pages created by 3 x different instances with no priority or queuing, essentially at random, while creating contention. This causes inefficiencies and less-meaningful monitoring values.

When monitoring MMAPv1, you should consider MongoDB AND the operating system as one “component” more than most engines. Due to this, it is critical that a database host runs a single MongoDB instance with no other processes except database monitoring tools such as PMM’s client. This allows MongoDB to be the only user of the operating system filesystem cache that MMAPv1 relies on. This also makes OS-level memory metrics more accurate because MongoDB is the only user of memory. If you need to “micro-shard” instances, I recommend using containers (Docker or plain cgroups) or virtualization to separate your memory for each MongoDB instance, with just one MongoDB instance per container.

Locking

MMAPv1’s has locks for both reads and writes. In the early days the lock was global only. Locking became per-database in v2.2 and per-collection in v3.0.

Locking is the leading cause of the performance issues we see on MMAPv1 systems, particularly write locking. To measure how much locking an MMAPv1 instance is waiting on, first we look at the “MMAPv1 Lock Ratio”:

Another important metric to watch is “MongoDB Lock Wait Time”, breaking down a number of time operations spend waiting on locks:

Three factors in combination influence locking:

  1. Data hotspots — if every query hits the same collection or database, locking increases
  2. Query performance — a lock is held for the duration of an operation; if that operation is slow, lock time increases
  3. Volume of queries — self-explanatory

Page Faults

Page faults happen when MMAPv1 data is not available in the cache and needs to be fetched from disk. On systems with data that is smaller than memory page faults usually only occur on reboot, or if the file system cache is dumped. On systems where data exceeds memory, this happens more frequently — MongoDB is asked for data not in memory.

How often this happens depends on how your application accesses your data. If it accesses new or frequently-queried data, it is more likely to be in memory. If it accesses old or infrequent data, more page faults occur.

If page faults suddenly start occurring, check to see if your data set has grown beyond the size of memory. You may be able to reduce your data set by removing fragmentation (explained later).

Journaling

As MMAPv1 eventually flushes changes to disk in batches, journaling is essential for running MongoDB with any real data integrity guarantees. As well as being included in the lock statistic graphs mentioned above, there are some good metrics for journaling (which is a heavy consumer of disk writes).

Here we have “MMAPv1 Journal Write Activity”, showing the data rates of journaling (max 19MB/sec):

“MMAPv1 Journal Commit Activity” measures the commits to the journal ops/second:

A very useful metric for write query performance is “MMAPv1 Journaling Time” (there is another graph with 99th percentile times):

This is important to watch, as write operations need to wait for a journal commit. In the above example, “write_to_journal” and “write_to_data_files” are the main metrics I tend to look at. “write_to_journal” is the rate of changes being written to the journal, and “write_to_data_files” is the rate that changes are written to on-disk data.

If you see very high journal write times, you may need faster disks or in-sharding scenarios. Adding more shards spreads out the disk write load.

Background Flushing

“MMAPv1 Background Flushing Time” graphs the background operation that calls flushes to disk:

This process does not block the database, but does cause more disk activity.

Fragmentation

Due to the way MMAPv1 writes to disk, it creates a high rate of fragmentation (or holes) in its data files. Fragmentation slows down scan operations, wastes some filesystem cache memory and can use much more disk space than there is actual data. On many systems I’ve seen, the size of MMAPv1 data files on disk take over twice the true data size.

Currently, our Percona Monitoring and Management MMAPv1 support does not track this, but we plan to add it in the future.

To track it manually, look at the output of the “.stats()” command for a given collection (replace “sbtest1” with your collection name):

> 1 - ( db.sbtest1.stats().size / db.sbtest1.stats().storageSize ) 0.14085410557184752

Here we can see this collection is about 14% fragmented on disk. To fix fragmentation, the most common fix is dropping and recreating the collection using a backup. Many just remove a replication member, clear the data and let it do a new replication initial sync.

Operating System Memory

In PMM we have graphed the operating system cached memory as it acts as the primary cache for MMAPv1:

For the most part, “Cached” is the value showing the amount of data that is cached MMAPv1 data (assuming the host is only running MongoDB).

We also graph the dirty memory pages:

It is important that dirty pages do not exceed the hard dirty page limit (which causes pauses). It is also important that dirty pages don’t accumulate (which wastes cache memory). The “soft” dirty page limit is the limit that starts dirty page cleanup without pausing.

On this host, you could probably lower the soft limit to clean up memory faster, assuming the increase in disk activity is acceptable. This topic is covered in this post: https://www.percona.com/blog/2016/08/12/tuning-linux-for-mongodb/.

What’s Missing?

As mentioned earlier, fragmentation rates are missing for MMAPv1 (this would be a useful addition). Due to the limited nature of the metrics offered for MMAPv1, PMM probably won’t provide the same level of graphs for MMAPv1 compared to what we provide for WiredTiger or RocksDB. There will likely be fewer additions to the graphing capabilities going forward.

If you are using a highly concurrent system, we highly recommend you upgrade to WiredTiger or RocksDB (both also covered in this monitoring series). These engines provide several solutions to MMAPv1 headaches: document-level locking, built-in compression, checkpointing that cause near-zero fragmentation on disk and much-improved visibility for monitoring. We just released Percona Server for MongoDB 3.4, and it provides many exciting features (including these engines).

Look out for more monitoring posts from this series!

MySQL 5.7.17-11 and 5.7.16-10 crashing after upgrade from MySQL 5.6

Lastest Forum Posts - February 28, 2017 - 11:20am
We successfully upgraded Percona MySQL server version 5.6.29-76.2 to 5.7.16-10 (first) but it was crashing then we upgraded to 5.7.17-11. We see 5.7.17-11 is also crashing.
So far we are not able to figure out anything except that there is something to do with lob/text data.
Following is error and stack trace.

00:01:39 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=417
max_threads=1001
thread_count=408
connection_count=406
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1556244 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7ef95c03c880
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ef8e6df7d40 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xed056c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7a0631]
/lib64/libpthread.so.0[0x31d080f7e0]
/usr/sbin/mysqld(_ZN10Field_blob15copy_blob_valueEP11st_mem_ root+0x28)[0x7e3d08]
/usr/sbin/mysqld(_Z25mysql_prepare_blob_valuesP3THDR4ListI4I temEP11st_mem_root+0x2b8)[0xe1b468]
/usr/sbin/mysqld(_Z12write_recordP3THDP5TABLEP9COPY_INFOS4_+ 0x87d)[0xe1c02d]
/usr/sbin/mysqld(_ZN14Sql_cmd_insert12mysql_insertEP3THDP10T ABLE_LIST+0x82d)[0xe1ca9d]
/usr/sbin/mysqld(_ZN14Sql_cmd_insert7executeEP3THD+0xc2)[0xe1d2d2]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x18d7)[0xca9e67]
/usr/sbin/mysqld(_ZN18Prepared_statement7executeEP6Stringb+0 x357)[0xcd9e57]
/usr/sbin/mysqld(_ZN18Prepared_statement12execute_loopEP6Str ingbPhS2_+0xca)[0xcda25a]
/usr/sbin/mysqld(_Z19mysqld_stmt_executeP3THDmmPhm+0x13b)[0xcda58b]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_ server_command+0x190f)[0xcb197f]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x1b7)[0xcb24a7]
/usr/sbin/mysqld(handle_connection+0x2a0)[0xd76740]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xeed474]
/lib64/libpthread.so.0[0x31d0807aa1]
/lib64/libc.so.6(clone+0x6d)[0x31d00e8aad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7ef95c12bec0): is an invalid pointer
Connection ID (thread ID): 4170
Status: NOT_KILLED


After converting it in readable format then we see this.

0xed056c my_print_stacktrace + 44
0x7a0631 handle_fatal_signal + 1121
0x31d080f7e0 _end + -829069216
0x7e3d08 Field_blob::copy_blob_value(st_mem_root*) + 40
0xe1b468 mysql_prepare_blob_values(THD*, List<Item>&, st_mem_root*) + 696
0xe1c02d write_record(THD*, TABLE*, COPY_INFO*, COPY_INFO*) + 2173
0xe1ca9d Sql_cmd_insert::mysql_insert(THD*, TABLE_LIST*) + 2093
0xe1d2d2 Sql_cmd_insert::execute(THD*) + 194
0xca9e67 mysql_execute_command(THD*, bool) + 6359
0xcd9e57 Prepared_statement::execute(String*, bool) + 855
0xcda25a Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*) + 202
0xcda58b mysqld_stmt_execute(THD*, unsigned long, unsigned long, unsigned char*, unsigned long) + 315
0xcb197f dispatch_command(THD*, COM_DATA const*, enum_server_command) + 6415
0xcb24a7 do_command(THD*) + 439
0xd76740 handle_connection + 672
0xeed474 pfs_spawn_thread + 436
0x31d0807aa1 _end + -829101279
0x31d00e8aad _end + -836568275

When we read stack trace from bottom to top then we can see that MySQL is working on some record which has something to do with blob data.

Has someone faced the same issue?

I have also filed a bug https://bugs.launchpad.net/percona-server/+bug/1667552 but so far no response.

Any pointer/clue highly appreciated.

Thanks

SST XtraDB Backup - Use a different tmp directory

Lastest Forum Posts - February 28, 2017 - 4:30am
Hi all,

Our current tmp directory is only 5GB and it's not able to perform SST backup as the tmp directory gets filled up really quickly.

Setting tmpdir under [mysqld] or [xtrabackup] doesn't work for me. I have also checked the permissions. It is still using /tmp for the SST.

We are using the following version:

Server version: 5.7.16-10-57-log Percona XtraDB Cluster (GPL), Release rel10, Revision bec0879, WSREP version 27.19, wsrep_27.19

Any ideas please?

Thanks!

pmm-server's requirements

Lastest Forum Posts - February 27, 2017 - 5:14pm
pmm-server(1.1.1) is running on host A and two pmm-clients(1.1.1) runs on B and C.
I run the container of pmm-server at -m 512M and for a short time [cgroup out of memory] error was occurred.
what is the requirements of ppm-server about memory,cpu ... etc?

Webinar Thursday March 2, 2017: MongoDB Query Patterns

Latest MySQL Performance Blog posts - February 27, 2017 - 5:06pm

Join Percona’s Senior Technical Services Engineer Adamo Tonete on Thursday, March 2, 2017, at 11:00 a.m. PST / 2:00 p.m. EST (UTC-8) as he reviews and discusses MongoDB® query patterns.

Register Now

MongoDB is a fast and simple-to-query schema-free database. It features a smart query optimizer that tries to use the easiest data retrieval method.

In this webinar, Adamo will discuss common query operators and how to use them effectively. The webinar will cover not only common query operations, but also the best practices for their usage.

Register for the webinar here.

Adamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL Database Administrator for three years. As the main database member of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 24/7 environment. Before that, he worked as a Microsoft SQL Server DBA for a large e-commerce company, mainly on performance tuning and automation. Adamo has almost eight years of experience working as a DBA, and in the past three years, he has moved to NoSQL technologies without giving up relational databases.

MySQL Ransomware: Open Source Database Security Part 3

Latest MySQL Performance Blog posts - February 27, 2017 - 2:28pm

This blog post examines the recent MySQL® ransomware attacks, and what open source database security best practices could have prevented them.

Unless you’ve been living under a rock, you know that there has been an uptick in ransomware for MongoDB and Elasticsearch deployments. Recently, we’re seeing the same for MySQL.

Let’s look and see if this is MySQL’s fault.

Other Ransomware Targets

Let’s briefly touch on how Elasticsearch and MongoDB became easy targets…

Elasticsearch

Elasticsearch® does not implement any access control: neither authentication nor authorization. For this, you need to deploy the Elastic’s shield offering. As such, if you have an Elasticsearch deployment that is addressable from the Internet, you’re asking for trouble. We see many deployments have some authentication around their access, such as HTTP Basic Auth – though sadly, some don’t employ authentication or network isolation. We already wrote a blog about this here.

MongoDB

MongoDB (< 2.6.0) does allow for access control through account creation. It binds to 0.0.0.0 by default (allowing access from anywhere). This is now changed in /etc/mongod.conf in versions >= 2.6.0. Often administrators don’t realize or don’t know to look for this. (Using MongoDB? My colleague David Murphy wrote a post on this issue here).

We began to see incidents where both Elasticsearch and MongoDB had their datasets removed and replaced with a README/note instructing the user to pay a ransom of 0.2BTC (Bitcoin) to the specified wallet address (if they wanted their data back).

MySQL

So is this latest (and similar) attack on MySQL MySQL’s fault? We don’t think so. MySQL and Percona Server® for MySQL by default do not accept authentication from everywhere without a password for the root user.

Let’s go over the various security options MySQL has, and describe some other best practices in order to protect your environment.

Default bind_address=127.0.0.1 in Percona Server for MySQL

MySQL currently still binds to 0.0.0.0 (listen to all network interfaces) by default. However, Percona Server for MySQL and Percona XtraDB Cluster have different defaults, and only bind on 127.0.0.1:3306 in its default configuration (Github pull request).

Recall, if you will, CVE-2012-2122. This ALONE should be enough to ensure that you as the administrator use best practices, and ONLY allow access to the MySQL service from known good sources. Do not setup root level or equivalent access from any host (% indicates any host is allowed). Ideally, you should only allow root access from 127.0.0.1 – or if you must, from a subset of a secured network (e.g., 10.10.0.% would only allow access to 10.10.0.0/24).

Prevent Access

Also, does the MySQL database really need a publicly accessible IP address? If you do have a valid reason for this, then you should firewall port 3306 and whitelist access only from hosts that need to access the database directly. You can easily use iptables for this.

Default Users

MySQL DOES NOT by default create accounts that can be exploited for access. This comes later through an administrator’s lack of understanding, sadly. More often than not, the grant will look something like the following.

GRANT ALL PRIVILEGES TO 'root'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION;

You may scoff at the above (and rightly so). However, don’t discount this just yet: “123456” was the MOST USED password in 2016! So it’s reasonable to assume that somewhere out there this is a reality.

Max Connection Errors

You can deploy max_connection_errors with a suitably low value to help mitigate a direct attack. This will not prevent a distributed attack, where many thousands of hosts are used. Network isolation is the only way to ensure your mitigation against this attack vector.

MySQL 5.7 Improvements on Security Default Root Password

Since MySQL 5.7, a random password is generated for the only root user (root@localhost) when you install MySQL for the first time. That password is then written in the error log and has to be changed. Miguel Ángel blogged about this before.

Connection Control Plugin

MySQL 5.7.17 introduced a new open source plugin called Connection Control. When enabled, it delays the authentication of users that failed to login by default more than three times. This is also part as of Percona Server for MySQL 5.7.17.

Here’s an example where the 4th consecutive try caused a one-second delay (default settings were used):

$ time mysql -u bleh2 -pbleh ERROR 1045 (28000): Access denied for user 'bleh2'@'localhost' (using password: YES) real 0m0.009s $ time mysql -u bleh2 -pbleh ERROR 1045 (28000): Access denied for user 'bleh2'@'localhost' (using password: YES) real 0m0.008s $ time mysql -u bleh2 -pbleh ERROR 1045 (28000): Access denied for user 'bleh2'@'localhost' (using password: YES) real 0m0.008s $ time mysql -u bleh2 -pbleh ERROR 1045 (28000): Access denied for user 'bleh2'@'localhost' (using password: YES) real 0m1.008s mysql> SELECT * FROM INFORMATION_SCHEMA.CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS; +---------------------+-----------------+ | USERHOST | FAILED_ATTEMPTS | +---------------------+-----------------+ | 'bleh2'@'localhost' | 4 | +---------------------+-----------------+ 1 row in set (0.01 sec)

Password Validation Plugin

MySQL 5.6.6 and later versions also ship with a password validation plugin, which prevents creating users with unsafe passwords (such as 123456) by ensuring passwords meet certain criteria: https://dev.mysql.com/doc/refman/5.7/en/validate-password-plugin.html

Summary

In order to get stung, one must ignore the best practices mentioned above (which in today’s world, should take some effort). These best practices include:

  1. Don’t use a publicly accessible IP address with no firewall configured
  2. Don’t use a root@% account, or other equally privileged access account, with poor MySQL isolation
  3. Don’t configure those privileged users with a weak password, allowing for brute force attacks against the MySQL service

Hopefully, these are helpful security tips for MySQL users. Comment below!

add generated virtual column still copy data

Lastest Forum Posts - February 27, 2017 - 1:59pm

When run the below query it take long time
alter table record_log add column record_key varchar(255) AS (md5(concat(record_time,record_campId))) VIRTUAL ;
show processlist show it copy data to tmp table, suppose it will done in place with data copy

34 | root | localhost | record | Query | 26 | copy to tmp table | alter table record_log add column record_key varchar(32) AS (md5(concat(record_time,reco | 0 |

MySQL version is 5.7.13-6-log Percona Server (GPL), Release 6, Revision e3d58bb

Thanks

Percona Monitoring and Management (PMM) Graphs Explained: WiredTiger and Percona Memory Engine

Latest MySQL Performance Blog posts - February 27, 2017 - 9:34am

This post is part of the MongoDB 3.4 bundled release series of blog posts. In this blog, we’ll go over some useful metrics WiredTiger outputs and how we visualize them in Percona Monitoring and Management (PMM).

WiredTiger is the default storage engine for MongoDB since version 3.2. The addition of this full-featured, comprehensive storage engine offered a lot of new, useful metrics that were not available before in MMAPv1.

Percona Monitoring and Management (PMM)

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB, developed by Percona on top of open-source technology. Behind the scenes, the graphing features this article covers use Prometheus (a popular time-series data store), Grafana (a popular visualization tool), mongodb_exporter (our MongoDB database metric exporter) plus other technologies to provide database and operating system metric graphs for your database instances.

Please see a live demo of our PMM 1.1.1 release of the MongoDB WiredTiger graphs covered in this article: https://pmmdemo.percona.com/graph/dashboard/db/mongodb-wiredtiger.

You can see a sneak peak demo of our Percona Memory Engine graphs we’ll release in PMM 1.1.2 here: https://pmmdemo.percona.com/graph/dashboard/db/mongodb-inmemory.

WiredTiger and Percona Memory Engine

WiredTiger is a storage engine that was developed outside of MongoDB, and was acquired and integrated into MongoDB in version 3.0. WiredTiger offers document-level locking, inline compression and many other useful storage engine features. WiredTiger writes data to disk in “checkpoints” and internally uses Multi-Version Concurrency Control (MVCC) to create “transactions” or “snapshots” when accessing data in the engine. In WiredTiger’s metrics, you will see the term “transactions” used often. It is important to note, however, that MongoDB does not support transactions at this time (this only occurs within the storage engine).

WiredTiger has an in-heap cache for mostly uncompressed pages (50% RAM by default). Like many other engines, it relies on the performance of the Linux filesystem cache, which ends up caching hot, compressed WiredTiger disk blocks.

Besides supporting WiredTiger, Percona Server for MongoDB also ships with a free, open-source in-memory storage engine: Percona Memory Engine for MongoDB. Since we based the Memory Engine on WiredTiger, all graphs and troubleshooting techniques for in-memory are essentially the same (the database data is not stored on disk, of course).

Checkpointing Graphs

WiredTiger checkpoints data to disk every 60 seconds, or after writing 2GB of journaled data.

PMM graphs the current minimum and maximum checkpoint times for WiredTiger checkpoints in the “WiredTiger Checkpoint Time” graph:

Above I have selected “current,” and we can see we have an average of 176ms checkpoints and over a long period it remains flat, not worsening or “snowballing” each checkpoint (which may indicate a performance issue).

Checkpointing is important to watch because it requires WiredTiger to use system resources, and also can affect query performance in an possibly unexpected way — WiredTiger Cache dirty pages:

The WiredTiger Cache is an LRU cache of mostly uncompressed pages. Like most caches, it creates dirty pages that can take up useful memory until flushed. The WiredTiger Cache uses checkpointing as the point in which it clears dirty pages, making the relationship between dirty pages and checkpointing important to note. WiredTiger cleans dirty pages less often if checkpoint performance is slow. They then can slowly consume more and more of the available cache memory.

In the above graph, we can see on average about 8.8% of the cache is dirty pages with spikes up/down aligning with checkpointing. Systems with a very high rate of dirty pages benefit from more RAM to provide more room for “clean” pages. Another option could be improving storage performance, so checkpoints happen faster.

Concurrency Graph

Similar to InnoDB, WiredTiger uses a system of tickets to control concurrency. Where things differ from InnoDB is both “reads” and “writes” have their own ticket pools with their own maximum-ticket limits. The defaults of “128” tickets for both read and write concurrency is generally enough for even medium-high usage systems. Some systems are capable of more than the default concurrency limit, however (usually systems with very fast storage). Also, concurrency can sometimes reduce overhead on network-based storage.

If you notice higher ticket usage, it can sometimes be due to a lot of single-document locking in WiredTiger. This is something to check if you see high rates alongside storage performance and general query efficiency.

In Percona Monitoring and Management, we have the “WiredTiger Concurrent Transactions” graph to visualize the usage of the tickets. In most cases, tickets shouldn’t reach the limit and you shouldn’t need to tweak this tuneable. If you do require more concurrency, however, PMM’s graphing helps indicate when limits are being reached and whether a new limit will mitigate the problem.

Here we can see a max usage of 8/128 write tickets and 5/128 read tickets. This means this system isn’t having any concurrency issues.

Throughput Graphs

There are several WiredTiger graphs to explain the rate of data moving through the engine. As storage is a common bottleneck, I generally look at “WiredTiger Block Activity” first when investigating storage resource usage. This graph shows the total rates written and read to/from storage by WiredTiger (disk for WiredTiger, memory for in-memory).

For correlation, there are also rates for the amount of data written from and read into the WiredTiger cache, from disk. The “read” metric shows the rate of data added to the cache due to query patterns (e.g.: scanning), while the “written” metric shows the rate of data written out to storage from the WiredTiger cache.

Also there are rates to explain the IO caused by the WiredTiger Log. The metric “payload” is the essentially the write rate of raw BSON pages, and “written” is a combined total of log bytes written (including overhead, likely the frames around the payload, etc.). You should watch changes to the average rate of “read” carefully, as they may indicate changes in query patterns or efficiency.

Detailed Cache Graphs

In addition to the Dirty Pages in the cache graph, “WiredTiger Cache Capacity” graphs the size and usage of the WiredTiger cache:

The rate of cache eviction is graphed in “WiredTiger Cache Eviction,” with a break down of modified vs. unmodified pages:

Very large spikes in eviction can indicate collection scanning or generally poor performing queries. This pushes data out of caches. You should avoid high rates of cache evictions, as they can cause a high overhead to the overall engine.

When increasing the size of the WiredTiger cache it is useful to look at both of the above cache graphs. You should look for more “Used” memory in the “WiredTiger Cache Capacity” graph and less rate of eviction in the “WiredTiger Cache Eviction” graph. If you do not see changes to these metrics, you may see better performance leaving the cache size as-is.

Transactions and Document Operations

The “WiredTiger Transactions” graph shows the overall operations happening inside the engine. All transactions start with a “begin,” and operations that changed data end with a “commit.” Read-only operations show a “rollback” at the time they returned data:

This graph above correlates nicely with the “Mongod – Document Activity” graph, which shows the rate of operations from the MongoDB-layer perspective instead of the storage engine level:

Detailed Log Graphs

The graph “WiredTiger Log Operations” explains activity inside the WiredTiger Log system:

Also, the rate of log record compression is graphed as “WiredTiger Log Records.” WiredTiger only compresses log operations that are greater than 128 bytes, which explains why some log records are not compressed:

In some cases, changes in the ratio of compressed vs. uncompressed pages may help explain changes in CPU% used.

What’s Missing?

As you’ll see in my other blog post “Percona Monitoring and Management (PMM) Graphs Explained: MongoDB with RocksDB” from this series, RocksDB includes read latency metrics and a hit ratio for the RocksDB block cache. These are two things I would like to see added to WiredTiger’s metric output, and thus PMM. I would also like to improve the user-experience of this dashboard. Some areas use linear-scaled graphs when a logarithmic-scaled graph could provide more value. “WiredTiger Concurrent Transactions” is one example of this.

A known-mystery (so-to-speak) is why WiredTiger reports the cache “percentage overhead” always as 8% in “db.serverStatus().cache.” We added this metric to PMM as a graph named “WiredTiger Cache Overhead.” We assumed it provided a variable overhead metric. However, I’ve seen that it returns 8% regardless of usage: it is 8% on a busy system or even on an empty system with no data or traffic. We’re aware of this, and plan to investigate, as a hit ratio for the cache is a very valuable metric:

Also, if you’ve ever seen the full output of the WiredTiger status metrics (‘db.serverStatus().wiredTiger’ in Mongo shell), you’ll know that there are a LOT more WiredTiger metrics than are currently graphed in Percona Monitoring and Management. In our initial release, we’ve aimed to only include high-value graphs to simplify monitoring WiredTiger. A major barrier in our development of monitoring features for WiredTiger has been the little-to-no documentation on the meaning of many status metrics. I hope this improves with time. As we understand more correlations and useful metrics to determine the health of WiredTiger, we plan to integrate those into Percona Monitoring and Management in the future. As always, we appreciate your suggestions.

Lastly, look out for an upcoming blog post from this series regarding creating custom dashboards, graphs and raw data queries with Percona Monitoring and Management!

mysqldump not releasing memory after completion

Lastest Forum Posts - February 27, 2017 - 1:17am
Good day all

We are running Percona server 5.7 on a server with 128Gb of Memory and 32Gb of Swap Space.

We have had a lot of issues with the memory filling up and using up swap space on a daily basis requiring the database to be restarted daily in order to release the memory.

After adding some graphs, monitoring and physical testing, we found that the problem seems to be related to mysqldump.

The memory usage is fairly stable , increasing little bits until innodb_buffer_pool is full, however the moment we start a backup, the memory usage spikes and uses up all physical memory as well as Swap.

The symptoms we found was that the backup uses up all memory and starts to swap, however once the backup completes, the memory is not released at all and the database server eventually runs out of memory and kills mysql.


We have tried removing the compression of the backups as well using the --quick option but to no avail.

The size of the databases on the server totals about 350Gb in total, however we run backups on a per database setup and the biggest database being +-130Gb in size.


Is there anything else we can perhaps look at in order to try and troubleshoot why the mysqldump processes are not releasing the memory?

Any suggestions would be greatly appreciated.


Regards

coreos

Lastest Forum Posts - February 26, 2017 - 11:18pm
hello

is possible to migrate pmm-client , pmm-server container to Docker OS such as CoreOS / RancherOS ?




FEATURE REQUEST (Ability to delete/modify files created by using SELECT INTO OUTFILE)

Lastest Forum Posts - February 26, 2017 - 4:12am
At the moment, for security reasons in MySQL implemented the ability to create files only through SELECT INTO OUTFILE.
The ability to overwrite or delete files is missing. In the case of a large number of files you create, implementation of removal procedure possible only in non MySQL way.
It is proposed to implement the safe removal option for OUTFILEs. For example:
1. Allow only delete the files that exist in directories tmpdir, thus isolating the OS from the database.
2. Implement registration procedure for all files created using the OUTFILE, respectively delete and change only files passed the registration procedure.
3. Create a single global parameter specifies where OUTILEs will be created, and allows you to manipulate files only within this parameter.
4. Create a global parameter that will indicate the allowable operations for OUTFILEs (create, change, delete)
In MySQL really lacking quality file manipulation procedures.
Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.