Buy Percona ServicesBuy Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 1 hour 17 min ago

Percona Toolkit 3.0.5 is Now Available

November 21, 2017 - 10:32am

Percona announces the release of Percona Toolkit 3.0.5 on November 21, 2017.

Percona Toolkit is a collection of advanced command-line tools that perform a variety of MySQL and MongoDB server and system tasks too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

You download Percona Toolkit packages from the web site or install from official repositories.

This release includes the following changes:

New Features:
  • PT-216: The pt-mongodb-query-digest supports MongoDB versions lower than 3.2; incorrect output was fixed.
  • PT-182: The pt-summary, pt-mysql-summary, pt-mongodb-summary commands provide output in the the JSON format.
  • PT-152: pt-mysql-summary shows the output of the SHOW SLAVE HOSTS command.
  • PT-139: pt-table-sync supports replication channels (requires MySQL version 5.7.6 or higher)
  • PMM-1590: MongoDB Profiler for Percona Management and Monitoring and Percona Toolkit has been improved.
Bug fixes:
  • PT-211: pt-mext would fail if the Rsa_public_key variable was empty.
  • PT-212: pt-mongodb-query-digest --version produced incorrect values.
  • PT-202: pt-online-schema-change incorrectly processed virtual columns.
  • PT-200: pt-online-schema-change command reported an error when the name of an index contained UNIQUE as as the prefix or suffix.
  • PT-199: pt-table-checksum did not detect differences on a system with the ROW based replication active.
  • PT-196: pt-onine-schema-change --max-load paused if a status variable was passed 0 as the value.
  • PT-193: pt-table-checksum reported a misleading error if a column comment contained an apostrophe. For more information, see #1708749.
  • PT-187: In some cases, pt-table-checksum did not report that the same table contained different values on the master and slave.
  • PT-186: pt-online-schema-change --alter could fail if field names contained upper case characters. For more information, see #1705998.
  • PT-183: In some cases pt-mongodb-query-digest could not connect to a database using authentication.
  • PT-167: In some cases, pt-kill could ignore the value of the --busy-time parameter. For more information, see #1016272.
  • PT-161: When run with the --skip-check-slave-lag, the pt-table-checksum could could fail in some cases.

InnoDB Page Compression: the Good, the Bad and the Ugly

November 20, 2017 - 10:54am

In this blog post, we’ll look at some of the facets of InnoDB page compression.

Somebody recently asked me about the best way to handle JSON data compression in MySQL. I took a quick look at InnoDB page compression and wanted to share my findings.

There is also some great material on this topic that was prepared and presented by Yura Sorokin at Percona Live Europe 2017: https://www.percona.com/live/e17/sessions/percona-xtradb-compressed-columns-with-dictionaries-an-alternative-to-innodb-table-compression. Yura also implemented Compressed Columns in Percona Server.

First, the good part.

InnoDB page compression is actually really easy to use and provides a decent compression ratio. To use it, I just ran CREATE TABLE commententry (...) COMPRESSION="zlib"; – and that’s all. By the way, for my experiment I used the subset of Reddit comments stored in JSON (described here: Big Dataset: All Reddit Comments – Analyzing with ClickHouse).

This method got me a compressed table of 3.9GB. Compare this to 8.4GB for an uncompressed table and it’s about a 2.15x compression ratio.

Now, the bad part.

As InnoDB page compression uses “hole punching,” the standard Linux utils do not always properly support files created this way. In fact, to see the size “3.9GB” I had to use du --block-size=1 tablespace_name.ibd , as the standard ls -l tablespace_name.ibd shows the wrong size (8.4GB). There is a similar limitation on copying files. The standard way cp old_file new_file may not always work, and to be sure I had to use cp --sparse=always old_file new_file.

Speaking about copying, here’s the ugly part.

The actual time to copy the sparse file was really bad.

On a fairly fast device (a Samsung SM863), copying the sparse file mentioned above in its compressed size of 3.9GB took 52 minutes! That’s shocking, so let me repeat it again: 52 minutes to copy a 3.9GB file on an enterprise SATA SSD.

By comparison, copying regular 8.4GB file takes 9 seconds! Compare 9 sec and 52 mins.

To be fair, the NMVe device (Intel® SSD DC D3600) handles sparse files much better. It took only 12 seconds to copy the same sparse file on this device.

Having considered all this, it is hard to recommend that you use InnoDB page compression for serious production. Well, unless you power your database servers with NVMe storage.

For JSON data, the Compressed Columns in Percona Server for MySQL should work quite well using Dictionary to store JSON keys – give it a try!

This Week in Data with Colin Charles 15: Percona Live 2018 Call for Papers and Best Practices for Observability

November 17, 2017 - 6:10am

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

So we have announced the call for presentations for Percona Live Santa Clara 2018. Please send your submissions in!

As you probably already know, we have been expanding the content to be more than just MySQL and MongoDB. It really does include more open source databases: the whole of 2016 had a “time series” theme to it, and we of course love to have more PostgreSQL content (there have been tracks dedicated to PostgreSQL for sometime now). I found this one comment interesting recently, from John Arundel, “If you’re going to learn one database really well, make it Postgres.” I have been noticing newer developers jump on the PostgreSQL bandwagon. I presume much of this blog’s readership is still MySQL centric, but it will be interesting to see where this goes.

Charity Majors recently wrote Best Practices for Observability. In addition, her book alongside Laine Campbell is now available for purchase on Kindle: Database Reliability Engineering: Designing and Operating Resilient Database Systems. Highly recommended purchase. You can also get it on O’Reilly Safari (free month with those codes for Percona Live Europe Dublin attendees).

Are you using Google Cloud Spanner? It now has multi-region support, and has an updated SLA for 99.999% uptime. That’s basically no more than 5.25 minutes of downtime per year!

Releases
  • orchestrator 3.0.3 – auto-provisioning Raft nodes, native Consul support, SQLite or MySQL backed setups, web UI improvements and more. Solid release.
  • MongoDB 3.6 – you can download this soon.
  • MariaDB 10.1.29 – important changes to Mariabackup, InnoDB/XtraDB, and some security fixes
  • Apache Kylin 2.2 – OLAP for Hadoop, originally developed at eBay, has enhanced ACL support amongst other improvements.
  • Cassandra on Azure Cosmos DB
Link List Upcoming Appearances
  • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine it’s going to be be bigger and better for 2017, since it is now two days long!
Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Understanding how an IST donor is selected

November 15, 2017 - 8:11am

In a clustering environment, we often see a node that needs to be taken down for maintenance. For a node to rejoin, it should re-sync with the cluster state. In PXC (Percona XtraDB Cluster), there are 2 ways for the rejoining node to re-sync: State Snapshot Transfer (SST) and Incremental State Transfer (IST). SST involves a full data transfer (which could be time consuming). IST is an incremental data transfer whereby only missing write-sets are donated by a DONOR to the rejoining node (aka as JOINER).

In this article I will try to show how a DONOR for the IST process is selected.

Selecting an IST DONOR

First, a word about gcache. Each node retains some write-sets in its cache known as gcache. Once this gcache is full it is purged to make room for new write-sets. Based on gcache configuration, each node may retain a different span of write-sets. The wider the span, the greater the probability of the node acting as prospective DONOR. The lowest seqno in gcache can be queried using ( show status like 'wsrep_local_cached_downto' )

Let’s understand the IST DONOR algorithm with a topology and working example:

  • Say we have 3 node cluster: N1, N2, N3.
  • To start with, all 3 nodes are in sync (wsrep_last_committed is the same for all 3 nodes, let’s say 100).
  • N3 is schedule for maintenance and is taken down.
  • In meantime N1 and N2 processes workload, thereby moving them from 100 -> 1100.
  • N1 and N2 also purges the gcache. Let’s say wsrep_local_cached_downto for N1 and N2 is 110 and 90 respectively.
  • Now N3 is restarted and discovers that the cluster has made progress from 100 -> 1100 and so it needs the write-sets from (101, 1100).
  • It starts looking for a prospective DONOR.
    • N1 can service data from (110, 1100) but the request is for (101, 1100) so N1 can’t act as DONOR
    • N2 can service data from (90, 1100) and the request is for (101, 1100) so N2 can act as DONOR.
Safety gap and how it affects DONOR selection

So far so good. But can N2 reliably act as DONOR? While N3 is evaluating the prospective DONOR, what if N2 purges more data and now wsrep_local_cached_downto on N2 is 105? In order to accommodate this, the N3 algorithm adds a safety gap.

safety gap = (Current State of Cluster – Lowest available seqno from any of the existing node of the cluster) * 0.008

So the N2 range is considered to be (90 + (1100 – 90) * 0.008, 1100) = (98, 1100).

Can now N2 act as DONOR ? Yes: (98, 1100) < (101, 1100)

What if N2 had purged up to 95 and then N3 started looking for prospective DONOR?

In this case the N2 range would be (95 + (1100 – 95) * 0.008, 1100) = (103, 1100), ruling N2 out from the prospective DONOR list.

Twist at the end

Considering the latter case above (N2 purged up to 95), it has been proven that N2 can’t act as the IST DONOR and the only way for N3 to join is through SST.

What if I say that N3 still joins back using IST? CONFUSED?

Once N3 falls back from IST to SST it will select a SST donor. This selection is done sequentially and nominates N1 as the first choice. N1 doesn’t have the required write-sets, so SST is forced.

But what if I configure wsrep_sst_donor=N2  on N3? This will cause N2 to get selected instead of N1. But wait: N2 doesn’t qualify either as with safety gap, the range is (103, 1100).

That’s true. But the request has IST + SST request, so even though N3 ruled out N2 as the IST DONOR, a request is sent for one last try. If N2 can service the request using IST, it is allowed to do so.  Otherwise it falls back to SST.

Interesting! This is a well thought out algorithm from Codership: I applaud them for this and the many other important control functions that go on backstage of the galera cluster.

ZFS from a MySQL perspective

November 15, 2017 - 2:44am

Since the purpose of a database system is to store data, there is close relationship with the filesystem. As MySQL consultants, we always look at the filesystems for performance tuning opportunities. The most common choices in term of filesystems are XFS and EXT4, on Linux it is exceptional to encounter another filesystem. Both XFS and EXT4 have pros and cons but their behaviors are well known and they perform well. They perform well but they are not without shortcomings.

Over the years, we have developed a bunch of tools and techniques to overcome these shortcomings. For example, since they don’t allow a consistent view of the filesystem, we wrote tools like Xtrabackup to backup a live MySQL database. Another example is the InnoDB double write buffer. The InnoDB double write buffer is required only because neither XFS nor EXT4 is transactional. There is one filesystem which offers nearly all the features we need, ZFS.  ZFS is arguably the most advanced filesystem available on Linux. Maybe it is time to reconsider the use of ZFS with MySQL.

ZFS on Linux or ZoL (from the OpenZFS project), has been around for quite a long time now. I first started using ZoL back in 2012, before it was GA (general availability), in order to solve a nearly impossible challenge to backup a large database (~400 GB) with a mix of InnoDB and MyISAM tables. Yes, ZFS allows that very easily, in just a few seconds. As of 2017, ZoL has been GA for more than 3 years and most of the issues that affected it in the early days have been fixed. ZFS is also GA in FreeBSD, illumos, OmniOS and many others.

This post will hopefully be the first of many posts, devoted to the use of ZFS with MySQL. The goal here is not to blindly push for ZFS but to see when ZFS can help solve real problems. We will first examine ZFS and try to draw parallels with the architecture of MySQL. This will help us to better understand how ZFS works and behaves. Future posts will be devoted to more specific topics like performance, PXC, backups, compression, database operations, bad and poor use cases and sample configurations.

Some context

ZFS is a filesystem that was developed by Sun Microsystems and introduced for the first time in with OpenSolaris in 2005. ZFS is unique in many ways, let’s first have a look at its code base using the sloccount tool which provides an estimation of the development effort.

  • EXT4: 8.5 person-years
  • XFS: 17 person-years
  • ZFS: 77 person-years

In term of code base complexity, it is approaching 10 times the complexity of EXT4, the above graphic shows the scale. To put things in perspective, the sloccount development effort for Percona-Server 5.7 which is based on MySQL community 5.7, is estimated at 680 person-years. The ZoL development is sponsored by the Lawrence Livermore National Laboratory and the project is very active.

ZFS features

Why does ZFS need such a large code base? Well, in Linux, it functionally replaces MD (software raid), LVM (volume manager) and the filesystem. ZFS is really a transactional database designed to support filesystem operations. Let’s review the ZFS main features.

128 bits filesystem

That’s huge! According to Jeff Bonwick (https://blogs.oracle.com/bonwick/128-bit-storage:-are-you-high), the rest energy of such a storage device would be enough to boil the oceans.  It seems inconceivable that we’d ever need a larger filesystem.

Copy-on-write (COW)

When ZFS needs to update a record it does not overwrite it. Instead, it writes a new record, change the pointers and then frees up the old one if it is no longer referenced. That design is at the core of ZFS. It allows for features like free snapshots and transactions.

Snapshot

ZFS supports snapshots, and because of its COW architecture taking a snapshot is merely a matter of recording a transaction number and telling ZFS to protect the referenced records from its garbage collector. This is very similar to the InnoDB MVCC. If a read view is kept open, InnoDB keeps a copy of each of the rows that changed in the undo log, and those rows are not purged until the transaction commits.

Clone

A ZFS snapshot can be cloned and then written too. At this point, the clone is like a fork for the original data. There is no equivalent feature in MySQL/InnoDB.

Checksum

All the ZFS records have a checksum. This is exactly like the page checksums of InnoDB. If a record is found to have an invalid checksum, it is automatically replaced by a copy, provided one is available. It is normal to define a ZFS production with more than one copy of the data set. With ZFS, we can safely disable InnoDB checksums.

Compression

ZFS records can be compressed transparently. The most common algorithms are gzip and lz4. The data is compressed per record and the recordsize is an adjustable property. The principle is similar to transparent InnoDB page compression but without the need for punching holes. In nearly all the ZFS setups I have worked with, enabling compression helped performance.

Encryption

ZoL doesn’t support transparent encryption of the records yet, but the encryption code is currently under review. If all goes well, the encryption should be available in a matter of a few months. Once there, it will offer another option for encryption at rest with MySQL. That feature compares very well with InnoDB tablespace encryption.

Transactional

An fsync on ZFS is transactional. This comes mainly from the fact that ZFS uses COW. When a file is opened with O_SYNC or O_DSYNC, ZFS behaves like a database where the fsync calls represent commits. A direct benefit for MySQL is the possibility of disabling the InnoDB doublewrite buffer. The InnoDB doublewrite buffer is often a source of contention in a heavy write environment although the latest Percona Server releases have parallel doublewrite buffers that relieve most of the issue.

ZIL/SLOG

The transactional support in ZFS bears a huge price in term of latency, since the synchronous writes and fsyncs involve many random write IO operations. Since ZFS is transactional, it needs a transactional journal, the ZIL. ZIL stands for ‘ZFS Intent Log’. There is always a ZIL. The ZIL serves a purpose very similar to the InnoDB log files. The ZIL is written to sequentially, is fsynced often, and read from only for recovery after a crash. The goal is to delay random write IO operations by writing sequentially pending changes to a device. By default the ZIL delays the actual writes by only 5s (zfs_txg_timeout) but that’s still very significant. To help synchronous write performance, ZFS has the possibility of locating the ZIL on a Separate Intent Log (SLOG).

The SLOG device doesn’t need to be very large, a few GB is often enough, but it must be fast for sequential writes and fast for fsyncs. A fast flash device with good write endurance or spinners behind a raid controller with a protected write cache are great SLOG devices. Normally, the SLOG is on a redundant device like a mirror since losing the ZIL can be dramatic. With MySQL, the presence of a fast SLOG is extremely important for performance.

ARC/L2ARC

The ARC is the ZFS file cache. It is logically split in two parts, the ARC and the L2ARC. The ARC is the in memory file cache, while the L2ARC is an optional on disk cache that stores items that are evicted from the ARC. The L2ARC is especially interesting with MySQL because it allows the use of a small flash storage device as a cache for a large slow storage device. Functionally, the ARC is like the InnoDB buffer pool while the L2ARC is similar to tools like flashcache/bcache/dm-cache.

RAID

ZFS has its own way of dealing with disk. At the lowest level, ZFS can use the bare disks individually with no redundancy, a bit like JBOD devices used with LVM. Redundancy can be added with a mirror which is essentially a software RAID-1 device. These mirrors can then be striped together to form the equivalent of a RAID-10 array. Going further, there are RAIDZ-1, RAIDZ-2 and RAIDZ-3 which are respectively the equivalent of RAID-5, RAID-6 and RAID… Well, an array with 3 parities has no standard name yet. When you build a RAID array with Linux MD, you could have the RAID-5+ write hole issue if you do not have a write journal. The write journal option is available only in recent kernels and with the latest mdadm packages. ZFS is not affected by the RAID-5 write hole.

Self-healing

I already touched on this feature when I talked about the checksums. If more than one copy of a record is available and one of the copies is found to be corrupted, ZFS will return only a valid copy and will repair the damaged record. You can trigger a full check with the scrubcommand.

ZVOL block devices

Not only can ZFS manage filesystems, it can also offer block devices. The block devices, called ZVOLs, can be snapshotted and cloned. That’s a very handy feature when I want to create a cluster of similar VMs. I create a base image and then snaphot and create clones for all the VMs. The whole image is stored only once, and each clone contains only the records that have been modified since the original clone was created.

Send/Receive

ZFS allows you to send and receive snapshots. This feature is very useful to send data between servers. If there is already a copy of the data on the remote server, you can also send only the incremental changes.

Deduplication

ZFS can automatically hardlink together files (or records) that have identical content. Although interesting, if you have a lot of redundant data, the dedup feature is very intensive. I don’t see a practical use case of dedup for databases except maybe for a backup server.

This concludes this first post about ZFS, stay tuned for more.

 

Webinars on Wednesday November 15, 2017: Proxy Wars and Percona Software Update for Q4

November 14, 2017 - 10:00am

Do you need to get to grips with MySQL proxies? Or maybe you could do with discovering the latest developments and plans for Percona’s software?

Well, wait no more because …

on Wednesday November 15, 2017, we bring you a webinar double bill.

Join Percona’s Chief Evangelist, Colin Charles as he presents “The Proxy Wars – MySQL Router, ProxySQL, MariaDB MaxScale” at 7:00 am PST / 10:00 am EST (UTC-8).

Reflecting on his past experience with MySQL proxies, Colin will provide a short review of three open source solutions. He’ll run through a comparison of MySQL Router, MariaDB MaxScale and ProxySQL and talk about the reasons for using the right tool for an application.

Register for Colin’s Webinar

 

Meanwhile, return a little later in the day at 10:00 am PST / 1:00 pm EST (UTC-8) to hear Percona CEO Peter Zaitsev discuss what’s new in Percona open source software. In “Percona Software News and Roadmap Update – Q4 2017”, Peter will talk about new features in Percona software, show some quick demos and share highlights from the Percona open source software roadmap. He will also talk about new developments in Percona commercial services and finish with a Q&A.

Join Peter’s Webinar

 

You are, of course, very welcome to register for either one or both webinars. Please register for your place soon!

Peter Zaitsev, Percona CEO and Co-Founder

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 30+ countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University, where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. Fortune and DZone have both tapped Peter as a contributor, and his recent ebook Practical MySQL Performance Optimization is one of percona.com’s most popular downloads.

Colin Charles, Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team for MariaDB Server in 2009, worked in MySQL since 2005 and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and OpenOffice.org projects. He’s well known within many open source communities and has spoken on the conference circuit.

 

Common MongoDB Topologies

November 14, 2017 - 2:34am

In this blog post, we’ll look at some common MongoDB topologies used in database deployments.

The question of the best architecture for MongoDB will arise in your conversations between developers and architects. In this blog, we wanted to go over the main sharded and unsharded designs, with their pros and cons.

We will first look at “Replica Sets.” Replica sets are the most basic form of high availability (HA) in MongoDB, and the building blocks for sharding. From there, we will cover sharding approaches and if you need to go that route.

Replica Set

From the MongoDB manual:

replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.

Short of sharding, this is the ideal way to run MongoDB. Things like high availability, failover and recovery become automated with no action typically needed. If you expect large growth or more than 200G of data, you should consider using this plus sharding to reduce your mean time to recovery on a restore from backup.

Pros:

  • Elections happen automatically and unnoticed by application setup with retry
  • Rebuilding a new node, or adding an additional read-only node, is as easy as  “rs.add(‘hostname’)”
  • Can skip building indexes to improve write speed
  • Can have members
    • hidden in other geographic location
    • delayed replication
    • analytics nodes via taggings

Cons:

  • Depending on the size of the oplog used, you can use  10-100+% more space to hold to change data for replication
  • You must scale up not out meaning more expensive hardware
  • Recovery using a sharded approach is faster than having is all on a single node ( parallelism)
Flat Mongos (not load balanced)

This is one of MongoDB’s more suggested deployment designs. To understand why, we should talk about the driver and the fact that it supports a CSV list of mongos hosts for fail-over.

You can’t distribute writes in a single replica set. Instead, they all need to go to the primary node. You can distribute reads to the secondaries using Read Preferences. The driver keeps track of what is a primary and what is a secondary and routes queries appropriately.

Conceptually, the driver should have connections bucketed into the mongos they go to. This allowed the 3.0+ driver to be semi-stateless and ask any connection to a specific mongos to preform a getMore to that mongos. In theory, this allows slightly more concurrency. Realistically you only use one mongos, since this is only a fail-over system.

Pros:

  • Mongos is on its own gear, so it will not run the application out of memory
  • If Mongos doesn’t respond, the driver “fails-over” to the next in the list
  • Can be put closer to the database or application depending on your network and sorting needs

Cons:

  • You can’t use mongos in a list evenly, so it is only good for fail-over (not evenness) in most drivers. Please read specific drivers for support, and test thoroughly.
Load Balanced (preferred if possible)

According to the Mongo docs:

You may also deploy a group of mongos instances and use a proxy/load balancer between the application and the mongos. In these deployments, you must configure the load balancer for client affinity so that every connection from a single client reaches the same mongos.

This is the model used by platforms such as ObjectRocket. In this pattern, you move mongos nodes to their own tier but then put them behind a load-balancer. In this design, you can even out the use of mongos by using a least-connection system. The challenge, however, is new drivers have issues with getMores. By this we mean the getMore selects a new random connection, and the load balancer can’t be sure which mongos should get it. Thus it has a one in N (number of mongos) chance of selecting the right one, or getting a “Cursor Not Found” error.

Pros:

  • Ability to have an even use of mongos
  • Mongos are separated from each other and the applications to prevent memory and CPU contention
  • You can easily remove or add mongos to help scale the layer without code changes
  • High availability at every level (multiple mongos, multiple configs, ReplSet for high availability and even multiple applications for app failures)

Cons:

  • If batching is used, unless switched to an IP pinning algorithm (which loses evenness) you can get “Cursor Not Found” errors due to the wrong mongos getting getMore and bulk connector connections
App-Centric Mongos

By and large, this is one of the most typical deployment designs for MongoDB sharding. In it, we have each application host talking to a mongos on the local network interface. This ensures there is very little latency to the application from the mongos.

Additionally, this means if a mongos fails, at most its own host is affected instead of the wider range of all application hosts.

Pros:

  • Local mongos on the loopback interface mean low to no latency
  • Limited scope of outage if this mongos fails
  • Can be geographically farther from the data storage in cases where you have a DR site

Cons:

  • Mongos is a memory hog; you could steal from your application memory to support running it here
    • Made worse with large batches, many connections, and sorting
  • Mongos is single-threaded and could become a bottleneck for your application
  • It is possible for a slow network to cause bad decision making, including duplicate databases on different shards. The functional result is data writing intermittently to two locations, and a DBA must remediate that at some point (think MMM VIP ping pong issues)
  • All sorting and limits are applied on the application host. In cases where the sort uses an index this is OK, but if not indexed the entire result set must be held in memory by mongos and then sorted, then returned the limited number of results to the client. This is the typical cause of mongos OOM’s errors due to the memory issues listed before.
Conclusion

The topologies are above cover many of the deployment needs for MongoDB environments. Hope this helps, and list any questions in the comments below.

Percona Live Open Source Database Conference 2018 Call for Papers Is Now Open!

November 13, 2017 - 10:09am

Announcing the opening of the Percona Live Open Source Database Conference 2018 in Santa Clara, CA, call for papers. It will be open from now until December  22, 2017.

Our theme is “Championing Open Source Databases,” with topics of MySQL, MongoDB and other open source databases, including PostgreSQL, time series databases and RocksDB. Sessions tracks include Developers, Operations and Business/Case Studies.

We’re looking forward to your submissions! We want proposals that cover the many aspects and current trends of using open source databases, including design practices, application development, performance optimization, HA and clustering, cloud, containers and new technologies, as well as new and interesting ways to monitor and manage database environments.

Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI? Share your case studies, best practices and technical knowledge with an engaged audience of open source peers.

Possible topics include:

  • Application development. How are you building applications using open source databases to power the data layers? What languages, frameworks and data models help you to build applications that your customers love? Are you using MySQL, MongoDB, PostgreSQL, time series or other databases?  
  • Database performance. What database issues have you encountered while meeting new application and new workload demands? How did they affect the user experience? How did you address them? Are you using WiredTiger or a new storage engine like RocksDB? Have you moved to an in-memory engine? Let us know about the solutions you have found to make sure your applications can get data to users and customers.
  • DBaaS and PaaS. Are you using a Database as a Service (DBaaS) in the public cloud, or have you rolled out your own? Are you on AWS, Google Cloud, Microsoft Azure or RackSpace/ObjectRocket? Are you using a database in a Platform as a Service (PaaS) environment? Tell us how it’s going.
  • High availability. Are your applications a crucial part of your business model? Do they need to be available at all times, no matter what? What database challenges have you come across that impacted uptime, and how did you create a high availability environment to address them?
  • Scalability. Has scaling your business affected database performance, user experience or the bottom line? How are you addressing the database environment workload as your business scales? Let us know what technologies you used to solve issues.
  • Distributed databases. Are you moving toward a distributed model? Why? What is your plan for replication and sharding?
  • Observability and monitoring. How do we design open source database deployment with observability in mind? Are you using Elasticsearch or some other analysis tool? What tools are you using to monitor data? Grafana? Prometheus? Percona Monitoring and Management? How do you visualize application performance trends for maximum impact?
  • Container solutions. Do you use Docker, Kubernetes or other containers in your database environment? What are the best practices for using open source databases with containers and orchestration? Has it worked out for you? Did you run into challenges and how did you solve them?
  • Security. What security and compliance challenges are you facing and how are you solving them?
  • Migrating to open source databases. Did you recently migrate applications from proprietary to open source databases? How did it work out? What challenges did you face, and what obstacles did you overcome? What were the rewards?
  • What the future holds. What do you see as the “next big thing”? What new and exciting features just released? What’s in your next release? What new technologies will affect the database landscape? AI? Machine learning? Blockchain databases? Let us know what you see coming.

The Percona Live Open Source Database Conference 2018 Call for Papers is open until December 22, 2017. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Share your open source database experiences with peers and professionals in the open source community by presenting a:

  • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
  • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. Encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
  • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

Speaking at Percona Live is a great way to build your personal and company brands. If selected, you will receive a complimentary full conference pass!

Submit your talks now.

Tips for Submitting to Percona Live

Include presentation details, but be concise. Clearly state:

  • Purpose of the talk (problem, solution, action format, etc.)
  • Covered technologies
  • Target audience
  • Audience takeaway

Keep proposals free of sales pitches. The Committee is looking for case studies and in-depth technical talks, not ones that sound like a commercial.

Be original! Make your presentation stand out by submitting a proposal that focuses on real-world scenarios, relevant examples, and knowledge transfer.

Submit your proposals as soon as you can – the call for papers is open until December 22, 2017.

This Week in Data with Colin Charles 14: A Meetup in Korea and The Magic Quadrant

November 10, 2017 - 3:03am

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

We’re close to opening up the call for papers for Percona Live Santa Clara 2018 and I expect this to happen next week. We also have a committee all lined up and ready to vote on submissions.

In other news, I’ve spent some time preparing for the Korean MySQL Power Group meetup to be held in Seoul this Saturday, 11 November 2017. This is a great opportunity for us to extend our reach in Asia. This meetup gathers together top DBAs from Internet companies that use MySQL and related technologies.

Gartner has released their Magic Quadrant for Operational Database Management Systems 2017. Reprint rights have been given to several vendors, e.g. EnterpriseDB and Microsoft. I’m sure you can find other links. The Magic Quadrant features far fewer database vendors now, many have been dropped. What’s your take on it?

Releases

This was a slow release week. Check out:

Link List Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

MySQL and Linux Context Switches

November 9, 2017 - 11:50am

In this blog post, I’ll look at MySQL and Linux context switches and what is the normal number per second for a database environment.

You might have heard many times about the importance of looking at the number of context switches to indicate if MySQL is suffering from the internal contention issues. I often get the question of what is a “normal” or “acceptable” number, and at what point should you worry about the number of context switches per second?

First, let’s talk about what context switches are in Linux. This StackOverflow Thread provides a good discussion, with a lot of details, but basically it works like this:  

The process (or thread in MySQL’s case) is running its computations. Sooner or later, it has to do some blocking operation: disk IO, network IO, block waiting on a mutex or yield. The execution switches to the other process, and this is called voluntary context switch.On the other hand, the process/thread may need to be preempted by the scheduler because it used an allotted amount of CPU time (and now other tasks need to run) or because it is required to run high priority task. This is called involuntary context switches. When all the process in the system are added together and totaled, this is the system-wide number of context switches reported (using, for example, vmstat):

root@nuc2:~# vmstat 10 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st 17  0      0 12935036 326152 2387388    0    0     0     5     0      1  9  0 91  0  0 20  0      0 12933936 326152 2387384    0    0     0     3 32228 124791 77 22  1  0  0 17  0      0 12933348 326152 2387364    0    0     0    11 33212 124575 78 22  1  0  0 16  0      0 12933380 326152 2387364    0    0     0    78 32470 126100 78 22  1  0  0

This is a global number. In many cases, however, it is better to look at it as context switches per CPU logical core. This is because cores execute tasks independently. As such, they have mostly independent causes for context switches. If you have a large number of cores, there can be quite a difference:

The number of context switches per second on this system looks high (at more than 1,000,000). Considering it has 56 logical cores, however, it is only about 30,000 per second per logical core (which is not too bad).

So how do we judge if the number of context switches is too high in your system? One answer is that it is too high if you’re wasting too much CPU on context switches. This brings up the question: how many context switches can the system handle if it is only doing context switches?

It is easy to find this out!  

Sysbench has a “threads” test designed specifically to measure this. For example:

sysbench --thread-locks=128 --time=7200 --threads=1024 threads run

Check the vmstat output or the Context Switches PMM graph:

We can see this system can handle up to 35 million context switches per second in total (or some 500K per logical CPU core on average).

I don’t recommend using more than 10% of CPU resources on context switching, so I would try to keep the number of the context switches at no more than 50K per logical CPU core.

Now let’s think about context switches from the other side: how many context switches do we expect to have at the very minimum for given load? Even if all the stars align and your query to MySQL doesn’t need any disk IO or context switches due to waiting for mutexes, you should expect at least two context switches: one to the client thread which processes the query and one for the query response sent to the client.    

Using this logic, if we have 100,000 queries/sec we should expect 200,000 context switches at the very minimum.

In the real world, though, I would not worry about contention being a big issue if you have less than ten context switches per query.

It is worth noting that in MySQL not every contention results in a context switch. InnoDB implements its own mutexes and RW-locks, which often try to “spin” to wait for a resource to become available. This wastes CPU time directly rather than doing a context switch.

Summary:

  • Look at the number of context switches per logical core rather than the total for easier-to-compare numbers
  • Find out how many context switches your system can handle per second, and don’t get too concerned if your context switches are no more than 10% of that number
  • Think about the number of context switches per query: the minimum possible is two, and values less than 10 make contention an unlikely issue
  • Not every MySQL contention results in a high number of context switches

Using Prometheus to Check for auto_increment Value Exhaustion

November 8, 2017 - 11:48am

In this blog post, we’ll look at how to use Prometheus to check for auto_increment value exhaustion.

One of the proactive tasks DBAs perform is checking if a field defined as auto_increment is about to reach the maximum allowed value of the int definition. For example, if a field is defined as smallint unsigned and your next auto_increment value is 65535 (the maximum possible value for smallint), you are in trouble. You should modify the column definition to at least mediumint unsigned, (which has a maximum value is 16777215) before you start to silently lose data or are not able to insert more rows.

Prometheus and the mysqld exporter

When you deploy the Percona Monitoring and Management (PMM) platform, you’re not only getting a state-of-the-art visualization tool, but you also get the Prometheus time-series database with it as part of the package. Prometheus connects to exporters running on a PMM Client and aggregates metrics collected by those exporters. For this case, the exporter that we care about is for MySQL (https://github.com/percona/mysqld_exporter), and specifically the metrics gathered by the collect.auto_increment.columns collector.

Tablestats

To know if your PMM installation is collecting this data, you should check with pmm-admin. If your output comes with tablestats=OFF, you are not collecting this metric:

[root@localhost]# pmm-admin list | grep tablestats mysql:metrics ip-10-0-0-207 42002 YES percona:***@unix(/var/lib/mysql/mysql.sock) tablestats=OFF

By default, the tablestats feature is enabled (unless your MySQL server has more than 1000 tables). To enable it beyond that limit, re-add the metrics using the --disable-tablestats-limit parameter with a larger value than the number of your tables (described in https://www.percona.com/doc/percona-monitoring-and-management/pmm-admin.html):

pmm-admin add mysql:metrics --disable-tablestats-limit 10000

Prometheus query

Prometheus not only stores the current value of the auto_increment column (mysql_info_schema_auto_increment_column), but also has the maximum allowed value per int type on it (mysql_info_schema_auto_increment_column_max). We can get the percentage of values used for the column with a single query, like:

mysql_info_schema_auto_increment_column*100/mysql_info_schema_auto_increment_column_max (available thanks to Max Bubenick)

Now, where do I execute the query? There are three options: the Prometheus GUI, using the API or the MySQL Table Statistics dashboard in PMM.

GUI

Every PMM installation allows you to access Prometheus by going to the URL http://<serverIP>/prometheus. Put the query in the text field, and hit Execute. The results look like this:

Prometheus GUI

 

You can see a row per existent table. In this case we can see that the table “sbtest20” is using the 95% of the available values. This makes it a candidate for a column definition modification. The remaining tables are using around 0.0002% of the values, so there’s plenty room to grow.

API

This is probably the best approach for checking this regularly. You can use the REST API to query metrics (as explained in https://prometheus.io/docs/querying/api/), like this:

curl -g 'http://127.0.0.1:80/prometheus/api/v1/query?query=(mysql_info_schema_auto_increment_column{schema="percona",table="sbtest20"}*100/mysql_info_schema_auto_increment_column_max{schema="percona",table="sbtest20"})'

See the new data? If you want to query the metrics for a specific table like I do, then just add the schema and table name: {schema=”percona”,table=”sbtest20″}:

[root@localhost ~]# curl -g 'http://127.0.0.1:80/prometheus/api/v1/query?query=(mysql_info_schema_auto_increment_column{schema="percona",table="sbtest20"}*100/mysql_info_schema_auto_increment_column_max{schema="percona",table="sbtest20"})' | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 220 100 220 0 0 1646 0 --:--:-- --:--:-- --:--:-- 1666 { "data": { "result": [ { "metric": { "column": "id", "instance": "localhost.localdomain", "job": "mysql", "schema": "percona", "table": "sbtest20" }, "value": [ 1509395634.736, "95.2755905511811" ] } ], "resultType": "vector" }, "status": "success" }

The part of the output that matters here is:

"value": [ 1509395634.736, "95.2755905511811" ]

As we saw in the GUI output, the table sbtest20 already used 95% of the available values for the column int definition. In this case, it is tinyint signed and the current maximum value is 120 (out of 127):

mysql> show create table sbtest20G *************************** 1. row *************************** Table: sbtest20 Create Table: CREATE TABLE `sbtest20` ( `id` tinyint(10) NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=121 DEFAULT CHARSET=latin1 1 row in set (0.02 sec) mysql> select max(id) from sbtest20; +---------+ | max(id) | +---------+ | 120 | +---------+ 1 row in set (0.09 sec)

PMM’s MySQL Table Statistics dashboard

This information is also available on PMM, just go to the dashboard called MySQL Table Statistics and look for the section “Top Tables by Auto Increment Usage”:

The above graph is from Percona’s PMM demo setup, which is available publicly to everyone. You can access that particular chart by going to https://pmmdemo.percona.com/graph/dashboard/db/mysql-table-statistics?refresh=1m&panelId=53&fullscreen&orgId=1.

Conclusion

There are several ways to check for auto_increment capacity; this is just one of them. But what really matters is how PMM collects a rich set of information, and that’s is available for just one API request of distance. Note that each additional metric series collected by Prometheus adds additional work on the PMM Server, so watch that your server doesn’t enter what is known as Rushed Mode. This is a condition where the PMM Server is experiencing a high volume of writes and is struggling to keep up. You can identify this condition (since version 1.4) by examining the Prometheus dashboard, and in particular the graph Prometheus Problems. Check for the metric series called “Has been in Rushed Mode” and see if is greater than 0.

Enjoy PMM!

Upcoming MongoDB Security Webinar November 8, 2017: Enhanced Security Using LDAP Authentication

November 7, 2017 - 7:58am

Join Percona’s Senior Technical Services Engineer Adamo Tonete as he presents MongoDB Security Webinars: Enhanced Security Using LDAP Authentication on Wednesday, November 8, 2017 at 11:00 am PST / 2:00 pm EST (UTC-8).

  • Experience: Intermediate
  • Tags: SysAdmin, DBAs

In this webinar, we are going to demonstrate how to configure Percona Server for MongoDB in a secure replica-set using LDAP authentication. LDAP authentication allows you to use a single password server to authenticate users. LDAP authentication is only available only on MongoDB Enterprise, but Percona offers this feature free of charge with Percona Server for MongoDB. Check out what other Enterprise features Percona Server for MongoDB provides for free.

Register Now

Register for the webinar.

Adamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL Database Administrator for three years. As the main DBA of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 24×7 environment. Before that, he worked as a Microsoft SQL Server DBA at a large e-commerce company. Here he mainly focused on performance tuning and automation. Adamo has almost eight years of experience working as a DBA. In the past three years he has moved to NoSQL technologies without giving up relational databases. He likes to play video games and to study everything that is related to engines. Adamo lives with his wife in São Paulo, Brazil.

MongoDB Security: Using LDAP Authentication

November 6, 2017 - 2:47pm

In this blog post, we’ll focus on MongoDB security and discuss how to configure Percona Server for MongoDB to authenticate a user with an LDAP server.

It is important to mention that Percona Server for MongoDB features LDAP authentication free of charge for everyone in all our versions. This feature is not available in the MongoDB Community versions. Make sure you’re using https://www.percona.com/software/mongo-database/percona-server-for-mongodb.

But what is LDAP, and how does it can help me with MongoDB security and provide a secure environment.

LDAP stands for Lightweight Directory Access Protocol, and it is a protocol to talk to a domain controller to provide authentication and/or authorization to users in a centralized way. It means that a user has only one single password to sign-in to multiple services. User management is centralized and maintained by a team. It is really not as simple as that, but we’re going to consider only this functionality for this blog. For more information about LDAP, please see: https://www.openldap.org/doc/admin24/

Before running MongoDB, we need to configure a few packages. For the next step, we are using a pre-existing domain called perconatest.com, running on machine ldapserver.perconatest.com with the USA organization group and one user called support1.

The first package you need to install is sasl2-bin or cyrus-sasl (depending on your OS). This package talks to the LDAP server in order to validate if a user password is valid, and Percona Server for MongoDB requires it.

Once we install (in this case) the cyrus package, we create/edit a few config files:

  1. Edit the /etc/default/saslauthd. Change enable auto-start yes and the auth mechanisms to LDAP:
    Start=yes MECHANISMS="ldap"
    This change auto-starts the saslauthd process, and configures its auth mechanisms to ldap.
  2. If the installation hasn’t created a file on /etc/saslauthd.conf, please create one with the following values:

    ldap_servers: ldap://ldapserver.perconatest.com:389 ldap_search_base: ou=database,dc=perconatest,dc=com ldap_filter: (uid=%u)
    This file means the sasauthd daemon tries to connect to the specified domain and run a search for the user we’ve passed as an argument in the USA Organizational Group. It is possible to run an LDAP server with a secure connection, but it is out of the scope of this blog post.

     

  3. With that configuration done, the following test must result in an OK (confirming the cyrus package is correctly configured):

    $ > sudo testsaslauthd -u dba -p 1234 -f /var/run/saslauthd/mux 0: OK "Success."

If the steps above returned an “OK”, we are good to configure MongoDB to accept LDAP authentication.

It is important to mention this integration only authenticates the users. The authorization process is still on the database layer. This is why we need to create the user using the $external database in the following commands.

In order to setup MongoDB, please follow the next steps:

  1. Download the latest Percona Server for MongoDB version here:
    wget https://www.percona.com/downloads/percona-server-mongodb-LATEST/percona-server-mongodb-3.4.9-2.9/binary/tarball/percona-server-mongodb-3.4.9-2.9-xenial-x86_64.tar.gz --2017-11-05 19:34:19-- https://www.percona.com/downloads/percona-server-mongodb-LATEST/percona-server-mongodb-3.4.9-2.9/binary/tarball/percona-server-mongodb-3.4.9-2.9-xenial-x86_64.tar.gz Resolvendo www.percona.com (www.percona.com)... 74.121.199.234 Conectando-se a www.percona.com (www.percona.com)|74.121.199.234|:443... conectado.
  2. Configure the mongod.conf file into the /etc/sasl2 folder (as root):
    # mkdir -p /etc/sasl2 # echo 'pwcheck_method: saslauthd saslauthd_path: /var/run/saslauthd/mux log_level: 5 mech_list: plain' > /etc/sasl2/mongodb.conf
  3. Edit mongodb config, or add startup parameters in order to use the sasauthd library to validate the users and passwords. If using a config file:
    setParameter:    authenticationMechanisms: PLAIN,SCRAM-SHA-1
    If using startup parameters:
    --setParameter authenticationMechanisms=PLAIN,SCRAM-SHA-1
    Make sure the authentication is configured to on. We wrote a blog post about it a few months ago.

     

  4. Create the first user as root. Considering the process is up and running, we need to create an administrator user. For this example, we will create a root user called admin with a “root” role (meaning this user can perform any operation in the database):

    mongo > use admin > db.createUser({user : 'superAdmin', pwd: '123', roles :['root']})

  5. Create the standard user using LDAP authentication. There is no password saved on the admin database when we perform the next operation. The following command creates a user based on LDAP, and the password verification is performed outside of the database. The cyrus library either answers OK or NOK for the validation and the authorization document (roles) is still managed by the database:

    use admin > db.auth('admin','1234') 1 > db.getSiblingDB("$external").createUser({ user : 'support', roles: [ {role : "read", db: 'percona'} ] }) Successfully added user: { "user" : "support1", "roles" : [   {     "role" : "read",     "db" : "percona"  }   ] }

  6. Test the user. Use the following command to log in using an LDAP user. Please notice the digestPassword is false, as the database calls another process to validate the user account:

    db.getSiblingDB("$external").auth( ... { ... mechanism: "PLAIN", ... user: 'utest', ... pwd: '123', ... digestPassword: false ... } ... ) 1

Configuring MongoDB security (and specifically Percona Server for MongoDB) to use LDAP is very straightforward, and this setup can help you have a more secure environment – with passwords managed by a central domain.

I hope you find this article useful! Please feel free to contact me @AdamoTonete or @percona on Twitter anytime!

If you’ve never worked with those technologies before, don’t miss our webinar on Wednesday, November 8, 2017. We are going to configure an LDAP and Percona Server for MongoDB to work together from scratch!

This Week in Data with Colin Charles 13: MariaDB, M18 and YugaByte

November 3, 2017 - 12:02pm

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

This week we saw the closing of the CFPs for MariaDB’s user conference, M18. Peter Zaitsev and I submitted for Percona (don’t forget that there is also a developer’s conference tacked on to this event). We don’t have high expectations of getting a talk there, but hey – you never know! I submitted a talk on running MariaDB in the cloud (either hosted on Amazon RDS or Rackspace) or in your own compute instance. Another talk on capacity planning seemed to make sense as well.

An event Percona should be at is SCALE 16x, though I’ll admit I’m a bit surprised to see that there isn’t a dedicated MySQL track this year (the PostgreSQL track is still there).

In other news, we’ve seen a new database company gain funding: YugaByte. They received $8 million, and their leadership tends to be ex-Facebook people who have worked on Apache HBase.

MariaDB Corporation raised a total of $54m in 2017, with $27 being led by Alibaba Group in their latest round (completing the Series C funding round). The Reg reports claims that MariaDB has access to more than 60 million devs (is this number even higher than the MySQL developers out there?). The Techcrunch take. There hasn’t been any reporting of this in mainstream press like the Wall Street Journal or the Financial Times.

Releases Link List Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Percona Monitoring and Management 1.4.1 Is Now Available

November 2, 2017 - 4:51pm

Percona announces the release of Percona Monitoring and Management 1.4.1 on Thursday, November 2nd, 2017. This release contains fixes to bugs found after Percona Monitoring and Management 1.4.0 was released. It also introduces two important improvements. We replaced the btrfs file system with XFS in AMI and OVF images, and the Prometheus dashboard has been enhanced to offer more information.

For install and upgrade instructions, see Deploying Percona Monitoring and Management.

Improvements
  • PMM-1567: The btrfs file system has been replaced with XFS in AMI and OVF images to meet the requirements of AWS Marketplace.

  • PMM-1594: In Metrics Monitor, the Prometheus dashboard has been updated to show more information about the running Prometheus jobs and help estimate their efficiency.

Bug fixes
  • PMM-1620: In some cases, PMM could not be upgraded to version 1.4.0 by using the Update button on the landing page.

  • PMM-1633QAN would show error List of Tables is Empty for instances having been upgraded from earlier releases of PMM, due to incorrect values being stored in the database. This has been addressed to identify the incorrect values and replace with accurate schema and table information.

  • PMM-1634: The Advanced Data Exploration dashboard did not always display data points from external:metrics monitoring services due to a too restrictive Grafana Template filter.

  • PMM-1636: Special characters prevented the removal of external:metrics services using the pmm-admin remove command.

MySQL vs. MariaDB: Reality Check

November 2, 2017 - 10:55am

In this blog, we’ll provide a comparison between MySQL vs. MariaDB (including Percona Server for MySQL).

Introduction

The goal of this blog post is to evaluate, at a higher level, MySQL, MariaDB and Percona Server for MySQL side-by-side to better inform the decision making process. It is largely an unofficial response to published comments from the MariaDB Corporation.

It is worth noting that Percona Server for MySQL is a drop-in compatible branch of MySQL, where Percona contributes as much as possible upstream. MariaDB Server, on the other hand, is a fork of MySQL 5.5. They cherry-picked MySQL features, and don’t guarantee drop-in compatibility any longer.

MySQL Percona Server for MySQL* MariaDB Server Protocols MySQL protocol over port 3306, X Protocol over port 33060 MySQL protocol over port 3306, X Protocol over port 33060 MySQL protocol, MariaDB Server extensions Community –
Source Code Open Source Open Source Open Source Community – Development Open Source, contributions via signing the Oracle Contributor Agreement (OCA) Open Source Open Source, contributions via the new BSD license or signing the MariaDB Contributor Agreement (MCA) Community – Collaboration Mailing list, forums, bugs system Mailing list, forums, bugs system (Jira, Launchpad) Mailing list, bugs system (Jira), IRC channel Core –
Replication MySQL replication with GTID MySQL replication with GTID MariaDB Server replication, with own GTID, compatible only if MariaDB Server is a slave to MySQL, not vice versa Core –
Routing MySQL Router (GPLv2) ProxySQL (GPLv3) MariaDB MaxScale (Business Source License) Core –
Partitioning Standard Standard Standard, with extra engines like SPIDER/CONNECT that offer varying levels of support Tool –
Editing MySQL Workbench for Microsoft Windows, macOS, and Linux MySQL Workbench for Microsoft Windows, macOS, and Linux Webyog’s SQLYog for Microsoft Windows (MySQL Workbench notes an incompatible server) Tool –
Monitoring MySQL Enterprise Monitor Percona Monitoring & Management (PMM) (100% open source) Webyog’s Monyog Scalability –
Client Connections MySQL Enterprise Threadpool Open Source Threadpool with support for priority tickets Open Source Threadpool Scalability –
Clustering MySQL Group Replication MySQL Group Replication, Percona XtraDB Cluster (based on a further engineered Galera Cluster) MariaDB Enterprise Cluster (based on Galera Cluster) Security –
Encryption Tablespace data-at-rest encryption. Amazon KMS, Oracle Vault Enterprise Edition Tablespace data-at-rest encryption with Keyring Vault plugin Tablespace and table data-at-rest encryption. Amazon KMS, binlog/redo/tmp file with Aria tablespace encryption Security –
Data Masking ProxySQL data masking ProxySQL data masking MariaDB MaxScale data masking Security –
Firewall MySQL Enterprise Firewall ProxySQL Firewall MariaDB MaxScale Firewall Security –
Auditing MySQL Enterprise Audit Plugin Percona Audit Plugin (OSS) MariaDB Audit Plugin (OSS) Analytics No ClickHouse MariaDB ColumnStore SQL –
Common Table Expressions In-development for MySQL 8.0 (now a release candidate) In-development for MySQL 8.0 (now a release candidate) Present in MariaDB Server 10.2 SQL –
Window Functions In-development for MySQL 8.0 (now a release candidate) In-development for MySQL 8.0 (now a release candidate) Present in MariaDB Server 10.2 Temporal –
Log-based rollback No No In development for MariaDB Server 10.3 Temporal – system versioned tables No No In development for MariaDB Server 10.3 JSON JSON Data type, 21 functions JSON Data type, 21 functions No JSON Data Type, 26 functions Official
client connectors C (libmysqlclient), Java, ODBC, .NET, Node.js, Python, C++, mysqlnd for PHP C (libmysqlclient), Java, ODBC, .NET, Node.js, Python, C++, mysqlnd for PHP C (libmariadbclient), Java, ODBC Usability – CJK Language support Gb18030, ngram & MeCab for InnoDB full-text search Gb18030, ngram & MeCab for InnoDB full-text search No Monitoring – PERFORMANCE
_SCHEMA Thorough instrumentation in 5.7, sys schema included Thorough instrumentation in 5.7, sys schema included Instrumentation from MySQL 5.6, sys schema not included Security – Password authentication sha256_password (with caching_sha2_password in 8.0) sha256_password (with caching_sha2_password in 8.0) ed25519 (incompatible with sha256_password) Security –
Secure out of the box validate_password on by default, to choose a strong password at the start validate_password on by default, to choose a strong password at the start No Usability – Syntax differences EXPLAIN FOR CONNECTION <thread_id> EXPLAIN FOR CONNECTION <thread_id> SHOW EXPLAIN FOR <thread_id> Optimiser –
Optimiser Tracing Yes Yes No Optimiser –
Optimiser Hints Yes Yes No DBA –
Super readonly mode Yes Yes No Security – Password expiry Yes Yes No Security – Password last changed? Password lifetime? Yes Yes No Security – VALIDATE_PASSWORD
_STRENGTH() Yes Yes No Security – ACCOUNT LOCK/UNLOCK Yes Yes No Usability – Query Rewriting Yes Yes No GIS – GeoJSON &
GeoHash functionality Yes Yes Incomplete Security – mysql_ssl_rsa_setup Yes Yes No (setup SSL connections manually) MySQL Utilities Yes Yes No Backup locks No (in development for 8.0) Yes No Usability – InnoDB memcached interface Yes Yes No

*Note. Third-party software (such as ProxySQL and ClickHouse) used in conjunction with Percona Server for MySQL is not necessarily covered by Percona Support services.

To get a higher level view of what Percona Server for MySQL offers compared to MySQL, please visit: Percona Server Feature Comparison. Read this for a higher level view of compatibility between MariaDB Server and MySQL written by MariaDB Corporation.

Open Community

MariaDB Server undoubtedly has an open community, with governance mixed between MariaDB Foundation and MariaDB Corporation. There are open developer meetings on average about twice per year, two mailing lists (one for developers and users), an IRC channel and an open JIRA ticket system that logs bugs and feature requests.

Percona Server for MySQL also has an open community. Developer meetings are not open to general contributors, but there is a mailing list, an IRC channel and two systems – Launchpad and JIRA – for logging bugs and feature requests.

MySQL also has an open community where developer meetings are also not open to general contributors. There are many mailing lists, there are a few IRC channels and there is the MySQL bugs system. The worklogs are where the design for future releases happens, and these are opened up when their features are fully developed and  source-code-pushed.

From a source code standpoint, MySQL makes pushes to Github when a release is made; whereas open source development happens for Percona Server for MySQL and MariaDB Server on Github.

Feature development on MySQL continues in leaps and bounds, and Oracle has been an excellent steward of MySQL. Please refer to The Complete List of Features in 5.7, as well as The Unofficial MySQL 8 Optimiser Guide.

Linux distributions have chosen MariaDB Server 5.5, and some have chosen MariaDB Server 10.0/10.1 when there was more backward compatibility to MySQL 5.5/5.6. It is the “default” MySQL in many Linux distributions (such as Red Hat Enterprise Linux, SUSE and Debian). However, Ubuntu still believes that when you ask for MySQL you should get it (and that is what Ubuntu ships).

One of the main reasons Debian switched was due to the way Oracle publishes updates for security issues. They are released as a whole quarterly as Critical Patch Updates, without much detail about individual fixes. This is a policy that is unlikely to change, but has had no adverse effects on distribution.

All projects actively embrace contributions from the open community. MariaDB Server does include contributions like the MyRocks engine developed at Facebook, but so does Percona Server for MySQL. Oracle accepts contributions from a long list of contributors, including Percona. Please see Licensing information for MySQL 5.7 as an example.

A Shared Core Engine

MariaDB Server has differed from MySQL since MySQL 5.5. This is one reason why you don’t get version numbers that follow the MySQL scheme. It is also worth noting that features are cherry-picked at merge time, because the source code has diverged so much since then.

As the table below shows, it took Percona Server for MySQL over four months to get a stable 5.5 release based on MySQL 5.5, while it took MariaDB Server one year and four months to get a stable 5.5 release based on MySQL 5.5. Percona Server for MySQL 5.6 and 5.7 are based on their respective MySQL versions.

MySQL Percona Server for MySQL MariaDB Server 3 December 2010 5.5.8 GA 28 April 2011 5.5.11-20.2 GA 11 April 2012 5.5.23 GA 5 February 2013 5.6.10 GA 7 October 2013 5.6.13-61.0 GA 31 March 2014 10.0.10 GA 17 October 2015 10.1.8 GA 21 October 2015 5.7.9 GA 23 February 2016 5.7.10-3 GA 23 May 2017 10.2.6 GA

 

MySQL is currently at 8.0.3 Release Candidate, while MariaDB Server is at 10.3.2 Alpha as of this writing.

MariaDB Server is by no means a drop-in replacement for MySQL. The risk of moving to MariaDB Server if you aren’t using newer MySQL features may be minimal, but the risk of moving out of MariaDB Server to MySQL is very prevalent. Linux distributions like Debian already warn you of this.

The differences are beyond just default configuration options. Some features, like time-delayed replication that were present in MySQL since 2013, only make an appearance in MariaDB Server in 2017! (Refer to the MariaDB Server 10.2 Overview for more.) However, it is also worth noting some features such as multi-source replication appeared in MariaDB Server 10.0 first, and only then came to MySQL 5.7.

Extensibility

MySQL and MariaDB Server have a storage engine interface, and this is how you access all engines, including the favored InnoDB/Percona XtraDB. It is worth noting that Percona XtraDB was the default InnoDB replacement in MariaDB Server 5.1, 5.2, 5.3, 5.5, 10.0 and 10.1. But in MariaDB Server 10.2, the InnoDB of choice is upstream MySQL.

Stock MySQL has provided several storage engines beyond just InnoDB (the default) and MyISAM. You can find out more information about 5.7 Supported Engines.

Percona Server for MySQL includes a modified MEMORY storage engine, ships Percona XtraDB as the default InnoDB and also ships TokuDB and MyRocks (currently experimental). MyRocks is based on the RocksDB engine, and both are developed extensively at Facebook.

MariaDB Server includes many storage engines, beyond the default InnoDB. MyISAM is modified with segmented key caches, the default temporary table storage engine is Aria (which is a crash-safe MyISAM), the FederatedX engine is a modified FEDERATED engine, and there are more: CONNECT, Mroonga, OQGRAPH, Sequence, SphinxSE, SPIDER, TokuDB and of course MyRocks.

Storage engines have specific use cases, and have different levels of feature completeness. You should thoroughly evaluate a storage engine before choosing it. We believe that over 90% of installations are fine with just InnoDB or Percona XtraDB. Percona TokuDB is another engine that users who need compression could use. We naturally expect more usage in the MyRocks sphere going forward.

Analytics

MariaDB ColumnStore is the MariaDB solution to analytics and using a column-based store. It is a separate download and product, and not a traditional storage engine (yet). It is based on the now defunct InfiniDB product.

At Percona, we are quite excited by ClickHouse. We also have plenty of content around it. There is no MySQL story around this.

High Availability

High Availability is an exciting topic in the MySQL world, considering the server itself has been around for over 22 years. There are so many solutions out there, and some have had evolution as well.

MySQL provides MySQL Cluster (NDBCLUSTER) (there is no equivalent in the MariaDB world). MySQL also provides group replication (similar to Galera Cluster). Combined with the proxy MySQL Router, and the mysqlsh for administration (part of the X Protocol/X Dev API), you can also get MySQL InnoDB Cluster.

We benefit from the above at Percona, but also put lots of engineering work to make Percona XtraDB Cluster.

MariaDB Server only provides Galera Cluster.

Security

While we don’t want to compare the proprietary MySQL Enterprise Firewall, MariaDB’s recommendation is the proprietary, non-open source MariaDB MaxScale (it uses a Business Source License). We highly recommend the alternative, ProxySQL.

When it comes to encryption, MariaDB Server implements Google patches to provide complete data at rest encryption. This supports InnoDB, XtraDB and Aria temporary tables. The log files can also be encrypted (not present in MySQL, which only allows tablespace encryption and not log file encryption).

When it comes to attack prevention, ProxySQL should offer everything you need.

MySQL Enterprise provides auditing, while MariaDB Server provides an audit plugin as well as an extension to the audit interface for user filtering. Percona Server for MySQL has an audit plugin that sticks to the MySQL API, yet provides user filtering and controls the ability to audit (since auditing is expensive). Streaming to syslog is supported by the audit plugins from Percona and MariaDB.

Supporting Ecosystem and Tools

Upgrading from MySQL to MariaDB Server should be a relatively simple process (as stated above). If you want to upgrade away from MariaDB Server to MySQL, you may face hassles. For tools, see the following table:

Purpose MySQL Percona Server for MySQL MariaDB Server Monitoring MySQL Enterprise Monitor Percona Monitoring & Management (PMM) (100% open source) Webyog Monyog Backup MySQL Enterprise Backup Percona XtraBackup MariaDB Backup (fork of Percona XtraBackup) SQL Management MySQL Workbench MySQL Workbench Webyog SQLyog Load Balancing & Routing MySQL Router ProxySQL MariaDB MaxScale Database Firewall MySQL Enterprise Firewall ProxySQL MariaDB MaxScale

 

Enterprise Database Compatibility

MariaDB Server today has window functions and common table expressions (CTEs). These appeared in MariaDB Server 10.2. MySQL 8 is presently in release candidate status and also has similar functionality.

Looking ahead, MariaDB Server 10.3 also includes an Oracle SQL_MODE and a partial PL/SQL parser. This is to aid migration from Oracle to MariaDB Server.

MariaDB Server 10.2 also has “flashback”, developed at Alibaba, to help with log-based rollback using the binary log.

Conclusion

Percona sees healthy competition in the MySQL ecosystem. We support all databases in the ecosystem: MySQL, MariaDB Server and Percona Server for MySQL. Our focus is to provide alternatives to proprietary parts of open source software. Percona has a strong operations focus on compatibility, application scalability, high availability security and observability. We also support many additional tools within the ecosystem, and love integrating and contributing to open source code.

For example, Percona Monitoring and Management (PMM) includes many open source tools like Prometheus, Consul, Grafana, Orchestrator and more. We have made the de facto open source hot backup solution for MySQL, MariaDB Server and Percona Server for MySQL (called Percona XtraBackup). We continue to maintain and extend useful tools for database engineers and administrators in Percona Toolkit. We make Percona XtraDB Cluster safe for deployment out of the box. We have invested in a write-optimized storage engine, TokuDB, and now continue to work with making MyRocks better.

We look forward to supporting your deployments of MySQL or MariaDB Server, whichever option is right for you! If you need assistance on migrations between servers, or further information, don’t hesitate to contact your friendly Percona sales associate.

Percona Server for MongoDB 3.2.17-3.8 Is Now Available

November 1, 2017 - 5:37am

Percona announces the release of Percona Server for MongoDB 3.2.17-3.8 on October 31, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database that supports the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocksPercona Memory Engine and PerconaFT storage engine, as well as enterprise-grade features like External Authentication, Audit Logging, Profiling Rate Limiting, and Hot Backup at no extra cost. The software requires no changes to MongoDB applications or code.

NOTE: The PerconaFT storage engine is deprecated as of 3.2. It is no longer supported and isn’t available in higher version releases.

This release is based on MongoDB 3.2.17 and does not include any additional changes.

The Percona Server for MongoDB 3.2.17-3.8 release notes are available in the official documentation.

MySQL Dashboard Improvements in Percona Monitoring and Management 1.4.0

October 31, 2017 - 1:08pm

In this blog post, I’ll walk through some of the improvements to the Percona Monitoring and Management (PMM) MySQL dashboard in release 1.4.0.

As the part of Percona Monitoring and Management development, we’re constantly looking for better ways to visualize information and help you to spot and resolve problems faster. We’ve made some updates to the MySQL dashboard in the 1.4.0 release. You can see those improvements in action in our Percona Monitoring and Management Demo Site: check out the MySQL Overview and MySQL InnoDB Metrics dashboards.

MySQL Client Thread Activity

One of the best ways to characterize a MySQL workload is to look at the number of MySQL server-client connections (Threads Connected). You should compare this number to how many of those threads are actually doing something on the server side (Threads Running), rather than just sitting idle waiting for a client to send the next request.

MySQL can handle thousands of connected threads quite well. However, many threads (hundred) running concurrently often increases query latency. Increased internal contention can make the situation much worse.

The problem with those metrics is that they are extremely volatile – one second you might have a lot of threads connected and running, and then none. This is especially true when some stalls on the MySQL level (or higher) causes pile-ups.

To provide better insight, we now show Peak Threads Connected and Peak Threads Running to help easily spot such potential pile-ups, as well as Avg Threads Running. These stats allow you look at a high number of threads connected and running to see if it there are just minor spikes (which tend to happen in many systems on a regular basis), or something more prolonged that warrants deeper investigation.

To simplify it even further: Threads Running spiking for a few seconds is OK, but spikes persisting for 5-10 seconds or more are often signs of problems that are impacting users (or problems about to happen).

InnoDB Logging Performance

Since I wrote a blog post about Choosing MySQL InnoDB Log File Size, I thought it would be great to check out how long the log file space would last (instead of just looking at how much log space is written per hour). Knowing how long the innodb_log_buffer_size lasts is also helpful for tuning this variable, in general.

This graph shows you how much data is written to the InnoDB Log Files, which helps to understand your disk bandwidth consumption. It also tells you how long it will take to go through your combined Redo Log Space and InnoDB Log Buffer Size (at this rate).

As I wrote in the blog post, there are a lot of considerations for choosing the InnoDB log file size, but having enough log space to accommodate all the changes for an hour is a good rule of thumb. As we can see, this system is close to full at around 50 minutes.

When it comes to innodb_log_buffer_size, even if InnoDB is not configured to flush the log at every transaction commit, it is going to be flushed every second by default. This means 10-15 seconds is usually good enough to accommodate the spikes. This system has it set at about 40 seconds (which is more than enough).

InnoDB Read-Ahead

This graph helps you understand how InnoDB Read-Ahead is working out, and is a pretty advanced graph.

In general, Innodb Read-Ahead is not very well understood. I think in most cases it is hard to tell if it is helping or hurting the current workload in its current configuration.

The for Read-Ahead in any system (not just InnoDB) is to pre-fetch data before it is really needed (in order to reduce latency and improve performance). The risk, however, is pre-fetching data that isn’t needed. This is wasteful.

InnoDB has two Read-Ahead options: Linear Read-Ahead (designed to speed up workloads that have physically sequential data access) and Random Read-Ahead (designed to help workloads that tend to access the data in the same vicinity but not in a linear order).

Due to potential overhead, only Linear Read-Ahead is enabled by default. You need to enable Random Read-Ahead separately if you want to determine its impact on your workload

Back to the graph in question: we show a number of pages pre-fetched by Linear and Random Read-Aheads to confirm if these are even in use with your workload. We show Number of Pages Fetched but Never Accessed (evicted without access) – shown as both the number of pages and as a percent of pages. If Fetched but Never Accessed is more than 30% or so, Read-Ahead might be producing more waste instead of helping your workload. It might need tuning.

We also show the portion of IO requests that InnoDB Read-Ahead served, which can help you understand the portion of resources spent on InnoDB Read-Ahead

Due to the timing of how InnoDB increments counters, the percentages of IO used for Read-Ahead and pages evicted without access shows up better on larger scale graphs.

Conclusion

I hope you find these graphs helpful. We’ll continue making Percona Monitoring and Management more helpful for troubleshooting database systems and getting better performance!

Percona XtraDB Cluster 5.7.19-29.22-3 is now available

October 30, 2017 - 5:29am

Percona announces the release of Percona XtraDB Cluster 5.7.19-29.22-3 on October 27, 2017. Binaries are available from the downloads section or our software repositories.

NOTE: You can also run Docker containers from the images in the Docker Hub repository.

Percona XtraDB Cluster 5.7.19-29.22-3 is now the current release, based on the following:

All Percona software is open-source and free.

Fixed Bugs
  • Added access checks for DDL commands to make sure they do not get replicated if they failed without proper permissions. Previously, when a user tried to perform certain DDL actions that failed locally due to lack of privileges, the command could still be replicated to other nodes, because access checks were performed after replication.This vulnerability is identified as CVE-2017-15365.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona XtraDB Cluster 5.6.37-26.21-3 is Now Available

October 30, 2017 - 5:27am

Percona announces the release of Percona XtraDB Cluster 5.6.37-26.21-3 on October 27, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.37-26.21-3 is now the current release, based on the following:

All Percona software is open-source and free.

Fixed Bugs
  • Added access checks for DDL commands to make sure they do not get replicated if they failed without proper permissions. Previously, when a user tried to perform certain DDL actions that failed locally due to lack of privileges, the command could still be replicated to other nodes, because access checks were performed after replication.This vulnerability is identified as CVE-2017-15365.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.