Buy Percona SupportBuy Now
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 1 hour 27 min ago

Testing Samsung storage in tpcc-mysql benchmark of Percona Server

3 hours 49 min ago

This blog post will detail the results of Samsung storage in tpcc-mysql benchmark using Percona Server.

I had an opportunity to test different Samsung storage devices under tpcc-mysql benchmark powered by Percona Server 5.7. You can find a summary with details here https://github.com/Percona-Lab-results/201607-tpcc-samsung-storage/blob/master/summary-tpcc-samsung.md

I have in my possession:

  • Samsung 850 Pro, 2TB: This is a SATA device and is positioned as consumer-oriented, something that you would use in a high-end user desktop. As of this post, I estimate the price of this device as around $430/TB.
  • Samsung SM863, 1.92TB: this device is also a SATA, and positioned for a server usage. The current price is about $600/TB. 
  • Samsung PM1725, 800GB: This is an NVMe device, in a 2.5″ form factor, but it requires a connection to a PCIe slot, which I had to allocate in my server. The device is high-end, oriented for server-side and demanding workloads. The current price is about $1300/TB.

I am going to use 1000 warehouses in the tpcc-mysql benchmarks, which corresponds roughly to a data size of 100GB.

This benchmark varies the innodb_buffer_pool_size from 5GB to 115GB. With 5GB buffer pool size only a very small portion of data fits into memory, which results in intensive foreground IO reads and intensive background IO writes. With 115GB almost all data fits into memory, which results in very small (or almost zero) IO reads and moderate background IO writes.

All buffer pool sizes in the middle of the interval correspond to resulting IO reads and writes. For example, we can see the read to write ratio on the chart below (received for the PM1725 device) with different buffer pool sizes:

We can see that for the 5GB buffer pool size we have 56000 read IOPs operations and 32000 write IOPs. For 115GB, the reads are minimal at about 300 IOPS and the background writes are at the 20000 IOPs level. Reads gradually decline with the increasing buffer pool size.

The charts are generated with the Percona Monitoring and Management tools.

Results

Let’s review the results. The first chart shows measurements taken every one second, allowing us to see the trends and stalls.

If we take averages, the results are:

In table form (the results are in new order transactions per minute (NOTPM)):

bp, GB pm1725 sam850 sam863 pm1725 / sam863 pm1725 / sam850 5 42427.57 1931.54 14709.69 2.88 21.97 15 78991.67 2750.85 31655.18 2.50 28.72 25 108077.56 5156.72 56777.82 1.90 20.96 35 122582.17 8986.15 93828.48 1.31 13.64 45 127828.82 12136.51 123979.99 1.03 10.53 55 130724.59 19547.81 127971.30 1.02 6.69 65 131901.38 27653.94 131020.07 1.01 4.77 75 133184.70 38210.94 131410.40 1.01 3.49 85 133058.50 39669.90 131657.16 1.01 3.35 95 133553.49 39519.18 132882.29 1.01 3.38 105 134021.26 39631.03 132126.29 1.01 3.38 115 134037.09 39469.34 132683.55 1.01 3.40 Conclusion

The Samsung 850 obviously can’t keep with the more advanced SM863 and PM1725. The PM1725 shows a greater benefit with smaller buffer pool sizes. In cases using large amounts of memory, there is practically no difference with SM863. The reason is that with big buffer pool sizes, MySQL does not push IO subsystem much to use all the PM1725 performance.

For the reference, my.cnf file is

[mysqld] datadir=/var/lib/mysql socket=/tmp/mysql.sock ssl=0 symbolic-links=0 sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES # general thread_cache_size=2000 table_open_cache = 200000 table_open_cache_instances=64 back_log=1500 query_cache_type=0 max_connections=4000 # files innodb_file_per_table innodb_log_file_size=15G innodb_log_files_in_group=2 innodb_open_files=4000 innodb_io_capacity=10000 loose-innodb_io_capacity_max=12000 innodb_lru_scan_depth=1024 innodb_page_cleaners=32 # buffers innodb_buffer_pool_size= 200G innodb_buffer_pool_instances=8 innodb_log_buffer_size=64M # tune innodb_doublewrite= 1 innodb_support_xa=0 innodb_thread_concurrency=0 innodb_flush_log_at_trx_commit= 1 innodb_flush_method=O_DIRECT_NO_FSYNC innodb_max_dirty_pages_pct=90 join_buffer_size=32K sort_buffer_size=32K innodb_use_native_aio=0 innodb_stats_persistent = 1 # perf special innodb_adaptive_flushing = 1 innodb_flush_neighbors = 0 innodb_read_io_threads = 16 innodb_write_io_threads = 8 innodb_purge_threads=4 innodb_adaptive_hash_index=0 innodb_change_buffering=none loose-innodb-log_checksum-algorithm=crc32 loose-innodb-checksum-algorithm=strict_crc32 loose-innodb_sched_priority_cleaner=39 loose-metadata_locks_hash_instances=256

Percona XtraBackup 2.4.4 is now available

July 25, 2016 - 11:05am

Percona announces the GA release of Percona XtraBackup 2.4.4 on July 25th, 2016. You can download it from our download site and from apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New Features:

  • Percona XtraBackup has been rebased on MySQL 5.7.13.

Bugs Fixed:

  • Percona XtraBackup reported the difference in the actual size of the system tablespace and the size which was stored in the tablespace header. This check is now skipped for tablespaces with autoextend support. Bug fixed #1550322.
  • Because Percona Server 5.5 and MySQL 5.6 store the LSN offset for large log files at different places inside the redo log header, Percona XtraBackup was trying to guess which offset is better to use by trying to read from each one and compare the log block numbers and assert lsn_chosen == 1 when both LSNs looked correct, but they were different. Fixed by improving the server detection. Bug fixed #1568009.
  • Percona XtraBackup didn’t correctly detect when tables were both compressed and encrypted. Bug fixed #1582130.
  • Percona XtraBackup would crash if the keyring file was empty. Bug fixed #1590351.
  • Backup couldn’t be prepared when the size in cache didn’t match the physical size. Bug fixed #1604299.
  • Free Software Foundation address in copyright notices was outdated. Bug fixed #1222777.
  • Backup process would fail if the datadir specified on the command-line was not the same as one that is reported by the server. Percona XtraBackup now allows the datadir from my.cnf override the one from SHOW VARIABLES. xtrabackup prints a warning that they don’t match, but continues. Bug fixed #1526467.
  • With upstream change of maximum page size from 16K to 64K, the size of incremental buffer became 1G. Which increased the requirement to 1G of RAM in order to prepare the backup. While in fact there is no need to allocate such a large buffer for smaller pages. Bug fixed #1582456.
  • Backup process would fail on MariaDB Galera cluster operating in GTID mode if binary logs were in non-standard directory. Bug fixed #1517629.

Other bugs fixed: #1583717, #1583954, and #1599397.

Release notes with all the bugfixes for Percona XtraBackup 2.4.4 are available in our online documentation. Please report any bugs to the launchpad bug tracker.

MongoDB Consistent Backups

July 25, 2016 - 10:35am

In this post, I’m going to discuss MongoDB consistent backups, and how to achieve them.

You might have read before that MongoDB backup is not consistent. But what if I told you there is a tool that could make them consistent. What if this tool also would make it cluster-wide consistent, automatically compress the backup, become the first step toward continually incremental recording, notify your monitoring system and upload the backup to cloud storage for you?

It’s all TRUE!

Recently Percona-Labs created a new repository aimed at exactly these issues. We hope it will eventually grow into something that becomes part of the officially supported tools (like Percona Toolkit and  Percona’s Xtrabackup utility). Before we get into how it works, let’s talk about why we need it and its key highlights. Then (for all the engineering types reading this) we can discuss what is does and why.

Why do we need a consistent backup tool?

The first thing to note is you absolutely can’t have a consistent backup on a working system unless your node is in a replicaset. (You could even have a single node replicaset for this to be accurate.) Why? Consistency requires an operations log to say what changes occurred from the first point in the backup to the last point. This lets us ensure we are consistent to the end timestamp of the backup. We are unable to verify consistency when the MongoDB backup started without the ability to take a “snapshot” of data and then save the data while other changes occur. MongoDB does not have ACID-like isolation in this way. However, it can be consistent to the backup endpoint by applying any deltas at the end of the backup restore process.

You might say, “but mongodump already provides --oplog for this feature.” You are right: it does, and it works great if you only have a single replicaset to backup. When we bring sharding into the mix, however, things get vastly more complicated. It ignores that flag and hits your primaries:

In the diagram above you can see the backup and oplog recording for the first shard ended long before the second shard. As such, the consistency point needed is nowhere close to being covered by the red line. Even if all your shards are the same size, there would be some level of variance due to network, disk, CPU and memory speeds. The new tool helps you here by keeping track of the dumps, but also by having a thread recording the oplog for all shards until the last shard finishes. This ensures that all shards can be synced to the point in time where the last shard finished. At that moment in time, we have a consistent backup across all the shards. As you can see below, the oplog finished watching both shards after the last shard finish. On recovery, they remain in sync.

You might ask, “well what about the meta-data stored in the config servers.” This is a great quest, as the behavior differs in our tool depending on if you’re using MongoDB 3.2’s new Config Servers as a replica set feature, or a legacy config server approach.

In the legacy mode, we fsyncAndLock the config servers just long enough to record a server config data dump. Then we stop the oplog tailer threads for all the shards. After that, and after the oplog tailers finish, we unlock the config server. This ensures we remove the race conditions that could occur if it took longer than expected to close an oplog cursor. However, if we run in 3.2 mode, the config servers act just like another shard. They get dumped at the same time, and the oplog just gets tailed until we complete the data shard dumps. The newest features available to MongoDB Community, MongoDB Enterprise, and Percona Server for MongoDB 3.2 make the process much simpler.

Key Takeaways from new tool

  1. Not yet an official Percona tool, but being used already by people as it’s just a wrapper to run multiple mongo dumps for you.
  2. If you execute the make setup, it outputs a single binary file that needs only python2.7 installed on your database system, even though under the hood it’s running many python modules in a virtualenv
  3. Dumps all shard in parallel and keeps tailing the oplog until all dumps are complete
  4. Handled backing up metadata for old and new config server topologies
  5. Can currently upload to S3, but more cloud storage is coming
  6. Backups compressed by default
  7. Uses the cluster_name,  time, and shard_name to make backup paths look like  /cluster1/<timestamp>/shard1.tgz, helping you keep things organized and letting you remove old backups by timestamp and cluster name.

Desired Roadmap

  • Mature into an officially support Percona product like  Xtrabackup
  • Fully Opensource and welcoming community improvements
  • Extending uploading to  CloudFiles by Rackspace, Azure ZRS, Google Cloud Storage and more
  • Complementary documentation on restores but can just natively use mongorestore tool also
  • Modular backup methods to extend to mongodump, LVM snapshots, ISCSI, EBS snapshots, MongoDB commands and more
  • Encryption before saving to disk
  • Partial backups and restores limit to specific databases and collections
  • Offline backup querying

Please be sure to check out the GitHub @mongodb_consistent_backup and log any issues or features requests.

Feel free to reach out to me on Twitter @dbmurphy_data or @percona with any questions or suggestions as well.

The Value of Database Support

July 20, 2016 - 2:31pm

In this post, I’ll discuss how database support is good for your enterprise.

Years ago when I worked for the MySQL Support organization at the original MySQL AB, we spoke about MySQL Support as insurance and focused on a value proposition similar to that of car insurance. You must purchase car insurance before the incident happens, or insurance won’t cover the damage. In fact, most places around the world require automobile insurance. Similarly, many organizations that leverage production-use technology have their own “insurance” by means of 24/7 support.

In my opinion, this is a very one-sided view that does not capture the full value (and ROI) that a database support contract with Percona provides. With a Percona support contract, you are assured that your database environment (virtual, physical, or in the cloud) is fully covered – whether it’s one server or many.

Increasingly – especially with the introduction of cloud-based database environments – database servers are being spun up and torn down on a day-to-day basis. However briefly these databases exist, they need support. One of the challenges modern businesses face is providing support for a changing database infrastructure, while still maintaining a viable cost structure.

Let’s look at the different dimensions of value offered by Percona Support based on the different support cases we have received throughout the years.

Reduce and Prevent Downtime

If your database goes down, the time to recover will be significantly shorter with a support agreement than without it. The cost of downtime varies widely between organizations. A recent IBM-sponsored Ponemon Institute research study found that the average data security breach could cost upwards of $7.01M (in the United States).

With our clients, we’ve found preventing even one significant downtime event a year justifies support costs. Even when the client’s in-house team is very experienced, our help is often invaluable as we are exposed to a great variety of incidents from hundreds of companies. It is much more likely we have encountered the same incident before and have a solution ready. Helping to recover from downtime quickly is a reactive part of support – you can realize even more value by proactively working with support to get advice on your HA options as well as ensure that you’re following the best database backup and security practices.

Better Security

Having a database support contract by itself is not enough to prevent all security incidents. Databases are only one of the component attack vectors, and it takes a lot of everyday work to stay secure. There is nothing that can guarantee complete security. Database support, however, can be a priceless resource for your security team. It can apply security and compliance practices to your database environment and demonstrate how to avoid typical mistakes.

The cost of data breaches can be phenomenal, as well as impact business reputations much more than downtime or performance issues. Depending on the company size and market, costs vary. Different studies estimate costs ranging in average from $1.6M to 7.01M in direct costs. Everyone agrees leaving rising security risks and costs unchecked is a recipe for disaster.

Fix Database Software Bugs

While you might have great DBAs on your team who are comfortable with best practices and downtime recovery, most likely you do not have a development team comfortable with fixing bugs in the database kernel or supporting tools. Getting up-to-date software fixes reduces downtime. It also limits other issues as well, such as ensuring efficient development and operation teams, avoiding using complex workarounds, etc.

Reduce Resources

We deal with a large number of performance-related questions. When we address such problems, we provide a better user experience, save costs, and minimize environmental impact by reducing resource use.

Savings vary depending on your application scale and how optimized the environment is already. In the best cases, our support team helped customers make applications more than 10x more efficient. In most cases, we can help make things at least 30% more efficient. If you’re spending $100K or more on your database environment, this benefit alone makes a support agreement well worth it.

Efficient Developers

You cannot minimize the importance of development efficiency. Too often customers don’t give their developers support access, even though they critically help realize application’s full value. Developers make database decisions about schema design all the time. These include query writing, stored procedures, triggers, sharding, document storage, or foreign keys. Without a database support contract, developers often have resort to “Google University” to find an answer – and often end up with inapplicable, outdated or simply wrong information. Combined with this, they often apply or resort to time-consuming trial and error.

With the help of a Percona Support team, developers can learn proven practices that apply to their specific situation. This saves a lot of time and gets better applications to market faster. Even with a single US-based developer intensively working within the database environment, a support agreement might justify the cost based on increased developer efficiency alone. Larger development teams simply cannot afford to not have support.

Efficient Operations

Your operations staff (DBAs, DevOps, Sysadmins) are in the same boat – if your database environment is significant, chances are you are always looking for ways to save time, make operations more efficient and reduce mistakes. Our support team can provide you with specific actionable advice for your challenges.

Chances are we have seen environments similar to yours and know which software, approaches and practices work well (and which do not). This knowledge helps prevent and reduce downtime. It also helps with team efficiency. Percona Support’s help allows you to handle operations with a smaller team, or address issues with a less experienced staff.

Better Applications

Percona Support access helps developers not only be more productive, but results in better application quality because application database interface design, schema, queries, etc. best practices are followed. The Percona team supports many applications, for many years. We often  think about problems before you might think about them, such as:

  • “How will this design play with replication or sharding?”
  • “Will it scale with large amounts of users or data?”
  • “How flexible is such a design when the  application will inevitably be evolving over years?”

While a better application is hard to quantify, it really is quite important.

Faster Time to Market

Yet another benefit that comes from developers having access to a database support team is faster time-to-market. For many agile applications, being able to launch new features faster is even more important than cost savings – this is how businesses succeed against the competition. At Percona, we love helping businesses succeed.

Conclusion

As you see, there are a lot of ways Percona Support can contribute to the success of your business. Support is much more than “insurance” that you should consider purchasing for compliance reasons. Percona Support provides a great return on investment. It allows you to minimize risks and costs while delivering the highest quality applications or services. Our flexible plans can cover your database environment, even if it is an ever-changing one, while still allowing you to plan your operations costs.

Upcoming Webinar Wednesday July 20, 11 am PDT: Practical MySQL Performance Optimization

July 19, 2016 - 3:51am

Are you looking to improve your MySQL performance? Application success is often limited by poor MySQL performance. Please join Percona CEO and Founder Peter Zaitsev for this exclusive webinar on Wednesday, July 20th, 2016 at 11:00 AM PDT (UTC – 7) as he presents “Practical MySQL Performance Optimization“.

Peter Zaitsev discusses how to get excellent MySQL performance while being practical. In other words, spending time on what gives you the best return. The webinar updates Peter’s ever-popular Practical MySQL Performance Optimization presentation. It covers the important points for improving MySQL performance. It also includes a discussion of the new tools and features in the latest MySQL 5.7 release, as well as their most important aspects – so you can employ them for maximum database performance.

Areas covered:

  • Hardware
  • MySQL Configuration
  • Schema and Queries
  • Application Architecture
  • MySQL 5.7 New Tools and Features

Peter will highlight practical approaches and techniques for optimizing your time. He will also focus on the queries that are most important for your application. At the end of this webinar, you will know how to optimize MySQL performance in the most practical way.

Peter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 20 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014 and 2015.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Data Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

Percona Live Europe, Amsterdam 2016: Speaking Gets You the Whole Community Event!

July 14, 2016 - 1:34pm

Come speak at Percona Live Europe, and get access to the entire conference.

The Percona Live Open Source Database Performance Conference Europe 2016 is the premier event for the rich and diverse MySQL, MongoDB and ODBMS ecosystems in Europe. Attendees include DBAs, SysAdmins, developers, architects, CTOs, CEOs, and vendors from around the world. It’s a great place to meet and participate with the open source community.

Want to go, but having a hard time getting the budget approved? We have a solution: be a speaker and get a complimentary full pass!

Submit your speaking proposal for a Percona Live session and share your MySQL, MongoDB and ODBMS ideas, case studies, best practices, and technical knowledge in front of an intelligent, engaged audience open source technology users. If selected as a speaker by our Conference Committee, you will receive a complimentary full conference pass.

Speaking at Percona Live is a great way to further the goals of open source software, and give back to a community that is literally changing the world.

Below are examples of some of the outstanding speakers from this year’s Percona Live Conference in Santa Clara. Speakers are made up of CEOs, Directors, DBAs, and a celebrity or two:

Speaking at Percona Live puts you in some pretty great company, and pays for your pass! Submit your speaking proposal today! The submission deadline is Monday, July 18th.

See the interviews from some of our speakers from this year’s Percona Live Conference in Santa Clara below.

MongoDB Data Durability

July 14, 2016 - 10:49am

In this post, I want to talk about MongoDB data durability options across MongoDB versions.

I consider a write durable if, once confirmed by the server, it becomes permanent at the node or cluster level (ignoring catastrophic failures like all nodes on a cluster failing at the same time).

MongoDB lets you choose between different levels of data durability using Write Concern. Unlike server-side configured durability (as you get with Innodb using innodb_flush_log_at_trx_commit), the client specifies the Write Concern on each write operation.

As indicated in the linked manual page, the Write Concern specification can include a w and a j field (among other things).

The w field determines the number of nodes that must confirm a write before the client acknowledges it, with the following possible values:

  • 1: meaning the primary,
  • “majority”: meaning a majority of the nodes,
  • Any other integer value, meaning that many nodes.

The j field requests acknowledgement that for every node determined by the “w” value, writes are confirmed to the on-disk journal. Otherwise, the write is confirmed only in memory.

How the client specifies Write Concern depends on the programming language and driver used. Here is how it javascript does it, using the mongo command line client:

db.test.insert({_id: 1}, {writeConcern: {w:1, j:1}})

while to use the same write concern on C, with the mongo-c-driver, you must do this before the corresponding write operation:

mongoc_write_concern_t wc = mongoc_write_concern_new();
mongoc_write_concern_set_w(wc, 1);
mongoc_write_concern_set_journal(wc, 1);

To get a better understanding of what this means from a durability perspective I ran a few tests using the following environment:

  • A single client, using the mongo command line client, inserting an auto-incrementing integer as the single field (_id) of a collection.
  • Standalone mongod, and a replica set of 4 mongod instances, all on the same machine. You can repeat the tests using this script as a guide (the only requisite would be that mongod and mongo are on the shell’s path).
  • SIGKILL sent to the Primary node while the writes are happening.
  • Comparing the last value for _id reported by the client, with the maximum value available in the collection, on the new Primary node after the replica set reconfigures (or on the standalone mongod, after I manually restarted it).
  • MongoDB 3.0.4 and 3.2.7, using WiredTiger as the storage engine.

(I’ll discuss performance perspectives in a future post.)

In all cases, I indicate “missing docs” if the value reported by the client is higher than the value reported by db.collection.find().sort({_id:-1}).limit(1)

Here are the results for a standalone mongod:

Standalone w j Missing docs 1 1 No 1 0 Yes 0 0 Yes 0 1 No

 

The first three don’t hold surprises, but the last one does. The mongo-c-driver does not let you specify a write concern of {w:0, j:1}, and a cursory inspection of the MongoDB code makes me believe that “w:0” is interpreted as “w:1”. This would explain the result.

Here are the results for a four node replica set:

Replica Set w j Missing docs “majority” 1 No “majority” 0 No 0 1 Yes

 

Again, w:0, j:1 is transformed into w:1, j:1. How can no data get lost in a standalone mongod, but can get lost in a replica set? The answer is in the standalone case, after SIGKILL I restarted the same instance. In that case, WiredTiger performs crash recovery. Since we request acknowledgement for write confirmation to the on-disk journal, the last _id is recovered (if needed), and no docs go missing.

However, in my replica set tests, I did not restart the SIGKILLED instance. Instead, I let mongod do its thing and automatically reconfigure the set, promoting one of the Secondaries as a new Primary. In this context, having a write concern that only requests acknowledgements of writes on the master is a liability, and leads to lost data.

When specifying w:”majority”, it is important to note that the value j:0 gets replaced with j:1 since version 3.2. That explains the lack of lost documents. I also tested 3.0 and, in that case, docs went missing when using w:"majority", j:0. This probably explains the behavior changed in 3.2, and, depending on your use cases, might justify an upgrade if you’re on an older version.

In conclusion, MongoDB data durability options lets you satisfy different requirements on a per operation basis, with the client being responsible for using the desired setting. When using a Write Concern that does not guarantee full durability, a mongod crash is enough to cause the loss of unconfirmed documents. In this sense, the Write Concern values that include j:0  are analogous to running Innodb with innodb_flush_log_at_trx_commit set to 0.

The “majority” value for the w component is valid even in the standalone case (where it is treated as “1”), so I think {w:"majority", j:1} is a good value to use in the general case to guarantee data durability.

Using Ceph with MySQL

July 13, 2016 - 10:48am

Over the last year, the Ceph world drew me in. Partly because of my taste for distributed systems, but also because I think Ceph represents a great opportunity for MySQL specifically and databases in general. The shift from local storage to distributed storage is similar to the shift from bare disks host configuration to LVM-managed disks configuration.

Most of the work I’ve done with Ceph was in collaboration with folks from RedHat (mainly Brent Compton and Kyle Bader). This work resulted in a number of talks presented at the Percona Live conference in April and the RedHat Summit San Francisco at the end of June. I could write a lot about using Ceph with databases, and I hope this post is the first in a long series on Ceph. Before I starting with use cases, setup configurations and performance benchmarks, I think I should quickly review the architecture and principles behind Ceph.

Introduction to Ceph

Inktank created Ceph a few years ago as a spin-off of the hosting company DreamHost. RedHat acquired Inktank in 2014 and now offers it as a storage solution. OpenStack uses Ceph as its dominant storage backend. This blog, however, focuses on a more general review and isn’t restricted to a virtual environment.

A simplistic way of describing Ceph is to say it is an object store, just like S3 or Swift. This is a true statement but only up to a certain point.  There are minimally two types of nodes with Ceph, monitors and object storage daemons (OSDs). The monitor nodes are responsible for maintaining a map of the cluster or, if you prefer, the Ceph cluster metadata. Without access to the information provided by the monitor nodes, the cluster is useless. Redundancy and quorum at the monitor level are important.

Any non-trivial Ceph setup has at least three monitors. The monitors are fairly lightweight processes and can be co-hosted on OSD nodes (the other node type needed in a minimal setup). The OSD nodes store the data on disk, and a single physical server can host many OSD nodes – though it would make little sense for it to host more than one monitor node. The OSD nodes are listed in the cluster metadata (the “crushmap”) in a hierarchy that can span data centers, racks, servers, etc. It is also possible to organize the OSDs by disk types to store some objects on SSD disks and other objects on rotating disks.

With the information provided by the monitors’ crushmap, any client can access data based on a predetermined hash algorithm. There’s no need for a relaying proxy. This becomes a big scalability factor since these proxies can be performance bottlenecks. Architecture-wise, it is somewhat similar to the NDB API, where – given a cluster map provided by the NDB management node – clients can directly access the data on data nodes.

Ceph stores data in a logical container call a pool. With the pool definition comes a number of placement groups. The placement groups are shards of data across the pool. For example, on a four-node Ceph cluster, if a pool is defined with 256 placement groups (pg), then each OSD will have 64 pgs for that pool. You can view the pgs as a level of indirection to smooth out the data distribution across the nodes. At the pool level, you define the replication factor (“size” in Ceph terminology).

The recommended values are a replication factor of three for spinners and two for SSD/Flash. I often use a size of one for ephemeral test VM images. A replication factor greater than one associates each pg with one or more pgs on the other OSD nodes.  As the data is modified, it is replicated synchronously to the other associated pgs so that the data it contains is still available in case an OSD node crashes.

So far, I have just discussed the basics of an object store. But the ability to update objects atomically in place makes Ceph different and better (in my opinion) than other object stores. The underlying object access protocol, rados, updates an arbitrary number of bytes in an object at an arbitrary offset, exactly like if it is a regular file. That update capability allows for much fancier usage of the object store – for things like the support of block devices, rbd devices, and even a network file systems, cephfs.

When using MySQL on Ceph, the rbd disk block device feature is extremely interesting. A Ceph rbd disk is basically the concatenation of a series of objects (4MB objects by default) that are presented as a block device by the Linux kernel rbd module. Functionally it is pretty similar to an iSCSI device as it can be mounted on any host that has access to the storage network and it is dependent upon the performance of the network.

The benefits of using Ceph

Agility
In a world striving for virtualization and containers, Ceph gives easily moves database resources between hosts.

IO scalability
On a single host, you have access only to the IO capabilities of that host. With Ceph, you basically put in parallel all the IO capabilities of all the hosts. If each host can do 1000 iops, a four-node cluster could reach up to 4000 iops.

High availability
Ceph replicates data at the storage level, and provides resiliency to storage node crash.  A kind of DRBD on steroids…

Backups
Ceph rbd block devices support snapshots, which are quick to make and have no performance impacts. Snapshots are an ideal way of performing MySQL backups.

Thin provisioning
You can clone and mount Ceph snapshots as block devices. This is a useful feature to provision new database servers for replication, either with asynchronous replication or with Galera replication.

The caveats of using Ceph

Of course, nothing is free. Ceph use comes with some caveats.

Ceph reaction to a missing OSD
If an OSD goes down, the Ceph cluster starts copying data with fewer copies than specified. Although good for high availability, the copying process significantly impacts performance. This implies that you cannot run a Ceph with a nearly full storage, you must have enough disk space to handle the loss of one node.

The “no out” OSD attribute mitigates this, and prevents Ceph from reacting automatically to a failure (but you are then on your own). When using the “no out” attribute, you must monitor and detect that you are running in degraded mode and take action. This resembles a failed disk in a RAID set. You can choose this behavior as default with the mon_osd_auto_mark_auto_out_in setting.

Scrubbing
Every day and every week (deep), Ceph scrubs operations that, although they are throttled, can still impact performance. You can modify the interval and the hours that control the scrub action. Once per day and once per week are likely fine. But you need to set osd_scrub_begin_hour and osd_scrub_end_hour to restrict the scrubbing to off hours. Also, scrubbing throttles itself to not put too much load on the nodes. The osd_scrub_load_threshold variable sets the threshold.

Tuning
Ceph has many parameters so that tuning Ceph can be complex and confusing. Since distributed systems push hardware, properly tuning Ceph might require things like distributing interrupt load among cores and thread core pinning, handling of Numa zones – especially if you use high-speed NVMe devices.

Conclusion

Hopefully, this post provided a good introduction to Ceph. I’ve discussed the architecture, the benefits and the caveats of Ceph. In future posts, I’ll present use cases with MySQL. These cases include performing Percona XtraDB Cluster SST operations using Ceph snapshots, provisioning async slaves and building HA setups. I also hope to provide guidelines on how to build and configure an efficient Ceph cluster.

Finally, a note for the ones who think cost and complexity put building a Ceph cluster out of reach. The picture below shows my home cluster (which I use quite heavily). The cluster comprises four ARM-based nodes (Odroid-XU4), each with a two TB portable USB-3 hard disk, a 16 GB EMMC flash disk and a gigabit Ethernet port.

I won’t claim record breaking performance (although it’s decent), but cost-wise it is pretty hard to beat (at around $600)!

https://rh2016.smarteventscloud.com/connect/sessionDetail.ww?SESSION_ID=42190&tclass=popup

 

Call for Percona Live Europe MongoDB Speakers

July 12, 2016 - 9:23am

Want to become one of the Percona Live Europe MongoDB speakers? Read this blog for details.

The Percona Live Europe, Amsterdam call for papers is ending soon and we are looking for MongoDB speakers! This is a great way to build your personal and company brands. If you haven’t submitted a paper yet, here are a list of ideas we would love to see covered at this conference:

If you find any of these ideas interesting, simply let @Percona know and we can help get you listed as the speaker. If nothing on this list strikes your fancy or peaks your interest, please submit a similar talk of your own – we’d love to find out what you have to say!

Want to become one of the Percona Live Europe MongoDB speakers? Read this blog for details.

The Percona Live Europe, Amsterdam call for papers is ending soon, and we are looking for MongoDB speakers! This is a great way to build your personal and company brands. If you haven’t submitted a paper yet, here are a list of ideas we would love to see covered at this conference:

If you any these ideas interesting, simply let @Percona know and we can help get you listed as the speaker. If nothing on this list strikes your fancy or peaks your interest, please submit a similar talk of your own – we’d love to find out what you have to say!

Here are some other ideas that might get your thoughts bubbling:

  • Secret use of “hidden” and tagged ReplicaSets
  • To use a hashed shard key or not?
  • Understanding how a shard key is used in MongoDB
  • Using scatter-gathers to your benefit
  • WriteConcern and its use cases
  • How to quickly build a sharded environment for MongoDB in Docker
  • How to monitor and scale MongoDB in the cloud
  • MongoDB Virtualization: the good, the bad, and the ugly
  • MongoDB and VMware: a cautionary tale
  • Streaming MySQL bin logs to MongoDB and back again
  • How to ensure that other technologies can safely use the epilog for pipelining

The Percona team and conference commitee would love to see what other ideas the community has that we haven’t covered. Anything helps: using @Percona and mentioning topics you would like to see, to sharing topics on twitter you like, or even just sharing the link to the call for papers.

The call for papers closes next Monday (7/18), so let’s get some great things in this week and build a truly dynamic conference!

Percona Server for MongoDB 3.0.12-1.8 is now available

July 12, 2016 - 8:33am

Percona announces the release of Percona Server for MongoDB 3.0.12-1.8 on July 12, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.0.12-1.8 is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.0 protocol and drivers. Based on MongoDB 3.0.12, it extends MongoDB with MongoRocks and PerconaFT storage engines, as well as features like external authentication and audit logging. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: The MongoRocks storage engine is still under development. There is currently no officially released version of MongoRocks recommended for production.

This release includes all changes from MongoDB 3.0.12, and the following known issue that will be fixed in a future release:

  • Fixed the software version incorrectly reported by the --version option.

You can find the release notes in the official documentation.

 

Webinar July 14, 10 am PDT: Introduction into storage engine troubleshooting

July 11, 2016 - 10:33am

Please join Sveta Smirnova for a webinar Thursday, July 14 at 10 am PDT (UTC-7) on an Introduction Into Storage Engine Troubleshooting.

The number of MySQL storage engines provide great flexibility for database users, administrators and developers. At the same time, engines add an extra level of complexity when it comes to troubleshooting issues. Before choosing the right troubleshooting tool, you need to answer the following questions (and often others):

  • What part of the server is responsible for my issue?
  • Was a lock set at the server or engine level?
  • Is a standard or engine-specific tool better?
  • Where are the engine-specific options?
  • How to know if an engine-specific command exists?

This webinar will discuss these questions and how to find the right answers across all storage engines in a general sense.

You will also learn:

  • How to troubleshoot issues caused by simple storage engines such as MyISAM or Memory
  • Why Federated is deprecated, and what issues affected that engine
  • How Blackhole can affect replication

. . . and more.

Register for the webinar here.

Note: We will hold a separate webinar specifically for InnoDB.

Sveta Smirnova, Principal Technical Services Engineer Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns which can solve typical issues quicker, teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona Sveta worked as a Support Engineer in the MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of the book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

Percona Server 5.5.50-38.0 is now available

July 8, 2016 - 9:19am


Percona announces the release of Percona Server 5.5.50-38.0 on July 8, 2016. Based on MySQL 5.5.50, including all the bug fixes in it, Percona Server 5.5.50-38.0 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.50-38.0 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

New Features: Bugs Fixed:
  • Querying the GLOBAL_TEMPORARY_TABLES table would cause server crash if temporary table owning threads would execute new queries. Bug fixed #1581949.
  • The innodb_log_block_size feature attempted to diagnose the situation where the logs have been created with a log block value that differs from the current innodb_log_block_size setting. But this diagnostics came too late, and a misleading error No valid checkpoints found was produced first, aborting the startup. Bug fixed #1155156.
  • AddressSanitizer build with LeakSanitizer enabled was failing at gen_lex_hash invocation. Bug fixed #1580993 (upstream #80014).
  • ssl.cmake file was broken when custom OpenSSL build was used. Bug fixed #1582639 (upstream #61619).
  • mysqlbinlog did not free the existing connection before opening a new remote one. Bug fixed #1587840 (upstream #81675).
  • Fixed memory leaks in mysqltest. Bugs fixed #1582718 and #1588318.
  • Fixed memory leaks in mysqlcheck. Bug fixed #1582741.
  • Fixed memory leak in mysqlbinlog. Bug fixed #1582761 (upstream #78223).
  • Fixed memory leaks in mysqldump. Bug fixed #1587873 and #1588845 (upstream #81714).
  • Fixed memory leak in innochecksum. Bug fixed #1588331.
  • Fixed memory leak in non-existing defaults file handling. Bug fixed #1588344.
  • Fixed memory leak in mysqlslap. Bug fixed #1588361.

Other bugs fixed: #1588169, #1588386, #1529885, #1587757, #1587426 (upstream, #81657), #1587527, #1588650, and #1589819.

The release notes for Percona Server 5.5.50-38.0 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Percona XtraBackup 2.3.5 is now available

July 8, 2016 - 9:17am

Percona announces the release of Percona XtraBackup 2.3.5 on July 8, 2016. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release is the current GA (Generally Available) stable release in the 2.3 series.

Bugs fixed:
  • Backup process would fail if --throttle option was used. Bug fixed #1554235.
  • .ibd files for remote tablespaces were not copied back to the original location pointed by the .isl files. Bug fixed #1555423.
  • When called with insufficient parameters, like specifying the empty --defaults-file option, Percona XtraBackup could crash. Bug fixed #1566228.
  • Documentation states that the default value for --ftwrl-wait-query-type is all, however it was update. Changed the default value to reflect the documentation. Bug fixed #1566315.
  • Free Software Foundation address in copyright notices was outdated. Bug fixed #1222777.
  • Backup process would fail if the datadir specified on the command-line was not the same as one that is reported by the server. Percona XtraBackup now allows the datadir from my.cnf override the one from SHOW VARIABLES. xtrabackup will print a warning that they don’t match, but continue. Bug fixed #1526467.
  • Backup process would fail on MariaDB if binary logs were in non-standard directory. Bug fixed #1517629.
  • Output of --slave-info option was missing an apostrophe. Bug fixed #1573371.

Other bugs fixed: #1599397.

Release notes with all the bugfixes for Percona XtraBackup 2.3.5 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Percona Server 5.6.31-77.0 is now available

July 7, 2016 - 6:53am


Percona
 announces the release of Percona Server 5.6.31-77.0 on July 7th, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.31, including all the bug fixes in it, Percona Server 5.6.31-77.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.31-77.0 milestone on Launchpad.

New Features:
  • Percona Server implemented protocol support for TLS 1.1 and TLS 1.2. This implementation turns off TLS v1.0 support by default.
  • TokuDB MTR suite is now part of the default MTR suite in Percona Server 5.6.
Bugs Fixed:
  • Querying the GLOBAL_TEMPORARY_TABLES table caused server crash if temporary table owning threads would execute new queries. Bug fixed #1581949.
  • Audit Log Plugin would hang when trying to write log record of audit_log_buffer_size length. Bug fixed #1588439.
  • Audit log in ASYNC mode could skip log records that don’t fit into log buffer. Bug fixed #1588447.
  • The innodb_log_block_size feature attempted to diagnose the situation where the logs have been created with a log block value that differs from the current innodb_log_block_size setting. But this diagnostics came too late, and a misleading error No valid checkpoints found was produced first, aborting the startup. Bug fixed #1155156.
  • Some transaction deadlocks did not increase the INFORMATION_SCHEMA.INNODB_METRICS lock_deadlocks counter. Bug fixed #1466414 (upstream #77399).
  • InnoDB tablespace import failed when trying to import a table with different data directory. Bug fixed #1548597 (upstream #76142).
  • Audit Log Plugin truncated SQL queries to 512 bytes. Bug fixed #1557293.
  • Regular user extra port connection failed if max_connections plus one SUPER user were already connected on the main port, even if it connecting would not violate the extra_max_connections. Bug fixed #1583147.
  • The error log warning Too many connections was only printed for connection attempts when max_connections plus one SUPER have connected. If the extra SUPER is not connected, the warning was not printed for a non-SUPER connection attempt. Bug fixed #1583553.
  • mysqlbinlog did not free the existing connection before opening a new remote one. Bug fixed #1587840 (upstream #81675).
  • Fixed memory leaks in mysqltest. Bugs fixed #1582718 and #1588318.
  • Fixed memory leaks in mysqlcheck. Bug fixed #1582741.
  • Fixed memory leak in mysqlbinlog. Bug fixed #1582761 (upstream #78223).
  • Fixed memory leaks in mysqldump. Bug fixed #1587873 and #1588845 (upstream #81714).
  • Fixed memory leak in non-existing defaults file handling. Bug fixed #1588344.
  • Fixed memory leak in mysqlslap. Bug fixed #1588361.
  • Transparent Huge Pages check will now only happen if tokudb_check_jemalloc option is set. Bugs fixed #939 and #713.
  • Logging in ydb environment validation functions now prints more useful context. Bug fixed #722.

Other bugs fixed: #1588386, #1529885, #1541698 (upstream #80261), #1582681, #1583589, #1587426 (upstream, #81657), #1589431, #956, and #964.

Release notes for Percona Server 5.6.31-77.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Pipelining versus Parallel Query Execution with MySQL 5.7 X Plugin

July 6, 2016 - 12:14pm

In this blog post, we’ll look at pipelining versus parallel query execution when using X Plugin for MySQL 5.7.

In my previous blog post, I showed how to use X Plugin for MySQL 5.7 for parallel query execution. The tricks I used to make it work:

  • Partitioning by hash
  • Open N connections to MySQL, where N = number of CPU cores

I had to do it manually (as well as to sort the result at the end) as X Plugin only supports “pipelining” (which only saves the round trip time) and does not “multiplex” connections to MySQL (MySQL does not use multiple CPU cores for a single query).

TL:DR; version

In this (long) post I’m playing with MySQL 5.7 X Plugin / X Protocol and document store. Here is the summary:

  1. X Plugin does not “multiplex” connections/sessions to MySQL. Similar to the original protocol, one connection to X Plugin will result in one session open to MySQL
  2. An X Plugin query (if the library supports it) returns immediately and does not wait until the query is finished (async call). MySQL works like a queue.
  3. X Plugin does not have any additional server-level durability settings. Unless you check or wait for the acknowledgement (which is asynchronous) from the server, the data might or might not be written into MySQL (“fire and forget”).

At the same time, X Protocol can be helpful if:

  • We want to implement an asynchronous client (i.e., we do not want to block the network communication such as downloading or API calls) when the MySQL table is locked.
  • We want to use MySQL as a queue and save the round-trip time.
Benchmark results: “pipelining” versus “parallelizing” versus a single query

I’ve done a couple of tests comparing the results between “pipelining” versus “parallelizing” versus a single query. Here are the results:

      1. Parallel queries with NodeJS:
        $ time node async_wikistats.js ... All done! Total: 17753 ... real 0m30.668s user 0m0.256s sys 0m0.028s
      2. Pipeline with NojeJS:
        $ time node async_wikistats_pipeline.js ... All done! Total: 17753 ... real 5m39.666s user 0m0.212s sys 0m0.024s
        In the pipeline with NojeJS, I’m reusing the same connection (and do not open a new one for each thread).
      3. Direct query – partitioned table:
        mysql> select sum(tot_visits) from wikistats.wikistats_by_day_spark_part where url like ‘%postgresql%’; +-----------------+ | sum(tot_visits) | +-----------------+ | 17753 | +-----------------+ 1 row in set (5 min 31.44 sec)
      4. Direct query – non-partitioned table.
        mysql> select sum(tot_visits) from wikistats.wikistats_by_day_spark where url like ‘%postgresql%’; +-----------------+ | sum(tot_visits) | +-----------------+ | 17753 | +-----------------+ 1 row in set (4 min 38.16 sec)
Advantages of pipelines with X Plugin 

Although pipelining with X Plugin does not significantly increase query response time (it can reduce the total latency), it might be helpful in some cases. For example, let’s say we are downloading something from the Internet and need to save the progress of the download as well as the metadata for the document. In this example, I use youtube-dl to search and download the metadata about YouTube videos, then save the metadata JSON into MySQL 5.7 Document Store. Here is the code:

var mysqlx = require('mysqlx'); # This is the same as running $ youtube-dl -j -i ytsearch100:"mysql 5.7" const spawn = require('child_process').spawn; const yt = spawn('youtube-dl', ['-j', '-i', 'ytsearch100:"mysql 5.7"'], {maxBuffer: 1024 * 1024 * 128}); var mySession = mysqlx.getSession({ host: 'localhost', port: 33060, dbUser: 'root', dbPassword: '<your password>' }); yt.stdout.on('data', (data) => { try { dataObj = JSON.parse(data); console.log(dataObj.fulltitle); mySession.then(session => { session.getSchema("yt").getCollection("youtube").add( dataObj ) .execute(function (row) { }).catch(err => { console.log(err); }) .then( function (notices) { console.log("Wrote to MySQL: " + JSON.stringify(notices)) }); }).catch(function (err) { console.log(err); process.exit(); }); } catch (e) { console.log(" --- Can't parse json" + e ); } }); yt.stderr.on('data', (data) => { console.log("Error receiving data"); }); yt.on('close', (code) => { console.log(`child process exited with code ${code}`); mySession.then(session => {session.close() } ); });

In the above example, I execute the youtube-dl binary (you need to have it installed first) to search for “MySQL 5.7” videos. Instead of downloading the videos, I only grab the video’s metadata in JSON format  (“-j” flag). Because it is JSON, I can save it into MySQL document store. The table has the following structure:

CREATE TABLE `youtube` ( `doc` json DEFAULT NULL, `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL, UNIQUE KEY `_id` (`_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

Here is the execution example:

$ node yt.js What's New in MySQL 5.7 Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["3f312c3b-b2f3-55e8-0ee9-b706eddf"]}} MySQL 5.7: MySQL JSON data type example Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["88223742-9875-59f1-f535-f1cfb936"]}} MySQL Performance Tuning: Part 1. Configuration (Covers MySQL 5.7) Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["c377e051-37e6-8a63-bec7-1b81c6d6"]}} Dave Stokes — MySQL 5.7 - New Features and Things That Will Break — php[world] 2014 Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["96ae0dd8-9f7d-c08a-bbef-1a256b11"]}} MySQL 5.7 & JSON: New Opportunities for Developers - Thomas Ulin - Forum PHP 2015 Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["ccb5c53e-561c-2ed5-6deb-1b325739"]}} Cara Instal MySQL 5.7.10 NoInstaller pada Windows Manual Part3 Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["95efbd79-8d79-e7b6-a535-271640c8"]}} MySQL 5.7 Install and Configuration on Ubuntu 14.04 Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["b8cfe132-aca4-1eba-c2ae-69e48db8"]}}

Now, here is what make this example interesting: as NodeJS + X Plugin = Asynchronous + Pipelining, the program execution will not stop if the table is locked. I’ve opened two sessions:

  • session 1: $ node yt.js > test_lock_table.log
  • session 2:
    mysql> lock table youtube read; select sleep(10); unlock tables; Query OK, 0 rows affected (0.00 sec) +-----------+ | sleep(10) | +-----------+ | 0 | +-----------+ 1 row in set (10.01 sec) Query OK, 0 rows affected (0.00 sec)

Results:

... Upgrade MySQL Server from 5.5 to 5.7 ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["d4d62a8a-fbfa-05ab-2110-2fd5cf6d"]}} OSC15 - Georgi Kodinov - Secure Deployment Changes Coming in MySQL 5.7 ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["8ac1cdb9-1499-544c-da2a-5db1ccf5"]}} MySQL 5.7: Create JSON string using mysql FreeBSD 10.3 - Instalación de MySQL 5.7 desde Código Fuente - Source Code Webinar replay: How To Upgrade to MySQL 5.7 - The Best Practices - part 1 How to install MySQL Server on Mac OS X Yosemite - ltamTube Webinar replay: How To Upgrade to MySQL 5.7 - The Best Practices - part 4 COMO INSTALAR MYSQL VERSION 5.7.13 MySQL and JSON MySQL 5.7: Merge JSON data using MySQL ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["a11ff369-6f23-11e9-187b-e3713e6e"]}} ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["06143a61-4add-79da-0e1d-c2b52cf6"]}} ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["1eb94ef4-db63-cb75-767e-e1555549"]}} ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e25f15b5-8c19-9531-ed69-7b46807a"]}} ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["02b5a4c9-6a21-f263-90d5-cd761906"]}} ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e0bef958-10af-b181-81cd-5debaaa0"]}} ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["f48fa635-fa63-7481-0668-addabbac"]}} ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["557fa5c5-3c8a-fe01-c17c-549c557e"]}} MySQL 5.7 Install and Configuration on Ubuntu 14.04 ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["456b11d8-ba03-0aec-8e06-9517c6e1"]}} MySQL WorkBench 6.3 installation on Ubuntu 14.04 ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["0b651987-9b23-b5e0-f8f7-49b8ba5c"]}} Going through era of IoT with MySQL 5.7 - FOSSASIA 2016 ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e133746c-836c-a7e0-3893-292a7429"]}} MySQL 5.7: MySQL JSON operator example ... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["4d13830d-7b30-5b31-d068-c7305e0a"]}}

As we can see, the first two writes were immediate. Then I’ve locked the table, and no MySQL queries went through. At the same time the download process (which is the slowest part here) proceeded and was not blocked (we can see the titles above, which are not followed by lines “… => wrote to MySQL:”). When the table was unlocked, a pile of waiting queries succeeded.

This can be very helpful when running a “download” process, and the network is a bottleneck. In a traditional synchronous query execution, when we lock a table the application gets blocked (including the network communication). With NodeJS and X Plugin, the download part will proceed with MySQL acting as a queue.

Pipeline Durability

How “durable” this pipeline, you might ask. In other words, what will happen if I will kill the connection? To test it out, I have (once again) locked the table (but now before starting the nodejs), killed the connection and finally unlocked the table. Here are the results:

Session 1: ---------- mysql> truncate table youtube_new; Query OK, 0 rows affected (0.25 sec) mysql> lock table youtube_new read; Query OK, 0 rows affected (0.00 sec) mysql> select count(*) from youtube_new; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) Session 2: ---------- (when table is locked) $ node yt1.js 11 03 MyISAM Switching to InnoDB from MyISAM tablas InnoDB a MyISAM MongoDB vs MyISAM (MariaDB/MySQL) MySQL Tutorial 35 - Foreign Key Constraints for the InnoDB Storage Engine phpmyadmin foreign keys myisam innodb Convert or change database manual from Myisam to Innodb ... >100 other results omited ... ^C Session 1: ---------- mysql> select count(*) from youtube_new; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) Id: 4916 User: root Host: localhost:33221 db: NULL Command: Query Time: 28 State: Waiting for table metadata lock Info: PLUGIN: INSERT INTO `iot`.`youtube_new` (doc) VALUES ('{"upload_date":"20140319","protocol":" mysql> unlock table; Query OK, 0 rows affected (0.00 sec) mysql> select count(*) from youtube_new; +----------+ | count(*) | +----------+ | 2 | +----------+ 1 row in set (0.00 sec) mysql> select json_unquote(doc->'$.title') from youtube_new; +---------------------------------+ | json_unquote(doc->'$.title') | +---------------------------------+ | 11 03 MyISAM | | Switching to InnoDB from MyISAM | +---------------------------------+ 2 rows in set (0.00 sec)

Please note: in the above, there isn’t a single acknowledgement from the MySQL server. When code receives a response from MySQL it prints “Wrote to MySQL: {“_state”:{“rows_affected”:1,”doc_ids”:[“…”]}}“. Also, note that when the connection was killed the MySQL process is still there, waiting on the table lock.

What is interesting here is is that only two rows have been inserted into the document store. Is there a “history length” here or some other buffer that we can increase? I’ve asked Jan Kneschke, one of the authors of the X Protocol, and the answers were:

  • Q: Is there any history length or any buffer and can we tune it?
    • A: There is no “history” or “buffer” at all, it is all at the connector level.
  • Q: Then why is 2 rows were finally inserted?
    • To answer this question I’ve collected tcpdump to port 33060 (X Protocol), see below

This is very important information! Keep in mind that the asynchronous pipeline has no durability settings: if the application fails and there are some pending writes, those writes can be lost (or could be written).

To fully understand how the protocol works, I’ve captured tcpdump (Jan Kneschke helped me to analyze it):

tcpdump -i lo -s0 -w tests/node-js-pipelining.pcap "tcp port 33060"

(see update below for the tcpdump visualization)

This is what is happening:

  • When I hit CTRL+C, nodejs closes the connection. As the table is still locked, MySQL can’t write to it and will not send the result of the insert back.
  • When the table is unlocked, it starts the first statement despite the fact that the connection has been closed. It then acknowledges the first insert and starts the second one.
  • However, at this point the script (client) has already closed the connection and the final packet (write done, here is the id) gets denied. The X Plugin then finds out that the client closed the connection and stops executing the pipeline.

Actually, this is very similar to how the original MySQL protocol worked. If we kill the script/application, it doesn’t automatically kill the MySQL connection (unless you hit CTRL+C in the MySQL client, sends the kill signal) and the connection waits for the table to get unlocked. When the table is unlocked, it inserts the first statement from a file.

Session 1 --------- mysql> select * from t_sql; Empty set (0.00 sec) mysql> lock table t_sql read; Query OK, 0 rows affected (0.00 sec) Session 2: ---------- $ mysql iot < t.sql $ kill -9 ... [3] Killed mysql iot < t.sql Session 1: ---------- mysql> show processlist; +------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+ | 4913 | root | localhost | iot | Query | 41 | Waiting for table metadata lock | insert into t_sql values('{"test_field":0}') | +------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+ 4 rows in set (0.00 sec) mysql> unlock tables; Query OK, 0 rows affected (0.00 sec) mysql> select * from t_sql; +-------------------+ | doc | +-------------------+ | {"test_field": 0} | +-------------------+ 1 row in set (0.00 sec)

Enforcing unique checks

If I restart my script, it finds the same videos again. We will probably need to enforce the consistency of our data. By default the plugin generates the unique key (_id) for the document, so it prevents inserting the duplicates.

Another way to enforce the unique checks is to create a unique key for youtube id. Here is the updated table structure:

CREATE TABLE `youtube` ( `doc` json DEFAULT NULL, `youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL, UNIQUE KEY `youtube_id` (`youtube_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

I’ve changed the default “_id” column to the YouTube’s unique ID. Now when I restart the script it shows:

MySQL 5.7: Merge JSON data using MySQL { [Error: Document contains a field value that is not unique but required to be] info: { severity: 0, code: 5116, msg: 'Document contains a field value that is not unique but required to be', sql_state: 'HY000' } } ... => wrote to MySQL: undefined

…as this document has already been loaded.

Conclusion

Although X Plugin pipelining does not necessarily significantly increase query response (it might save the roundtrip time) it can be helpful for some applications.We might not want to block the network communication (i.e., downloading or API calls) when the MySQL table is locked, for example. At the same time, unless you check/wait for the acknowledgement from the server, the data might or might not be written into MySQL.

Bonus: data analysis

Now we can see what we have downloaded. There are a number of interesting fields in the result:

"is_live": null, "license": "Standard YouTube License", "duration": 2965, "end_time": null, "playlist": ""mysql 5.7"", "protocol": "https", "uploader": "YUI Library", "_filename": "Douglas Crockford - The JSON Saga--C-JoyNuQJs.mp4", "age_limit": 0, "alt_title": null, "extractor": "youtube", "format_id": "18", "fulltitle": "Douglas Crockford: The JSON Saga", "n_entries": 571, "subtitles": {}, "thumbnail": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg", "categories": ["Science & Technology"], "display_id": "-C-JoyNuQJs", "like_count": 251, "player_url": null, "resolution": "640x360", "start_time": null, "thumbnails": [{ "id": "0", "url": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg" }], "view_count": 36538, "annotations": null, "description": "Yahoo! JavaScript architect Douglas Crockford tells the story of how JSON was discovered and how it became a major standard for describing data.", "format_note": "medium", "playlist_id": ""mysql 5.7"", "upload_date": "20110828", "uploader_id": "yuilibrary", "webpage_url": "https://www.youtube.com/watch?v=-C-JoyNuQJs", "uploader_url": "http://www.youtube.com/user/yuilibrary", "dislike_count": 5, "extractor_key": "Youtube", "average_rating": 4.921875, "playlist_index": 223, "playlist_title": null, "automatic_captions": {}, "requested_subtitles": null, "webpage_url_basename": "-C-JoyNuQJs"

We can see the most popular videos. To do that I’ve added one more virtual field on view_count, and created an index on it:

CREATE TABLE `youtube` ( `doc` json DEFAULT NULL, `youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL, `view_count` int(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.view_count'))) VIRTUAL, UNIQUE KEY `youtube_id` (`youtube_id`), KEY `view_count` (`view_count`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

We can run the queries like:

mysql> select json_unquote(doc->'$.title'), -> view_count, -> json_unquote(doc->'$.dislike_count') as dislikes -> from youtube -> order by view_count desc -> limit 10; +----------------------------------------------------------------------------------------------------+------------+----------+ | json_unquote(doc->'$.title') | view_count | dislikes | +----------------------------------------------------------------------------------------------------+------------+----------+ | Beginners MYSQL Database Tutorial 1 # Download , Install MYSQL and first SQL query | 664153 | 106 | | MySQL Tutorial | 533983 | 108 | | PHP and MYSQL - Connecting to a Database and Adding Data | 377006 | 50 | | PHP MySQL Tutorial | 197984 | 41 | | Installing MySQL (Windows 7) | 196712 | 28 | | Understanding PHP, MySQL, HTML and CSS and their Roles in Web Development - CodersCult Webinar 001 | 195464 | 24 | | jQuery Ajax Tutorial #1 - Using AJAX & API's (jQuery Tutorial #7) | 179198 | 25 | | How To Root Lenovo A6000 | 165221 | 40 | | MySQL Tutorial 1 - What is MySQL | 165042 | 45 | | How to Send Email in Blackboard Learn | 144948 | 28 | +----------------------------------------------------------------------------------------------------+------------+----------+ 10 rows in set (0.00 sec)

Or if we want to find out the most popular resolutions:

mysql> select count(*) as cnt, -> sum(view_count) as sum_views, -> json_unquote(doc->'$.resolution') as resolution -> from youtube -> group by resolution -> order by cnt desc, sum_views desc -> limit 10; +-----+-----------+------------+ | cnt | sum_views | resolution | +-----+-----------+------------+ | 273 | 3121447 | 1280x720 | | 80 | 1195865 | 640x360 | | 18 | 33958 | 1278x720 | | 15 | 18560 | 1152x720 | | 11 | 14800 | 960x720 | | 5 | 6725 | 1276x720 | | 4 | 18562 | 1280x682 | | 4 | 1581 | 1280x616 | | 4 | 348 | 1280x612 | | 3 | 2024 | 1200x720 | +-----+-----------+------------+ 10 rows in set (0.02 sec)

Special thanks to Jan Kneschke and Morgan Tocker from Oracle for helping with the X Protocol internals.

Update: Jan Kneschke also generated the visualization for the tcpdump I’ve collected (when connection was killed):

Percona Server 5.7.13-6 is now available

July 6, 2016 - 9:07am

Percona announces the GA release of Percona Server 5.7.13-6 on July 6, 2016. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.7.13, including all the bug fixes in it, Percona Server 5.7.13-6 is the current GA release in the Percona Server 5.7 series. Percona’s provides completely open-source and free software. All the details of the release can be found in the 5.7.13-6 milestone at Launchpad.

New Features:
  • TokuDB MTR suite is now part of the default MTR suite in Percona Server 5.7.
Bugs Fixed:
  • Querying the GLOBAL_TEMPORARY_TABLES table would cause server crash if temporary table owning threads would execute new queries. Bug fixed #1581949.
  • IMPORT TABLESPACE and undo tablespace truncate could get stuck indefinitely with a writing workload in parallel. Bug fixed #1585095.
  • Requesting to flush the whole of the buffer pool with doublewrite parallel buffer wasn’t working correctly. Bug fixed #1586265.
  • Audit Log Plugin would hang when trying to write log record of audit_log_buffer_size length. Bug fixed #1588439.
  • Audit log in ASYNC mode could skip log records which don’t fit into log buffer. Bug fixed #1588447.
  • In order to support innodb_flush_method being set to ALL_O_DIRECT, the log I/O buffers were aligned to innodb_log_write_ahead_size. That implementation missed the case that the variable is dynamic and could still lead to a server to crash. Bug fixed #1597143.
  • InnoDB tablespace import would fail when trying to import a table with different data directory. Bug fixed #1548597 (upstream #76142).
  • Audit Log Plugin was truncating SQL queries to 512 bytes. Bug fixed #1557293.
  • mysqlbinlog did not free the existing connection before opening a new remote one. Bug fixed #1587840 (upstream #81675).
  • Fixed a memory leak in mysqldump. Bug fixed #1588845 (upstream #81714).
  • Transparent Huge Pages check will now only happen if tokudb_check_jemalloc option is set. Bugs fixed #939 and #713.
  • Logging in ydb environment validation functions now prints more useful context. Bug fixed #722.

Other bugs fixed: #1541698 (upstream #80261), #1587426 (upstream, #81657), #1589431, #956, and #964.

The release notes for Percona Server 5.7.13-6 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

MySQL 8.0

July 5, 2016 - 3:18pm

If you haven’t heard the news yet, MySQL 8.0 is apparently the next release of the world-famous database server.

Obviously abandoning plans to name the next release 5.8, Percona Server’s upstream provider relabelled all 5.8-related bugs to 8.0 as follows:

Reported version value updated to reflect release name change from 5.8 to 8.0

What will MySQL 8.0 bring to the world?

While lossless RBR has been suggested by Simon Mudd (for example), the actual feature list (except a Boost 1.60.0 upgrade!) remains a secret.

As far as bug and feature requests go, a smart google query revealed which bugs are likely to be fixed in (or are feature requests for) MySQL 8.0.

Here is the full list:

  • MySQL Bug #79380: Upgrade to Boost 1.60.0
  • MySQL Bug #79037: get rid of dynamic_array in st_mysql_options
  • MySQL Bug #80793: EXTEND EXPLAIN to cover ALTER TABLE
  • MySQL Bug #79812: JSON_ARRAY and JSON_OBJECT return …
  • MySQL Bug #79666: fix errors reported by ubsan
  • MySQL Bug #79463: Improve P_S configuration behaviour
  • MySQL Bug #79939: default_password_lifetime &gt; 0 should print …
  • MySQL Bug #79330: DROP TABLESPACE fails for missing general …
  • MySQL Bug #80772: Excessive memory used in memory/innodb …
  • MySQL Bug #80481: Accesses to new data-dictionary add confusing …
  • MySQL Bug #77712: mysql_real_query does not report an error for …
  • MySQL Bug #79813: Boolean values are returned inconsistently with …
  • MySQL Bug #79073: Optimizer hint to disallow full scan
  • MySQL Bug #77732: REGRESSION: replication fails for insufficient …
  • MySQL Bug #79076: make hostname a dynamic variable
  • MySQL Bug #78978: Add microseconds support to UNIX_TIMESTAMP
  • MySQL Bug #77600: Bump major version of libmysqlclient in 8.0
  • MySQL Bug #79182: main.help_verbose failing on freebsd
  • MySQL Bug #80627: incorrect function referenced in spatial error …
  • MySQL Bug #80372: Built-in mysql functions are case sensitive …
  • MySQL Bug #79150: InnoDB: Remove runtime checks for 32-bit file …
  • MySQL Bug #76918: Unhelpful error for mysql_ssl_rsa_setup when …
  • MySQL Bug #80523: current_memory in sys.session can go negative!
  • MySQL Bug #78210: SHUTDOWN command should have an option …
  • MySQL Bug #80823: sys should have a mdl session oriented view
  • MySQL Bug #78374: “CREATE USER IF NOT EXISTS” reports an error
  • MySQL Bug #79522: can mysqldump print the fully qualified table …
  • MySQL Bug #78457: Use gettext and .po(t) files for translations
  • MySQL Bug #78593: mysqlpump creates incorrect ALTER TABLE …
  • MySQL Bug #78041: GROUP_CONCAT() truncation should be an …
  • MySQL Bug #76927: Duplicate UK values in READ-COMMITTED …
  • MySQL Bug #77997: Automatic mysql_upgrade
  • MySQL Bug #78495: Table mysql.gtid_executed cannot be opened.
  • MySQL Bug #78698: Simple delete query causes InnoDB: Failing …
  • MySQL Bug #76392: Assume that index_id is unique within a …
  • MySQL Bug #76671: InnoDB: Assertion failure in thread 19 in file …
  • MySQL Bug #76803: InnoDB: Unlock row could not find a 2 mode …
  • MySQL Bug #78527: incomplete support and/or documentation of …
  • MySQL Bug #78732: InnoDB: Failing assertion: *mbmaxlen &lt; 5 in file …
  • MySQL Bug #76356: Reduce header file dependencies for …
  • MySQL Bug #77056: There is no clear error message if …
  • MySQL Bug #76329: COLLATE option not accepted in generated …
  • MySQL Bug #79500: InnoDB: Assertion failure in thread …
  • MySQL Bug #72284: please use better options to …
  • MySQL Bug #78397: Subquery Materialization on DELETE WHERE …
  • MySQL Bug #76552: Cannot shutdown MySQL using JDBC driver
  • MySQL Bug #76532: MySQL calls exit(MYSQLD_ABORT_EXIT …
  • MySQL Bug #76432: handle_fatal_signal (sig=11) in …
  • MySQL Bug #41925: Warning 1366 Incorrect string value: … for …
  • MySQL Bug #78452: Alter table add virtual index hits assert in …
  • MySQL Bug #77097: InnoDB Online DDL should support change …
  • MySQL Bug #77149: sys should possibly offer user threads …

Amazon RDS and pt-online-schema-change

July 1, 2016 - 10:30am

In this blog post, I discuss some of the insights needed when using Amazon RDS and pt-online-schema-change together.

The pt-online-schema-change tool runs DDL queries (ALTER) online so that the table is not locked for reads and writes. It is a commonly used tool by community users and customers. Using it on Amazon RDS requires knowing about some specific details. First, a high-level explanation of how the tool works.

This is an example from the documentation:

pt-online-schema-change --alter "ADD COLUMN c1 INT" D=sakila,t=actor

The tool runs an ALTER on the table “actor” from the database “sakila.” The alter adds a column named “c1” of type “integer.” In the background, the tool creates a new empty table similar to “actor” but with the new column already added. It then creates triggers on the original table to update the corresponding rows in the new table. After, it starts copying rows to the new table (this is the phase that takes the longest amount of time). When the copy is done, the tables are swapped, triggers removed and the old table dropped.

As we can see, it is a tool that uses the basic features of MySQL. You can run it on MySQL, Percona Server, MariaDB, Amazon RDS and so on. But when using Amazon, there is a hidden issue: you don’t have SUPER privileges. This means that if you try to run the tool on an RDS with binary logs enabled, you could get the following error:

DBD::mysql::db do failed: You do not have the SUPER privilege and binary logging is enabled (you might want to use the less safe log_bin_trust_function_creators variable) [for Statement "CREATE TRIGGER `pt_osc_db_table_del` AFTER DELETE ON `db`.`table` FOR EACH ROW DELETE IGNORE FROM `db`.`_table_new` WHERE `db`.`_table_new`.`table_id` <=> OLD.`table_id` AND `db`.`_table_new`.`account_id` <=> OLD.`account_id`"] at /usr/bin/pt-online-schema-change line 10583.

The following documentation page explains the reason for this message:

http://dev.mysql.com/doc/refman/5.7/en/stored-programs-logging.html

The bottom line is creating triggers on a server with binary logs enabled requires a user with SUPER privileges (which is impossible in Amazon RDS). The error message specifies the workaround. We need to enable the variable log_bin_trust_function_creators. Enabling it is like saying to the server:

“I trust regular users’ triggers and functions, and that they won’t cause problems, so allow my users to create them.”

Since the database functionality won’t change, it becomes a matter of trusting your users. log_bin_trust_function_creators is a global variable that can be changed dynamically:

mysql> SET GLOBAL log_bin_trust_function_creators = 1;

Run the tool again. This time, it will work. After the ALTER is done, you can change the variable to 0 again.

After you’re done with the ALTER process, you can change the variable to “0” again.

Rescuing a crashed pt-online-schema-change with pt-archiver

June 30, 2016 - 2:20pm

This article discusses how to salvage a crashed pt-online-schema-change by leveraging pt-archiver and executing queries to ensure that the data gets accurately migrated. I will show you how to continue the data copy process, and how to safely close out the pt-online-schema-change via manual operations such as RENAME TABLE and DROP TRIGGER commands. The normal process to recover from a crashed pt-online-schema-change is to drop the triggers on your original table and drop the new table created by the script. Then you would restart pt-online-schema-change. In this case, this wasn’t possible.

A customer recently needed to add a primary key column to a very busy table (with around 200 million rows). The table only had a unique key on one column (called our_id below). The customer had concerns about slave lag, and wanted to ensure there was little or no lag. This, as well as the fact that you can’t add a primary key as an online DDL in MySQL and Percona Server 5.6, meant the obvious answer was using pt-online-schema-change.

Due to the sensitivity of their environment, they could only afford one short window for the initial metadata locks, and needed to manually do the drop swap that pt-online-schema-change normally does automatically. This is where no-drop-triggers and no-swap-tables come in. The triggers will theoretically run indefinitely to keep the new and old tables in sync once pt-online-schema-change is complete. We crafted the following command:

pt-online-schema-change --execute --alter-foreign-keys-method=auto --max-load Threads-running=30 --critical-load Threads_running=55 --check-slave-lag mysql-slave1,mysql-slave2,mysql-slave3 --max−lag=10 --chunk-time=0.5 --set-vars=lock_timeout=1 --tries="create_triggers:10:2,drop_triggers:10:2" --no-drop-new-table --no-drop-triggers --no-swap-tables --chunk-index "our_id" --alter "ADD newcol BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST" D=website,t=largetable --nocheck-plan

You can see some of the specifics of other flags and why we used them in the Percona Toolkit Manual.

Once we ran the command the customer got concerned, as their monitoring tools weren’t showing any work done (which is by design, pt-online-schema-change doesn’t want to hurt your running environment). The customer ran strace -p to verify it was working. This wasn’t a great choice as it crashed pt-online-schema-change.

At this point, we knew that the application (and management) would not allow us to take new metadata locks to create triggers on the table, as we had passed our metadata lock window.

So how do we recover?

First, let’s start with a clean slate. We issued the following commands to create a new table, where __largetable_new is the table created by pt-online-schema-change:

CREATE TABLE mynewlargetable LIKE __largetable_new; RENAME TABLE __largetable_new TO __largetable_old, mynewlargetable TO __largetable_new; DROP TABLE __largetable_old;

Now the triggers on the original table, largetable are updating the new empty table that has our new schema.

Now let’s address the issue of actually moving the data that’s already in largetable to __largetable_new. This is where pt-archiver comes in. We crafted the following command:

pt-archiver --execute --max-lag=10 --source D=website,t=largetable,i=our_id --dest D=website,t=__largetable_new --where "1=1" --no-check-charset --no-delete --no-check-columns --txn-size=500 --limit=500 --ignore --statistics

We use pt-archiver to slowly copy records non-destructively to the new table based on our_id and WHERE 1=1 (all records). At this point, we periodically checked the MySQL data directory over the course of a day with ls -l to compare table sizes.

Once the table files were close to the same size, we ran counts on the tables. We noticed something interesting: the new table had thousands more records than the original table.

This concerned us. We wondered if our “hack” was a mistake. At this point we ran some verification queries:

select min(our_id) from __largetable_new; select max(our_id) from __largetable_new; select min(our_id) from largetable; select max(our_id) from largetable;

We learned that there were older records that didn’t exist in the live table. This means that pt-archiver and the DELETE trigger may have missed each other (i.e., pt-archiver was already in a transaction but hadn’t written records to the new table until after the DELETE trigger already fired).

We verified with more queries:

SELECT COUNT(*) FROM largetable l WHERE NOT EXISTS (SELECT our_id FROM __largetable_new n WHERE n.our_id=l.our_id);

They returned nothing.

SELECT COUNT(*) FROM __largetable_new n WHERE NOT EXISTS (SELECT our_id FROM largetable l WHERE n.our_id=l.our_id);

Our result showed 4000 extra records in the new table. This shows that we ended up with extra records that were deleted from the original table. We ran other queries based on their data to verify as well.

This wasn’t a huge issue for our application, and it could have been easily dealt with using a simple DELETE query based on the unique index (i.e., if it doesn’t exist in the original table, delete it from the new one).

Now to complete the pt-online-schema-change actions. All we need to do is the atomic rename or drop swap. This should be done as soon as possible to avoid running in a degraded state, where all writes to the old table are duplicated on the new one.

RENAME TABLE largetable TO __largetable_old , __largetable_new TO largetable;

Then drop the triggers for safety:

DROP TRIGGER pt_osc_website_largetable_ins; DROP TRIGGER pt_osc_website_largetable_upd; DROP TRIGGER pt_osc_website_largetable_del;

At this point it is safer to wait for the old table to clear out of the buffer pool before dropping it, just to ensure there is no impact on the server (maybe a week to be safe). You can check information_schema for a more accurate reading on this:

SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE WHERE TABLE_NAME = '`website`.`__largetable_old`'; +----------+ | count(*) | +----------+ | 279175 | +----------+ 1 row in set (8.94 sec)

Once this goes to 0 you can issue:

DROP TABLE __largetable_old;

2016 MySQL User Group Leaders Summit

June 29, 2016 - 10:20am

In this post, I’ll share my experience attending the annual MySQL User Group Leaders Summit in Bucharest, Romania.

The MySQL User Group Leaders Summit gathers together as many of the global MySQL user group leaders as possible. At the summit, we discuss further actions on how we can better act for their local communities. This year, it focused primarily on cloud technologies.

As the Azerbaijan MySQL User Group leader, I felt a keen responsibility to go. I wanted to represent our group and learn as much as possible to take back to with me. Mingling and having conversations with other group leaders helps give me more ideas about how to spread the MySQL word!

The Conference

I attended three MySQL presentations:

  • Guided tour on the MySQL source code. In this session, we reviewed the layout of the MySQL code base, roughly following the query execution path. We also covered how to extend MySQL with both built-in and pluggable add-ons.
  • How profiling SQL works in MySQL. This session gave an overview of the performance monitoring tools in MySQL: performance counters, performance schema and SYS schema. It also covered some of the details in analyzing MySQL performance with performance_schema.
  • What’s New in MySQL 5.7 Security. This session presented an overview of the new MySQL Server security-related features, as well as the MySQL 5.6 Enterprise edition tools. This session detailed the shifting big picture of secure deployments, along with all of the security-related MySQL changes.

I thought that the conference was very well organized, with uniformly great discussions. We also participated in some city activities and personal interactions. I even got to see Le Fred!

I learned a lot from the informative sessions I attended. The MySQL source code overview showed me the general paths of MySQL source code, including the most important directories, the most important functions and classes. The session about MySQL profiling instrumentation sessions informed us of the great MySQL profiling improvements. It reviewed some useful tools and metrics that you can use to get info from the server. The last session about MySQL security covered improved defaults, tablespace encryption and authentication plugins.

In conclusion, my time was well spent. Meeting and communicating with other MySQL user group leaders gives me insight into the MySQL community. Consequently, I highly recommend everyone gets involved in your local user groups and attend get-togethers like the MySQL User Group Leaders Summit when you can find the time.

Below you can see some of the pics from the trip. Enjoy!

 

 

 

 



General Inquiries

For general inquiries, please send us your question and someone will contact you.