EmergencyEMERGENCY? Get 24/7 Help Now!
Subscribe to Latest MySQL Performance Blog posts feed
Percona's Data performance and scalability blog
Updated: 1 hour 38 min ago

EXPLAIN FORMAT=JSON wrap-up

April 6, 2016 - 7:23am

This blog is an EXPLAIN FORMAT=JSON wrap-up for the series of posts I’ve done in the last few months.

In this series, we’ve discussed everything unique to EXPLAIN FORMAT=JSON. I intentionally skipped a description of members such as table_name, access_type  or select_id, which are not unique.

In this series, I only mentioned in passing members that replace information from the Extra column in the regular EXPLAIN output, such as using_join_buffer , partitions, using_temporary_table  or simply message. You can see these in queries like the following:

mysql> explain format=json select rand() from dual *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "message": "No tables used" } } 1 row in set, 1 warning (0.00 sec)

Or

mysql> explain format=json select emp_no from titles where 'Senior Engineer' = 'Senior Cat' *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "message": "Impossible WHERE" } } 1 row in set, 1 warning (0.01 sec)

Their use is fairly intuitive, similar to regular EXPLAIN, and I don’t think one can achieve anything from reading a blog post about each of them.

The only thing left to list is a Table of Contents for the series:

attached_condition: How EXPLAIN FORMAT=JSON can spell-check your queries

rows_examined_per_scan, rows_produced_per_join: EXPLAIN FORMAT=JSON answers on question “What number of filtered rows mean?”

used_columns: EXPLAIN FORMAT=JSON tells when you should use covered indexes

used_key_parts: EXPLAIN FORMAT=JSON provides insight into which part of multiple-column key is used

EXPLAIN FORMAT=JSON: everything about attached_subqueries, optimized_away_subqueries, materialized_from_subquery

EXPLAIN FORMAT=JSON provides insights on optimizer_switch effectiveness

EXPLAIN FORMAT=JSON: order_by_subqueries, group_by_subqueries details on subqueries in ORDER BY and GROUP BY

grouping_operation, duplicates_removal: EXPLAIN FORMAT=JSON has all details about GROUP BY

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

ordering_operation: EXPLAIN FORMAT=JSON knows everything about ORDER BY processing

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

Thanks for following the series!

Webinar April 7, 10am PDT – Introduction to Troubleshooting Performance: What Affects Query Execution?

April 5, 2016 - 5:49pm

Join us for our latest webinar on Thursday, April 7, at 10 am PDT (UTC-7) on Introduction to Troubleshooting Performance: What Affects Query Execution?

MySQL installations experience a multitude of issues: server hangs, wrong data stored in the database, slow running queries, stopped replications, poor user connections and many others. It’s often difficult not only to troubleshoot these issues, but to even know which tools to use.

Slow running queries, threads stacking for ages during peak times, application performance suddenly lagging: these are some of the things on a long list of possible database performance issues. How can you figure out why your MySQL installation isn’t running as fast as you’d like?

In this introductory webinar, we will concentrate on the three main reasons for performance slowdown:

  • Poorly optimized queries
  • Concurrency issues
  • Effects of hardware and other system factors

This webinar will teach you how to identify and fix these issues. Register now.

If you can’t attend this webinar live, register anyway and we’ll send you a link to the recording.

Sveta Smirnova, Principal Technical Services Engineer.

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns which can solve typical issues quicker, teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of the book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

Percona Live featured talk with Anastasia Ailamaki — RAW: Fast queries on JIT databases

April 5, 2016 - 9:07am

Welcome to the next Percona Live featured talk with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus!

In this Percona Live featured talk, we’ll meet Anastasia Ailamaki, Professor and CEO, EPFL and RAW Labs. Her talk will be RAW: Fast queries on JIT databases. RAW is a query engine that reads data in its raw format and processes queries using adaptive, just-in-time operators. The key insight is its use of virtualization and dynamic generation of operators. I had a chance to speak with Anastasia and learn a bit more about RAW and JIT databases:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Anastasia: I am a computer engineer and initially trained on networks. I came across databases in the midst of the object-oriented hype — and was totally smitten by both the power of data models and the wealth of problems one had to solve to create a functioning and performant database system. In the following years, I built several systems as a student and (later) as a coder. At some point, however, I needed to learn more about the machine. I decided to do a Masters in computer architecture, which led to a Ph.D. in databases and microarchitecture. I became a professor at CMU, where for eight years I guided students as they built their ideas into real systems that assessed their ideas potential and value. During my sabbatical at EPFL, I was fascinated by the talent and opportunities in Switzerland — I decided to stay and, seven years later, co-founded RAW Labs.

Percona: Your talk is going to be on “RAW: Fast queries on JIT databases.” Would you say you’re an advocate of abandoning (or at least not relying on) the traditional “big structured database accessed by queries” model that have existed for most of computing? Why?

Anastasia: The classical usage paradigm for databases has been “create a database, then ask queries.” Traditionally, “creating a database” means creating a structured copy of the entire dataset. This is now passé for the simple reason that data is growing too fast, and loading overhead grows with data size. What’s more, we typically use only a small fraction of the data available, and investing in the mass of owned data is a waste of resources — people have to wait too long from the time they receive a dataset until they can ask a query. And it doesn’t stop there: the users are asked to pick a database engine based on the format and intended use of the data. We associate row stores to transactions, NoSQL to JSON, and column stores to analytics, but true insight comes from combining all of the data semantically as opposed to structurally. With each engine optimizing for specific kinds of queries and data formats, analysts subconsciously factor in limitations when piecing together their infrastructure. We only know the best way to structure data when we see the queries, so loading data and developing query processing operators before knowing the queries is premature.

Percona: What are the conditions that make JIT databases in general (and RAW specifically) the optimum solution?

Anastasia: JIT databases push functionality to the last minute, and execute it right when it’s actually needed. Several systems perform JIT compilation of queries, which offer great performance benefits (an example is Hyper, a system recently acquired by Tableau). RAW is JIT on steroids: it leaves data at its source and only reads it or asks for any system resources when they’re actually required. You may have 10000 files, and a file will only be read when you ask a query that needs the data in it. With RAW, when the user asks a query the RAW code-generates raw source data adaptors and the entire query engine needed to run the query. It stores all useful information about the accessed data, as well as popular operators generated in the past, and uses them to accelerate future queries. It adapts to system resources on the fly and only asks for them when needed. RAW is an interface to raw data and operational databases, and uses them to accelerate future queries. It adapts to system resources on the fly and only asks for them when needed. In addition, the RAW query language is incredibly rich; it is a superset of SQL which allows navigation on hierarchical data and tables at the same time, with support for variable assignments, regular expressions, and more for log processing — while staying in declarative land. Therefore, the analysts only need to describe the desired result in SQL, without thinking of data format.

Percona: What would you say in the next step for JIT and RAW? What keeps you up at night concerning the future of this approach?

Anastasia: The next step for RAW is to reach out to as many people as possible — especially users with complex operational data pipelines — and reduce cost and eliminate pipeline stages, unneeded data copies, and extensive scripting. RAW is a new approach that can work with existing infrastructures in a non-intrusive way. We are well on our way with several proof-of-concept projects that create verticals for RAW, and demonstrate its usefulness for different applications.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Anastasia: I am looking forward to meeting as many users and developers as possible, hearing their feedback on RAW and our ideas, and learning from their experiences.

You can read more about RAW and JIT databases at Anastasia’s academic group’s website: dias.epfl.ch.

Want to find out more about Anastasia and RAW? Register for Percona Live Data Performance Conference 2016, and see her talk RAW: Fast queries on JIT databases. Use the code “FeaturedTalk” and receive $100 off the current registration price!

Percona Live Data Performance Conference 2016 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Data in the Cloud track at Percona Live with Brent Compton and Ross Turk: The Data Performance Cloud

April 4, 2016 - 5:03pm

In this blog, we’ll discuss the Data in the Cloud track at Percona Live with Red Hat’s Brent Compton and Ross Turk.

Welcome to another interview with the Percona Live Data Performance Conference speakers and presenters. This series of blog posts will highlight some of the talks and presentations available at Percona Live Data Performance Conference April 18-21 in Santa Clara. Read through to the end for a discounts for Percona Live registration.

(A webinar sneak preview of their “MySQL on Ceph” cloud storage talk is happening on Wednesday, April 6th at 2 pm EDT. You can register for it here – all attendees will receive a special $200 discount code for Percona Live registration after the webinar! See the end of this blog post for more details!)

First, we need to establish some context. Data storage has traditionally, and for most of its existence, pretty much followed a consistent model: stable and fairly static big box devices that were purpose-built to house data. Needing more storage space meant obtaining more (or bigger) boxes. Classic scale-up storage. Need more, go to the data storage vendor and order a bigger box.

The problem is that data is exploding, and has been exponentially for the last decade. Some estimates put the amount of data being generated worldwide increasing at a rate of 40%-60% per year. That kind of increase, and at that speed, doesn’t leave a lot of ramp up time to make long term big box hardware investments. Things are changing too fast.

The immediate trend – evident by declining revenues of class storage boxes – is placing data in a cloud of scale-out storage. What is the cloud? Since that question has whole books devoted to it, let’s try to simplify it a bit.

Cloud computing benefits include scalability, instantaneous configuration, virtualized consumables and the ability to quickly expand base specifications. Moving workloads to the cloud brings with it numerous business benefits, including agility, focus and cost:

  • Agility. The cloud enables businesses to react to changing needs. As the workload grows or spikes, just add compute cycles, storage, and bandwidth with the click of a mouse.
  • Focus. Deploying workloads to the cloud enables companies to focus more resources on business-critical activities, rather than system administration.
  • Cost. Businesses can pay as they go for the services level they need. Planning and sinking money into long-term plans that may or may not pan out is not as big a problem.

When it comes to moving workloads into the cloud, the low throughput applications were the obvious first choice: email, non-critical business functions, team collaboration assistance. These generally are neither mission critical, nor require high levels of security. As applications driven services became more and more prevalent (think Netflix, Facebook, Instagram), more throughput intensive services were moved to the cloud – mainly for flexibility during service spikes and to accommodate increased users. But tried and true high-performance workloads like databases and other corporate kingdoms that have perceived higher security requirements have traditionally remained stuck in the old infrastructures that have served well – until now.

So what is this all leading to? Well, according to Brent and Ross, ALL data will eventually be going to the cloud, and the old models of storage infrastructure are falling by the wayside. Between the lack of elasticity and scalability of purpose-built hardware, and the oncoming storage crisis, database storage is headed for cloud services solutions.

I had some time to talk with Brent and Ross about data in the cloud, and what we can expect regarding a new data performance cloud model.

Percona: There is always a lot of talk about public versus private paradigms when it comes to cloud discussions. To you, this is fairly inconsequential. How do see “the cloud?” How would you define it terms of infrastructure for workloads?

RHT: Red Hat has long provided software for hybrid clouds, with the understanding that most companies will use a mix of public cloud and private cloud infrastructure for their workloads. This means that Red Hat software is supported both on popular public cloud platforms (such as AWS, Azure, and GCE) as well as on-premise platforms (such as OpenStack private clouds). Our work with Percona in providing a reference architecture for MySQL running on Ceph is all about giving app developers a comparable, deterministic experience when running their MySQL-based apps on a Ceph private storage cloud v. running them in the public cloud.

Percona: So, your contention is that ALL data is headed to the cloud. What are the factors that are going ramp up this trend? What level of information storage will cement this as inevitable?

RHT:  We’d probably restate this to “most data is headed to A cloud.” Two distinctions being made in this statement. The first is “most” versus “all” data.  For years to come, there will be late adopters with on-premise data NOT being served through a private cloud infrastructure. The second distinction is “a” cloud versus “the” cloud.  “A” cloud means either a public cloud or a private cloud (or some hybrid of the two). Private clouds are being constructed by the world’s most advanced companies within their own data centers to provide a similar type of elastic infrastructure with dynamic provisioning and lower CAPEX/OPEX costs (as is found in public clouds).

Percona: What are the concerns you see with moving all workloads to the cloud, and how would you address those concerns?

RHT:  The distinctions laid out in the previous answer address this. For myriad reasons, some data and workloads will reside on-premise within private clouds for a very long time. In fact, as the technology matures for building private clouds (as we’re seeing with OpenStack and Ceph), and can offer many of the same benefits as public clouds, we see the market reaching an equilibrium of sorts. In this equilibrium many of the agility, flexibility, and cost benefits once available only through public cloud services will be matched by private cloud installations. This will re-base the public versus private cloud discussion to fewer, simpler trade-offs – such as which data must reside on-premises to meet an enterprise’s data governance and control requirements.

Percona: So you mentioned the “Data Performance Cloud”? How would you describe that that is, and how it affects enterprises?

RHT:  For many enterprises, data performance workloads have been the last category of workloads to move a cloud, whether public or private. Public cloud services, such as AWS Relational Database Service with Provisioned-IOPS storage, have illustrated improved data performance for many workloads once relegated to the cloud sidelines. Now, with guidelines in the reference architecture being produced by Percona and the Red Hat Ceph team, customers can achieve comparable data performance on their private Ceph storage clouds as they do with high-performance public cloud services.

Percona: What can people expect to get out of the Data in the Cloud track at Percona Live this year?

RHT: Architecture guidelines for building and optimizing MySQL databases on a Ceph private storage cloud.   These architectures will include public cloud benefits along with private cloud control and governance.

Want to find out more about MySQL, Ceph, and Data in the Cloud? Register for Percona Live Data Performance Conference 2016, and see Red Hat’s sponsored Data in the Cloud Keynote Panel: Cloudy with a chance of running out of disk space? Or Sunny times ahead? Use the code “FeaturedTalk” and receive $100 off the current registration price!

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

MySQL and Ceph: Database-as-a-Service sneak preview

Businesses are familiar with running a Database-as-a-Service (DBaaS) in the public cloud. They enjoy the benefits of on-demand infrastructure for spinning-up lots of MySQL instances with predictable performance, without the headaches of managing them on specific, bare-metal highly available clusters.

This webinar lays the foundation for building a DBaaS on your own private cloud, enabled by Red Hat® Ceph Storage. Join senior architects from Red Hat and Percona for reference architecture tips and head-to-head performance results of MySQL on Ceph versus MySQL on AWS.

This is a sneak preview of the labs and talks to be given in April 2016 at the Percona Live Data
Performance Conference
. Attendees received a discount code for $200 off Percona Live registration!

Speakers:

  • Brent Compton, director, Storage Solution Architectures, Red Hat
  • Kyle Bader, senior solutions architect, Red Hat
  • Yves Trudeau, principal consultant, Percona

Join the live event:

Wednesday, April 6, 2016 | 2 p.m. ET | 11 a.m. PT

Time zone converter

See Bill Nye the Science Guy at Percona Live and help change the world: a special offer!

April 4, 2016 - 9:47am

See Bill Nye the Science Guy at Percona Live Data Performance Conference, and help an excellent cause!

The best science is built on solid data. As a world-renown icon in tech and geek circles everywhere, Bill Nye fights to raise awareness of the value of science, critical thinking, and reason. He hopes that the data he brings will help inspire people everywhere to change the world. And seeing as the open source community is full of science-minded individuals, he is excited to speak to everyone at Percona Live!

Since his early days as a comedian, to creating his well-known Science Guy character, to the present day, Bill Nye has always brought the impressive and illuminating power of science to people.

Bill Nye’s keynote speech at Percona Live is “Bill Nye’s Objective – Change the World!” Through his talks, books, and day job as the CEO of The Planetary Society (the world’s largest non-governmental space interest organization), Bill wants to get people involved in the power of science. Science can teach people about the world, and how they can influence and change it. Science helps us to understand what Bill likes to call “our place in space.”

And now you can help change the world, just by attending The Percona Live Data Performance Conference! For a limited time, if you buy a Keynote or Expo pass to Percona Live using the promo code “NYE” you will get the pass for just $10, AND all the money from these registrations will be donated to The Planetary Society. The Planetary Society sponsors projects that will seed innovative space technologies, nurtures creative young minds, and is a vital advocate for our future in space. Their mission is to empower the world’s citizens to advance space science and exploration.

A great deal, and a great cause! This offer is limited to the first 250 registrations, so hurry up and help change the world!

Percona XtraBackup 2.4.2 is now available

April 1, 2016 - 7:30am

Percona is glad to announce the first GA release of Percona XtraBackup 2.4.2 on April 1st, 2016. Downloads are available from our download site and from apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New Features:

Bugs Fixed:

  • When backup was taken on MariaDB 10 with GTID enabled, Percona XtraBackup didn’t store gtid_slave_pos in xtrabackup_slave_info but logged it only to STDERR. Bug fixed #1404484.
  • Backup process would fail if --throttle option was used. Bug fixed #1554235.

Release notes with all the bugfixes for Percona XtraBackup 2.4.2 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Fixing MySQL Bug#2: now MySQL makes toast!

April 1, 2016 - 7:15am

Historical MySQL Bug#2, opened 12 Sep 2002, states that MySQL Connector/J doesn’t make toast. It hasn’t been fixed for more than 14 years. I’ve finally created a patch for it.

First of all: why only fix this for MySQL Connector/J? We should make sure the server can do this for any implementation! With this fix, now MySQL server (starting with version 5.1) can make toast.

There are few dependences though (see assembled setup picture):

  1. Raspberry Pi + PiFace shield
  2. Power switch relay (I’ve used an IoT Power Relay)
  3. Toaster oven (any cheap mechanical model will work)

Patch:

  1. Make_toast binary, which is run on the Raspberry Pi and PiFace interface (you’ll need to install the PiFace library):
    #!/usr/bin/python import sys from time import sleep if len(sys.argv) == 2 and sys.argv[1].isdigit(): toast_time = sys.argv[1] else: toast_time = 10 print "Toasting for " + str(toast_time) + " seconds..." import pifacedigitalio as p try: p.init() p.digital_write(7,1) sleep(float(toast_time)) p.digital_write(7,0) except (KeyboardInterrupt, SystemExit): print "Exiting and turning off heat..." p.digital_write(7,0) sys.exit(1) print "Your toast is ready! Enjoy! "
  2. MySQL UDF, based on lib_mysqludf_sys, which calls the make_toast binary:
    char* make_toast( UDF_INIT *initid , UDF_ARGS *args , char* result , unsigned long* length , char *is_null , char *error ){ FILE *pipe; char line[1024]; unsigned long outlen, linelen; char buf[40]; result = malloc(1); outlen = 0; sprintf(buf, "make_toast %s", args->args[0]); pipe = popen(buf, "r"); while (fgets(line, sizeof(line), pipe) != NULL) { linelen = strlen(line); result = realloc(result, outlen + linelen); strncpy(result + outlen, line, linelen); outlen = outlen + linelen; } pclose(pipe); if (!(*result) || result == NULL) { *is_null = 1; } else { result[outlen] = 0x00; *length = strlen(result); } return result; }

Usage:

mysql> call make_toast(300)

Demo picture (thanks to my colleague Fernando Laudares Camargos), actual video will follow:

Implementation details:

Hardware/wiring

The relay switch powers on the toaster oven, and no modifications are needed to the oven itself. Make sure the timer is set to 30 min initially, the Raspberry Pi/MySQL UDF will now control how long you toast the bread.

The setup wiring is super easy (but may be counterintuitive if you are used to working with Arduino): use the output pins (image) and connect 5v on the PiFace to the “+” sign on the relay switch, and one of the pins to the “-” sign on the relay switch.

Software install

  1. Install PiFace software and Python bindings
  2. Test the make_toast python script
  3. Add user “mysql” to the spi and gpio groups so it can manipulate pins:
    # gpasswd -a mysql gpio # gpasswd -a mysql spi
  4. Download the make toast UDF code and run install.sh.

mysql> call make_toast(300);

Enjoy your toast when it is hot!

MongoDB at Percona Live: A Special Open Source Community Discount

March 31, 2016 - 12:32pm

We want MongoDB at Percona Live!

One of the main goals of the Percona Live Data Performance Conference 2016 is celebrating and embracing the open source community. The community’s spirit of innovation, expertise and competition has produced incredible software, hardware, processes and products.

The open source community is a diverse and powerful collection of companies, organizations and individuals that have helped to literally change the world. Percona is proud to call itself a member of the open source community, and we strongly feel that upholding the principles of the community is a key to our success. These principals include an open dialog, an open mind, and a zeal for cooperative interaction. Together, we can create amazing things.

That’s why we were surprised when MongoDB declined to have us sponsor or speak at MongoDB World 2016, and even more taken aback when we were told our engineers are not welcome to attend the show. We make a special point of inviting competitors to participate in, speak at, and sponsor Percona Live – MongoDB included. We welcome our competitors to speak, sponsor and attend Percona Live because it is in the greater interest of the community at large to include all voices.

With that in mind, we’d like to extend a special offer to any MongoDB employees: sign up for the Percona Live Data Performance Conference 2016 using your company email, and receive a special VIP 25% discount off the registration price (use promo code “mongodb”).

In addition:

  • We invite all MongoDB attendees to a VIP cocktail reception with the Percona Team Tuesday, April 19th from 5-6pm
  • Percona is pleased to host all MongoDB attendees as special guests at the Tuesday, April 19th, Community Dinner Event at Pedro’s

It’s our way of showing our solidarity with the open source community, and expressing our belief that we work best when we work together.

See you all at Percona Live! Register here!

Docker MySQL Replication 101

March 30, 2016 - 1:21pm

In this blog post, we’ll discuss some of the basics regarding Docker MySQL replication.

Docker has gained widespread popularity in recent years as a lightweight alternative to virtualization. It is ideal for building virtual development and testing environments. The solution is flexible and seamlessly integrates with popular CI tools. This post walks through the setup of MySQL replication with Docker using Percona Server 5.6 images.

To keep things simple we’ll configure a pair of instances and override only the most important variables for replication. You can add whatever other variables you want to override in the configuration files for each instance. Note: the configuration described here is suitable for development or testing. We use the operating system repository packages, for the latest version use the official

Note: the configuration described here is suitable for development or testing. We’ve also used the operating system repository packages; for the latest version use the official Docker images. The steps described can be used to setup more slaves if required, as long as each slave has a different server-id.

First, install Docker and pull the Percona images (this will take some time and is only executed once):

# Docker install for Debian / Ubuntu apt-get install docker.io # Docker install for Red Hat / CentOS (requires EPEL repo) yum install epel-release # If not installed already yum install docker-io # Pull docker repos docker pull percona

Now create locally persisted directories for the:

  1. Instance configuration
  2. Data files

# Create local data directories mkdir -p /opt/Docker/masterdb/data /opt/Docker/slavedb/data # Create local my.cnf directories mkdir -p /opt/Docker/masterdb/cnf /opt/Docker/masterdb/cnf ### Create configuration files for master and slave vi /opt/Docker/masterdb/cnf/config-file.cnf # Config Settings: [mysqld] server-id=1 binlog_format=ROW log-bin vi /opt/Docker/slavedb/cnf/config-file.cnf # Config Settings: [mysqld] server-id=2

Great, now we’re ready start our instances and configure replication. Launch the master node, configure the replication user and get the initial replication co-ordinates:

# Launch master instance docker run --name masterdb -v /opt/Docker/masterdb/cnf:/etc/mysql/conf.d -v /opt/Docker/masterdb/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=mysecretpass -d percona:5.6 00a0231fb689d27afad2753e4350192bebc19ab4ff733c07da9c20ca4169759e # Create replication user docker exec -ti masterdb 'mysql' -uroot -pmysecretpass -vvv -e"GRANT REPLICATION SLAVE ON *.* TO repl@'%' IDENTIFIED BY 'slavepass'G" mysql: [Warning] Using a password on the command line interface can be insecure. -------------- GRANT REPLICATION SLAVE ON *.* TO repl@"%" -------------- Query OK, 0 rows affected (0.02 sec) Bye ### Get master status docker exec -ti masterdb 'mysql' -uroot -pmysecretpass -e"SHOW MASTER STATUSG" mysql: [Warning] Using a password on the command line interface can be insecure. *************************** 1. row *************************** File: mysqld-bin.000004 Position: 310 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set:

If you look carefully at the “docker run” command for masterdb, you’ll notice we’ve defined two paths to share from local storage:

/opt/Docker/masterdb/data:/var/lib/mysql:

  • This maps the local “/opt/Docker/masterdb/data” to the masterdb’s container’s “/var/lib/mysql path”
  • All files within the datadir “/var/lib/mysql” persist locally on the host running docker rather than in the container

/opt/Docker/masterdb/cnf:/etc/mysql/conf.d:

  • This maps the local “/opt/Docker/masterdb/cnf” directory to the container’s “/etc/mysql/conf.d” path
  • The configuration files for the masterdb instance persist locally as well
  • Remember these files augment or override the file in “/etc/mysql/my.cnf” within the container (i.e., defaults will be used for all other variables)

We’re done setting up the master, so let’s continue with the slave instance. For this instance the “docker run” command also includes the “–link masterdb:mysql” command, which links the slave instance to the master instance for replication.

After starting the instance, set the replication co-ordinates captured in the previous step:

docker run --name slavedb -d -v /opt/Docker/slavedb/cnf:/etc/mysql/conf.d -v /opt/Docker/slavedb/data:/var/lib/mysql --link masterdb:mysql -e MYSQL_ROOT_PASSWORD=mysecretpass -d percona:5.6 eb7141121300c104ccee0b2df018e33d4f7f10bf5d98445ed4a54e1316f41891 docker exec -ti slavedb 'mysql' -uroot -pmysecretpass -e'change master to master_host="mysql",master_user="repl",master_password="slavepass",master_log_file="mysqld-bin.000004",master_log_pos=310;"' -vvv mysql: [Warning] Using a password on the command line interface can be insecure. -------------- change master to master_host="mysql",master_user="repl",master_password="slavepass",master_log_file="mysqld-bin.000004",master_log_pos=310 -------------- Query OK, 0 rows affected, 2 warnings (0.23 sec) Bye

Almost ready to go! The last step is to start replication and verify that replication running:

# Start replication docker exec -ti slavedb 'mysql' -uroot -pmysecretpass -e"START SLAVE;" -vvv mysql: [Warning] Using a password on the command line interface can be insecure. -------------- START SLAVE -------------- Query OK, 0 rows affected, 1 warning (0.00 sec) Bye # Verify replication is running OK docker exec -ti slavedb 'mysql' -uroot -pmysecretpass -e"SHOW SLAVE STATUSG" -vvv mysql: [Warning] Using a password on the command line interface can be insecure. -------------- SHOW SLAVE STATUS -------------- *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: mysql Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysqld-bin.000004 Read_Master_Log_Pos: 310 Relay_Log_File: mysqld-relay-bin.000002 Relay_Log_Pos: 284 Relay_Master_Log_File: mysqld-bin.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 310 Relay_Log_Space: 458 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 230d005a-f1a6-11e5-b546-0242ac110004 Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec) Bye

Finally, we have a pair of dockerized Percona Server 5.6 master-slave servers replicating!

As mentioned before, this is suitable for a development or testing environment. Before going into production with this configuration, think carefully about the tuning of the “my.cnf” variables and the choice of disks used for the data/binlog directories. It is important to remember that newer versions of Docker recommend using “networks” rather than “linking” for communication between containers.

Percona Live featured talk with Avi Kivity: Scylla, a Cassandra-compatible NoSQL database at two million requests per second

March 30, 2016 - 10:14am

Welcome to the next Percona Live featured talk with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus!

In this Percona Live featured talk, we’ll meet Avi Kivity, CTO of ScyllaDB. His talk will be Scylla, a Cassandra-compatible NoSQL database at 2 million requests per second. Scylla is a new NoSQL database that applies systems programming techniques to a horizontally scalable NoSQL design to achieve extreme performance improvements. I had a chance to speak with Avi and learn a bit more about Scylla and its strengths:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Avi: Unlike perhaps many database developers, I approached databases up from the kernel and the filesystem layers. As the maintainer of the Linux Kernel-based Virtual Machine (KVM) project, I had extensive experience in kernel programming, especially scaling loads to many cores. Before that, at Exanet, I worked on a distributed filesystem (now Dell’s FluidFS), where I gained storage and distributed systems experience. Applying this low-level experience to a high-level application like a database has been very rewarding for me.

I work as ScyllaDB’s CTO in our Herzliyya, Israel headquarters, but our development team is scattered around twelve countries! Since ScyllaDB is a remote-work champion, I have the pleasure of working with the very best developers on the planet.

Percona: Your talk is going to be “Scylla, a Cassandra-compatible NoSQL database at 2 million requests per second.” What is it about Scylla that makes it an obvious choice for an adopter? Is there a specific workload or scenario that it handles well?

Avi: As Scylla is a drop-in replacement for Cassandra, existing Cassandra users are our obvious target. Cassandra compatibility means that the Cassandra file formats, drivers, query language, management tools, and even configuration files are all understood by Scylla. Your existing applications, data and Cassandra skills transfer with very little effort. However, on average you gain up to 10 times the throughput, with a sizable reduction in latency; at the higher percentiles, you gain even more! The throughput improvement can be translated to smaller clusters, higher application throughput, a bigger load safety margin, or a combination of all of these.

As a very high-throughput database, Scylla is a good fit for the Internet of Things (IoT) and web-scale data stores. Its low latency (no Garbage Collection pauses!) make it a good fit for ad-tech applications. Even non-Cassandra users with high-throughput or strict low-latency requirements should take a good look at Scylla.

Percona: Where are you in the great NoSQL vs. MySQL debate? Why would somebody choose NoSQL (and specifically, Scylla) over MySQL?

Avi: Both SQL and NoSQL have their places. SQL offers great flexibility in your query choices, and excellent ACIDity. NoSQL trades off some of that flexibility and transactional behavior, but in return it gives you incredible scalability, geographical distribution and availability – and with Scylla, amazing throughput.

A great advantage of the Scylla architecture (which, to be fair, we inherited from Cassandra) is its symmetric structure. All nodes have the same role: there are no masters and slaves, metadata nodes or management nodes. A symmetric architecture means linear scaling as you add nodes, without a specific node becoming a bottleneck. This is pretty hard to achieve in a MySQL deployment.

Percona: What do you see as an issue that we the open source database community needs to be on top of concerning NoSQL, Cassandra, or Scylla? What keeps you up at night?

Avi: The NoSQL movement placed great emphasis on scale-out, almost completely ignoring scale-up. Why bother with per-node performance if you can simply add more nodes? Operational costs and complexity, that’s why! With Scylla, we’re trying to bring the same kind of attention to per-node performance that traditional SQL databases have while still providing the NoSQL goodness.

When we investigated the performance bottlenecks in Cassandra, we saw that while non-blocking message-passing was used between nodes (as it should be), blocking locks were used for inter-core communications, and blocking I/O APIs were used for storage access. To fix this problem, we wrote Seastar (http://seastar-project.org), a server application framework that uses non-blocking message-passing for inter-core communications and storage access. Scylla builds on Seastar and uses it to achieve its performance goals.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Avi: This is my first Percona Live conference, so I’m excited!  I’m looking forward to engaging with Percona Live attendees, seeing how Scylla can help them and understanding which features we need to prioritize on the Scylla roadmap.

You can read more of Avi’s thoughts on NoSQL, SQL and Scylla at the Scylla blog, or follow him on Twitter.

Want to find out more about Avi and Scylla? Register for Percona Live Data Performance Conference 2016, and see his talk Scylla, a Cassandra-compatible NoSQL database at 2 million requests per second. Use the code “FeaturedTalk” and receive $100 off the current registration price!

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Read-write split routing performance in MaxScale

March 29, 2016 - 5:42pm

In this blog post, we’ll discuss read-write split routing performance in MaxScale.

The two previous posts have shown how to setup high availability (HA) with Maxscale using asynchronous replication and how we monitor replication.

Now let’s focus on the routing module performing read-write splits.

This is our current configuration:

[Splitter Service] type=service router=readwritesplit servers=percona1, percona2 max_slave_replication_lag=30 user=maxscale passwd=264D375EC77998F13F4D0EC739AABAD4

This router module is designed to spread the read queries across multiple servers (slaves by default), and send the write queries to a single server: the master.

This module is intended to work with Asynchronous Master-Slave replication but also with Galera replication if you plan to write to a single node.

So what is routed to the Master?

  • Write statements
  • All statements within an open transaction, even if this transaction is read only
  • Store procedure and user-defined function call.
  • DDL statements
  • Execution of prepared statements (EXECUTE)
  • All statements using temporary tables

Example:

    • percona1: master
    • percona2 and percona3: slaves

Let’s connect to MaxScale with the MySQL’s interactive client:

mysql> select @@hostname; +------------+ | @@hostname | +------------+ | percona2 | +------------+ mysql> start transaction; mysql> select @@hostname; +------------+ | @@hostname | +------------+ | percona1 | +------------+ mysql> rollback; mysql> select @@hostname; +------------+ | @@hostname | +------------+ | percona2 | +------------+

Now let’s try with a READ ONLY transaction:

mysql> start transaction read only; mysql> select @@hostname; +------------+ | @@hostname | +------------+ | percona1 | +------------+

As we can see, MaxScale doesn’t support READ ONLY transactions. It considers them the same as any transaction. This means they are routed to the master as a WRITE.

We’ve already seen the max_slave_replication_lag optional parameter, but there are some others:

      • max_slave_connections: defines the maximum number of slaves a router session uses, the default is to use all the ones available
      • use_sql_variables_in: defines where queries’ reading session variables should be routed. Valid values are master and all (the latter being the default)
      • weightby: defines the name of the value used to calculate the weights of the server

Now let’s play with the weightby . So in this configuration, we will target 10% of the reads to percona2, and 90% to percona3:

[Splitter Service] type=service router=readwritesplit servers=percona1, percona2, percona3 weightby=myweight ... [percona2] type=server address=192.168.90.3 port=3306 protocol=MySQLBackend myweight=1 [percona3] type=server address=192.168.90.4 port=3306 protocol=MySQLBackend myweight=9

We restart MaxScale, and verify the settings of the service:

# maxadmin -pmariadb show service "Splitter Service" Service 0x363b460 Service: Splitter Service Router: readwritesplit (0x7fe7f1e88540) State: Started Number of router sessions: 0 Current no. of router sessions: 0 Number of queries forwarded: 0 Number of queries forwarded to master: 0 Number of queries forwarded to slave: 0 Number of queries forwarded to all: 0 Master/Slave percentage: 0.00% Connection distribution based on myweight server parameter. Server Target % Connections Operations Global Router percona3 90.0% 0 0 0 percona2 10.0% 0 0 0 percona1 100.0% 0 0 0 Started: Wed Feb 24 22:39:27 2016 Root user access: Disabled Backend databases 192.168.90.4:3306 Protocol: MySQLBackend 192.168.90.3:3306 Protocol: MySQLBackend 192.168.90.2:3306 Protocol: MySQLBackend Routing weight parameter: myweight Users data: 0x36397c0 Total connections: 2 Currently connected: 2 SSL: Disabled

The target % seems correct, let’s test it!

for i in `seq 1 10`; do mysql -h 192.168.90.5 -BN -umanager -ppercona -e "select @@hostname; select sleep(10)" 2>/dev/null & done percona2 percona2 percona2 percona2 percona2 percona2 percona3 percona3 percona3 percona3

That doesn’t seem good! Let’s check the service again:

Service 0x363b460 Service: Splitter Service Router: readwritesplit (0x7fe7f1e88540) State: Started Number of router sessions: 10 Current no. of router sessions: 10 Number of queries forwarded: 30 Number of queries forwarded to master: 0 Number of queries forwarded to slave: 30 Number of queries forwarded to all: 0 Master/Slave percentage: 0.00% Connection distribution based on myweight server parameter. Server Target % Connections Operations Global Router percona3 90.0% 10 10 5 percona2 10.0% 10 10 5 percona1 100.0% 10 10 0 Started: Wed Feb 24 22:39:27 2016 Root user access: Disabled Backend databases 192.168.90.4:3306 Protocol: MySQLBackend 192.168.90.3:3306 Protocol: MySQLBackend 192.168.90.2:3306 Protocol: MySQLBackend Routing weight parameter: myweight Users data: 0x36397c0 Total connections: 12 Currently connected: 12 SSL: Disabled

Five operations for both . . . this looks like a normal load balancer, 50%-50%.

So that doesn’t work as we expected. Let’s have a look at other router options:

      • slave_selection_criteria. Controls how the router chooses the slaves and how it load balances the sessions. There are some parameter options:
        • LEAST_GLOBAL_CONNECTIONS. Slave with least connections from MaxScale, not on the server itself
        • LEAST_ROUTER_CONNECTIONS. Slave with least connections from this service
        • LEAST_BEHIND_MASTER. Slave with smallest replication lag
        • LEAST_CURRENT_OPRTATIONS. Slave with least active operations (this is the default)
      • master_accept_reads. Uses the master for reads

The are some others; please check the online manual for:

      • max_sescmd_history
      • disable_sescmd_history

That explains the behavior we just observed. But what if we want to use the weight setting, and not spread the reads equivalently on the slaves?

I found the answer on IRC in the #maxscale freenode. Markus Makela (markusjm) explained to me that the default configuration in 1.3.0 is to use all the slaves, and load balance the actual statements. So to achieve what we want to do, we need to use these options in the service section:

router_options=slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS max_slave_connections=1

Let’s test it:

for i in `seq 1 10`; do mysql -h 192.168.90.5 -BN -umanager -ppercona -e "select @@hostname; select sleep(10)" 2>/dev/null & done percona3 percona3 percona3 percona3 percona3 percona2 percona3 percona3 percona3 percona3 Service 0x1d88560 Service: Splitter Service Router: readwritesplit (0x7f9c018c3540) State: Started Number of router sessions: 10 Current no. of router sessions: 10 Number of queries forwarded: 30 Number of queries forwarded to master: 0 Number of queries forwarded to slave: 30 Number of queries forwarded to all: 0 Master/Slave percentage: 0.00% Connection distribution based on myweight server parameter. Server Target % Connections Operations Global Router percona3 90.0% 9 9 9 percona2 10.0% 1 1 1 percona1 100.0% 10 10 0 Started: Wed Feb 24 22:58:21 2016 Root user access: Disabled Backend databases 192.168.90.4:3306 Protocol: MySQLBackend 192.168.90.3:3306 Protocol: MySQLBackend 192.168.90.2:3306 Protocol: MySQLBackend Routing weight parameter: myweight Users data: 0x1d8a480 Total connections: 12 Currently connected: 12 SSL: Disabled

Yes! It worked as expected!

max_slave_connections sets the maximum number of slaves a router session uses at any moment. The default is to use all available slaves. When we set it to 1, we get one master and one slave connection per client, and the connections are balanced according to the server weights. The new mechanism uses statements instead of connections for load balancing (see MXS-588).

Finally, this routing module also support routing hints. I’ll cover them in my next MaxScale post.

More information: https://github.com/mariadb-corporation/MaxScale/blob/develop/Documentation/Routers/ReadWriteSplit.md

MySQL 5.7 primary key lookup results: is it really faster?

March 28, 2016 - 8:51am

This blog examines MySQL 5.7’s primary key lookup results, and determines if MySQL 5.7 is really faster than its early versions.

MySQL 5.7 was released some time ago, and now that the dust has settled it’s a good time to review its performance improvements.

I’m not doing this just to satisfy my own curiosity! Many customers still running MySQL 5.6 (or even MySQL 5.5) often ask “How much performance gain we can expect by switching to 5.7? Or will it actually be a performance hit, especially after Peter’s report here: https://www.percona.com/blog/2013/02/18/is-mysql-5-6-slower-than-mysql-5-5/?”

To determine the answer, we’ll look at some statistics. There are a variety workloads to consider, and we will start with the simplest one: MySQL primary key lookup for data that fits into memory. This workload does not involve transactions and is fully CPU-bound.

The full results, scripts and configurations can be found on our GitHub page.

For this test, my server is a 56-logical-thread system (2 sockets / 14 cores each / 2 hyper-threads each) powered by “Intel(R) Xeon(R) E5-2683 v3 @ 2.00GHz” CPUs.

These are the primary results:

Up to 20 threads, MySQL 5.5 clearly outperforms MySQL 5.7. After 20 threads, however, it hits scalability issues and starts struggling with throughput. MySQL 5.6 is a different story – it outperforms 5.7 up to 120 threads. After that 120 threads, MySQL 5.7 again scales much better, and it can maintain throughput to all 1000 threads.

The above results are on a system where the client and server use the same server. To verify the results, I also ran the test on a system configuration where the client and server are located on different servers, connected via 10GB network.

Here results are the results from that setup:

In this case, we pushed more load on the server (since the client does not share resources with MySQL), and we can see that MySQL 5.7 outperformed MySQL 5.6 after 68 threads (with MySQL 5.6 showing scalability problems even sooner).

There is another way to improve MySQL 5.6 results on large numbers of threads: good old innodb-thread-concurrency. Let’s see the MySQL 5.6 results after setting innodb-thread-concurrency=64:

We can see that using innodb-thread-concurrency improves MySQL 5.6 results when getting into hundreds of threads.

While investigating ways to improve overall throughput, I found disabling PERFORMANCE_SCHEMA during MySQL startup is a good option. The numbers got better after doing so. Below are the numbers for 5.6 and 5.7 with PERFORMANCE_SCHEMA disabled.

For MySQL 5.6:

For MySQL 5.7:

For MySQL 5.7, PERFORMANCE_SCHEMA’s overhead is quite visible.

Conclusions

I can say that Oracle clearly did a good job with MySQL 5.7, but they focused on primary keys lookups. They wanted to report 1.6M QPS.

I was not able to get to 1.6M; the best I could achieve was 470K QPS (with a disabled PERFORMANCE_SCHEMA). Full disclosure: I used sysbench 0.5 with LUA scripts and no prepared statements during this test. Oracle used the older sysbench 0.4 (with prepared statements), and their system had 144-logical threads.

MySQL 5.7, however, continues their tradition of slowness in low threads ranges. MySQL 5.6 was slower than MySQL 5.5, and MySQL 5.7 slower than MySQL 5.6.

PRIMARY KEY lookups aren’t the only workload type – there are many cases, some much more interesting! I will show the performance metrics for other workloads in upcoming posts.

Percona How To: Field Names and Document Size in MongoDB

March 24, 2016 - 2:28pm

In this blog post, we’ll discuss how shorter field names impact performance and document size in MongoDB.

The MongoDB Manual Developer Notes state:

Shortening field names reduce expressiveness and does not provide considerable benefit for larger documents and where document overhead is not of significant concern. Shorter field names do not lessen the size of indexes because indexes have a predefined structure. In general, it is not necessary to use short field names.

This is a pretty one-sided statement, and we should be careful not to fall into this trap. At first glance, you might think “Oh that makes sense due to compression!” However, compression is only one part of the story. When we consider the size of a single document, we need to consider several things:

  • Size of the data in the application memory
  • Size over the network
  • Size in the replication log
  • Size in memory in the cache
  • Amount of data being sent to the compressor
  • Size on disk*
  • Size in the journal files*

As you can see, this is a pretty expansive list, and this is just for consideration on field naming – we haven’t even gotten to using the right data types for the value yet.

Further, only the last two items in the list (“*” starred) represent any part of the system that has compression (to date). Put another way, the conversation about compression only covers about 25% of the discussion about field names. MongoDB Inc’s comment is trying to sidestep nearly 75% of the rest of the conversation.

To ensure an even debate, I want to break size down into two major areas: Field Optimization and Value Optimization. They both touch on all of the areas listed above except for sorting, which is only about value optimization.

Field Optimization

When we talk about field optimization, it is purely considering using smaller field names. This might seem obvious, but when your database field names become object properties in your application code, the developers want these to be expressive (i.e., longer and space-intensive).

Consider the following:

locations=[]; for (i=1;i<=1000;i++){    locations.push({ longitude : 28.2211, latitude : 128.2828 }) } devices=[]; for (i=1;i<=10;i++){    devices.push( {        name:"iphone6",        last_ping: ISODate(),        version: 8.1 ,        security_pass: true,        last_10_locations: locations.slice(10,20)    }) } x={    _id : ObjectId(),    first_name: "David",    last_name:     "Murphy",    birthdate:     "Aug 16 2080",    address :     "123 nowhere drive Nonya, TX, USA , 78701",    phone_number1:     "512-555-5555",    phone_number2:    "512-555-5556",    known_locations: locations,    last_checkin : ISODate(),    devices : devices } >Object.bsonsize(x) 54879

Seems pretty standard, but wow! That’s 54.8k per document! Now let’s consider another format:

locations2=[]; for (i=1;i<=1000;i++){    locations2.push({ lon : 28.2211, lat : 128.2828 }) } devices2=[]; for (i=1;i<=10;i++){    devices2.push( {        n:"iphone6",        lp: ISODate(),        v: 8.1 ,        sp: true,        l10: locations.slice(10,20)    }) } y={    _id : ObjectId(),    fn:     "David",    ln:     "Murphy",    bd:     "Aug 16 2080",    a :     "123 nowhere drive Nonya, TX, USA , 78701",    pn1:     "512-555-5555",    pn2:    "512-555-5556",    kl:     locations2,    lc :     ISODate(),    d :     devices2 } > Object.bsonsize(y) 41392 > Object.bsonsize(y)/Object.bsonsize(x) 0.754241148708978

This minor change saves space by 25%, without changing any actual data. I know you can already see things like kl or l10 and are wondering, “What the heck is that!” This is where some clever tricks with the application code can come in.

You can make a mapping collection in MongoDB, or keep it in your application code – so in the code  self.l10 is renamed to self.last_10_locations. Some people go so far as using constants – for example “self.LAST_10_LOCATIONS”  to “self.l10 = self.get_value(LAST_10_LOCATIONS)” – to reduce the field size.

Value Optimization

Using the same example, let’s assume we want to improve the field usage. We know we will always pull a user by their _id,  or the most recent people to check-in. To help optimize this further, let us assume “x” is still our main document:

locations=[]; for (i=1;i<=1000;i++){    locations.push({ longitude : 28.2211, latitude : 128.2828 }) } devices=[]; for (i=1;i<=10;i++){    devices.push( {        name:"iphone6",        last_ping: ISODate(),        version: 8.1 ,        security_pass: true,        last_10_locations: locations.slice(10,20)    }) } x={    _id : ObjectId(),    first_name: "David",    last_name:     "Murphy",    birthdate:     "Aug 16 2080",    address :     "123 nowhere drive Nonya, TX, USA , 78701",    phone_number1:     "512-555-5555",    phone_number2:    "512-555-5556",    known_locations: locations,    laat_checkin : ISODate(),    devices : devices } >Object.bsonsize(x) 54879

But now, instead of optimizing field names, we want to optimize the values:

locations=[]; for (i=1;i<=1000;i++){    locations.push({ longitude : 28.2211, latitude : 128.2828 }) } devices=[]; for (i=1;i<=10;i++){    devices.push( {        name:"iphone6",        last_ping: ISODate(),        version: 8.1 ,        security_pass: true,        last_10_locations: locations.slice(10,20)    }) } z={    _id : ObjectId(),    first_name: "David",    last_name:     "Murphy",    birthdate:     ISODate("2080-08-16T00:00:00Z"),    address :     "123 nowhere drive Nonya, TX, USA , 78701",    phone_number1:    5125555555,    phone_number2:    5125555556,    known_locations: locations,    last_checkin : ISODate(),    devices : devices } >Object.bsonsize(z) 54853

In this example, we changed phone numbers to integers and used the “Date Type” for dates (as already done in the devices document). The savings were much smaller than earlier, coming in at only 26 bytes, but this could have a significant impact when multiplied out to many fields and documents. If we had started this example quoting the floats as many people do, we would see more of a difference. But always watch out for numbers and dates shown as strings: these almost always waste space.

When you combine both sets of savings  you have:

54853- 26 - 41392 = 13435

That’s right: 24.5% smaller memory size on the network and for the application to parse with its CPU! Easy wins to reduce your resource needs, and to make the COO happier.

Want to be a superhero? Join the Database Performance Team!

March 24, 2016 - 7:39am

Admit it, you’ve always wanted to fight danger, win battles and save the day! Who doesn’t? Do you want to be a superhero? Percona can show you how!

We don’t have any radioactive spiders or billionaire gadgets, but we do have our own team of superheroes dedicated to protecting your database performance: The Database Performance Team!

The Database Performance Team is comprised of our services experts, who work tirelessly every day to guarantee the performance of your database. Percona’s database services are some of our most valuable customer resources – besides the software itself. Whether it’s support, consulting, technical account managers, or remote DBAs, our support team is made up of superheroes that make sure your database is running at peak performance.

We want you to join us in the fight against poor performance. Join our Database Performance Team crew as part of the Database Street Team!

We’ll be introducing the members of our super group in the coming weeks. As we introduce the Database Performance Team (the “characters” below), we want you! We’ll be offering “missions” for you to complete: challenges, puzzles, or actions that get you prizes for success!

Your first mission: guess the identities of our secret team before we reveal them!

Mystery Character 1

Mystery Character 2

Mystery Character 3

Mystery Character 4

Mystery
Character 5

Hint: Hint: Hint: Hint: Hint: Funny, friendly, quick-witted, supporting, fast and courteous – but still able to get the job done with amazing competence. Computer-like smarts, instant recall, a counselor, able to understand a problem and the solution quickly. Technical, with clairvoyant foresight, with the knowledge and statistics to account for all issues, manages problems before they happen. Remotely all-seeing, a director, good at multi-tasking, adapts-on-the-fly, cool in a crisis. Insanely strong, can’t be stopped, hard to knock down, the product of rigorous testing, unlimited endurance. Who am I? Who am I? Who am I? Who am I? Who am I?

Follow @Percona on Twitter and use the hashtag #DatabasePerformanceTeam to cast your guess on who any mystery character is. Correctly guess any of their names or roles, and the lucky winner gets their choice of our mystery T-shirt in either men’s or women’s style.

Stay tuned, as we reveal the identities of the Database Performance Team over the coming weeks!

Join the ranks of the Database Street Team! Fun games, cool prizes – more info is coming soon!

Some facts:*

Gartner has estimated the average cost of downtime at $5,000 per minute!

Join The Database Performance Team today!

 

 

*Source: http://data-informed.com/key-challenges-facing-the-modern-database-administrator/

Percona Live featured talk with Stewart Smith: Why Would I Run MySQL or MariaDB on POWER Anyway?

March 23, 2016 - 12:07pm

Welcome to the next installment of our talks with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus! This blog is Percona Live featured talk with Stewart Smith.

In this installment, we’ll meet Stewart Smith, OPAL Architect at IBM. His talk is Why Would I Run MySQL or MariaDB on POWER Anyway? and will present the technical reasons why you might want to consider the POWER architecture for MySQL/MariaDB production workloads.

I got a chance to discuss Stewart’s talk with him:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Stewart: While at the AUUG (Australian Unix Users Group) Conference in Melbourne way back in 2004, I mentioned to Arjen Lentz (MySQL employee #25) that I was starting to look for my next adventure. Shortly after, at lunch time, instead of going to eat lunch Arjen hands me his cell phone and on the other end of the line was Brian Aker, then Director of Architecture at MySQL AB. That that was my first interview.

So in 2004, I started working on MySQL Cluster (NDB). Four years later, having encountered pretty much all of the warts in the MySQL Server while working on MySQL Cluster, I became the second person to commit code to Drizzle – a fork of MySQL where we tried to rethink everything and reshape the code base to be a) modern, efficient and scalable on modern multi-core machines, and b) designed for large scale web applications. Luckily – instead of firing us all for forking the product in secret – Sun moved us to the CTO group and we got to work on Drizzle full time.

From 2011 to 2014 I was Director of Server Development at Percona where I continued my focus on iterative change and automated QA.

All of these were amazing learning experiences and I’m really proud of what we achieved during these times, the impact of which is still being felt in the database world.

Ultimately, though, it was time for a change – and in my role at IBM as OPAL Architect, I work on the OpenPower Abstraction Layer (OPAL), the completely Open Source firmware for OpenPOWER systems. Of course, at some point, somebody discovered I knew something about MySQL, and I managed to (as a side project) port MySQL to POWER and get a world record of 1 million queries per second. This led to MariaDB being officially supported on POWER8 processors and IBM investing more time in the reliability and performance of MySQL on POWER.

So while my day job is no longer database internals, I stick my head in occasionally as there’s still some part of me that enjoys it.

Percona: Your talk is going to be on “Why would I run MySQL or MariaDB on POWER anyway?” So does that mean you’re firmly on the side of increasing HW power before optimization to achieve performance? If so, why? And for what workload types?

Stewart: We’ve always been scaling both up and out with MySQL. It used to be that scaling up was going from a single processor with a single core to two processors. Now, a cell phone with fewer than four cores is low end.

Of course, POWER CPUs have more benefits than just more cores and threads. There’s a long history of building POWER CPUs to be reliable, with an ethos of *never* acting on bad data, so there’s lots of error checking throughout the CPU itself in addition to ECC memory.

So while POWER can bring raw computing performance to the table, it can also bring reliability and our density with virtual machines can be really interesting.

Percona: Virtualization, SDN, cloud deployments – all of these use distributed resources. How does this affect the use of MySQL/MariaDB on POWER? And how can these types of setups affect application performance – positively and negatively?

Stewart: We’re lucky on POWER in that our hardware has been designed to last a great many years with the idea of partitioning it out to multiple operating systems.

The big advantage for cloud deployments is isolation between tenants, and if we can do this with minimal or zero performance impact, that’s a win for everyone. A challenge to cloud environments is always IO. Databases love lots of low latency IOPs and too often, adding virtualization adds latency, reducing density (tenants per physical machine).

Percona: What do you see as an issue that we the open source database community > needs to be on top of with regard white box development? What keeps you up at night with regard to the future of white box deployments?

Stewart: I think there’s a few big challenges to address for today’s hardware, the main one being scaling to the number of CPU cores/threads we have now as well as to the number of IOPs we have now. These are, however, not new problems – they’re ones that MySQL and InnoDB have been struggling with for over a decade.

Other open source databases (e.g., MongoDB) have re-learned the lesson the hard way: big global locks don’t scale. With storage backends coming to Mongo such as TokuDB and WiredTiger, it has the opportunity to
become an interesting player.

Non-volatile memory has the opportunity to change things more than cheap, high-performance SSDs have. When the unit of persistence is a byte, and not a 512/4096-byte block or multi-kilobyte/megabyte erase
block, things get *different*. Engines such as MyRocks may fare a lot better than more traditional engines like InnoDB – but more than likely there are new designs yet to exist.

I think the biggest challenge is going to be creating a vibrant and innovative development community around an SQL front-end – an open source project (rather than an open source product) where new ideas and experimentation can flourish.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Stewart: I’m really looking forward to some of the internals talks. I’m a deeply technical person and I want the deep dive into the interesting details of how things work. Of special interest are the MyRocks and TokuDB sessions, as well as how InnoDB is evolving.

I’ll likely poke my head into several sessions around managing MySQL deployments in order to keep up to date on how people are deploying and using relational databases today.

These days, I’m the old gray-beard of the MySQL world inside IBM, where I try and keep to mostly being advisory while focusing on my primary role which is open source firmware for POWER.

Also, over the years working on databases, I’ve made a great many friends who I don’t get to see often enough. I’m really looking forward to catching up with friends and former colleagues – and this is one of the
things I treasure about this community.

You can read more about POWER and Stewart’s thoughts at his personal blog.

To see Stewart’s talk, register for Percona Live Data Performance Conference 2016. Use the code “FeaturedTalk” and receive $100 off the current registration price!

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

TokuDB impacts InnoDB Performance?

March 22, 2016 - 4:10pm

This blog discusses how TokuDB impacts InnoDB performance when the two run in the same environment.

You would think MySQL storage engines are fairly independent of each other, even in the same environment. Enabling one, or changing its configuration, logically should have no impact on the performance of other engines (such as InnoDB) when they are accessing tables. The reality, however, is more complicated than that!    

Now that we’ve shipped TokuDB, we’ve been getting feedback from our community and customers that enabling TokuDB might negatively affect performance – even for queries that don’t touch TokuDB tables (and in some cases, even when TokuDB is kept completely idle).

After investigating these reports, we found that the following issues could be contributing to this performance loss:

  • Memory allocation. By default, TokuDB allocates 50% of the system memory to its cache. If you already allocate a significant amount of memory to the innodb_buffer_pool, you’re likely overcommitting. Check tokudb_cache_size and set it to the amount that is appropriate for your system.
  • Caching. Many users run InnoDB with innodb_flush_method=O_DIRECT. This ensures that the OS file cache is not polluted with heavy database IO, and is available for binary log files, MyISAM temporary tables and other OS needs. TokuDB uses buffered IO by default, which can pollute your OS cache with heavy use. You can change this behavior by enabling tokudb_directio. Note though, TokuDB performs better without tokudb_directio for some workloads.
  • Two-phase transaction commit. This is a tricky one. To coordinate between multiple transactionally capable storage engines and the binary log, MySQL uses a two-phase transaction commit and transaction coordinator. If you only run the InnoDB storage engine, and have the binary log disabled (typical in a test environment), the transaction coordinator won’t be used. If, however, you enable both InnoDB and TokuDB – with the binary log disabled – the TC_MMAP transaction coordinator gets used. The TC_MMAP transaction coordinator has very poor performance and scalability. Since MySQL can’t predict which storage engines are going to be part of a transaction, overhead takes place for all transactions – even for the transactions that only touch InnoDB tables. To “fix” this problem, you need to enable the binary log. The binary log gets used as the transaction coordinator, and it performs and scales much better. This related bug describes the other effect of transaction coordinator issue.

Storage engine configuration causes most of the performance issues related to running TokuDB and InnoDB together, and the proper MySQL configuration settings resolve them.

Webinar Thursday March 24: What MongoDB Storage Engine is Right for You?

March 22, 2016 - 7:50am

Join Jon Tobin, Percona’s Director of Solutions Engineering, as he presents “MongoDB Revs You Up: What Storage Engine is Right for You?” As you may or may not know, there is more than one MongoDB storage engine to choose from, now that MongoDB supports multiple storage engines – a feature the community has been requesting for years! Now that it’s here, you might be asking yourself, “Which engine should I use?”

As an extension of the recent “MonogDB Revs You Up” blog series, this webinar will help you answer that question. Jon will review the storage engines that have recently become available in the MongoDB ecosystem and how they can accelerate your workload.

This webinar will cover:

  • What differentiates the storage engines
  • How you can use them
  • The effects the different storage engines have on various workloads
  • How you can evaluate the engines for yourself

Jon Tobin is Percona’s Director of Solutions Engineering. Jon has over 15 years of experience in the IT and business sectors. His passion is helping his customers leverage technology to build the business of the future. Most recently, Jon was the Lead Sales Engineer at Tokutek, the company bringing speed and efficiency to “Big Data” (Tokutek was acquired by Percona in 2015). Before that, he was a core sales engineer for EMC Corporation, where for four years he sold enterprise SAN technologies in the NY/NJ and New England areas.

Percona Live 2016 Community Game Night

March 21, 2016 - 2:02pm

Hello, all! Just a quick blog to remind you all about the Percona Live 2016 Community Game Night!

This awesome event is taking place on Wednesday, April 20 from 7:00 pm to 10:00 pm, after the Community Networking Reception. It’s modeled after last year’s extremely popular game night, featuring many of the same games back by popular demand:

  • The Meltdown
  • Segway Races
  • Pac-Man Battle Royal!

This year we plan on tons more games compared to last year, with fun additions like a bowling alley and Oculus Rift experiences. We’ll have drinks, food and most importantly: some crazy good fun.

“The Meltdown!”

Community Game Night was one of the most talked about events of last year’s conference, and this year it is going to be even bigger and better. Make sure you don’t miss it!

Percona Live’s Game Night is our way of thanking the community for everything you do. Everyone and anyone who has at least a keynote and exhibitor pass are welcome to attend.

We’re looking forward to seeing you there, so mark your calendar and join us!

If you haven’t registered for Percona Live, you’re in danger of missing out! Register now!

Moving with SQL Server to Linux? Move from SQL Server to MySQL as well!

March 21, 2016 - 1:01pm

In this blog, we’ll discuss MicroSoft’s recent announcement that SQL Server will be available on Linux. If you’re going to Linux, why not move from SQL Server to MySQL?

The open source operating system Linux powers nearly one-quarter of all the servers running in Microsoft’s Azure cloud service. Until just last week, this meant nearly one-quarter of Azure cloud servers couldn’t actually run much Microsoft software.

So Microsoft is porting SQL Server to Linux. Currently, there is a private preview available with a planned full release sometime in mid-2017.  Microsoft clarified it would offer at least SQL Server’s core capabilities. Microsoft will include other components after they receive customer requests and feedback. Initially, Microsoft’s Linux version will support relational databases, but leave out the business intelligence side of the software suite for a later date.

Microsoft isn’t open sourcing SQL Server’s code, but making it Linux-compatible is a big change for the company. This is quite a move for Microsoft, signaling at some acknowledgment that open source platforms have the potential to drive revenue. Which is funny, seeing as they once compared Linux to a cancer.

The move also demonstrates the legitimacy of open source development: if there are enough businesses out there willing to use Linux rather than Windows as a platform, obviously any debate over stability, functionality or security is over. But over the years, open source has won over not just hackers and the computer-savvy, but corporations and governments as well.

And why would so many companies be on Linux? I’m going to guess it’s to reduce licensing fees for platform software (such as Windows) and avoid being driven by a single vendor. This makes sense for today’s business world: open source isn’t the wild west anymore, and is a viable alternative to proprietary licensed software. Windows has lost dominance due to missteps in the mobile market and an industry-wide move to web and cloud computing.

So now that you’re saving costs on your platform, and you’ve moved to Linux, why wouldn’t you go one step further and save money on your DBMS licensing by moving from SQL Server to MySQL? Why only go half way – especially since the full version’s release is unknown?

Windows users should be plenty familiar with MySQL, it used often enough by people already trying to cut down on licensing costs, and is increasing in popularity against SQL Server – as evidenced by this graph:

(Graph taken from http://db-engines.com/. You can see an interactive version can here.)

And the benefits of migrating are pretty clear:

  • Eliminate the license and support fees for MS SQL Server (or other SQL software). Even various MySQL enterprise versions are less expensive (and frankly, there are several completely open source license version available at NO cost)
  • MySQL already supports a wide range of operating systems (OS), including several Linux distros – so, better late than never, Microsoft, but still . . .
  • MySQL isn’t tied to a single vendor’s software, so it’s easier to scale or change your environment to adapt to different workloads.
  • Almost every aspect of MySQL, including advanced features, have been tested intensively by a massive open source community over years – there isn’t an issue or configuration that hasn’t been thoroughly dealt with 
  • Several pluggable storage engines, as opposed to one for SQL Server
  • MySQL feature set is equal if not better than SQL Server for today’s workloads. You can see a fairly good, if not totally accurate, comparison here. With the recent release of MySQL 5.7, even more performance and monitoring features are available, including multi-source replication, InnoDB enhancements, security updates, and optimizer improvements.  

What about MySQL’s performance and scalability compared to SQL Server? Seeing as numerous sites use MySQL for large scale applications and huge workloads (Google, Facebook, Yahoo!, etc.), it shouldn’t be much of a question anymore. And it’s not just a web-centric platform, as evidenced by numerous enterprise and government agency adoptions, in many industries: F5, Telenor, NASA, Scholastic, etc.  MySQL’s performance and scalability are clearly in line with SQL Server.

There are multiple online tools to help you migrate from SQL Server to MySQL, any number of which are excellent: MySQL Workbench and AWS Database Migration tool are two examples with a good deal of online documentation and support.

Introduction to MySQL troubleshooting basic techniques webinar: Q & A

March 18, 2016 - 5:43pm

In this blog, I will answer questions posed in my recent introduction to MySQL troubleshooting basic techniques webinar.

Thank you for attending my March 10 MySQL troubleshooting webinar “Introduction to MySQL troubleshooting: basic techniques,” which is the second in the “MySQL Troubleshooting Webinar” series. The recording and slides for the webinar are available here. Here is the list of your questions that I wasn’t able to answer during the webinar, with responses.

Q: Any known issues with auditing?  We see a real slowdown with insert/update performance.  Best way to track it down?

A: Sorry, I did not completely understand your question. Do you see a real slowdown when turning auditing on? In this case: which type of auditing? Percona Audit plugin or something else? Or do you simply have slow inserts and updates and want to track them down?

Regarding the effect of auditing: nothing comes without a price. Any auditing means you will, at least, store queries and write them to disk. This will take CPU and disk resources. Therefore, on some very busy servers keeping the general query log or audit log always ON isn’t an option. But you can turn them ON for limited periods of time when an issue occurs. If you create custom auditing, you can setup better filtering rules and log only those queries that you’re interested in. In any case, if the binary log is ON, all inserts and updates will be there. As I mentioned during the webinar, it can miss some information about the environment when an insert or update was executed, but it will still give you idea of what happened. Then, if needed, you can turn ON more verbose auditing for a short period.

If you’re asking if it is possible to track down a reason for insert and update performance degrading: yes, it is possible. I will discuss this topic in the next webinar: Introduction to Troubleshooting Performance: What Affects Query Execution? I also plan more detailed webinars about optimizing slow queries, locking and storage engine issues – both of which should be helpful in this case as well. (The dates for these webinars are still to be decided.) But in general, you need to check first if you can repeat slowdown in single thread environment. If yes: you need to tune inserts and updates. When you will do so note what UPDATE statements are optimized in a similar way as SELECT statements. This means that you can run EXPLAIN on them, probably adding indexes, so that the WHERE condition can be optimized more effectively. For single-thread INSERT slowdowns, you need to work with table and storage engine options, such as removing unused indexes or making IO and CPU-related options optimal. If instead, the slowdown is repeatable in a multiple-connection environment only this means you hit locking issue. You need to study how table, metadata and storage engine locks work, find when they set in your application and implement ways to workaround this issue (in simplest case just ensure what no two parallel updates update the same row same time in case of InnoDB).

Q: Will we see relative performance differences on 5.6 vs. 5.7 queries?

A: Yes, this is possible. When Oracle announced GA for 5.7, they claimed that some queries are three times faster than in 5.6 (proof). You can also find benchmarks, which prove that performance in 5.7 is much better than in 5.6 at Dimitry’s Kravtchuk blog: 1, 2, 3, 4, 5.

At the same time, I already worked on a few support tickets where the user claimed that some of their queries work slower in 5.7 than in 5.6. There are also a public bug reports about a similar issue: 1, 2.  This is normal during any upgrade. The reason for these slowdowns is the fact what while the MySQL Optimizer developers are consistently working on making it better, they also change the heuristics that chose an optimal plan. Sometimes queries, optimized for earlier versions, need to be rewritten after an upgrade – especially ones that use index hints.

I’ve seen a few issues (between 5.5->5.6 and 5.1->5.5 upgrades) where customers claimed an overall application performance slowdown when keeping index hints that were created to improve performance in previous versions. After the index hints were removed, the application started working much faster than it did with the earlier MySQL version.

Q: The system variables are “policies” controlling InnoDB optimization that change by version.

A: I did not understand this question, sorry. Did you just want to say that “the system variables are ‘policies’ controlling InnoDB optimization”? Well, there aren’t just InnoDB-related variables that can affect the performance and behavior of the server. Regarding version upgrades, I would this isn’t true. From a support point of view, the MySQL Server team does not change defaults for the system variables often, as these variables are required by the market. Changes introduced by version 5.7 to the defaults are just “written in stone” minimalistic recommendations that have been tested by MySQL Support engineers and customers for years. Server development makes these changes

Server development makes these changes very conservatively because the MySQL and Percona user base are very large: laptop users (including students), database application developers, support engineers and consultants (who travel a lot), to big shops that can afford most powerful hardware on the market. Of course, the settings will be different for each of these groups. The defaults are just appropriate for a modern, but not the most powerful, laptop with 4-8G of RAM and a spinning disk.



General Inquiries

For general inquiries, please send us your question and someone will contact you.