Buy Percona ServicesBuy Now!

Percona Server for MySQL 5.7.17-11 is now available

Latest MySQL Performance Blog posts - February 3, 2017 - 12:32pm

Percona announces the GA release of Percona Server for MySQL 5.7.17-11 on February 3, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.7.17, including all the bug fixes in it, Percona Server for MySQL 5.7.17-11 is the current GA release in the Percona Server for MySQL 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.17-11 milestone at Launchpad.

New Features:
  • Percona Server for MySQL has implemented support for per-column VARCHAR/BLOB compression for the XtraDB storage engine. This also features compression dictionary support, to improve compression ratio for relatively short individual rows, such as JSON data.
  • Kill Idle Transactions feature has been re-implemented by setting a connection socket read timeout value instead of periodically scanning the internal InnoDB transaction list. This makes the feature applicable to any transactional storage engine, such as TokuDB, and, in future, MyRocks. This re-implementation is also addressing some existing bugs, including server crashes: #1166744, #1179136, #907719, and #1369373.
Bugs Fixed:
  • Logical row counts for TokuDB tables could get inaccurate over time. Bug fixed #1651844 (#732).
  • Repeated execution of SET STATEMENT ... FOR SELECT FROM view could lead to a server crash. Bug fixed #1392375.
  • CREATE TEMPORARY TABLE would create a transaction in binary log on a read-only server. Bug fixed #1539504 (upstream #83003).
  • Using per-query variable statement with subquery temporary tables could cause a memory leak. Bug fixed #1635927.
  • Fixed new compilation warnings with GCC 6. Bugs fixed #1641612 and #1644183.
  • A server could crash if a bitmap write I/O error happens in the background log tracking thread while a FLUSH CHANGED_PAGE_BITMAPS is executing concurrently. Bug fixed #1651656.
  • TokuDB was using wrong function to calculate free space in data files. Bug fixed #1656022 (#1033).
  • CONCURRENT_CONNECTIONS column in the USER_STATISTICS table was showing incorrect values. Bug fixed #728082.
  • Audit Log Plugin when set to JSON format was not escaping characters properly. Bug fixed #1548745.
  • InnoDB index dives did not detect some of the concurrent tree changes, which could return bogus estimates. Bug fixed #1625151 (upstream #84366).
  • INFORMATION_SCHEMA.INNODB_CHANGED_PAGES queries would needlessly read potentially incomplete bitmap data past the needed LSN range. Bug fixed #1625466.
  • Percona Server cmake compiler would always attempt to build RocksDB even if -DWITHOUT_ROCKSDB=1 argument was specified. Bug fixed #1638455.
  • Lack of free pages in the buffer pool is not diagnosed with innodb_empty_free_list_algorithm set to backoff (which is the default). Bug fixed #1657026.
  • mysqld_safe now limits the use of rm and chown to avoid privilege escalation. chown can now be used only for /var/log directory. Bug fixed #1660265. Thanks to Dawid Golunski (https://legalhackers.com).
  • Renaming a TokuDB table to a non-existent database with tokudb_dir_per_db enabled would lead to a server crash. Bug fixed #1030.
  • Read Free Replication optimization could not be used for TokuDB partition tables. Bug fixed #1012.

Other bugs fixed: #1486747, #1617715, #1633988, #1638198 (upstream #82823), #1642230, #1646384, #1640810, #1647530, #1651121, #1658843, #1156772, #1644583, #1648389, #1648737, #1650256, and #1647723.

The release notes for Percona Server for MySQL 5.7.17-11 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Percona Live Featured Tutorial with Derek Downey, David Turner and René Cannaò — ProxySQL Tutorial

Latest MySQL Performance Blog posts - February 3, 2017 - 9:22am

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Derek Downey (OSDB Practice Advocate, Pythian), David Turner (Storage SRE, Uber) and René Cannaò (MySQL SRE, Dropbox / ProxySQL). Their session is ProxySQL Tutorial. There is a stigma attached to database proxies when it comes to MySQL. This tutorial hopes to blow away that stigma by showing you what can be done with a proxy designed from the ground up to perform. I had a chance to speak with Derek, David and René and learn a bit more about ProxySQL:

Percona: How did you get into database technology? What do you love about it?

Derek Downey

Derek: I took a relational database course in college based on Oracle. Set theory and the relational model made a lot of sense to me. After a few years as a web developer at a small company, I transitioned to a hybrid SysAdmin/DBA role and got my first taste of the potential of “the cloud” (and some of the drawbacks).

I really came to understand that data is the lifeblood of any organization, and making sure it is always available through any disaster – from human error to hurricanes – is a unique and exciting challenge.

You should never notice the DBA if they’re doing their job right. There’s not much praise for a DBA on a job well done. But it’s a vital position to keep a company running. And that suits me just fine.

David: I started working for the Advanced Projects Group at the University of Missouri, now known as MOREnet. They were largely responsible for connecting all of the libraries and schools in the state to the Internet. I was initially helping them with their Internet presence as a webmaster. Later they needed help with their databases. I got very excited about working with Oracle at the time, and decided to join that team.

My relationship with MySQL started primarily because the cost of sharding Oracle was so high. Additionally, MySQL’s replication allowed us to use slaves. Oracle’s Dataguard/standby options wouldn’t allow reads from the slaves at that time. Lastly, MySQL was sort of “wild west” fun, since it lacked so many other features that Oracle had long ago. You had to get creative. It has been humbling to see how much innovation has come from the community and how far MySQL has come. And this is only the beginning!

René Cannaò

René: My career followed the classic path of a system administrator that ends up becoming a DBA. I used to work for a few companies as webmaster,  and finally as SysAdmin for a web hosting company. I always saw a similar pattern: “the bottleneck is in the database.” Nobody ever knew why the database was the bottleneck. I volunteered to improve the performance of this “unknown system.” Learning was a fun experience, and the result was extremely rewarding. I love understanding how databases operate and their internals. This is the only way to be able to get the maximum performance: “scientia potentia est”!

Percona: Your tutorial is called “ProxySQL Tutorial.” What exactly is ProxySQL, and what does it do?

Derek: I’ll leave it to René, the creator of ProxySQL, to give more detail on exactly what it is. But for a DBA trying to ensure their data is always available, it is a new and exciting tool in our toolbox.

René: ProxySQL is the MySQL data gateway. It’s the Stargate that can inspect, control, transform, manage, and route all traffic between clients and database servers. It builds reliable and fault-tolerant networks. It is a software bridge that empowers MySQL DBAs, built by DBAs for DBAs, allowing them to control all MySQL traffic where previously such traffic could not be controlled either on the client side (normally the developers’ realm) or server side (where there are not enough tools).

David Turner

David: Architecturally, ProxySQL is a separate process between the client and the database. Because traffic passes through it, ProxySQL can become many things (three of which got my attention). It can be a multiplexer, a filter, and a replicator.

Multiplexers reduce many signals down to a few. Web servers often open many static connections to MySQL. Since MySQL can only support a limited number of connections before performance suffers, ProxySQL’s ability to transparently manage tens of thousands of connections while only opening a few to the database is a great feature.

Administrators can update ProxySQL to filter and even rewrite queries based on patterns they decide on. As someone that has worked in operations and seen how long it can take to disable misbehaving applications, this is a very compelling feature. With ProxySQL in place, I can completely block a query from the database in no time.

ProxySQL’s replication or mirroring capability means that all of the queries sent to one database can now be sent to N databases. As someone that has to roll out new versions of MySQL, test index changes, and benchmark hardware this is also a compelling feature.

Percona: What are the advantages of using ProxySQL in a database environment?

René: ProxySQL is the bridge between the clients and the servers. It creates two layers, and controls all the communication between the two. Sitting in the middle, ProxySQL provides a lot of advantages normally impossible to achieve in a standard database environment, such as throttling or blocking queries, rewriting queries, implementing sharding, read/write splitting, caching, duplicating traffic, handling backend failures, failovers, integration with HA solutions, generating real-time statistics, etc. All this, without any application change, and completely transparent to the application.

Derek: For me, ProxySQL decouples the application from the data layer. This provides more control over the backend database environment to the DBA in regards to queries, maintenance and failovers, without impact to the application.

David: In addition to the roles noted above, ProxySQL can be part of a failover solution, routing queries to a new master when the current master fails. Other advantages are splitting queries over multiple databases to distribute the load, provide additional metrics, etc.

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Derek: This tutorial highlights what ProxySQL is trying to achieve, and discusses how to add ProxySQL to common architectures environments. Attendees will get hands-on experience with the technology, and learn how to install and configure ProxySQL to achieve query rewriting, seamless failover, and query mirroring.

David: Not only hands-on experience with ProxySQL, but an understanding of how much they can leverage with it. The more I use ProxySQL, the more advantages I see to it. For example, I did not realize that by clustering ProxySQL processes I can distribute the query matching and rewrites over many hosts, as well as use them as a caching layer.

René: ProxySQL is built upon very innovative technologies. Certain architectural concepts like hostgroups, chaining of query rules, granular routing and sharding, query retries and the very powerful Admin interface are concepts not always intuitive for DBAs with experience using other proxies. This tutorial helps understand these concepts, and attendees get hand-on experience in the configuration of ProxySQL in various scenarios.

Percona: What are you most looking forward to at Percona Live?

David: First, the people. Next, everything everyone is working on. We’re really lucky to work in such an innovative and collaborative industry. As databases evolve, we are on the ground floor of their evolution. What an exciting place to be.

Derek: I am mostly looking forward to reconnecting with my peers in the MySQL community. Both ones I’ve formerly worked with or previously met at Percona Live, as well as meeting new open source database professionals and hearing how they are providing solutions for their companies.

René: I am looking forward to attending sessions regarding new features in MySQL 8 and other new technologies. But moreover, I am excited to interact with MySQL users and get more input on how to improve ProxySQL so that it can become an indispensable tool in any MySQL environment.

Register for Percona Live Data Performance Conference 2017, and see Derek, David and René present their ProxySQL Tutorial. Use the code FeaturedTutorial and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Recommended Query/Load Analysis Stack

Lastest Forum Posts - February 3, 2017 - 5:59am
Hi All,

I'm looking for recommendations on what everyone is using for query/load analysis on MySQL (or Percona Server, of course).

We are already aware of (and use) pt-query-digest, and we also use NewRelic. Both of these tools have quirks, but the main problem that we're having is that currently, separating the signal from the noise is getting to be very challenging, particularly under high load. We've fixed the worst outliers that both of these products show to us, so what remains are a large segment of nearly-the-same lesser offenders (in terms of slow queries). This can make finding the root cause of any ongoing problems quite difficult, even when using pt-query-digest over a very restricted time-period (i.e., the last fifteen minutes). Generally speaking, as everything slows down pt-query-digest reports more and more noise, submerging the "signal" even more.

I'm wondering if there are more advanced toolsets out there which are better at root-cause diagnostics and analysis for high query load situations, and if there are, what they are.

What I'd be looking for is an up-to-the-minute display of database performance, and most especially query flow and load metrics that make it easier to figure out which particular query is the root cause of ongoing database performance issues.

Alternatively, if there are more advanced techniques that you are aware of that can be used to more easily separate the signal from the noise in pt-query-digest during high-load situations, I'd love to hear them.

Thanks for any suggestions or recommendations that you can make!

Remi

PMM Alerting with Grafana: Working with Templated Dashboards

Latest MySQL Performance Blog posts - February 2, 2017 - 3:54pm

In this blog post, we will look into more intricate details of PMM alerting. More specifically, we’ll look at how to set up alerting based on templated dashboards.

Percona Monitoring and Management (PMM) 1.0.7 includes Grafana 4.0, which comes with the Alerting feature. Barrett Chambers shared how to enable alerting in general. This blog post looks at the specifics of setting up alerting based on the templated dashboards. Grafana 4.0 does not support basic alerting out-of-the-box.

This means if I try to set up an alert on the number of MySQL threads running, I get the error “Template variables are not supported in alert queries.”

What is the solution?

Until Grafana provides a better option, you need to do alerting based on graphs (which don’t use templating). This is how to do it.

Click on “Create New” in the Dashboards list to create a basic dashboard for your alerts:

Click on “Add Panel” and select “Graph”:

Click on the panel title of the related panel on the menu sign, and then click on “Panel JSON”.

This shows you the JSON of the panel, which will look like something like this:

Now you need to go back to the other browser window, and the dashboard with the graph you want to alert on. Show the JSON panel for it. In our case, we go to “MySQL Overview” and show the JSON for “MySQL Active Threads” panel.

Copy the JSON from the “MySQL Active Threads” panel and paste it into the new panel in the dashboard created for alerting.

Once we have done the copy/paste, click on the green Update button, and we’ll see the broken panel:

It’s broken because we’re using templating variables in dashboard expressions. None of them are set up in this dashboard. Expressions won’t work. We must replace the template variables in the formulas with actual hosts, instances, mount points, etc., for we want to alert on:

We need to change $host to the name of the host we want to alert on, and the $interval should align with the data capture interval (here we’ll set it to 5 seconds):

If correctly set up, you should see the graph showing the data.

Finally, we can go to edit the graph. Click on the “Alert” and “Create Alert”.

Specify Evaluate Every to create an alert. This sets up the evaluation interval for the alert rule. Obviously, the more often the alert evaluates the condition, the more quickly you get alerted if something goes wrong (as well as alert conditions).

In our case, we want to get an alert if the number of running threads are sustained at a high rate. To do this, look at the minimum number of threads for last minute to be above 30:

Note that our query has two parameters: “A” is the number of threads connected, and “B” is the number of threads running. We’re choosing to Alert on “B”. 

The beautiful thing Grafana does is show the alert threshold clearly on the graph, and allows you to edit the alert just by moving this alert line with a mouse:

You may want to click on the floppy drive at the top to save dashboard (giving it whatever identifying name you want).

At this point, you should see the alert working. A little heart sign appears by the graph title, colored green (indicating it is not active) or red (indicating it is active). Additionally, you will see the red and green vertical lines in the alert history. These show when this alert gets triggered and when the system went back to normal.

You probably want to set up notifications as well as see alerts on the graphs. 

To set up notifications, go to the Grafana Configuration menu and configure Alerting. There are Grafana Support Email, Slack, Pagerduty and general Webhook notification options (with more on the way, I’m sure).

The same way you added the “Graph” panel to set up an alert, you can add the “Alert List” panel to see all the alerts you have set up (and their status):

Summary

As you can see, it is possible to set up alerts in PMM using the new Grafana 4.0 alerting feature. It is not very convenient or easy to do. This is first alerting support release for Grafana and PMM. As such, I’m sure it will become much easier and more convenient over time.

slow response from PMM grafana

Lastest Forum Posts - February 2, 2017 - 1:48pm
Hi,
There are 15 RDS aurora clusters to monitor, I'm using EC2 instance with t2.large instance class. I'm still seeing very slow response from grafana window and some times QAN is completely not responding. I see some metrics after restarting the admin client. Please let me know, Is there some thing I can do to get better response.

Vote Percona in LinuxQuestions.org Members Choice Awards 2016

Latest MySQL Performance Blog posts - February 2, 2017 - 12:40pm

Percona is calling on you! Vote for Percona for Database of the Year in LinuxQuestions.org Members Choice Awards 2016. Help Percona get recognized as one of the best database options for data performance. Percona provides free, fully compatible, enhanced, open source drop-in replacement database software with superior performance, scalability and instrumentation.

LinuxQuestions.org, or LQ for short, is a community-driven, self-help website for Linux users. Each year, LinuxQuestions.org holds an annual competition to recognize the year’s best-in-breed technologies. The online Linux community determines the winners of each category!

You can vote now for your favorite database of 2016 (Percona, of course!). This is your chance to be heard!

Voting ends on February 7, 2017. You must be a registered member of LinuxQuestions.org with at least one post on their forums to vote.

DML command rejected when PXC mode set to Enforcing

Lastest Forum Posts - February 2, 2017 - 9:31am
I'm working on a 3 node Percona XtraDB POC. My test app is a commercial product called Artifactory. The app successfully starts, but when testing some functions I'm running into errors when pxc mode is set to enforcing. Below appears to be the SQL in the logs being run and the error being returned. Are there any real issues with setting percona in permissive mode all the time when an app tries to use functions like this?

Code: 2017-02-02 09:25:37,451 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.u.JdbcHelper:255) - Executing SQL: 'SELECT repo_key FROM permission_target_repos WHERE perm_target_id = 436406'. 2017-02-02 09:25:37,500 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.u.JdbcHelper:113) - Query returned in 48.42 millis : 'SELECT repo_key FROM permission_target_repos WHERE perm_target_id = 436406' 2017-02-02 09:25:37,502 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.u.JdbcHelper:255) - Executing SQL: 'UPDATE permission_targets SET perm_target_name = 'Anonymous', includes = '**', excludes = NULL WHERE perm_target_id = 436406'. 2017-02-02 09:25:37,551 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.u.JdbcHelper:166) - Query returned with 1 results in 48.25 millis : 'UPDATE permission_targets SET perm_target_name = 'Anonymous', includes = '**', excludes = NULL WHERE perm_target_id = 436406' 2017-02-02 09:25:37,552 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.u.JdbcHelper:255) - Executing SQL: 'DELETE FROM permission_target_repos WHERE perm_target_id = 436406'. 2017-02-02 09:25:37,601 [http-nio-8081-exec-9] [TRACE] (o.a.s.d.s.DbTransactionManager:939) - Triggering beforeCompletion synchronization 2017-02-02 09:25:37,602 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.s.DbTransactionManager:847) - Initiating transaction rollback 2017-02-02 09:25:37,602 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.s.DbTransactionManager:284) - Rolling back JDBC transaction on Connection [ProxyConnection[PooledConnection[com.mysql.jdbc.JDBC4Connection@46bb9a77]]] 2017-02-02 09:25:37,627 [http-nio-8081-exec-9] [TRACE] (o.a.s.d.s.DbTransactionManager:968) - Triggering afterCompletion synchronization 2017-02-02 09:25:37,627 [http-nio-8081-exec-9] [TRACE] (o.a.s.d.f.s.SqlStorageSession:81) - Release all locks of lm=851775068 2017-02-02 09:25:37,628 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.s.SessionSynchronization:95) - Session completed: 'org.artifactory.storage.db.security.service.AclServiceImpl.updateAcl' in 228.28 millis 2017-02-02 09:25:37,677 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.s.DbTransactionManager:66) - Releasing JDBC Connection [ProxyConnection[PooledConnection[com.mysql.jdbc.JDBC4Connection@46bb9a77]]] after transaction 2017-02-02 09:25:37,677 [http-nio-8081-exec-9] [DEBUG] (o.a.s.d.s.DbTransactionManager:67) - Connection was successfully released 2017-02-02 09:25:37,691 [http-nio-8081-exec-9] [DEBUG] (o.a.s.f.l.a.LockingAdvice:81) - Received exception on method invocation: ReflectiveMethodInvocation: public abstract void org.artifactory.storage.security.service.AclStoreService.updateAcl(org.artifactory.security.MutableAclInfo); target is of class [org.artifactory.storage.db.security.service.AclServiceImpl] org.artifactory.storage.StorageException: Could not update ACL org.artifactory.model.xstream.security.AclImpl@fa86944d Caused by: java.sql.SQLException: Percona-XtraDB-Cluster prohibits use of DML command on a table (artifactory.permission_target_repos) without an explicit primary key with pxc_strict_mode = ENFORCING or MASTER

InnoDB Issue. Percona CLuster

Lastest Forum Posts - February 2, 2017 - 8:24am
for the past 2 months my cluster went do 2 times, and looks like this error is causing the problem, any ideas.

2017-01-24 10:52:47 7f50044bd7e0 InnoDB: Error: table `tmp`.`#sql3843_343b7a_a` does not exist in the InnoDB internal
InnoDB: data dictionary though MySQL is trying to drop it.
InnoDB: Have you copied the .frm file of the table to the
InnoDB: MySQL database directory from another database?
InnoDB: You can look for further help from
InnoDB: http://dev.mysql.com/doc/refman/5.6/...eshooting.html
2017-01-24 10:52:47 7f50044bd7e0 InnoDB: Error: table `tmp`.`#sql3843_343b7f_a` does not exist in the InnoDB internal
InnoDB: data dictionary though MySQL is trying to drop it.
InnoDB: Have you copied the .frm file of the table to the
InnoDB: MySQL database directory from another database?
InnoDB: You can look for further help from
InnoDB: http://dev.mysql.com/doc/refman/5.6/...eshooting.html

Percona Server for MySQL 5.5.54-38.6 is now available

Lastest Forum Posts - February 1, 2017 - 12:55pm
Perconaannounces the release of Percona Server for MySQL 5.5.54-38.6 on February 1, 2017. Based on MySQL 5.5.54, including all the bug fixes in it, Percona Server for MySQL 5.5.54-38.6 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.54-38.6 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Percona Server for MySQL 5.5.54-38.6 is now available

Latest MySQL Performance Blog posts - February 1, 2017 - 12:46pm

Percona announces the release of Percona Server for MySQL 5.5.54-38.6 on February 1, 2017. Based on MySQL 5.5.54, including all the bug fixes in it, Percona Server for MySQL 5.5.54-38.6 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.54-38.6 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:
  • Fixed new compilation warnings with GCC 6. Bugs fixed #1641612 and #1644183.
  • CONCURRENT_CONNECTIONS column in the USER_STATISTICS table was showing incorrect values. Bug fixed #728082.
  • Audit Log Plugin when set to JSON format was not escaping characters properly. Bug fixed #1548745.
  • mysqld_safe now limits the use of rm and chown to avoid privilege escalation. chown can now be used only for /var/log directory. Bug fixed #1660265.

Other bugs fixed: #1638897, #1644174, #1644547, and #1644558.

Find the release notes for Percona Server for MySQL 5.5.54-38.6 in our online documentation. Report bugs on the launchpad bug tracker.

Display hostname in alert emails

Lastest Forum Posts - February 1, 2017 - 8:42am
Has anyone got email alerts to display the actual hostname - the name of the host/node that has triggered the alert.

So far I have tried labels and altering the alert name/description, but still not able to see the hostname within an alert email.

Third node does not start MySQL

Lastest Forum Posts - February 1, 2017 - 2:35am

Hello,

Three-node cluster ( Percona XtraDB Cluster (GPL), Release rel37.0, Revision 853, WSREP version 25.11, wsrep_25.11.r4027), running HPELinux (Debian), two nodes runs OK, but the third is unable to start database.

Happened after a file system full, apparently withiut any modification on the previous running configuration (previous system managerresigned and I'm trying to understand the issues).

Here below is the error log.

Any hint about restart succesfully the third node?

Many thanks in advance for any help!

Max

-
State from failing machine:

root@helion-cp1-c1-m1-mgmt:/etc/mysql# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
cert_index:

- Logfile:



170201 08:09:01 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170201 08:23:36 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170201 08:23:36 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.Qbt51O' --pid-file='/var/lib/mysql/helion-cp1-c1-m1-mgmt-recover.pid'
170201 8:23:36 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
170201 8:23:36 [Warning] The syntax '--log-slow-queries' is deprecated and will be removed in a future release. Please use '--slow-query-log'/'--slow-query-log-file' instead.
170201 08:23:40 mysqld_safe WSREP: Recovered position 9c3eced4-70eb-11e6-9bc5-6fd14ff56422:62955021
170201 8:23:40 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
170201 8:23:40 [Warning] The syntax '--log-slow-queries' is deprecated and will be removed in a future release. Please use '--slow-query-log'/'--slow-query-log-file' instead.
170201 8:23:40 [Note] WSREP: wsrep_start_position var submitted: '9c3eced4-70eb-11e6-9bc5-6fd14ff56422:62955021'
170201 8:23:40 [Note] WSREP: Read nil XID from storage engines, skipping position init
170201 8:23:40 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
170201 8:23:40 [Note] WSREP: wsrep_load(): Galera 2.12(r318911d) by Codership Oy <info@codership.com> loaded successfully.
170201 8:23:40 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
170201 8:23:40 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
170201 8:23:40 [Note] WSREP: Passing config to GCS: base_host = 10.3.1.174; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://10.3.1.174:4567; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S;
170201 8:23:40 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
170201 8:23:40 [Note] WSREP: wsrep_sst_grab()
170201 8:23:40 [Note] WSREP: Start replication
170201 8:23:40 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
170201 8:23:40 [Note] WSREP: protonet asio version 0
170201 8:23:40 [Note] WSREP: backend: asio
170201 8:23:40 [Note] WSREP: GMCast version 0
170201 8:23:40 [Note] WSREP: (bc5caf9a, 'tcp://10.3.1.174:4567') listening at tcp://10.3.1.174:4567
170201 8:23:40 [Note] WSREP: (bc5caf9a, 'tcp://10.3.1.174:4567') multicast: , ttl: 1
170201 8:23:40 [Note] WSREP: EVS version 0
170201 8:23:40 [Note] WSREP: PC version 0
170201 8:23:40 [Note] WSREP: gcomm: connecting to group 'Vs01zdJ71', peer 'helion-cp1-c1-m1-mgmt:,helion-cp1-c1-m2-mgmt:,helion-cp1-c1-m3-mgmt:'
170201 8:23:40 [Note] WSREP: (bc5caf9a, 'tcp://10.3.1.174:4567') turning message relay requesting on, nonlive peers:
170201 8:23:40 [Note] WSREP: declaring ad7803c2 at tcp://10.3.1.178:4567 stable
170201 8:23:40 [Note] WSREP: declaring e2ca1084 at tcp://10.3.1.177:4567 stable
170201 8:23:40 [Note] WSREP: Node ad7803c2 state prim
170201 8:23:40 [Note] WSREP: gcomm: connected
170201 8:23:40 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
170201 8:23:40 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
170201 8:23:40 [Note] WSREP: Opened channel 'Vs01zdJ71'
170201 8:23:40 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
170201 8:23:40 [Note] WSREP: Waiting for SST to complete.
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: sent state msg: 34aed4c9-e868-11e6-8acf-1eb66790547e
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: got state msg: 34aed4c9-e868-11e6-8acf-1eb66790547e from 0 (helion-cp1-c1-m3-mgmt)
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: got state msg: 34aed4c9-e868-11e6-8acf-1eb66790547e from 2 (helion-cp1-c1-m2-mgmt)
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: got state msg: 34aed4c9-e868-11e6-8acf-1eb66790547e from 1 (helion-cp1-c1-m1-mgmt)
170201 8:23:40 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 16,
members = 2/3 (joined/total),
act_id = 66631754,
last_appl. = -1,
protocols = 0/4/3 (gcs/repl/appl),
group UUID = 9c3eced4-70eb-11e6-9bc5-6fd14ff56422
170201 8:23:40 [Note] WSREP: Flow-control interval: [28, 28]
170201 8:23:40 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 66631754)
170201 8:23:40 [Note] WSREP: State transfer required:
Group state: 9c3eced4-70eb-11e6-9bc5-6fd14ff56422:66631754
Local state: 00000000-0000-0000-0000-000000000000:-1
170201 8:23:40 [Note] WSREP: New cluster view: global state: 9c3eced4-70eb-11e6-9bc5-6fd14ff56422:66631754, view# 17: Primary, number of nodes: 3, my index: 1, protocol version 3
170201 8:23:40 [Warning] WSREP: Gap in state sequence. Need state transfer.
170201 8:23:40 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '10.3.1.174' --auth 'sst_user:JgEfDXKG1F' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '22943''
WSREP_SST: [INFO] Streaming with tar (20170201 08:23:40.717)
WSREP_SST: [INFO] Using socat as streamer (20170201 08:23:40.719)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | tar xfi - --recursive-unlink -h; RC=( ${PIPESTATUS[@]} ) (20170201 08:23:40.730)
2017/02/01 08:23:40 socat[23111] E bind(12, {AF=2 0.0.0.0:4444}, 16): Address already in use
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
170201 8:23:40 [Note] WSREP: Prepared SST request: xtrabackup|10.3.1.174:4444/xtrabackup_sst
170201 8:23:40 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170201 8:23:40 [Note] WSREP: Assign initial position for certification: 66631754, protocol version: 2
170201 8:23:40 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (9c3eced4-70eb-11e6-9bc5-6fd14ff56422): 1 (Operation not permitted)
at galera/src/replicator_str.cpprepare_for_IST():447. IST will be unavailable.
170201 8:23:40 [Note] WSREP: Node 1 (helion-cp1-c1-m1-mgmt) requested state transfer from '*any*'. Selected 0 (helion-cp1-c1-m3-mgmt)(SYNCED) as donor.
170201 8:23:40 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 66631759)
170201 8:23:40 [Note] WSREP: Requesting state transfer: success, donor: 0
WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 1 2 (20170201 08:23:40.757)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20170201 08:23:40.759)
WSREP_SST: [INFO] Removing the sst_in_progress file (20170201 08:23:40.761)
170201 8:23:40 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '10.3.1.174' --auth 'sst_user:JgEfDXKG1F' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '22943': 32 (Broken pipe)
170201 8:23:40 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
170201 8:23:40 [ERROR] WSREP: SST failed: 32 (Broken pipe)
170201 8:23:40 [ERROR] Aborting

170201 8:23:41 [Warning] WSREP: 0 (helion-cp1-c1-m3-mgmt): State transfer to 1 (helion-cp1-c1-m1-mgmt) failed: -1 (Operation not permitted)
170201 8:23:41 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort.
170201 8:23:41 [Note] WSREP: gcomm: terminating thread
170201 8:23:41 [Note] WSREP: gcomm: joining thread
170201 8:23:41 [Note] WSREP: gcomm: closing backend
170201 8:23:41 [Note] WSREP: gcomm: closed
170201 8:23:41 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted
170201 08:23:41 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
root@helion-cp1-c1-m1-mgmt:/etc/mysql#

Docker Security Vulnerability CVE-2016-9962

Latest MySQL Performance Blog posts - January 31, 2017 - 9:39am

Docker 1.12.6 was released to address CVE-2016-9962. CVE-2016-9962 is a serious vulnerability with RunC.

Quoting the coreos page (linked above):

“RunC allowed additional container processes via runc exec to be ptraced by the pid 1 of the container. This allows the main processes of the container, if running as root, to gain access to file-descriptors of these new processes during the initialization and can lead to container escapes or modification of runC state before the process is fully placed inside the container.”

In short, IF processes run as root inside a container they could potentially break out of the container and gain access over the host.

My recommendation at this time is to apply the same basic security tenants for containers as you would (I hope) for VM and baremetal installs. In other words, ensure you are adhering to a Path of Least Privilege as a best practice and not running as root for conevience’s sake.

Prior to this, we made changes to PMM prior to version 1.0.4 to reduce the number of processes within the container that ran as root. As such, only the processes required to do so run as root. All other processes run as a lower privilege user.

Check here for documentation on PMM, and use the JIRA project to raise bugs (JIRA requires registration).

To comment on running a database within docker, I’ve reviewed the following images

  • percona-server image: I have verified it does not run as root, and runs as a mysql user (for 5.7.16 at least)
  • percona-server-mongodb: I have worked with our teams internally and can confirm that the latest image no longer runs as root (you will to run the latest image, however, to see this change via docker pull)

Please comment below with any questions.

Query Analytic timezone changing

Lastest Forum Posts - January 31, 2017 - 1:44am
Hi!
Currently Query Analytic timezone is set to UTC, in spite of the fact that i use "TZ="Europe/Kiev"(UTC+2) in time of "docker run". Metric Monitore use correct time sone(UTC+2), but not Query Analytic.
Is there some ability to set Query Analytic timezone to UTC+2?

MySQL Sharding Models for SaaS Applications

Latest MySQL Performance Blog posts - January 30, 2017 - 3:16pm

In this blog post, I’ll discuss MySQL sharding models, and how they apply to SaaS application environments.

MySQL is one of the most popular database technologies used to build many modern SaaS applications, ranging from simple productivity tools to business-critical applications for the financial and healthcare industries.

Pretty much any large scale SaaS application powered by MySQL uses sharding to scale. In this blog post, we will discuss sharding choices as they apply to these kinds of applications.

In MySQL, unlike in some more modern technologies such as MongoDB, there is no standard sharding implementation that the vast majority of applications use. In fact, if anything “no standard” is the standard. The common practice is to roll your own sharding framework, as famous MySQL deployments such as Facebook and Twitter have done. MySQL Cluster – the MySQL software that has built-in Automatic Sharding functionality – is rarely deployed (for a variety of reasons). MySQL Fabric, which has been the official sharding framework, has no traction either.

When sharding today, you have a choice of rolling your own system from scratch, using comprehensive sharding platform such as Vitess or using a proxy solution to assist you with sharding. For proxy solutions, MySQL Router is the official solution. But in reality, third party solutions such as open source ProxySQL, commercial ScaleArc and semi-commercial (BSL)  MariaDB MaxScale are widely used. Keep in mind, however, that traffic routing is only one of the problems that exist in large scale sharding implementations.

Beneath all these “front end” choices for sharding on the application database connection framework or database proxy, there are some lower level decisions that you’ve got to make. Namely, around how your data is going to be led out and organized on the MySQL nodes.

When it comes to SaaS applications, at least one answer is simple. It typically makes sense to shard your data by “customer” or “organization” using some sort of mapping tables. In the vast majority of cases, single node (or replicated cluster) should be powerful enough to handle all the data and load coming from each customer.

What Should I Ask Myself Now?

The next set questions you should ask yourself are around your SaaS applications:

  • How much revenue per customer are you generating?
  • Do your customers (or regulations) require data segregation?
  • Are all the customers about the same, or are there outliers?
  • Are all your customers running the same database schema?

I address the answers in the sections below.

How Much Revenue?

How much revenue per customer you’re generating is an important number. It defines how much infrastructure costs per customer you can afford. In the case of “freemium” models, and customers generating less than $1 a month an average, you might need to ensure low overhead per customer (even if you have to compromise on customer isolation).

How much revenue per customer you’re generating is an important number. It defines how much infrastructure costs per customer you can afford. In the case of “freemium” models, and customers generating less than $1 a month an average, you might need to ensure low overhead per customer (even if you have to compromise on customer isolation).

Typically with low revenue customers, you have to co-locate the data inside the same MySQL instance (potentially even same tables). In the case of high revenue customers, isolation in separate MySQL instances (or even containers or virtualized OS instances) might be possible.

Data Segregation?

Isolation is another important area of consideration. Some enterprise customers might require that their data is physically separate from others. There could also be government regulations in play that require customer data to be stored in a specific physical location. If this is the case, you’re looking at completely dedicated customer environments. Or at the very least, separate database instances (which come with additional costs).

Customer Types?

Customer size and requirements are also important. A system designed to handle all customers of approximately the same scale (for example, personal accounting) is going to be different than if you are in the business of blog hosting. Some blogs might be 10,000 times more popular than the average.

Same Database Schema?

Finally, there is a there is the big question of whether all your customers are running the same database schema and same software version. If you want to support different software versions (if your customers require a negotiated maintenance window for software upgrades, for example) or different database schemas (if the schema is dependent on the custom functionality and modules customers might use, for example), keeping such customers in different MySQL schemas make sense.

Sharding Models

This gets us to the following sharding isolation models, ranging from lowest to highest:

  • Customers Share Schemas. This is the best choice when you have very large numbers of low-revenue customers. In this case, you would map multiple customers to the same set of tables, and include something like a customer_id field in them to filter customer data. This approach minimizes customer overhead and reduces customer isolation. It’s harder to backup/restore data for individual customers, and it is easier to introduce coding mistakes that can access other customers data. This method does not mean there is only one schema, but that there is a one-to-many relationship between schemas and customers.  For example, you might have 100 schema’s per MySQL instance, each handling 1000 to 10000 customers (depending on the application). Note that with a well-designed sharding implementation, you should be able to map customers individually to schemas. This allows you to have key customer data stored in dedicated schemas, or even on dedicated nodes.
  • Schema per Customer. This is probably the most common sharding approach in MySQL powered SaaS applications. Especially ones that have substantial revenue ($10+ per month / per customer). In this model, each customer’s data is stored in its own schema (database). This makes it very easy to backup/restore individual customers. It allows customers to have different schemas (i.e., add custom tables). It also allows them to run different versions of the application if desired. This approach allows the application server to use different MySQL users connecting on behalf of different customers, which adds an extra level of protection from accidental (or intentional) access of data that belongs to different customers. The schema per customer approach also makes it easier to move the shards around, and limits maintenance impact. The downside of this approach is higher overhead. It also results in a large number of tables per instance, and potentially larger numbers of files (which can be hard to manage).
  • Database Instance per Customer. You achieve even better isolation by having a MySQL instance per customer. This approach, however, increases overhead even further. The recent rise of light virtualization technologies and containers has reduced its usage.
  • OS Instance/Container per Customer. This approach allows you to improve isolation even further. It can be used for any customer, but can also be applied to selected customers in a model that uses Schema per Customer model for a majority of them.  Dedicated OS Instance, with improved isolation and better performance SLAs, might be a feature of some premium customer tiers. This method not only allows better isolation, but it also let’s you handle outliers better. You might chose to run a majority of your customers on the hardware (or cloud instance) that has best price/performance numbers, and also place some of the larger customers on the highest performance nodes.
  • Environment per customer. Finally, if you take this all the way you can build completely separate environments for customers. This includes databases, application servers and other required components. This is especially useful if you need to deploy the application close to the customer – which includes the appliance model, or deployment in the customer’s data center or cloud provider. This also allows you to accommodate customers if their data must be stored in a specific location. This is often due to government regulations. It is worth noting that many SaaS applications, even if they do not quite have one environment per customer, have multiple independent environments. These are often hosted in different locations or availability zones. Such setups allow you to reduce the impact of large-scale failures to only a portion of your customers. This avoids overloading your customer service group and allowing the operational organization to focus on repairing smaller environments.

The farther you go down this route – from the shared schema to an environment per customer – the more important is to have a high level of automation. With a shared schema, you often can get by with little automation (and some environments manually set up) and all the schema’s pre-created. If customer sign up requires setting up dedicated database instance or the whole environment, manual implementation doesn’t scale. For this type of setup, you need state-of-the-art automation and orchestration.

Conclusion

I hope this helps you to understand your options for MySQL sharding models. Each of the different sharding models for SaaS applications powered by MySQL have benefits and drawbacks. As you can see, many of these approaches require you to work with a large number of tables in the MySQL – this will be the topic of one of my next posts!

ERROR qan-analyzer driver: bad connection

Lastest Forum Posts - January 30, 2017 - 10:40am
Hi!
After starting "pmm-admin start --all" everything works fine, for a few time. After 2-3 hours approximately, Query Analytics stop to show data. We have 2 servers for now, and they both have the same problem.
In /var/log/pmm-mysql-queries-0.log we can see:

[mysql] 2017/01/30 20:29:09 packets.go:59: unexpected EOF
[mysql] 2017/01/30 20:29:09 packets.go:386: busy buffer
2017/01/30 20:29:09.538585 ERROR qan-analyzer-1342087d driver: bad connection
2017/01/30 20:30:00.001315 WARNING qan-analyzer-1342087d-worker Interval out of sequence: got 186, expected 182
2017/01/30 20:30:08.733680 ERROR qan-analyzer-1342087d-worker Got class twice: registry_dev2 e1fecd37897b7375d76acae411ed0f5b
2017/01/30 20:31:00.011321 WARNING qan-analyzer-1342087d Skipping interval '187 2017-01-30 18:30:00 UTC to 2017-01-30 18:31:00 UTC (0-0)' because interval '186 2017-01-30 18:29:00 UTC to 2017-01-30 18:30:00 UTC (0-0)' is still being parsed
2017/01/30 20:32:00.005720 WARNING qan-analyzer-1342087d Skipping interval '188 2017-01-30 18:31:00 UTC to 2017-01-30 18:32:00 UTC (0-0)' because interval '186 2017-01-30 18:29:00 UTC to 2017-01-30 18:30:00 UTC (0-0)' is still being parsed
2017/01/30 20:33:00.001729 WARNING qan-analyzer-1342087d Skipping interval '189 2017-01-30 18:32:00 UTC to 2017-01-30 18:33:00 UTC (0-0)' because interval '186 2017-01-30 18:29:00 UTC to 2017-01-30 18:30:00 UTC (0-0)' is still being parsed
[mysql] 2017/01/30 20:33:28 packets.go:59: unexpected EOF
[mysql] 2017/01/30 20:33:28 packets.go:386: busy buffer
[mysql] 2017/01/30 20:33:28 connection.go:307: invalid connection
2017/01/30 20:33:28.778027 ERROR qan-analyzer-1342087d driver: bad connection
2017/01/30 20:34:00.003532 WARNING qan-analyzer-1342087d-worker Interval out of sequence: got 190, expected 187
2017/01/30 20:34:04.971836 ERROR qan-analyzer-1342087d-worker Got class twice: registry_dev2 e1fecd37897b7375d76acae411ed0f5b

After "pmm-admin restart --all" there is no errors for some time, Query Analytics works fine. But not for long. Same errors.

Server and clients was reinstalled by documentation 3 times. No effect.
How can i fix this problem?

MariaDB ColumnStore

Latest MySQL Performance Blog posts - January 30, 2017 - 9:29am

Last month, MariaDB officially released MariaDB ColumnStore, their column store engine for MySQL. This post discusses what it is (and isn’t), why it matters and how you can approach a test of it.

What is ColumnStore?

ColumnStore is a storage engine that turns traditional MySQL storage concepts on their head. Instead of storing the data by row, a column store stores the data by column (obviously). This provides advantages for certain types of data, and certain types of queries run against that data. See my previous post for more details on column-based storage systems.

ColumnStore is a fork of InfiniDB and carries forward many of the concepts behind that product. InfiniDB ceased operations in 2014. With the front end managed through MariaDB, you get access to all of the expected security and audit options of MariaDB. MariaDB designed ColumnStore as a massively parallel database, working best in an environment with multiple servers. This is somewhat different than a traditional row store database.

ColumnStore stores columnar data in a concept called an “extent.” An extent contains a range of values for a single column. Each extent contains no more than 8 million values. It stores additional values in a new extent. The extents for a single column get distributed across the database nodes, known as “Performance Modules” in ColumnStore. It stores each unique extent on more than one node, thus providing data redundancy and removing the need for replication. If a node is down, and it contains an extent needed for a query, that same extent is found on another node in the environment. This data redundancy also provides a high availability environment.

The query engine determines which extents process query requests. Since the data in an extent is often preordered (time series data, for example), many queries can ignore individual extents since they cannot contain any data needed for the query. If we are only looking for data from February 2017, for example, extents containing data outside of that range get ignored. However, if a query requires data from many or all extents on a single column, the query takes much longer to complete.

Unlike some traditional column store vendors, that take an all or nothing approach to storage, MariaDB decided to go with a mixed concept. In a MariaDB MySQL database, you can mix traditional InnoDB storage with the new ColumnStore storage, just like you used to mix InnoDB and MyISAM. This presents some nice options, not the least of which is that it provides a way to “dip your toe” into the world of column stores. On the other hand, it could lead to people using the ColumnStore engine in non-optimal ways. Also, the differences in what is considered optimal architecture between these two storage options make it hard to see how this plays out in the real world.

Data Definition

As discussed in the earlier post, column storage works great for specific types of data and queries. It is important that your data definitions are as tight as possible, and that your queries are appropriate for column-based data.

Many people set their field definition as VARCHAR(256) when setting up a new database. They might not know what type of data gets stored in the new field. This broad definition allows you to store whatever you throw at the database. The negative effect for row store is that it can cause over-allocation of storage – but it only has a minimal effect on queries.

In a column store, the field definition can drive decisions about the compression methods for storing the data, along with sorting implications. Columnar data can use storage more efficiently than a row store, since the data for a single column is well-defined. This leads to selecting the best compression algorithm for the data. If that data is poorly defined, the selected compression algorithm might not be the best for the data.

Sorting is also a problem in a column store when the data types are not well-defined. We’ve all seen integer or date data that is sorted alphabetically. While it can be annoying, we can still adjust to that sorting method to find what we need. Since a column store is often used to perform analytical queries over a range of data, this poorly-sorted data can present a bigger problem. If you specify a column to be VARCHAR and only include date information, that data is sorted alphabetically. The same column defined as DATE causes the data to be sorted by date. This chart shows the difference (date format is mm/dd/yy)

Alphabetic Sort Date Sort 01/01/17 01/01/17 01/11/17 01/02/17 01/02/17 01/04/17 01/21/17 01/11/17 01/4/17 01/21/17 11/01/17 02/01/17 11/02/17 11/01/17 02/01/17 11/02/17

 

Imagine running a query over a range of dates (requesting all activity in the months of January and February 2017, for example). In the alphabetic sort, this requires working through the whole file, since the data for November shows up between the data for January and February. In the date sort, the query only reads the until the end of February. We know there can be no more matching data after that. The alphabetic sort leads to more I/O, more query time and less happiness on the part of the user.

Why Should You Care About ColumnStore?

The first reason is that it allows you to try out column storage without doing a massive shift in technology and with minimal effort. By setting up some tables in a MariaDB database to use the ColumnStore engine, you can benefit from the storage efficiencies and faster query capabilities, provided that the data you’ve selected for this purpose is sound. This means that the data definitions should be tight (always a good plan anyway), and the queries should be more analytical than transactional. For a purely transactional workflow, a row store is the logical choice. For a purely analytical workflow, a column store is the logical choice. ColumnStore allows you to easily mix the two storage options so that you can have the best match possible. It is still important to know what type of workflow you’re dealing with, and match the storage engine to that need.

Another solid reason is that it is a great fit if you are already doing analysis over massive amounts of data. Any column store shines when asked to look at relatively few columns of data (ideally the column or two that are being aggregated and other columns to locate and group the data). If you are already running these types of queries in MySQL, ColumnStore would likely be a good fit.

But There Be Dragons!

As with any new technology, ColumnStore might not be a good fit for everyone. Given that you can mix and match your storage engines, with ColumnStore for some tables and InnoDB for others, it can be tempting to just go ahead with a ColumnStore test doing things the same way you always did in the past. While this still yields results, those results might not be a true test of the technology. It’s like trying to drive your minivan the same way you used to drive your sports car. “Hey, my Alfa Romeo never flipped over taking these turns at high speed!”

To effectively use ColumnStore, it’s important to match it to a proper analytical workload. This means that you will likely do more bulk loading into these tables, since there is additional overhead in writing the data out into the column files. The overall workflow should be more read-intensive. The queries should only look for data from a small set of fields, but can span massive amounts of data within a single column. In my earlier post, there’s also a discussion about normalization of data and how denormalizing data is more common in columnar databases.

You should address these issues in your testing for a valid conclusion.

The minimum specifications for ColumnStore also point to a need for a more advanced infrastructure than is often seen for transactional data. This is to support batch loading, read intensive workloads and possibly different ETL processes for each type of data. In fact, MariaDB states in the installation documentation for ColumnStore that it must be completed as a new installation of MariaDB. You must remove any existing installations of MariaDB or MySQL before installing the ColumnStore-enabled RPM on a system.

Is It Right for Me?

ColumnStore might fit well into your organization. But like haggis, it’s not for everyone. If you need analytical queries, it is a great option. If your workload is more read-intensive, it could still work for you. As we move to a more Internet of Things (IoT) world, we’re likely to see a need for more of this type of query work. In order to accurately present each user with the best possible Internet experience, we might want to analyze their activities over spans of time and come up with the best match for future needs.

Seriously consider if making the move to ColumnStore is right for you. It is newer software (version 1.0.6, the first GA version, was released on December 14, 2016, and 1.0.7 was released on January 23, 2017), so it might go through changes as it matures. Though a new product, it is based on InfiniDB concepts (which are somewhat dated). MariaDB has lots of plans for additional integrations and support for ancillary products that are absent in the current release.

MariaDB took a huge step forward with ColumnStore. But do yourself a favor and consider whether it is right for you before testing it out. Also, make sure that you are not trying to force your current workflow into the column store box. Kids know that we cannot put a square peg in a round hole, but we adults often try to do just that. Finding the right peg saves you lots of time, hassle and annoyance.

Warnings in Error log on a Slave/Master node

Lastest Forum Posts - January 30, 2017 - 5:56am
Hello- We have a 3-node Percona XtraDB (5.6.34) cluster set up, with a slave server hanging off of one of the nodes. We also have this slave server (also running XtraDB 5.6.34) set up as a master as we use it to replicate to an off-site DR Percona server. We noticed that the below warning started showing up in the log on this slave server last week. The warning shows up numerous times every second. I have not been able to find much online in regards to this warning. I see the warning mentions WSREP, which has to do with Galera, but this slave is not part of the cluster. I have verified that the replication is running fine on both this slave, as well as the DR slave, as I see current data on both:

=> Skipping replication
2017-01-30 08:47:05 14855 [Warning] WSREP: SQL statement was ineffective, THD: 2, buf: 276
schema: (null)
QUERY: (null)
=> Skipping replication
2017-01-30 08:47:05 14855 [Warning] WSREP: SQL statement was ineffective, THD: 2, buf: 329
schema: (null)
QUERY: (null)
=> Skipping replication
2017-01-30 08:48:08 14855 [Warning] WSREP: SQL statement was ineffective, THD: 2, buf: 407
schema: (null)
QUERY: (null)

If anyone has specific info on what this warning means and how to fix it, I would appreciate the feedback- thanks!

Ron
Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.