Buy Percona ServicesBuy Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 2 hours 6 min ago

Webinar Thu 6/21: How to Analyze and Tune MySQL Queries for Better Performance

2 hours 30 min ago

Please join Percona’s MySQL Database Administrator, Brad Mickel as he presents How to Analyze and Tune MySQL Queries for Better Performance on Thursday, June 21st, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

 

Query performance is essential in making any application successful. In order to finely tune your queries you first need to understand how MySQL executes them, and what tools are available to help identify problems.

In this session you will learn:

  1. The common tools for researching problem queries
  2. What an Index is, and why you should use one
  3. Index limitations
  4. When to rewrite the query instead of just adding a new index
Register Now

 

Brad Mickel MySQL DBA

Bradley began working with MySQL in 2013 as part of his duties in healthcare billing. After 3 years in healthcare billing he joined Percona as part of the bootcamp process. After the bootcamp he has served as a remote database administrator on the Atlas team for Percona Managed Services.

The post Webinar Thu 6/21: How to Analyze and Tune MySQL Queries for Better Performance appeared first on Percona Database Performance Blog.

Chunk Change: InnoDB Buffer Pool Resizing

June 19, 2018 - 11:02am

Since MySQL 5.7.5, we have been able to resize dynamically the InnoDB Buffer Pool. This new feature also introduced a new variable — innodb_buffer_pool_chunk_size — which defines the chunk size by which the buffer pool is enlarged or reduced. This variable is not dynamic and if it is incorrectly configured, could lead to undesired situations.

Let’s see first how innodb_buffer_pool_size , innodb_buffer_pool_instances  and innodb_buffer_pool_chunk_size interact:

The buffer pool can hold several instances and each instance is divided into chunks. There is some information that we need to take into account: the number of instances can go from 1 to 64 and the total amount of chunks should not exceed 1000.

So, for a server with 3GB RAM, a buffer pool of 2GB with 8 instances and chunks at default value (128MB) we are going to get 2 chunks per instance:

This means that there will be 16 chunks.

I’m not going to explain the benefits of having multiple instances, I will focus on resizing operations. Why would you want to resize the buffer pool? Well, there are several reasons, such as:

  • on a virtual server you can add more memory dynamically
  • for a physical server, you might want to reduce database memory usage to make way for other processes
  • on systems where the database size is smaller than available RAM
  • if you expect a huge growth and want to increase the buffer pool on demand
Reducing the buffer pool

Let’s start reducing the buffer pool:

| innodb_buffer_pool_size | 2147483648 | | innodb_buffer_pool_instances | 8 | | innodb_buffer_pool_chunk_size | 134217728 | mysql> set global innodb_buffer_pool_size=1073741824; Query OK, 0 rows affected (0.00 sec) mysql> show global variables like 'innodb_buffer_pool_size'; +-------------------------+------------+ | Variable_name | Value | +-------------------------+------------+ | innodb_buffer_pool_size | 1073741824 | +-------------------------+------------+ 1 row in set (0.00 sec)

If we try to decrease it to 1.5GB, the buffer pool will not change and a warning will be showed:

mysql> set global innodb_buffer_pool_size=1610612736; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> show warnings; +---------+------+---------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------------------------------------------+ | Warning | 1210 | InnoDB: Cannot resize buffer pool to lesser than chunk size of 134217728 bytes. | +---------+------+---------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> show global variables like 'innodb_buffer_pool_size'; +-------------------------+------------+ | Variable_name | Value | +-------------------------+------------+ | innodb_buffer_pool_size | 2147483648 | +-------------------------+------------+ 1 row in set (0.01 sec)

Increasing the buffer pool

When we try to increase the value from 1GB to 1.5GB, the buffer pool is resized but the requested innodb_buffer_pool_size is considered to be incorrect and is truncated:

mysql> set global innodb_buffer_pool_size=1610612736; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> show warnings; +---------+------+-----------------------------------------------------------------+ | Level | Code | Message | +---------+------+-----------------------------------------------------------------+ | Warning | 1292 | Truncated incorrect innodb_buffer_pool_size value: '1610612736' | +---------+------+-----------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> show global variables like 'innodb_buffer_pool_size'; +-------------------------+------------+ | Variable_name | Value | +-------------------------+------------+ | innodb_buffer_pool_size | 2147483648 | +-------------------------+------------+ 1 row in set (0.01 sec)

And the final size is 2GB. Yes! you intended to set the value to 1.5GB and you succeeded in setting it to 2GB. Even if you set 1 byte higher, like setting: 1073741825, you will end up with a buffer pool of 2GB.

mysql> set global innodb_buffer_pool_size=1073741825; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> show global variables like 'innodb_buffer_pool_%size' ; +-------------------------------+------------+ | Variable_name | Value | +-------------------------------+------------+ | innodb_buffer_pool_chunk_size | 134217728 | | innodb_buffer_pool_size | 2147483648 | +-------------------------------+------------+ 2 rows in set (0.01 sec)

Interesting scenarios Increasing size in the config file

Let’s suppose one day you get up willing to change or tune some variables in your server, and you decide that as you have free memory you will increase the buffer pool. In this example, we are going to use a server with innodb_buffer_pool_instances = 16  and 2GB of buffer pool size which will be increased to 2.5GB

So, we set in the configuration file:

innodb_buffer_pool_size = 2684354560

But then after restart, we found:

mysql> show global variables like 'innodb_buffer_pool_%size' ; +-------------------------------+------------+ | Variable_name | Value | +-------------------------------+------------+ | innodb_buffer_pool_chunk_size | 134217728 | | innodb_buffer_pool_size | 4294967296 | +-------------------------------+------------+ 2 rows in set (0.00 sec)

And the error log says:

2018-05-02T21:52:43.568054Z 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 16, chunk size = 128M

So, after we have set innodb_buffer_pool_size in the config file to 2.5GB, the database gives us a 4GB buffer pool, because of the number of instances and the chunk size. What the message doesn’t tell us is the number of chunks, and this would be useful to understand why such a huge difference.

Let’s take a look at how that’s calculated.

Increasing instances and chunk size

Changing the number of instances or the chunk size will require a restart and will take into consideration the buffer pool size as an upper limit to set the chunk size. For instance, with this configuration:

innodb_buffer_pool_size = 2147483648 innodb_buffer_pool_instances = 32 innodb_buffer_pool_chunk_size = 134217728

We get this chunk size:

mysql> show global variables like 'innodb_buffer_pool_%size' ; +-------------------------------+------------+ | Variable_name | Value | +-------------------------------+------------+ | innodb_buffer_pool_chunk_size | 67108864 | | innodb_buffer_pool_size | 2147483648 | +-------------------------------+------------+ 2 rows in set (0.00 sec)

However, we need to understand how this is really working. To get the innodb_buffer_pool_chunk_size it will make this calculation: innodb_buffer_pool_size / innodb_buffer_pool_instances with the result rounded to a multiple of 1MB.

In our example, the calculation will be 2147483648 / 32 = 67108864 which 67108864%1048576=0, no rounding needed. The number of chunks will be one chunk per instance.

When does it consider that it needs to use more chunks per instance? When the difference between the required size and the innodb_buffer_pool_size configured in the file is greater or equal to 1MB.

That is why, for instance, if you try to set the innodb_buffer_pool_size equal to 1GB + 1MB – 1B you will get 1GB of buffer pool:

innodb_buffer_pool_size = 1074790399 innodb_buffer_pool_instances = 16 innodb_buffer_pool_chunk_size = 67141632 2018-05-07T09:26:43.328313Z 0 [Note] InnoDB: Initializing buffer pool, total size = 1G, instances = 16, chunk size = 64M

But if you set the innodb_buffer_pool_size equals to 1GB + 1MB you will get 2GB of buffer pool:

innodb_buffer_pool_size = 1074790400 innodb_buffer_pool_instances = 16 innodb_buffer_pool_chunk_size = 67141632 2018-05-07T09:25:48.204032Z 0 [Note] InnoDB: Initializing buffer pool, total size = 2G, instances = 16, chunk size = 64M

This is because it considers that two chunks will fit. We can say that this is how the InnoDB Buffer pool size is calculated:

determine_best_chunk_size{   if innodb_buffer_pool_size / innodb_buffer_pool_instances < innodb_buffer_pool_chunk_size   then     innodb_buffer_pool_chunk_size = roundDownMB(innodb_buffer_pool_size / innodb_buffer_pool_instances)   fi } determine_amount_of_chunks{   innodb_buffer_amount_chunks_per_instance = roundDown(innodb_buffer_pool_size / innodb_buffer_pool_instances / innodb_buffer_pool_chunk_size)   if innodb_buffer_amount_chunks_per_instance * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size - innodb_buffer_pool_size > 1024*1024   then     innodb_buffer_amount_chunks_per_instance++   fi } determine_best_chunk_size determine_amount_of_chunks innodb_buffer_pool_size = innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size * innodb_buffer_amount_chunks_per_instance

What is the best setting?

In order to analyze the best setting you will need to know that there is a upper limit of 1000 chunks. In our example with 16 instances, we can have no more than 62 chunks per instance.

Another thing to consider is what each chunk represents in percentage terms. Continuing with the example, each chunk per instance represent 1.61%, which means that we can increase or decrease the complete buffer pool size in multiples of this percentage.

From a management point of view, I think that you might want to consider at least a range of 2% to 5% to increase or decrease the buffer. I performed some tests to see the impact of having small chunks and I found no issues but this is something that needs to be thoroughly tested.

The post Chunk Change: InnoDB Buffer Pool Resizing appeared first on Percona Database Performance Blog.

Webinar Weds 20/6: Percona XtraDB Cluster 5.7 Tutorial Part 2

June 19, 2018 - 5:18am
Including setting up Percona XtraDB Cluster with ProxySQL and PMM

Please join Percona’s Architect, Tibi Köröcz as he presents Percona XtraDB Cluster 5.7 Tutorial Part 2 on Wednesday, June 20th, 2018, at 7:00 am PDT (UTC-7) / 10:00 am EDT (UTC-4).

Register Now

 

Never used Percona XtraDB Cluster before? This is the webinar for you! In this 45-minute webinar, we will introduce you to a fully functional Percona XtraDB Cluster.

This webinar will show you how to install Percona XtraDB Cluster with ProxySQL, and monitor it with Percona Monitoring and Management (PMM).

We will also cover topics like bootstrap, IST, SST, certification, common-failure situations and online schema changes.

After this webinar, you will have enough knowledge to set up a working Percona XtraDB Cluster with ProxySQL, in order to meet your high availability requirements.

You can see part one of this series here: Percona XtraDB Cluster 5.7 Tutorial Part 1

Register Now!

Tibor Köröcz Architect

Tibi joined Percona in 2015 as a Consultant. Before joining Percona, among many other things, he worked at the world’s largest car hire booking service as a Senior Database Engineer. He enjoys trying and working with the latest technologies and applications which can help or work with MySQL together. In his spare time he likes to spend time with his friends, travel around the world and play ultimate frisbee.

 

The post Webinar Weds 20/6: Percona XtraDB Cluster 5.7 Tutorial Part 2 appeared first on Percona Database Performance Blog.

Percona Database Performance Blog Commenting Issues

June 18, 2018 - 7:28pm

We are experiencing an intermittent commenting problem on the Percona Database Performance Blog.

At Percona, part of our purpose is to engage with the open source database community. A big part of this engagement is the Percona Database Performance Blog, and the participation of the community in reading and commenting. We appreciate your interest and contributions.

Currently, we are experiencing an intermittent problem with comments on the blog. Some users are unable to post comments and can receive an error message similar to the following:

We are working on correcting the issue, and apologize if it affects you. If you have a comment that you want to make on a specific blog and this issue affects you, you can also leave comments on Twitter, Facebook or LinkedIn – all blog posts are socialized on those platforms.

Thanks for your patience.

The post Percona Database Performance Blog Commenting Issues appeared first on Percona Database Performance Blog.

Webinar Tues 19/6: MySQL: Scaling and High Availability – Production Experience from the Last Decade(s)

June 18, 2018 - 4:02am


Please join Percona’s CEO, Peter Zaitsev as he presents MySQL: Scaling and High Availability – Production Experience Over the Last Decade(s) on Tuesday, June 19th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

Register Now

 

Percona is known as the MySQL performance experts. With over 4,000 customers, we’ve studied, mastered and executed many different ways of scaling applications. Percona can help ensure your application is highly available. Come learn from our playbook, and leave this talk knowing your MySQL database will run faster and more optimized than before.

Register Now

About Peter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

The post Webinar Tues 19/6: MySQL: Scaling and High Availability – Production Experience from the Last Decade(s) appeared first on Percona Database Performance Blog.

This Week in Data with Colin Charles 42: Security Focus on Redis and Docker a Timely Reminder to Stay Alert

June 15, 2018 - 9:18am

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Much of last week, there was a lot of talk around this article: New research shows 75% of ‘open’ Redis servers infected. It turns out, it helps that one should always read beyond the headlines because they tend to be more sensationalist than you would expect. From the author of Redis, I highly recommend reading Clarifications on the Incapsula Redis security report, because it turns out that in this case, it is beyond the headline. The content is also suspect. Antirez had to write this to help the press (we totally need to help keep reportage accurate).

Not to depart from the Redis world just yet, but Antirez also had some collaboration with the Apple Information Security Team with regards to the Redis Lua subsystem. The details are pretty interesting as documented in Redis Lua scripting: several security vulnerabilities fixed because you’ll note that the Alibaba team also found some other issues. Antirez also ensured that the Redis cloud providers (notably: Redis Labs, Amazon, Alibaba, Microsoft, Google, Heroku, Open Redis and Redis Green) got notified first (and in the comments, compose.io was missing, but now added to the list). I do not know if Linux distributions were also informed, but they will probably be rolling out updates soon.

In the “be careful where you get your software” department: some criminals have figured out they could host some crypto-currency mining software that you would get pre-installed if you used their Docker containers. They’ve apparently made over $90,000. It is good to note that the Backdoored images downloaded 5 million times finally removed from Docker Hub. This, however, was up on the Docker Hub for ten months and they managed to get over 5 million downloads across 17 images. Know what images you are pulling. Maybe this is again more reason for software providers to run their own registries?

James Turnbull is out with a new book: Monitoring with Prometheus. It just got released, I’ve grabbed it, but a review will come shortly. He’s managed all this while pulling off what seems to be yet another great O’Reilly Velocity San Jose Conference.

Releases

A quiet week on this front.

Link List
  • INPLACE upgrade from MySQL 5.7 to MySQL 8.0
  • PostgreSQL relevant: What’s is the difference between streaming replication vs hot standby vs warm standby ?
  • A new paper on Amazon Aurora is out: Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes. It was presented at SIGMOD 2018, and an abstract: “One of the more novel differences between Aurora and other relational databases is how it pushes redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. Doing so reduces networking traffic, avoids checkpoints and crash recovery, enables failovers to replicas without loss of data, and enables fault-tolerant storage that heals without database involvement. Traditional implementations that leverage distributed storage would use distributed consensus algorithms for commits, reads, replication, and membership changes and amplify cost of underlying storage.” Aurora, as you know, avoids distributed consensus under most circumstances. Short 8-page read.
  • Dormando is blogging again, and this was of particular interest — Caching beyond RAM: the case for NVMe. This is done in the context of memcached, which I am certain many use.
  • It is particularly heartening to note that not only does MongoDB use Linkbench for some of their performance testing, they’re also contributing to making it better via a pull request.
Industry Updates

Trying something new here… To cover fundraising, and people on the move in the database industry.

  • Kenny Gorman — who has been on the program committee for several Percona Live conferences, and spoken at the event multiple times before — is the founder and CEO of Eventador, a stream-processing as a service company built on Apache Kafka and Apache Flink, has just raised $3.8 million in funding to fuel their growth. They are also naturally spending this on hiring. The full press release.
  • Jimmy Guerrero (formerly of MySQL and InfluxDB) is now VP Marketing & Community at YugaByte DB. YugaByte was covered in column 13 as having raised $8 million in November 2017.
Upcoming appearances
  • DataOps Barcelona – Barcelona, Spain – June 21-22, 2018 – code dataopsbcn50 gets you a discount
  • OSCON – Portland, Oregon, USA – July 16-19, 2018
  • Percona webinar on Maria Server 10.3 – June 26, 2018
Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 42: Security Focus on Redis and Docker a Timely Reminder to Stay Alert appeared first on Percona Database Performance Blog.

Tuning PostgreSQL for sysbench-tpcc

June 15, 2018 - 5:30am

Percona has a long tradition of performance investigation and benchmarking. Peter Zaitsev, CEO and Vadim Tkachenko, CTO, led their crew into a series of experiments with MySQL in this space. The discussion that always follows on the results achieved is well known and praised even by the PostgreSQL community. So when Avi joined the team and settled at Percona just enough to get acquainted with my colleagues, sure enough one of the first questions they asked him was: “did you know sysbench-tpcc also works with PostgreSQL now ?!“. 

sysbench

sysbench is “a scriptable multi-threaded benchmark tool based on LuaJIT (…) most frequently used for database benchmarks“, created and maintained by Alexey Kopytov. It’s been around for a long time now and has been a main reference for MySQL benchmarking since its inception. One of the favorites of Netflix’ Brendan Gregg, we now know. You may remember Sveta Smirnova and Alexander Korotkov’s report on their experiments in Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads here. In fact, that post may serve as a nice prelude for the tests we want to show you today. It provides a good starting point as a MySQL vs PostgreSQL performance comparison.

The idea behind Sveta and Alexander’s experiments was “to provide an honest comparison for the two popular RDBMSs“, MySQL and PostgreSQL, using “the same tool, under the same challenging workloads and using the same configuration parameters (where possible)“. Since neither pgbench nor sysbench would work effectively with MySQL and PostgreSQL for both writes and reads they attempted to port pgbench‘s workload as a sysbench benchmark. 

sysbench-tpcc

More recently, Vadim came up with an implementation of the famous TPC-C workload benchmark for sysbench, sysbench-tpcc. He has since published a series of tests using Percona Server and MySQL, and worked to make it compatible with PostgreSQL too. For real now, hence the request that awaited us.

Our goal this time was less ambitious than Sveta and Alexander’s. We wanted to show you how we setup PostgreSQL to perform optimally for sysbench-tpcc, highlighting the settings we tuned the most to accomplish this. We ran our tests on the same box used by Vadim in his recent experiments with Percona Server for MySQL and MySQL.

A valid benchmark – benchmark rules

Before we present our results we shall note there are several ways to speed up database performance. You may for example disable full_page_writes, which would make a server crash unsafe, and use a minimalistic wal_level mode, which would block replication capability. These would speed things up but at the expense of reliability, making the server inappropriate for production usage.

For our benchmarks, we made sure we had all the necessary parameters in place to satisfy the following:

  1. ACID Compliance
  2. Point-in-time-recovery
  3. WALs usable by Replica/Slave for Replication
  4. Crash Recovery
  5. Frequent Checkpointing to reduce time for Crash Recovery
  6. Autovacuum

When we initially prepared sysbench-tpcc with PostgreSQL 10.3 the database size was 118 GB. By the time we completed the test, i.e. after 36000 seconds, the DB size had grown up to 335 GB. We have a total of “only” 256 GB of memory available in this server, however, based on the observations from pg_stat_database, pg_statio_user_tables and pg_statio_user_indexes 99.7% of the blocks were always in-memory:

postgres=# select ((blks_hit)*100.00)/(blks_hit+blks_read) AS “perc_mem_hit” from pg_stat_database where datname like ‘sbtest’; perc_mem_hit --------------------- 99.7267224322546 (1 row)

Hence, we consider it to be an in-memory workload with the whole active data set in RAM. In this post we explain how we tuned our PostgreSQL Instance for an in-memory workload, as was the case here.

Preparing the database before running sysbench

In order to run a sysbench-tpcc, we must first prepare the database to load some data. In our case, as mentioned above, this initial step resulted in a 118 GB database:

postgres=# select datname, pg_size_pretty(pg_database_size(datname)) as "DB_Size" from pg_stat_database where datname = 'sbtest'; datname | DB_Size ---------+--------- sbtest | 118 GB (1 row)

This may change depending on the arguments used. Here is the actual command we used to prepare the PostgreSQL Database for sysbench-tpcc:

$ ./tpcc.lua --pgsql-user=postgres --pgsql-db=sbtest --time=120 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0  --trx_level=RC --db-driver=pgsql prepare

While we were loading the data, we wanted to see if we could speed-up the process. Here’s the customized PostgreSQL settings we used, some of them directly targeted to accelerate the data load:

shared_buffers = 192GB maintenance_work_mem = '20GB' wal_level = 'minimal' autovacuum = 'OFF' wal_compression = 'ON' max_wal_size = '20GB' checkpoint_timeout = '1h' checkpoint_completion_target = '0.9' random_page_cost = 1 max_wal_senders = 0 full_page_writes = ON synchronous_commit = ON

We’ll discuss most of these parameters in the sections that follow, but we would like to highlight two of them here. We increased maintenance_work_mem to speed  up index creation and max_wal_size to delay checkpointing further, but not too much — this is a write-intensive phase after all. Using these parameters it took us 33 minutes to complete the prepare stage compared with 55 minutes when using the default parameters. 

If you are not concerned about crash recovery or ACID, you could turn off full_page_writes, fsync and synchrnous_commit. That would speed up the data load much more. 

Running a manual VACUUM ANALYZE after sysbench-tpcc’s initial prepare stage

Once we had prepared the database, as it is a newly created DB Instance, we ran a manual VACUUM ANALYZE on the database (in parallel jobs) using the command below. We employed all the 56 vCPUs available in the server since there was nothing else running in the machine:

$ /usr/lib/postgresql/10/bin/vacuumdb -j 56 -d sbtest -z

Having run a vacuum for the entire database we restarted PostgreSQL and cleared the OS cache before executing the benchmark in “run” mode. We repeated this process after each round.

First attempt with sysbench-tpcc

When we ran sysbench-tpcc for the first time, we observed a resulting TPS of 1978.48 for PostgreSQL with the server not properly tuned, running with default settings. We used the following command to run sysbench-tpcc for PostgreSQL for 10 hours (or 36000 seconds) for all rounds:

./tpcc.lua --pgsql-user=postgres --pgsql-db=sbtest --time=36000 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0  --trx_level=RC --pgsql-password=oracle --db-driver=pgsql run

PostgreSQL performance tuning of parameters for sysbench-tpcc (crash safe)

After getting an initial idea of how PostgreSQL performed with the default settings and the actual demands of the sysbench-tpcc workload, we began making progressive adjustments in the settings, observing how they impacted the server’s performance. After several rounds we came up with the following list of parameters (all of these satisfy ACID properties):

shared_buffers = '192GB' work_mem = '4MB' random_page_cost = '1' maintenance_work_mem = '2GB' wal_level = 'replica' max_wal_senders = '3' synchronous_commit = 'on' seq_page_cost = '1' max_wal_size = '100GB' checkpoint_timeout = '1h' synchronous_commit = 'on' checkpoint_completion_target = '0.9' autovacuum_vacuum_scale_factor = '0.4' effective_cache_size = '200GB' min_wal_size = '1GB' bgwriter_lru_maxpages = '1000' bgwriter_lru_multiplier = '10.0' logging_collector = 'ON' wal_compression = 'ON' log_checkpoints = 'ON' archive_mode = 'ON' full_page_writes = 'ON' fsync = 'ON'

Let’s discuss our reasoning behind the tuning of the most important settings:

shared_buffers

Defines the amount of memory PostgreSQL uses for shared memory buffers. It’s arguably its most important setting, often compared (for better or worse) to MySQL’s innodb_buffer_pool_size. The biggest difference, if we dare to compare shared_buffers to the Buffer Pool, is that InnoDB bypasses the OS cache to directly access (read and write) data in the underlying storage subsystem whereas PostgreSQL do not.

Does this mean PostgreSQL does “double caching” by first loading data from disk into the OS cache to then make a copy of these pages into the shared_buffers area? Yes.

Does this “double caching” makes PostgreSQL inferior to InnoDB and MySQL in terms of memory management? No. We’ll discuss why that is the case in a follow up blog post. For now it suffice to say the actual performance depends on the workload (mix of reads and writes), the size of the “hot data” (the portion of the dataset that is most accessed and modified) and how often checkpointing takes place.

How we chose the setting for shared_buffers to optimize PostgreSQL performance

Due to these factors, the documented suggested formula of setting shared_buffers to 25% of RAM or the magic number of “8GB” is hardly ideal. What seems to be good reasoning, though, is this:

  • If you can fit the whole of your “hot data” in memory, then dedicating most of your memory to shared_buffers pays off nicely, making PostgreSQL behave as close to an in-memory database as possible.
  • If the size of your “hot data” surpasses the amount of memory you have available in the server, then you’re probably better off working with a much smaller shared_buffers area and relying more on the OS cache.

For this benchmark, considering the options we used, we found that dedicating 75% of all the available memory to shared_buffers is ideal. It is enough to fit the entire “hot data” and still leave sufficient memory for the OS to operate, handle connections and everything else.

work_mem

This setting defines the amount of memory that can be used by each query (not session) for internal sort operations (such as ORDER BY and DISTINCT), and hash tables (such as when doing hash-based aggregation). Beyond this, PostgreSQL moves the data into temporary disk files. The challenge is usually finding a good balance here. We want to avoid the use of temporary disk files, which slow down query completion and in turn may cause contention. But we don’t want to over-commit memory, which could even lead to OOM; working with high values for work_mem may be destructive when it is not really needed.

We analyzed the workload produced by sysbench-tpcc and found with some surprise that work_mem doesn’t play a role here, considering the queries that were executed. So we kept the default value of 4MB. Please note that this is seldom the case in production workloads, so it is important to always keep an eye on that parameter.

random_page_cost

This setting stipulates the cost that a non-sequentially-fetched disk page would have, and directly affects the query planner’s decisions. Going with a conservative value is particularly important when using high latency storage, such as spinning disks. This wasn’t our case, hence we could afford to equalize random_page_cost to seq_page_cost. So, we set this parameter to 1 as well, down from the default value of 4.

wal_level, max_wal_senders and archive_mode

To set up streaming replication wal_level needs to be set to at least “replica” and archive_mode must be enabled. This means the amount of WAL data produced increases significantly compared to when using default settings for these parameters, which in turn impacts IO. However, we considered these with a production environment in mind.

wal_compression

For this workload, we observed total WALs produced of size 3359 GB with wal_compression disabled and 1962 GB with wal_compression. We enabled wal_compression to reduce IO — the amount (and, most importantly, the rate) of WAL files being written to disk — at the expense of some additional CPU cycles. This proved to be very effective in our case as we had a surplus of CPU available.

checkpoint_timeout, checkpoint_completion_target and max_wal_size

We set the checkpoint_timeout to 1 hour and checkpoint_completion_target to 0.9. This means a checkpoint is forced every 1 hour and it has 90% of the time before the next checkpoint to spread the writes. However, a checkpoint is also forced when max_wal_size of WAL’s have been generated. With these parameters for a sysbench-tpcc workload, we saw that there were 3 to 4 checkpoints every 1 hour. This is especially because of the amount of WALs being generated.

In production environments we would always recommend you perform a manual CHECKPOINT before shutting down PostgreSQL in order to allow for a faster restart (recovery) time. In this context, issuing a manual CHECKPOINT took us between 1 and 2 minutes, after which we were able to restart PostgreSQL in just about 4 seconds. Please note that in our testing environment, taking time to restart PostgreSQL was not a concern, so working with this checkpoint rate benefited us. However, if you cannot afford a couple of minutes for crash recovery it is always suggested to force checkpointing to take place more often, even at the cost of some degraded performance.

full_page_writes, fsync and synchronous_commit

We set all of these parameters to ON to satisfy ACID properties.

autovacuum

We enabled autovacuum and other vacuum settings to ensure vacuum is being performed in the backend. We will discuss the importance of maintaining autovacuum enabled in a production environment, as well as the danger of doing otherwise, in a separate post. 

Amount of WAL’s (Transaction Logs) generated after 10 hours of sysbench-tpcc

Before we start to discuss the numbers it is important to highlight that we enabled wal_compression before starting sysbench. As we mentioned above, the amount of WALs generated with wal_compression set to OFF was more than twice the amount of WALs generated when having compression enabled. We observed that enabling wal_compression resulted in an increase in TPS of 21%. No wonder, the production of WALs has an important impact on IO: so much so that it is very common to find PostgreSQL servers with a dedicated storage for WALs only. Thus, it is important to highlight the fact wal_compression may benefit write-intensive workloads by sparing IO at the expense of additional CPU usage.

To find out the total amount of WALs generated after 10 Hours, we took note at the WAL offset from before we started the test and after the test completed:

WAL Offset before starting the sysbench-tpcc ⇒ 2C/860000D0 WAL Offset after 10 hours of sysbench-tpcc   ⇒ 217/14A49C50

and subtracted one from the other using pg_wal_lsn_diff, as follows:

postgres=# SELECT pg_size_pretty(pg_wal_lsn_diff('217/14A49C50','2C/860000D0')); pg_size_pretty ---------------- 1962 GB (1 row)

1962 GB of WALs is a fairly big amount of transaction logs produced over 10 hours, considering we had enabled wal_compression .

We contemplated making use of a separate disk to store WALs to find out by how much more a dedicated storage for transaction logs would benefit overall performance. However, we wanted to keep using the same hardware Vadim had used for his previous tests, so decided against this.

Crash unsafe parameters

Setting full_page_writes, fsync and synchronous_commit to OFF may speed up the performance but it is always crash unsafe unless we have enough backup in place to consider these needs. For example, if you are using a COW FileSystem with Journaling, you may be fine with full_page_writes set to OFF. This may not be true 100% of the time though.

However, we still want to share the results with the crash unsafe parameters mentioned in the paragraph above as a reference.

Results after 10 Hours of sysbench-tpcc for PostgreSQL with default, crash safe and crash unsafe parameters

Here are the final numbers we obtained after running sysbench-tpcc for 10 hours considering each of the scenarios above:

Parameters

TPS

Default / Untuned

1978.48

Tuned (crash safe)

5736.66

Tuned (crash unsafe)

7881.72

Did we expect to get these numbers? Yes and no.

Certainly we expected a properly tuned server would outperform one running with default settings considerably but we can’t say we expected it to be almost three times better (2.899). With PostgreSQL making use of the OS cache it is not always the case that tuning shared_buffers in particular will make such a dramatic difference. By comparison, tuning MySQL’s InnoDB Buffer Pool almost always makes a difference. For PostgreSQL high performance it depends on the workload. In this case for sysbench-tpcc benchmarks, tuning shared_buffers definitely makes a difference.

On the other hand experiencing an additional order of magnitude faster (4x), when using crash unsafe settings, was not much of a surprise.

Here’s an alternative view of the results of our PostgreSQL insert performance tuning benchmarks:

What did you think about this experiment? Please let us know in the comments section below and let’s get the conversation going.

Hardware spec
  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic
  • PostgreSQL: version 10.3
  • sysbench-tpcc: https://github.com/Percona-Lab/sysbench-tpcc

The post Tuning PostgreSQL for sysbench-tpcc appeared first on Percona Database Performance Blog.

Percona Monitoring and Management: Look After Your pmm-data Container

June 14, 2018 - 5:58am

If you have already deployed PMM server using Docker you might be aware that we begin by creating a special container for persistent PMM data. In this post, I aim to explain the importance of pmm-data container when you deploy PMM server with Docker. By the end of this post, you will have a fair idea of why this Docker container is needed.

Percona Monitoring and Management (PMM) is a free and open-source solution for database troubleshooting and performance optimization that you can run in your own environment. It provides time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

What is the purpose of pmm-data?

Well, as simple as its name suggests, when PMM Server runs via Docker its data is stored in the pmm-data container. It’s a dedicated data only container which you create with bind mounts using -v i.e data volumes for holding persistent PMM data. We use pmm-data to compartmentalize the persistent data so you can more easily backup up and move data consistently across instances or containers. It acts as a single access point from which other running containers (in this case pmm-server) can access data volumes.

pmm-data container does not run, but data from the container is used by pmm-server to build graphs. PMM Server is the core of PMM that aggregates collected data and presents it in the form of tables, dashboards, and graphs in a web interface.

Why do we use docker create ?

The docker create  command instructs the Docker daemon to create a writable container layer over the docker image. When you execute docker create  using the steps shown, it will create a Docker container named pmm-data and initialize data volumes using the -v flag in conjunction with the create command. (e.g. /opt/prometheus/data).

Option -v is used multiple times in current versions of PMM to mount multiple data volumes. This allows you to create the data volume containers, and then use them from another container i.e pmm-server. We do not want to run the pmm-data container, but only to create it. nb: the number of data volumes bind mounted may change with versions of PMM

$ docker create \ -v /opt/prometheus/data -v /opt/consul-data \ -v /var/lib/mysql \ -v /var/lib/grafana \ --name pmm-data \ percona/pmm-server:latest /bin/true

Make sure that the data volumes you initialize with the -v option match those given in the example. PMM Server expects you to have bind mounted those directories exactly as demonstrated in the deployment steps. For using different mount points for PMM deployment, please refer to this blog post. Data volumes are very useful as once designated and created you can share them and be include them as part of other containers. If you use -v or –volume to bind-mount a file or directory that does not yet exist on the Docker host, -v creates the endpoint for you. It is always created as a directory. Data in the pmm-data volume are actually hosted on the host’s filesystem.

Why does pmm-data not run ?

As we used docker create  container and not docker run  for pmm-data, this container does not run. It simply exists to make sure you retain all PMM data when you upgrade to a newer PMM Server image. Data volumes bind mounted on pmm-data container are shared to the running pmm-server container as the --volumes-from  option is used for pmm-server launch. Here we persisted data using Docker without binding it to the pmm-server by storing files in the host machine. As long as pmm-data exists, the data exists.

You can stop, destroy, or replace a container. When a non-running container is using a volume, the volume is still available to Docker and is not removed automatically. You can easily replace the pmm-server of the running container by a newer version without any impact or loss of data. For that reason, because of the need to store persistent data, we do it in a data volume. In our case, pmm-data container does not write to the same volumes as it could cause possible corruption.

Why can’t I remove pmm-data container ? What happens if I delete it ?

Removing pmm-data container results in the loss of collected metrics data.

If you remove containers that mount volumes, including the initial pmm-server container, or any subsequent containers mounted, such as pmm-server-2, you do not delete the volumes. This allows you to upgrade — or effectively migrate — data volumes between containers. Your data container might be based on an old version of container, with known security problems. It is not a big problem since it doesn’t actually run anything, but it doesn’t feel right.

As noted earlier, pmm-data stores metrics data as per the retention. You should not remove or recreate pmm-data container unless you need to wipe out all PMM data and start again. To delete the volume from disk, you must explicitly call docker rm -v against the container with a reference to the volume.

Some do’s and don’ts
  • Allocate enough disk space on the host for pmm-data to retain data.
    By default, Prometheus stores time-series data for 30 days, and QAN stores query data for 8 days.
  • Manage data retention appropriately as per your disk space available.
    You can take backup of pmm-data by extracting data from container to avoid data-loss in any situation by using steps mentioned here.

In case of any issues with metrics, here’s a good blog post regarding troubleshooting.

The post Percona Monitoring and Management: Look After Your pmm-data Container appeared first on Percona Database Performance Blog.

What is the Top Cause of Application Downtime Today?

June 14, 2018 - 2:43am

I frequently talk to our customer base about what keeps them up at night. While there is a large variance of answers, they tend to fall into one of two categories. The first is the conditioned fear of some monster lurking behind the scenes that could pounce at any time. The second, of course, is the actual monster of downtime on a critical system. Ask most tech folks and they will tell you outages seem to only happen late at night or early in the morning. And that they do keep them up.

Entire companies and product lines have been built around providing those in the IT world with some ability to sleep at night. Modern enterprises have spent millions to mitigate the risk and prevent their businesses from having a really bad day because of an outage. Cloud providers are attuned to the downtime dilemma and spend lots of time, money, and effort to build in redundancy and make “High Availability” (HA) as easy as possible. The frequency of “hardware” or server issues continues to dwindle.

Where does the downtime issue start?

In my discussions, most companies I have talked to say their number one cause of outages and customer interruptions is ultimately related to the deployment of new or upgraded code. Often I hear the operations team has little or no involvement with an application until it’s put into production. It is a bit ironic that this is also the area where companies tend to drastically under-invest. They opt instead to invest in ways to “Scale Out or Up”. Or perhaps how to survive asteroids hitting two out three of their data centers.

Failing over broken or slow code from one server to another does not fix it. Adding more servers to distribute the load can mitigate a problem, but can also escalate the cost dramatically. In most cases, the solutions they apply don’t address the primary cause of the problems.

While there are some fantastic tools out there that can help with getting better visibility into code level issues — such as New Relic, AppDynamics and others — the real problem is that these often end up being used to diagnose issues after they have appeared in production. Most companies carry out some amount of testing before releasing code, but typically it is a fraction of what they should be doing. Working for a company that specializes in open source databases, we get a lot of calls on issues that have prevented companies’ end users from using critical applications. Many of these problems are fixable before they cost a loss of revenue and reputation.

I think it’s time technology companies start to rethink our QA, Testing, and Pre-Deployment requirements. How much time, effort, and money can we save if we catch these “monsters” before they make it into production?

Not to mention how much better our operations team will sleep . . .

The post What is the Top Cause of Application Downtime Today? appeared first on Percona Database Performance Blog.

Zone Based Sharding in MongoDB

June 13, 2018 - 6:58am

In this blog post, we will discuss about how to use zone based sharding to deploy a sharded MongoDB cluster in a customized manner so that the queries and data will be redirected per geographical groupings. This feature of MongoDB is a part of its Data Center Awareness, that allows queries to be routed to particular MongoDB deployments considering physical locations or configurations of mongod instances.

Before moving on, let’s have an overview of this feature. You might already have some questions about zone based sharding. Was it recently introduced? If zone-based sharding is something we should use, then what about tag-aware sharding?

MongoDB supported tag-aware sharding from even the initial versions of MongoDB. This means tagging a range of shard keys values, associating that range with a shard, and redirecting operations to that specific tagged shard. This tag-aware sharding, since version 3.4, is referred to as ZONES. So, the only change is its name, and this is the reason sh.addShardTag(shard, tag) method is being used.

How it works
  1. With the help of a shard key, MongoDB allows you to create zones of sharded data – also known as shard zones.
  2. Each zone can be associated with one or more shards.
  3. Similarly, a shard can associate with any number of non-conflicting zones.
  4. MongoDB migrates chunks to the zone range in the selected shards.
  5. MongoDB routes read and write to a particular zone range that resides in particular shards.
Useful for what kind of deployments/applications?
  1. In cases where data needs to be routed to a particular shard due to some hardware configuration restrictions.
  2. Zones can be useful if there is the need to isolate specific data to a particular shard. For example, in the case of GDPR compliance that requires businesses to protect data and privacy for an individual within the EU.
  3. If an application is being used geographically and you want a query to route to the nearest shards for both reads and writes.
Let’s consider a Scenario

Consider the scenario of a school where students are experts in Biology, but most students are experts in Maths. So we have more data for the maths students compare to Biology students. In this example, deployment requires that Maths students data should route to the shard with the better configuration for a large amount of data. Both read and write will be served by specific shards.  All the Biology students will be served by another shard. To implement this, we will add a tag to deploy the zones to the shards.

For this scenario we have an environment with:

DB: “school”

Collection: “students”

Fields: “sId”, “subject”, “marks” and so on..

Indexed Fields: “subject” and “sId”

We enable sharding:

sh.enableSharding("school")

And create a shardkey: “subject” and “sId” 

sh.shardCollection("school.students", {subject: 1, sId: 1});

We have two shards in our test environment

shards:

{ "_id" : "shard0000", "host" : "127.0.0.1:27001", "state" : 1 } { "_id" : "shard0001", "host" : "127.0.0.1:27002", "state" : 1 }

Zone Deployment 1) Disable balancer

To prevent migration of the chunks across the cluster, disable the balancer for the “students” collection:

mongos> sh.disableBalancing("school.students") WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Before proceeding further make sure the balancer is not running. It is not a mandatory process, but it is always a good practice to make sure no migration of chunks takes place while configuring zones

mongos> sh.isBalancerRunning() false

2) Add shard to the zone

A zone can be associated with a particular shard in the form of a tag, using the sh.addShardTag(), so a tag will be added to each shard. Here we are considering two zones so the tags “MATHS” and “BIOLOGY” need to be added.

mongos> sh.addShardTag( "shard0000" , "MATHS"); { "ok" : 1 } mongos> sh.addShardTag( "shard0001" , "BIOLOGY"); { "ok" : 1 }

We can see zones are assigned in the form of tags as required against each shard.

mongos> sh.status() shards: { "_id" : "shard0000", "host" : "127.0.0.1:27001", "state" : 1, "tags" : [ "MATHS" ] } { "_id" : "shard0001", "host" : "127.0.0.1:27002", "state" : 1, "tags" : [ "BIOLOGY" ] }

3) Define ranges for each zone

Each zone covers one or more ranges of shard key values. Note: each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.

mongos> sh.addTagRange( "school.students", { "subject" : "maths", "sId" : MinKey}, { "subject" : "maths", "sId" : MaxKey}, "MATHS" ) { "ok" : 1 } mongos> sh.addTagRange( "school.students", { "subject" : "biology", "sId" : MinKey}, { "subject" : "biology", "sId" : MaxKey}, "BIOLOGY" ) { "ok" : 1 }

4) Enable balancer

Now enable the balancer so the chunks will migrate across the shards as per the requirement and all the read and write queries will be routed to the particular shards.

mongos> sh.enableBalancing("school.students") WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) mongos> sh.isBalancerRunning() true

Let’s check how documents get routed as per the tags:

We have inserted 6 documents, 4 documents with “subject”:”maths” and 2 documents with “subject”:”biology”

mongos> db.students.find({"subject":"maths"}).count() 4 mongos> db.students.find({"subject":"biology"}).count() 2

Checking the shard distribution for the students collection:

mongos> db.students.getShardDistribution() Shard shard0000 at 127.0.0.1:27003 data : 236B docs : 4 chunks : 4 estimated data per chunk : 59B estimated docs per chunk : 1 Shard shard0001 at 127.0.0.1:27004 data : 122B docs : 2 chunks : 1 estimated data per chunk : 122B estimated docs per chunk : 2

So in this test case, all the queries for the students collection have routed as per the tag used, with four documents inserted into shard0000 and two documents inserted to shard0001.

Any queries related to MATHS will route to shard0000 and queries related to BIOLOGY will route to shard0001, hence the load will be distributed as per the configuration of the shard, keeping the database performance optimized.

Sharding MongoDB using zones is a great feature provided by MongoDB. With the help of zones, data can be isolated to the specific shards. Or if we have any kind of hardware or configuration restrictions to the shards, it is a possible solution for routing the operations.

The post Zone Based Sharding in MongoDB appeared first on Percona Database Performance Blog.

Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide

June 13, 2018 - 1:04am

Please join Percona’s Sr. Technical Operations Architect, Tim Vaillancourt as he presents MongoDB Backup and Recovery Field Guide on Thursday, June 14, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

This talk will cover backup and recovery solutions for MongoDB replica sets and clusters, focusing on online and low-impact solutions for production systems.

Register for the webinar

Tim Vaillancourt Senior Technical Operations Architect

With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS.

Prior to moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide appeared first on Percona Database Performance Blog.

Webinar Weds 6/13: Performance Analysis and Troubleshooting Methodologies for Databases

June 12, 2018 - 4:48am

Please join Percona’s CEO, Peter Zaitsev as he presents Performance Analysis and Troubleshooting Methodologies for Databases on Wednesday, June 13th, 2018 at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

Register Now

 

Have you heard about the USE Method (Utilization – Saturation – Errors)? RED (Rate – Errors – Duration), or Golden Signals (Latency – Traffic – Errors – Saturations)?

In this presentation, we will talk briefly about these different-but-similar “focuses”. We’ll discuss how we can apply them to data infrastructure performance analysis, troubleshooting, and monitoring.

We will use MySQL as an example, but most of this talk applies to other database technologies too.

Register for the webinar

About Peter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

The post Webinar Weds 6/13: Performance Analysis and Troubleshooting Methodologies for Databases appeared first on Percona Database Performance Blog.

ProxySQL Experimental Feature: Native ProxySQL Clustering

June 11, 2018 - 5:18am

ProxySQL 1.4.2 introduced native clustering, allowing several ProxySQL instances to communicate with and share configuration updates with each other. In this blog post, I’ll review this new feature and how we can start working with 3 nodes.

Before I continue, let’s review two common methods to installing ProxySQL.

ProxySQL as a centralized server

This is the most common installation, where ProxySQL is between application servers and the database. It is simple, but without any high availability. If ProxySQL goes down you lose all connectivity to the database.

ProxySQL on app instances

Another common setup is to install ProxySQL onto each application server. This is good because the loss of one ProxySQL/App server will not bring down the entire application.

For more information about the previous installation, please visit this link Where Do I Put ProxySQL?

Sometimes our application and databases grow fast. Maybe you need add a loadbalancer, for example, and in that moment you start thinking … “What could I do to configure and maintain all these ProxySQL nodes without mistakes?

To do that, there are many tools like Ansible, Puppet, and Chef, but you will need write/create/maintain scripts to do those tasks. This is really difficult to administer for one person.

Now, there is a native solution, built into ProxySQL, to create and administer a cluster in an easy way.

At the moment this feature is EXPERIMENTAL and subject to change. Think very carefully before installing it in production, in fact I strongly recommend you wait. However, if you would like to start testing this feature, you need to install ProxySQL 1.4.2, or better.

This clustering feature is really useful if you have installed one ProxySQL per application instance, because all the changes in one of the ProxySQL nodes will be propagated to all the other ProxySQL nodes. You can also configure a “master-slave” style setup with ProxySQL clustering.

There are only 4 tables where you can make changes and propagate the configuration:

  • mysql_query_rules
  • mysql_servers
  • mysql_users
  • proxysql_servers
How does it work?

It’s easy. When you make a change like INSERT/DELETE/UPDATE on any of these tables, after running the command LOAD … TO RUNTIME , ProxySQL creates a new checksum of the table’s data and increments the version number in the table runtime_checksums_values. Below we can see an example.

admin ((none))>SELECT name, version, FROM_UNIXTIME(epoch), checksum FROM runtime_checksums_values ORDER BY name; +-------------------+---------+----------------------+--------------------+ | name | version | FROM_UNIXTIME(epoch) | checksum | +-------------------+---------+----------------------+--------------------+ | admin_variables | 0 | 1970-01-01 00:00:00 | | | mysql_query_rules | 1 | 2018-04-26 15:58:23 | 0x0000000000000000 | | mysql_servers | 1 | 2018-04-26 15:58:23 | 0x0000000000000000 | | mysql_users | 4 | 2018-04-26 18:36:12 | 0x2F35CAB62143AE41 | | mysql_variables | 0 | 1970-01-01 00:00:00 | | | proxysql_servers | 1 | 2018-04-26 15:58:23 | 0x0000000000000000 | +-------------------+---------+----------------------+--------------------+

Internally, all nodes are monitoring and communicating with all the other ProxySQL nodes. When another node detects a change in the checksum and version (both at the same time), each node will get a copy of the table that was modified, make the same changes locally, and apply the new config to RUNTIME to refresh the new config, make it visible to the applications connected and automatically save it to DISK for persistence.

The following setup creates a “synchronous cluster” so any changes to these 4 tables on any ProxySQL server will be replicated to all other ProxySQL nodes. Be careful!

How can I start testing this new feature?

1) To start we need to get at least 2 nodes. Download and install ProxySQL 1.4.2 or higher and start a clean version.

2) On all nodes, we need to update the following global variables. These changes will set the username and password used by each node’s internal communication to cluster1/clusterpass. These must be the same on all nodes in this cluster.

update global_variables set variable_value='admin:admin;cluster1:clusterpass' where variable_name='admin-admin_credentials'; update global_variables set variable_value='cluster1' where variable_name='admin-cluster_username'; update global_variables set variable_value='clusterpass' where variable_name='admin-cluster_password'; update global_variables set variable_value=200 where variable_name='admin-cluster_check_interval_ms'; update global_variables set variable_value=100 where variable_name='admin-cluster_check_status_frequency'; update global_variables set variable_value='true' where variable_name='admin-cluster_mysql_query_rules_save_to_disk'; update global_variables set variable_value='true' where variable_name='admin-cluster_mysql_servers_save_to_disk'; update global_variables set variable_value='true' where variable_name='admin-cluster_mysql_users_save_to_disk'; update global_variables set variable_value='true' where variable_name='admin-cluster_proxysql_servers_save_to_disk'; update global_variables set variable_value=3 where variable_name='admin-cluster_mysql_query_rules_diffs_before_sync'; update global_variables set variable_value=3 where variable_name='admin-cluster_mysql_servers_diffs_before_sync'; update global_variables set variable_value=3 where variable_name='admin-cluster_mysql_users_diffs_before_sync'; update global_variables set variable_value=3 where variable_name='admin-cluster_proxysql_servers_diffs_before_sync'; load admin variables to RUNTIME; save admin variables to disk;

3) Add all IPs from the other ProxySQL nodes into each other node:

INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES ('10.138.180.183',6032,100,'PRIMARY'); INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES ('10.138.244.108',6032,99,'SECONDARY'); INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES ('10.138.244.244',6032,98,'SECONDARY'); LOAD PROXYSQL SERVERS TO RUNTIME; SAVE PROXYSQL SERVERS TO DISK;

At this moment, we have all nodes synced.

In the next example from the log file, we can see when node1 detected node2.

[root@proxysql1 ~]# $ tail /var/lib/proxysql/proxysql.log ... 2018-05-10 11:19:51 [INFO] Cluster: Fetching ProxySQL Servers from peer 10.138.244.108:6032 started 2018-05-10 11:19:51 [INFO] Cluster: Fetching ProxySQL Servers from peer 10.138.244.108:6032 completed 2018-05-10 11:19:51 [INFO] Cluster: Loading to runtime ProxySQL Servers from peer 10.138.244.108:6032 2018-05-10 11:19:51 [INFO] Destroyed Cluster Node Entry for host 10.138.148.242:6032 2018-05-10 11:19:51 [INFO] Cluster: Saving to disk ProxySQL Servers from peer 10.138.244.108:6032 2018-05-10 11:19:52 [INFO] Cluster: detected a new checksum for proxysql_servers from peer 10.138.180.183:6032, version 6, epoch 1525951191, checksum 0x3D819A34C06EF4EA . Not syncing yet ... 2018-05-10 11:19:52 [INFO] Cluster: checksum for proxysql_servers from peer 10.138.180.183:6032 matches with local checksum 0x3D819A34C06EF4EA , we won't sync. 2018-05-10 11:19:52 [INFO] Cluster: closing thread for peer 10.138.148.242:6032 2018-05-10 11:19:52 [INFO] Cluster: detected a new checksum for proxysql_servers from peer 10.138.244.244:6032, version 4, epoch 1525951163, checksum 0x3D819A34C06EF4EA . Not syncing yet ... 2018-05-10 11:19:52 [INFO] Cluster: checksum for proxysql_servers from peer 10.138.244.244:6032 matches with local checksum 0x3D819A34C06EF4EA , we won't sync ...

Another example is to add users to the table mysql_users. Remember these users are to enable MySQL connections between the application (frontend) and MySQL (backend).

We will add a new username and password on any server; in my test I’ll use node2:

admin proxysql2 ((none))>INSERT INTO mysql_users(username,password) VALUES ('user1','crazyPassword'); Query OK, 1 row affected (0.00 sec) admin proxysql2 ((none))>LOAD MYSQL USERS TO RUNTIME; Query OK, 0 rows affected (0.00 sec)

In the log file from node3, we can see the update immediately:

[root@proxysql3 ~]# $ tail /var/lib/proxysql/proxysql.log ... 2018-05-10 11:30:57 [INFO] Cluster: detected a new checksum for mysql_users from peer 10.138.244.108:6032, version 2, epoch 1525951873, checksum 0x2AF43564C9985EC7 . Not syncing yet ... 2018-05-10 11:30:57 [INFO] Cluster: detected a peer 10.138.244.108:6032 with mysql_users version 2, epoch 1525951873, diff_check 3. Own version: 1, epoch: 1525950968. Proceeding with remote sync 2018-05-10 11:30:57 [INFO] Cluster: detected a peer 10.138.244.108:6032 with mysql_users version 2, epoch 1525951873, diff_check 4. Own version: 1, epoch: 1525950968. Proceeding with remote sync 2018-05-10 11:30:57 [INFO] Cluster: detected peer 10.138.244.108:6032 with mysql_users version 2, epoch 1525951873 2018-05-10 11:30:57 [INFO] Cluster: Fetching MySQL Users from peer 10.138.244.108:6032 started 2018-05-10 11:30:57 [INFO] Cluster: Fetching MySQL Users from peer 10.138.244.108:6032 completed 2018-05-10 11:30:57 [INFO] Cluster: Loading to runtime MySQL Users from peer 10.138.244.108:6032 2018-05-10 11:30:57 [INFO] Cluster: Saving to disk MySQL Query Rules from peer 10.138.244.108:6032 2018-05-10 11:30:57 [INFO] Cluster: detected a new checksum for mysql_users from peer 10.138.244.244:6032, version 2, epoch 1525951857, checksum 0x2AF43564C9985EC7 . Not syncing yet ... 2018-05-10 11:30:57 [INFO] Cluster: checksum for mysql_users from peer 10.138.244.244:6032 matches with local checksum 0x2AF43564C9985EC7 , we won't sync. 2018-05-10 11:30:57 [INFO] Cluster: detected a new checksum for mysql_users from peer 10.138.180.183:6032, version 2, epoch 1525951886, checksum 0x2AF43564C9985EC7 . Not syncing yet ... 2018-05-10 11:30:57 [INFO] Cluster: checksum for mysql_users from peer 10.138.180.183:6032 matches with local checksum 0x2AF43564C9985EC7 , we won't sync. ...

What happens if some node is down?

In this example, we will see and find out what happens if one node is down or has a network glitch, or other issue. I’ll stop ProxySQL node3:

[root@proxysql3 ~]# service proxysql stop Shutting down ProxySQL: DONE!

On ProxySQL node1, we can check that node3 is unreachable:

[root@proxysql1 ~]# tailf /var/lib/proxysql/proxysql.log 2018-05-10 11:57:33 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.138.244.244:6032 . Error: Can't connect to MySQL server on '10.138.244.244' (107) 2018-05-10 11:57:33 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.138.244.244:6032 . Error: Can't connect to MySQL server on '10.138.244.244' (107) 2018-05-10 11:57:33 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.138.244.244:6032 . Error: Can't connect to MySQL server on '10.138.244.244' (107)

And another check can be run in any ProxySQL node like node2, for example:

admin proxysql2 ((none))>SELECT hostname, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at FROM stats_proxysql_servers_checksums WHERE name='proxysql_servers' ORDER BY hostname; +----------------+--------------------+---------------------+---------------------+ | hostname | checksum | changed_at | updated_at | +----------------+--------------------+---------------------+---------------------+ | 10.138.180.183 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 | 2018-05-10 12:01:59 | | 10.138.244.108 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:38 | 2018-05-10 12:01:59 | | 10.138.244.244 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 | 2018-05-10 11:56:59 | +----------------+--------------------+---------------------+---------------------+ 3 rows in set (0.00 sec)

In the previous result, we can see node3 (10.138.244.244) is not being updated; the column updated_at should have a later datetime. This means that node3 is not running (or is down or network glitch).

At this point, any change to any of the tables, mysql_query_rules, mysql_servers, mysql_users, proxysql_servers, will be replicated between nodes 1 & 2.

In this next example, while node3 is offline, we will add another user to mysql_users table.

admin proxysql2 ((none))>INSERT INTO mysql_users(username,password) VALUES ('user2','passwordCrazy'); Query OK, 1 row affected (0.00 sec) admin proxysql2 ((none))>LOAD MYSQL USERS TO RUNTIME; Query OK, 0 rows affected (0.00 sec)

That change was propagated to node1:

[root@proxysql3 ~]# $ tail /var/lib/proxysql/proxysql.log ... 2018-05-10 12:12:36 [INFO] Cluster: detected a peer 10.138.244.108:6032 with mysql_users version 3, epoch 1525954343, diff_check 4. Own version: 2, epoch: 1525951886. Proceeding with remote sync 2018-05-10 12:12:36 [INFO] Cluster: detected peer 10.138.244.108:6032 with mysql_users version 3, epoch 1525954343 2018-05-10 12:12:36 [INFO] Cluster: Fetching MySQL Users from peer 10.138.244.108:6032 started 2018-05-10 12:12:36 [INFO] Cluster: Fetching MySQL Users from peer 10.138.244.108:6032 completed 2018-05-10 12:12:36 [INFO] Cluster: Loading to runtime MySQL Users from peer 10.138.244.108:6032 2018-05-10 12:12:36 [INFO] Cluster: Saving to disk MySQL Query Rules from peer 10.138.244.108:6032 ...

We keep seeing node3 is out of sync about 25 minutes ago.

admin proxysql2 ((none))>SELECT hostname, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname; +----------------+--------------------+---------------------+---------------------+ | hostname | checksum | changed_at | updated_at | +----------------+--------------------+---------------------+---------------------+ | 10.138.180.183 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 | 2018-05-10 12:21:35 | | 10.138.244.108 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:38 |2018-05-10 12:21:35 | | 10.138.244.244 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 |2018-05-10 12:21:35 | +----------------+--------------------+---------------------+---------------------+ 3 rows in set (0.00 sec)

Let’s start node3 and check if the sync works. node3 should connect to the other nodes and get the last changes.

[root@proxysql3 ~]# tail /var/lib/proxysql/proxysql.log ... 2018-05-10 12:30:02 [INFO] Cluster: detected a peer 10.138.244.108:6032 with mysql_users version 3, epoch 1525954343, diff_check 3. Own version: 1, epoch: 1525955402. Proceeding with remote sync 2018-05-10 12:30:02 [INFO] Cluster: detected a peer 10.138.180.183:6032 with mysql_users version 3, epoch 1525954356, diff_check 3. Own version: 1, epoch: 1525955402. Proceeding with remote sync … 2018-05-10 12:30:03 [INFO] Cluster: detected peer 10.138.180.183:6032 with mysql_users version 3, epoch 1525954356 2018-05-10 12:30:03 [INFO] Cluster: Fetching MySQL Users from peer 10.138.180.183:6032 started 2018-05-10 12:30:03 [INFO] Cluster: Fetching MySQL Users from peer 10.138.180.183:6032 completed 2018-05-10 12:30:03 [INFO] Cluster: Loading to runtime MySQL Users from peer 10.138.180.183:6032 2018-05-10 12:30:03 [INFO] Cluster: Saving to disk MySQL Query Rules from peer 10.138.180.183:6032

Looking at the status from the checksum table, we can see node3 is now up to date.

admin proxysql2 ((none))>SELECT hostname, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname; +----------------+--------------------+---------------------+---------------------+ | hostname | checksum | changed_at | updated_at | +----------------+--------------------+---------------------+---------------------+ | 10.138.180.183 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 | 2018-05-10 12:21:35 | | 10.138.244.108 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:38 |2018-05-10 12:21:35 | | 10.138.244.244 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 |2018-05-10 12:21:35 | +----------------+--------------------+---------------------+---------------------+ 3 rows in set (0.00 sec)admin proxysql2 ((none))>SELECT hostname, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname; +----------------+--------------------+---------------------+---------------------+ | hostname | checksum | changed_at | updated_at | +----------------+--------------------+---------------------+---------------------+ | 10.138.180.183 | 0x3928F574AFFF4C65 | 2018-05-10 12:12:24 | 2018-05-10 12:31:58 | | 10.138.244.108 | 0x3928F574AFFF4C65 | 2018-05-10 12:12:23 | 2018-05-10 12:31:58 | | 10.138.244.244 | 0x3928F574AFFF4C65 | 2018-05-10 12:30:19 | 2018-05-10 12:31:58 | +----------------+--------------------+---------------------+---------------------+ 3 rows in set (0.00 sec)

Now we have 3 ProxySQL nodes up to date. This example didn’t add any MySQL servers, hostgroups, etc, because the functionality is the same. The post is intended as an introduction to this new feature and how you can create and test a ProxySQL cluster.

Just remember that this is still an experimental feature and is subject to change with newer versions of ProxySQL.

Summary

This feature is really needed if you have more than one ProxySQL running for the same application in different instances. It is easy to maintain and configure for a single person and is easy to create and attach new nodes.

Hope you find this post helpful!

References

http://www.proxysql.com/blog/proxysql-cluster
http://www.proxysql.com/blog/proxysql-cluster-part2
http://www.proxysql.com/blog/proxysql-cluster-part3-mysql-servers
https://github.com/sysown/proxysql/wiki/ProxySQL-Cluster

The post ProxySQL Experimental Feature: Native ProxySQL Clustering appeared first on Percona Database Performance Blog.

This Week in Data with Colin Charles 40: a Peak at Blockchain, Lots of MariaDB News, then Back on the Road

June 1, 2018 - 9:45am

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Shortly after the last dispatch, I jetted off for a spot of vacation (which really meant I was checking out the hype behind Blockchain with a database developer lens at the Blockchain Week NYC), and then some customer visits in Seoul, which explains the short hiatus. Here’s to making this more regular as the summer approaches.

I am about to embark on a fairly long trip, covering a few upcoming appearances: Lisbon for the Percona Engineering meeting, SouthEastLinuxFest in Charlotte, the Open Source Data Centre Conference in Berlin and then the DataOps Barcelona event. I have some discount codes: 50% discount for OSDC with the code OSDC_FOR_FRIENDS, and 50% discount for DataOps Barcelona with the code dataopsbcn50. Expect this column to reflect my travels over the next few weeks.

There has been a lot of news on the MariaDB front: MariaDB 10.3.7 went stable/GA! You might have noticed more fanfare around the release name MariaDB TX 3.0, but the reality is you can still get this download from your usual MariaDB Foundation site. It is worth noting that the MariaDB Foundation 2017 financials have also been released. Some may have noticed a couple months back there was a press release titled Report “State of the Open-Source DBMS Market, 2018” by Gartner Includes Pricing Comparison With MariaDB. This led to a Gartner report on the State of the Open-Source DBMS Market, 2018; although the report has since been pulled. Hopefully we see it surface again.

In the meantime, please do try out MariaDB 10.3.7 and it would be great to hear feedback. I also have an upcoming Percona webinar on MariaDB Server 10.3 on June 26 2018 — when the sign up link appears, I will be sure to include it here.

Well written, and something worth discussing: Should Red Hat Buy or Build a Database?. The Twitter discussion is also worth looking at.

Releases Link List Upcoming appearances Feedback

I look forward to receiving feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 40: a Peak at Blockchain, Lots of MariaDB News, then Back on the Road appeared first on Percona Database Performance Blog.

Percona Server for MySQL 5.7.22-22 Is Now Available

May 31, 2018 - 11:06am

Percona announces the GA release of Percona Server for MySQL 5.7.22-22 on on May 31, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.22, including all the bug fixes in it, Percona Server for MySQL 5.7.22-22 is the current GA release in the Percona Server for MySQL 5.7 series. Percona provides completely open-source and free software.

New Features:
  • A new --encrypt-tmp-files option turns on encryption for the temporary files which Percona Server may create on disk for filesort, binary log transactional caches and Group Replication caches.
Bugs Fixed:
  • Executing the SHOW GLOBAL STATUS expression could cause “data drift” on global status variables in case of a query rollback: the variable, being by its nature a counter and allowing only an increase, could return to its previous value. Bug fixed #3951 (upstream #90351).
  • NUMA support was improved in Percona Server, reverting upstream implementation back to the original one,due to upstream variant being less effective in memory allocation. Now  innodb_numa_interleave variable not only enables NUMA interleave memory policy for the InnoDB buffer pool allocation, but forces NUMA interleaved allocation at the buffer pool initialization time. Bug fixed #3967.
  • audit_log_include_accounts variable did not take effect if placed in my.cnf configuration file, while still working as intended if set dynamically. Bug fixed #3867.
  • A key_block_size value was set automatically by the Improved MEMORY Storage Engine, which resulted in warnings when changing the engine type to InnoDB, and constantly growing key_block_size during alter operations. Bugs fixed #3936#3940, and #3943.
  • Fixes were introduced to remove GCC 8 compilation warnings for the Percona Server build. Bug fixed #3950.
  • An InnoDB Memcached Plugin code clean-up was backported from MySQL 8.0. Bug fixed  #4506.
  • Percona Server could not be built with -DWITH_LZ4=system option on Ubuntu 14.04 (Trusty) because of too old LZ4 packages. Bug fixed #3842.
  • A regression brought during TokuDB code clean-up in 5.7.21-21 was causing assertion in cases when the FT layer returns an error during an alter table operation. Bug fixed #4294.
MyRocks Changes and fixes:
  • UPDATE statements were returning incorrect results because of not making a full table scan on tables with unique secondary index. Bug fixed #4495 (upstream facebook/mysql-5.6#830).
Other Bugs Fixed:
  • #4451 “Implement better compression algo testing”
  • #4469 “variable use out of scope bug in get_last_key test detected by ASAN in clang 6”
  • #4470 “the cachetable-simple-pin-nonblocking-cheap test occasionally fails due to a locking conflict with the cachetable evictor”
  • #4488 “-Werror is always disabled for innodb_memcached”
  • #1114 “Assertion `inited == INDEX’ failed”
  • #1130 “RBR Replication with concurrent XA in READ-COMMITTED takes supremum pseudo-records and breaks replication”

Find the release notes for Percona Server for MySQL 5.7.22-22 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.7.22-22 Is Now Available appeared first on Percona Database Performance Blog.

Don’t Drown in your Data Lake

May 31, 2018 - 10:37am

A data lake is “…a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms…”1. Many companies find value in using a data lake but aren’t clear that they need to properly plan for it and maintain it in order to prevent issues.

The idea of a data lake rose from the need to store data in a raw format that is accessible to a variety of applications and authorized users. Hadoop is often used to query the data, and the necessary structures for querying are created through the query tool (schema on read) rather than as part of the data design (schema on write). There are other tools available for analysis, and many cloud providers are actively developing additional options for creating and managing your data lake. The cloud is often viewed as an ideal place for your data lake since it is inherently elastic and can expand to meet the needs of your data.

Data Lake or Data Swamp?

One of the key components of a functioning data lake is the continuing inflow and egress of data. Some data must be kept indefinitely but some can be archived or deleted after a defined period of time. Failure to remove stale data can result in a data swamp, where the out of date data is taking up valuable and costly space and may be causing queries to take longer to complete. This is one of the first issues that companies encounter in maintaining their data lake. Often, people view the data lake as a “final resting place” for data, but it really should be used for data that is accessed often, or at least occasionally.

A natural spring-fed lake can turn into a swamp due to a variety of factors. If fresh water is not allowed to flow into the lake, this can cause stagnation, meaning that plants and animals that previously were not able to be supported by the lake take hold. Similarly, if water cannot exit the lake at some point, the borders will be breached, and the surrounding land will be inundated. Both of these conditions can cause a once pristine lake to turn into a fetid and undesirable swamp. If data is no longer being added to your data lake, the results will become dated and eventually unreliable. Also, if data is always being added to the lake but is not accessed on a regular basis, this can lead to unrestricted growth of your data lake, with no real plan for how the data will be used. This can become an expensive “cold storage” facility that is likely more expensive than archived storage.

If bad or undesirable items, like old cars or garbage, are thrown into a lake, this can damage the ecosystem, causing unwanted reactions. In a data lake, this is akin to simply throwing data into the data lake with no real rules or rationale. While the data is saved, it may not be useful and can cause negative consequences across the whole environment since it is consuming space and may slow response times. Even though a basic concept of a data lake is that the data does not need to conform to a predefined structure, like you would see with a relational database, it is important that some rules and guidelines exist regarding the type and quality of data that is included in the lake. In the absence of some guidelines, it becomes difficult to access the relevant data for your needs. Proper definition and tagging of content help to ensure that the correct data is accessible and available when needed.

Unrestricted Growth Consequences

Many people have a junk drawer somewhere in their house; a drawer that is filled with old receipts, used tickets, theater programs, and the like. Some of this may be stored for sentimental reasons, but a lot of it is put into this drawer since it was a convenient dropping place for things. Similarly, if we look to the data lake as the “junk drawer” for our company, it is guaranteed to be bigger and more expensive than it truly needs to be.

It is important that the data that is stored in your data lake has a current or expected purpose. While you may not have a current use for some data, it can be helpful to keep it around in case a need arises. An example of this is in the area of machine learning. Providing more ancillary data enables better decisions since it provides a deeper view into the decision process. Therefore, maintaining some data that may not have a specific and current need can be helpful. However, there are cases where maintaining a huge volume of data can be counterproductive. Consider temperature information delivered from a switch. If the temperature reaches a specific threshold, the switch should be shut down. Reporting on the temperature in an immediate and timely manner is important to make an informed decision, but stable temperature data from days, week, or months ago could be summarized and stored in a more efficient manner. The granular details can then be purged from the lake.

So, where is the balance? If you keep all the data, it can make your data lake unwieldy and costly. If you only keep data that has a specific current purpose, you may be impairing your future plans. Obviously, the key is to monitor your access and use of the data frequently, and purge or archive some of the data that is not being regularly used.

Uncontrolled Access Concerns

Since much of the data in your data lake is company confidential, it is imperative that access to that data be controlled. The fact that the data in the lake is stored in its raw format means that it is more difficult to control access. The structures of a relational database provide some of the basis for access control, allowing us to limit who has access to specific queries, tables, fields, schemas, databases, and other objects. In the absence of these structures, controlling access requires more finesse. Determining who has access to what parts of the data in the lake must be handled, as well as isolating the data within your own network environment. Many of these restrictions may already be in place in your current environment, but they should be reviewed before being relied on fully, since the data lake may store information that was previously unavailable to some users. Access should be regularly reviewed to identify potential rogue activities. Encryption options also exist to further secure the data from unwanted access, and file system security can be used to limit access. All of these components must be considered, implemented, and reviewed to ensure that the data is secure.

User Considerations

In a relational database, the data structure inherently determines some of the consistencies and format of the data. This enables users to easily query the data and be assured that they are returning valid results. The lack of such structures in the data lake means that users must be more highly skilled at data manipulation. Having users with less skill accessing the data is possible, but it may not provide the best results. A data scientist is better positioned to access and query the complete data set. Obviously, users with a higher skill set are rare and cost more to hire, but the return may be worth it in the long run.

So What Do I Do Now?

This is an area where there are no hard and fast rules. Each company must develop and implement processes and procedures that make sense for their individual needs. Only with a plan for monitoring inputs, outputs, access patterns, and the like are you able to make a solid determination for your company’s needs. Percona can help to determine a plan for reporting usage, assess security settings, and more. As you are using the data in your data lake, we can also provide guidance regarding tools used to access the data.

1 Wikipedia, May 22, 2018

The post Don’t Drown in your Data Lake appeared first on Percona Database Performance Blog.

MongoDB: deploy a replica set with transport encryption (part 3/3)

May 31, 2018 - 7:59am

In this third and final post of the series, we look at how to configure transport encryption on a deployed MongoDB replica set. Security vulnerabilities can arise when internal personnel have legitimate access to the private network, but should not have access to the data. Encrypting intra-node traffic ensures that no one can “sniff” sensitive data on the network.

In part 1 we described MongoDB replica sets and how they work.
In part 2 we provided a step-by-step guide to deploy a simple 3-node replica set, including information on replica set configuration.

Enable Role-Based Access Control

In order for the encryption to be used in our replica set, we need first to activate Role-Based Access Control (RBAC). By default, a MongoDB installation permits anyone to connect and see the data, as in the sample deployment we created in part 2. Having RBAC enabled is mandatory for encryption.

RBAC governs access to a MongoDB system. Users are created and assigned privileges to access specific resources, such as databases and collections. Likewise, for carrying out administrative tasks, users need to be created with specific grants. Once activated, every user must authenticate themselves in order to access MongoDB.

Prior to activating RBAC, let’s create an administrative user. We’ll connect to the PRIMARY member and do the following:

rs-test:PRIMARY> use admin switched to db admin rs-test:PRIMARY> db.createUser({user: 'admin', pwd: 'secret', roles:['root']}) Successfully added user: { "user" : "admin", "roles" : [ "root" ] }

Let’s activate the RBAC in the configuration file /etc/mongod.conf on each node

security: authorization: enabled

and restart the daemon

sudo service mongod restart

Now to connect to MongoDB we issue the following command:

mongo -u admin -p secret --authenticationDatabase "admin"

Certificates

MongoDB supports X.509 certificate authentication for use with a secure TLS/SSL connection. The members can use X.509 certificates to verify their membership of the replica set.

In order to use encryption, we need to create certificates on all the nodes and have a certification authority (CA) that signs them. Since having a certification authority can be quite costly, we decide to use self-signed certificates. For our purposes, this solution ensures encryption and has no cost. Using a public CA is not necessary inside a private infrastructure.

To proceed with certificate generation we need to have openssl installed on our system and certificates need to satisfy these requirements:

  • any certificate needs to be signed by the same CA
  • the common name (CN) required during the certificate creation must correspond to the hostname of the host
  • any other field requested in the certificate creation should be a non-empty value and, hopefully, should reflect our organization details
  • it is also very important that all the fields, except the CN, should match those from the certificates for the other cluster members

The following guide describes all the steps to configure internal X.509 certificate-based encryption.

1 – Connect to one of the hosts and generate a new private key using openssl openssl genrsa -out mongoCA.key -aes256 8192

We have created a new 8192 bit private key and saved it in the file mongoCA.key
Remember to enter a strong passphrase when requested.

2 – Sign a new CA certificate

Now we are going to create our “fake” local certification authority that we’ll use later to sign each node certificate.

During certificate creation, some fields must be filled out. We could choose these randomly but they should correspond to our organization’s details.

root@psmdb1:~# openssl req -x509 -new -extensions v3_ca -key mongoCA.key -days 365 -out mongoCA.crt Enter pass phrase for mongoCA.key: You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:US State or Province Name (full name) [Some-State]:California Locality Name (eg, city) []:San Francisco Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Company Ltd Organizational Unit Name (eg, section) []:DBA Common Name (e.g. server FQDN or YOUR name) []:psmdb Email Address []:corrado@mycompany.com 3 – Issue self-signed certificates for all the nodes

For each node, we need to generate a certificate request and sign it using the CA certificate we created in the previous step.

Remember: fill out all the fields requested the same for each host, but remember to fill out a different common name (CN) that must correspond to the hostname.

For the first node issue the following commands.

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb1.key -out psmdb1.csr openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb1.csr -out psmdb1.crt cat psmdb1.key psmdb1.crt > psmdb1.pem

for the second node

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb2.key -out psmdb2.csr openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb2.csr -out psmdb2.crt cat psmdb2.key psmdb2.crt > psmdb2.pem

and for the third node

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb3.key -out psmdb3.csr openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb3.csr -out psmdb3.crt cat psmdb3.key psmdb3.crt > psmdb3.pem

4 – Place the files

We could execute all of the commands in the previous step on the same host, but now we need to copy the generated files to the proper nodes:

  • Copy to each node the CA certifcate file: mongoCA.crt
  • Copy each self signed certifcate <hostname>.pem into the relative member
  • Create on each member a directory that only the MongoDB user can read, and copy both files there

sudo mkdir -p /etc/mongodb/ssl sudo chmod 700 /etc/mongodb/ssl sudo chown -R mongod:mongod /etc/mongodb sudo cp psmdb1.pem /etc/mongodb/ssl sudo cp mongoCA.crt /etc/mongodb/ssl

Do the same on each host.

5 – Configure mongod

Finally, we need to instruct mongod about the certificates to enable the encryption.

Change the configuration file /etc/mongod.conf on each host adding the following rows:

net: port: 27017 ssl: mode: requireSSL PEMKeyFile: /etc/mongodb/ssl/psmdb1.pem CAFile: /etc/mongodb/ssl/mongoCA.crt clusterFile: /etc/mongodb/ssl/psmdb1.pem security: authorization: enabled clusterAuthMode: x509

Restart the daemon

sudo service mongodb restart

Make sure to put the proper file names on each host (psmdb2.pem on psmdb2 host and so on)

Now, as long as we have made no mistakes, we have a properly configured replica set that is using encrypted connections.

Issue the following command to connect on node psmdb1:

mongo admin --ssl --sslCAFile /etc/mongodb/ssl/mongoCA.crt --sslPEMKeyFile /etc/mongodb/ssl/psmdb1.pem -u admin -p secret --host psmdb1

Access the first two articles in this series
  • Part 1: Introduces basic replica set concepts, how it works and what its main features
  • Part 2:  Provides a step-by-step guide to configure a three-node replica set

The post MongoDB: deploy a replica set with transport encryption (part 3/3) appeared first on Percona Database Performance Blog.

Percona Server for MySQL 5.6.40-84.0 Is Now Available

May 30, 2018 - 11:55am

Percona announces the release of Percona Server for MySQL 5.6.40-84.0 on May 30, 2018 (downloads are available here and from the Percona Software Repositories). Based on MySQL 5.6.40, including all the bug fixes in it, Percona Server for MySQL 5.6.40-84.0 is now the current GA release in the 5.6 series. All of Percona’s software is open-source and free.

New Features
  • A new string variable version_suffix allows to change suffix for the Percona Server version string returned by the read-only version variable. This allows to append the version number for the server with a custom suffix to reflect some build or configuration specifics. Also version_comment (default value of which is taken from the CMake COMPILATION_COMMENT option) is converted from a global read-only to a global read-write variable and thereby it is now cutomizable.
  • Query response time plugin now can be disabled at session level with use of a new variable query_response_time_session_stats.
Bugs Fixed
  • Compilation warning was fixed for -DWITH_QUERY_RESPONSE_TIME=ON CMake compilation option, which makes QRT to be linked statically. Bug fixed #3841.
  • A code clean-up was done to fix clang 6 specific compilation warnings and errors (bug fixed #3893, upstream #90111).
  • Using -DWITHOUT_<PLUGIN>=ON CMake variable to exclude a plugin from the build didn’t work for some plugins, including a number of storage engines. Bug fixed #3901.
  • A clean-up in Percona Server binlog-related code was made to avoid uninitialized memory comparison. Bug fixed #3925 (upstream #90238).
  • Temporary file I/O was not instrumented for Performance Schema. Bug fixed  #3937  (upstream  #90264).
  • A key_block_size value was set automatically by the Improved MEMORY Storage Engine, which resulted in warnings when changing the engine type to InnoDB, and constantly growing key_block_size during alter operations. Bugs fixed #3936#3940, and #3943.
  • Percona Server Debian packages description included reference to /etc/mysql/my.cnf file, which is not actually present in these packages. Bug fixed #2046.
  • Fixes were introduced to remove GCC 8 compilation warnings for the Percona Server build, retaining compatibility with old compiler versions, including GCC 4.4. Bugs fixed #3950 and #4471.
  • A typo in plugin.cmake file prevented to compile plugins statically into the server. Bug fixed #3871 (upstream #89766).
  • -DWITH_NUMA=ON build option was silently ignored by CMake when NUMA development package was not installed, instead of exiting by error. Bug fixed #4487.
  • Variables innodb_buffer_pool_populate and numa_interleave mapped to the upstream innodb_numa_interleave variable in 5.6.27-75.0 were reverted to their original implementation due to upstream variant being less effective in memory allocation. Now buffer pool is allocated with MAP_POPULATE, forcing NUMA interleaved allocation at the buffer pool initialization time. Bug fixed #3967.
  • audit_log_include_accounts variable did not take effect if placed in my.cnf configuration file, while still working as intended if set dynamically. Bug fixed #3867.
  • Synchronization between between innodb_kill_idle_transaction and kill_idle_transaction system variables was broken because of the regression in Percona Server 5.6.40-83.2. Bug fixed #3955.
  • Executing the SHOW GLOBAL STATUS expression could cause “data drift” on global status variables in case of a query rollback: the variable, being by its nature a counter and allowing only an increase, could return to its previous value. Bug fixed #3951 (upstream #90351).
  • ALTER TABLE … COMMENT = … statement caused TokuDB to rebuild the whole table, which is not needed, as only FRM metadata should be changed. The fix was provided as a contribution by Fungo Wang. Bugs fixed #4280 and #4292.
  • A number of Percona Server 8.0 TokuDB fixes have been backported to Percona Server 5.6 in preparation for using MySQL 8.0. Bugs fixed  #4379#4380#4387#4378#4383#4384#4386#4382, #4391#4390#4392, and #4381.
TokuDB Changes and Fixes
  • Two new variables, tokudb_enable_fast_update and tokudb_enable_fast_upsert, were introduced to facilitate the TokuDB fast updates feature, which involves queries optimization to avoid random reads during their execution. Bug fixed #4365.
  • A data race was fixed in minicron utility of the PerconaFT, as a contribution by Rik Prohaska. Bug fixed #4281.
  • Row count and cardinality decrease to zero took place after long-running REPLACE load, ending up with full table scans for any action.
Other Bugs Fixed
  • #3818 “Orphaned file mysql-test/suite/innodb/r/percona_innodb_kill_idle_trx.result”
  • #3926 “Potentially truncated bitmap file name in log_online_open_bitmap_file_read_only() (storage/innobase/log/log0online.cc)”
  • #2204 “Test main.audit_log_default_db is unstable”
  • #3767 “Fix compilation warnings/errors with clang”
  • #3773 “Incorrect key file for table frequently for tokudb”
  • #3794 “MTR test main.percona_show_temp_tables_stress does not wait for events to start”
  • #3798 “MTR test innodb.percona_extended_innodb_status fails if InnoDB status contains unquoted special characters”
  • #3887 “TokuDB does not compile with -DWITH_PERFSCHEMA_STORAGE_ENGINE=OFF”
  • #4388 “5.7 code still has TOKU_INCLUDE_OPTION_STRUCTS which is a MariaDB specific construct”
  • #4265 “TDB-114 (Change use of MySQL HASH to unordered_map) introduces memory leak”
  • #4277 “memory leaks in TDB-2 and TDB-89 tests”
  • #4276 “Data race on cache table attributes detected by the thread sanitizer”
  • #4451 “Implement better compression algo testing”
  • #4469 “variable use out of scope bug in get_last_key test detected by ASAN in clang 6”
  • #4470 “the cachetable-simple-pin-nonblocking-cheap test occasionally fails due to a locking conflict with the cachetable evictor”
  • #1131 “User_var_log_event::User_var_log_event(const char*, uint, const Format_description_log_event*): Assertion `(bytes_read == (data_written – ((old_pre_checksum_fd || (description_event->checksum_alg == BINLOG_CHECKSUM_ALG_OFF)) ? 0 : 4))) || ((“.

Find the release notes for Percona Server for MySQL 5.6.40-84.0 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.6.40-84.0 Is Now Available appeared first on Percona Database Performance Blog.

MySQL Test Framework for Percona XtraDB Cluster

May 30, 2018 - 3:43am

At my latest webinar “MySQL Test Framework (MTR) for Troubleshooting”, I received an interesting question about MTR test cases for Percona XtraDB Cluster (PXC). Particularly about testing SST and IST.

This post is intended to answer this question. It assumes you are familiar with MTR and can write tests for MySQL servers. If you are not, please watch the webinar recording first.

You can find example tests in any PXC tarball package. They are located in directories mysql-test/suite/galera , mysql-test/suite/galera_3nodes  and mysql-test/suite/wsrep , though that last directory only contains a configuration file.

If you simply try to run tests in galera suite you will find they all are disabled, because the environment variable WSREP_PROVIDER  was not set:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ ./mtr --suite=galera Logging: ./mtr --suite=galera MySQL Version 5.7.19 Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one... - using tmpdir: '/tmp/xYgQqOa5b7' Checking supported features... - SSL connections supported - binaries built with wsrep patch Using suites: galera Collecting tests... Checking leftover processes... - found old pid 30624 in 'mysqld.3.pid', killing it... process did not exist! Removing old var directory... Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'... Installing system database... Using parallel: 1 ============================================================================== TEST RESULT TIME (ms) or COMMENT -------------------------------------------------------------------------- worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009 galera.GAL-419 [ skipped ] Test needs 'big-test' option ... galera.galera_binlog_checksum [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER? galera.galera_binlog_event_max_size_min [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER? galera.galera_flush_gtid [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER? galera.galera_gtid [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER? galera.lp1435482 [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER? ^Cmysql-test-run: *** ERROR: Got ^C signal

In order to run these tests you need to set this variable first.

I use the quite outdated 5.7.19 PXC package (the version does not matter for the purpose of this post) and run tests as:

WSREP_PROVIDER=/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so ./mtr --suite=galera

After the variable WSREP_PROVIDER  is set, mtr  can successfully run:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test

WSREP_PROVIDER=/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so ./mtr --suite=galera Logging: ./mtr --suite=galera MySQL Version 5.7.19 Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one... - using tmpdir: '/tmp/I6HfuqkwR1' Checking supported features... - SSL connections supported - binaries built with wsrep patch Using suites: galera Collecting tests... Checking leftover processes... - found old pid 14271 in 'mysqld.1.pid', killing it... process did not exist! - found old pid 14273 in 'mysqld.2.pid', killing it... process did not exist! Removing old var directory... Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'... Installing system database... Using parallel: 1 ============================================================================== TEST RESULT TIME (ms) or COMMENT -------------------------------------------------------------------------- worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009 galera.GAL-419 [ skipped ] Test needs 'big-test' option ... worker[1] mysql-test-run: WARNING: Waited 60 seconds for /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/run/mysqld.2.pid to be created, still waiting for 120 seconds... galera.galera_binlog_checksum [ pass ] 2787 worker[1] mysql-test-run: WARNING: Waited 60 seconds for /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/run/mysqld.2.pid to be created, still waiting for 120 seconds... galera.galera_binlog_event_max_size_min [ pass ] 2200 ...

Now we are ready to write our first PXC test. The easiest way to get started is to open any existing test and check how it is written. Then modify it so that it replays our own scenario.

Since the question was about testing IST  and SST, I will use the test galera_ist_progress  as an example. First let’s check that it runs successfully and that it does not have any requirements that could prevent it from running inside regular production binaries:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ WSREP_PROVIDER=/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so ./mtr --suite=galera galera_ist_progress Logging: ./mtr --suite=galera galera_ist_progress MySQL Version 5.7.19 Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one... - using tmpdir: '/tmp/EodvOyCJwo' Checking supported features... - SSL connections supported - binaries built with wsrep patch Collecting tests... Checking leftover processes... Removing old var directory... Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'... Installing system database... Using parallel: 1 ============================================================================== TEST RESULT TIME (ms) or COMMENT -------------------------------------------------------------------------- worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009 worker[1] mysql-test-run: WARNING: Waited 60 seconds for /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/run/mysqld.2.pid to be created, still waiting for 120 seconds... galera.galera_ist_progress [ pass ] 17970 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 17.970 of 218 seconds executing testcases Completed: All 1 tests were successful.

Everything is fine. Now let’s look into the test itself.

First, this test has its own configuration file. Let’s check what’s in there:

$ cat suite/galera/t/galera_ist_progress.cnf !include ../galera_2nodes.cnf [mysqld.1] wsrep_provider_options='base_port=@mysqld.1.#galera_port;pc.ignore_sb=true'

galera_2nodes.cnf  is one of the standard configuration files in galera suite. If we look into it we may notice that wsrep_provider_options  is defined and overriding this option is not required for all tests.

We’ll continue our review. The test script includes the galera_cluster.inc  file:

--source include/galera_cluster.inc

This file is located outside of galera suite and contains 2 lines:

--let $galera_cluster_size = 2 --source include/galera_init.inc

galera_init.inc , in its turn, creates as many nodes as defined by the galera_cluster_size  variable and additionally creates a default connection for each of them.

Now let’s step out from galera_ist_progress  and check if this knowledge is enough to create our first PXC test.

I created a simple test based on a two node setup which checks a few status and system variables, creates a table, inserts data into it, and ensures that content is accessible on both nodes:

$ cat ~/src/tests/t/pxc.test --source include/galera_cluster.inc --connection node_1 --echo We are on node 1 select @@hostname, @@port; show status like 'wsrep_cluster_size'; show status like 'wsrep_cluster_status'; show status like 'wsrep_connected'; create table t1(id int not null auto_increment primary key, f1 int) engine=innodb; insert into t1(f1) values(1),(2),(3); select * from t1; --connection node_2 --echo We are on node 2 select @@hostname, @@port; show status like 'wsrep_cluster_size'; show status like 'wsrep_cluster_status'; show status like 'wsrep_connected'; select * from t1; insert into t1(f1) values(1),(2),(3); select * from t1; --connection node_1 --echo We are on node 1 select * from t1; drop table t1;

However, if I run this test in the main suite, it will fail:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ export WSREP_PROVIDER=/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ do_test.sh -s ~/mysql_packages -b Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100 Logging: ./mysql-test-run.pl --record --force pxc MySQL Version 5.7.19 Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one... - using tmpdir: '/tmp/uUmBztSWUA' Checking supported features... - SSL connections supported - binaries built with wsrep patch Collecting tests... Checking leftover processes... Removing old var directory... Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'... Installing system database... Using parallel: 1 ============================================================================== TEST RESULT TIME (ms) or COMMENT -------------------------------------------------------------------------- worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009 main.pxc [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER? -------------------------------------------------------------------------- The servers were restarted 0 times Spent 0.000 of 108 seconds executing testcases Completed: All 0 tests were successful. 1 tests were skipped, 1 by the test itself. =====Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100===== =====pxc===== sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ echo $WSREP_PROVIDER /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so

The reason for this failure is that galera suite has default option files that set the necessary variables. Let’s skip those option files for a while and simply run our test in galera suite:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ do_test.sh -s ~/mysql_packages -b Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100 -t galera Logging: ./mysql-test-run.pl --record --force --suite=galera pxc MySQL Version 5.7.19 Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one... - using tmpdir: '/tmp/ytqEjnfM7i' Checking supported features... - SSL connections supported - binaries built with wsrep patch Collecting tests... Checking leftover processes... Removing old var directory... Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'... Installing system database... Using parallel: 1 ============================================================================== TEST RESULT TIME (ms) or COMMENT -------------------------------------------------------------------------- worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009 worker[1] mysql-test-run: WARNING: Waited 60 seconds for /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/run/mysqld.2.pid to be created, still waiting for 120 seconds... galera.pxc [ pass ] 2420 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 2.420 of 208 seconds executing testcases Completed: All 1 tests were successful. pxc.result =====Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100===== =====pxc===== We are on node 1 select @@hostname, @@port; @@hostname @@port Thinkie 13000 show status like 'wsrep_cluster_size'; Variable_name Value wsrep_cluster_size 2 show status like 'wsrep_cluster_status'; Variable_name Value wsrep_cluster_status Primary show status like 'wsrep_connected'; Variable_name Value wsrep_connected ON create table t1(id int not null auto_increment primary key, f1 int) engine=innodb; insert into t1(f1) values(1),(2),(3); select * from t1; id f1 2 1 4 2 6 3 We are on node 2 select @@hostname, @@port; @@hostname @@port Thinkie 13004 show status like 'wsrep_cluster_size'; Variable_name Value wsrep_cluster_size 2 show status like 'wsrep_cluster_status'; Variable_name Value wsrep_cluster_status Primary show status like 'wsrep_connected'; Variable_name Value wsrep_connected ON select * from t1; id f1 2 1 4 2 6 3 insert into t1(f1) values(1),(2),(3); select * from t1; id f1 2 1 4 2 6 3 7 1 9 2 11 3 We are on node 1 select * from t1; id f1 2 1 4 2 6 3 7 1 9 2 11 3 drop table t1;

You will see that the test reports that the two nodes run on different ports:

We are on node 1 select @@hostname, @@port; @@hostname @@port Thinkie 13000 ... We are on node 2 select @@hostname, @@port; @@hostname @@port Thinkie 13004

… and that PXC started:

show status like 'wsrep_cluster_size'; Variable_name Value wsrep_cluster_size 2 show status like 'wsrep_cluster_status'; Variable_name Value wsrep_cluster_status Primary show status like 'wsrep_connected'; Variable_name Value wsrep_connected ON

And we can also clearly see that each node sees the changes to our test table that were made by the other node.

Now let’s get back to IST  test, defined in galera_ist_progress.test .

In order to test IST  it first stops writes to the cluster:

# Isolate node #2 --connection node_2 SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 1';

Then it connects to node 1 and waits until wsrep_cluster_size  becomes 1:

--connection node_1 --let $wait_condition = SELECT VARIABLE_VALUE = 1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size'; --source include/wait_condition.inc

Then it turns wsrep_on OFF  on node 2:

--connection node_2 SET SESSION wsrep_on = OFF; --let $wait_condition = SELECT VARIABLE_VALUE = 'non-Primary' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status'; --source include/wait_condition.inc SET SESSION wsrep_on = ON;

Now node 2 is completely isolated and node 1 can be updated, so we can test IST  when we bring node 2 back online.

--connection node_1 CREATE TABLE t1 (f1 INTEGER) ENGINE=InnoDB; INSERT INTO t1 VALUES (1); INSERT INTO t1 VALUES (2); INSERT INTO t1 VALUES (3); INSERT INTO t1 VALUES (4); INSERT INTO t1 VALUES (5); INSERT INTO t1 VALUES (6); INSERT INTO t1 VALUES (7); INSERT INTO t1 VALUES (8); INSERT INTO t1 VALUES (9); INSERT INTO t1 VALUES (10);

After the update is done, node 2 is brought online:

--connection node_2 SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 0'; --connection node_1 --let $wait_condition = SELECT VARIABLE_VALUE = 2 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size'; --source include/wait_condition.inc --connection node_2 --let $wait_condition = SELECT VARIABLE_VALUE = 'Primary' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status'; --source include/wait_condition.inc

Once node 2 is online, checks for IST progress are performed. To check for IST progress, the test greps the error log file from node 2 where any messages about IST progress are printed:

# # Grep for expected IST output in joiner log # --connection node_1 --let $assert_count = 1 --let $assert_file = $MYSQLTEST_VARDIR/log/mysqld.2.err --let $assert_only_after = Need state transfer --let $assert_text = Receiving IST: 11 writesets, seqnos --let $assert_select = Receiving IST: 11 writesets, seqnos --source include/assert_grep.inc --let $assert_text = Receiving IST... 0.0% ( 0/11 events) complete --let $assert_select = Receiving IST... 0.0% ( 0/11 events) complete --source include/assert_grep.inc --let $assert_text = Receiving IST...100.0% (11/11 events) complete --let $assert_select = Receiving IST...100.0% (11/11 events) complete --source include/assert_grep.inc

Here is the error log snipped from node 2 when it re-joined the cluster and initiated state transfer.

2018-05-25T17:00:46.908569Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 13) 2018-05-25T17:00:46.908637Z 2 [Note] WSREP: State transfer required: Group state: f364a69b-603c-11e8-a632-ce5a4a7d5964:13 Local state: f364a69b-603c-11e8-a632-ce5a4a7d5964:2 2018-05-25T17:00:46.908673Z 2 [Note] WSREP: New cluster view: global state: f364a69b-603c-11e8-a632-ce5a4a7d5964:13, view# 4: Primary, number of nodes: 2, my index: 1, protocol version 3 2018-05-25T17:00:46.908694Z 2 [Note] WSREP: Setting wsrep_ready to true 2018-05-25T17:00:46.908717Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer. 2018-05-25T17:00:46.908737Z 2 [Note] WSREP: Setting wsrep_ready to false 2018-05-25T17:00:46.908757Z 2 [Note] WSREP: You have configured 'xtrabackup-v2' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server. 2018-05-25T17:00:46.908777Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-05-25T17:00:46.908799Z 2 [Note] WSREP: REPL Protocols: 7 (3, 2) 2018-05-25T17:00:46.908831Z 2 [Note] WSREP: Assign initial position for certification: 13, protocol version: 3 2018-05-25T17:00:46.908886Z 0 [Note] WSREP: Service thread queue flushed. 2018-05-25T17:00:46.908934Z 2 [Note] WSREP: Check if state gap can be serviced using IST 2018-05-25T17:00:46.909062Z 2 [Note] WSREP: IST receiver addr using tcp://127.0.0.1:13006 2018-05-25T17:00:46.909232Z 2 [Note] WSREP: Prepared IST receiver, listening at: tcp://127.0.0.1:13006 2018-05-25T17:00:46.909267Z 2 [Note] WSREP: State gap can be likely serviced using IST. SST request though present would be void. 2018-05-25T17:00:46.909489Z 0 [Note] WSREP: Member 1.0 (Thinkie) requested state transfer from '*any*'. Selected 0.0 (Thinkie)(SYNCED) as donor. 2018-05-25T17:00:46.909513Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 13) 2018-05-25T17:00:46.909557Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0 2018-05-25T17:00:46.909602Z 2 [Note] WSREP: GCache history reset: f364a69b-603c-11e8-a632-ce5a4a7d5964:2 -> f364a69b-603c-11e8-a632-ce5a4a7d5964:13 2018-05-25T17:00:46.910221Z 0 [Note] WSREP: 0.0 (Thinkie): State transfer to 1.0 (Thinkie) complete. 2018-05-25T17:00:46.910422Z 0 [Note] WSREP: Member 0.0 (Thinkie) synced with group. 2018-05-25T17:00:47.006802Z 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset 2018-05-25T17:00:47.106423Z 2 [Note] WSREP: Receiving IST: 11 writesets, seqnos 2-13 2018-05-25T17:00:47.106764Z 0 [Note] WSREP: Receiving IST... 0.0% ( 0/11 events) complete. 2018-05-25T17:00:47.109740Z 0 [Note] WSREP: Receiving IST...100.0% (11/11 events) complete. 2018-05-25T17:00:47.110029Z 2 [Note] WSREP: IST received: f364a69b-603c-11e8-a632-ce5a4a7d5964:13 2018-05-25T17:00:47.110433Z 0 [Note] WSREP: 1.0 (Thinkie): State transfer from 0.0 (Thinkie) complete. 2018-05-25T17:00:47.110480Z 0 [Note] WSREP: SST leaving flow control 2018-05-25T17:00:47.110509Z 0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 13) 2018-05-25T17:00:47.110778Z 0 [Note] WSREP: Member 1.0 (Thinkie) synced with group. 2018-05-25T17:00:47.110830Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 13) 2018-05-25T17:00:47.110890Z 2 [Note] WSREP: Synchronized with group, ready for connections

If you want to write your own tests for IST and SST operations you can use existing test cases as a baseline. You are not required to use grep, and can explore your own scenarios. The important parts of the code are:

  • The variable WSREP_PROVIDER must be set before the test run
  • The test should be either in galera suite or if you choose to use your own suite you must copy the definitions from the galera suite default configuration file
  • The test should include the file include/galera_cluster.inc
  • To isolate the node from the cluster run the following code:

# Isolate node #2 --connection node_2 SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 1'; --connection node_1 --let $wait_condition = SELECT VARIABLE_VALUE = 1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size'; --source include/wait_condition.inc --connection node_2 SET SESSION wsrep_on = OFF; --let $wait_condition = SELECT VARIABLE_VALUE = 'non-Primary' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status'; --source include/wait_condition.inc SET SESSION wsrep_on = ON;

Replace the node numbers if needed.

To bring the node back to the cluster run the following code:

# Restore node #2, IST is performed --connection node_2 SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 0'; --connection node_1 --let $wait_condition = SELECT VARIABLE_VALUE = 2 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size'; --source include/wait_condition.inc --connection node_2 --let $wait_condition = SELECT VARIABLE_VALUE = 'Primary' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status'; --source include/wait_condition.inc

Depending on the size of the updates and gcache you can test either IST or SST in this way.

The post MySQL Test Framework for Percona XtraDB Cluster appeared first on Percona Database Performance Blog.

Deploying PMM on DigitalOcean

May 29, 2018 - 5:22am

It’s very easy to install Percona Monitoring and Management (PMM) on DigitalOcean. If you’ve never used DigitalOcean before, you will find that it is user-friendly and not very expensive. For $5/month you can easily host your PMM on it, letting you monitor your simple infrastructure or try out PMM before implementing it to monitor your production environments.

Let’s prepare the DigitalOcean instance

Log in to DigitalOcean (DO) control panel and click “Create Droplet.”

Thanks to DO you can skip the boring OS setup and save time by using the Docker “One click app” in DO and the Docker image from PMM.

Note: After clicking on “Docker…” choose an instance size that accommodates your budget – PMM can run on as little as the 1GB 1vCPU instance!

Note: Scroll again!

Next step – select a nearby region

Since the next Percona Live Europe, 2018 will be in Frankfurt (https://www.percona.com/blog/2018/04/05/percona-live-europe-2018-save-the-date/ ) for me the location choice is obvious.

The final step in this section is ‘Set Hostname’

I recommend you add ‘pmm-server-‘ at the beginning so that you can easily find it in your control panel. The name in my case is ‘pmm-server-docker-s-1vcpu-1gb-fra1-01’ and I’ll use it later in this tutorial.

Click “Create” and wait a while.You can follow the process on the dashboard:

When the Droplet is created, you’ll get an email with your login details.

The next step is ‘Set up PMM into the Droplet’

SSH to the server, change the password, and let’s prepare to install the PMM server.

================== random@random-vb:~$ ssh root@X.X.X.X ... "ufw" has been enabled. All ports except 22 (SSH), 80 (http) and 443 (https) have been blocked by default. ... Changing password for root. (current) UNIX password: Enter new UNIX password: Retype new UNIX password: root@pmm-server-docker-s-1vcpu-1gb-fra1-01:~# ====================

Note the output for the first login. You are getting Ubuntu 16.04 with pre-installed Docker.

The instructions for installing PMM are very simple. You can read them at https://www.percona.com/doc/percona-monitoring-and-management/deploy/server/docker.html

1) Pull the latest version from Docker Hub:

docker pull percona/pmm-server:latest

Wait for some time (this depends on your internet connection)

2) Create a container for persistent PMM data

docker create -v /opt/prometheus/data -v /opt/consul-data -v /var/lib/mysql -v /var/lib/grafana --name pmm-data percona/pmm-server:latest /bin/true

3) Create and launch PMM Server in one command

docker run -d -p 80:80 --volumes-from pmm-data --name pmm-server --restart always percona/pmm-server:latest

Just to confirm that your containers are available, go ahead and run “docker ps.” You’ll see something like this:

root@pmm-server-docker-s-1vcpu-1gb-fra1-01:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5513858041f7 percona/pmm-server:latest "/opt/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp, 443/tcp pmm-server

That’s all! Congratulations! Your PMM server is running.

If you open the IP of your server in the browser, you’ll see something like this:

There you can see that PMM has already started monitoring itself.

Now you need to install PMM client on your database server and configure it, instructions for this are at https://www.percona.com/doc/percona-monitoring-and-management/deploy/client/index.html

Please note, if you also use DO for the database server by external IP, you’ll probably face “the firewall problem.” In this case, you need to open ports using the “ufw” tool. (See the welcome message from Digital Ocean). For testing purposes, you can use

ufw allow 42000:42999/tcp

To open only pmm-client related ports, follow https://www.percona.com/doc/percona-monitoring-and-management/glossary.terminology.html#term-ports  To run ufw, you need to use the terminal, and you can find more information about ufw at https://www.digitalocean.com/community/tutorials/ufw-essentials-common-firewall-rules-and-commands  Once you have opened up the ports, PMM should now work correctly for this setup.

Final recommendation: Depending on your load you may need to monitor your System Overview dashboard which you’ll find at http://X.X.X.X/graph/somesymbols/system-overview

If you are out of space, upgrade your DO Droplet.

The post Deploying PMM on DigitalOcean appeared first on Percona Database Performance Blog.

Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.