Laurie Coffin welcomes everyone to the Percona Live Europe Open Source Database Conference.
Join Geir, Development Director for MySQL, to learn about the current state of MySQL development. Learn juicy tidbits of what to expect in MySQL 8.0, beyond what you see in the current Developer Milestone Releases!
You may know Continuent Tungsten for our highly advanced MySQL replication tool or for our state-of-the-art MySQL clustering solution, Tungsten Clustering. Our solutions are used by leading SaaS vendors, e-commerce, financial services and telco customers. But there are more, many more, Tungsten deployments out there. Tungsten Replicator is also an Oracle replication solution, the "Oracle GoldenGate without the price tag”. Tungsten Replicator can also be used for real-time data loading into analytics, from MySQL and Oracle into Cassandra, Elasticsearch, Kafka, Redshift and Vertica. And there could be more... How about Tungsten Backup? Using the power of the Tungsten Transaction History Log (THL) , we may create the ultimate continuous backup solution with flexible point-in-time recovery. Would you be interested, especially for free? What about the ultimate proxy, a stand-alone Tungsten Connector? To support our Clustering solution, Continuent has developed one of the most advanced proxies available. Could it be time to unleash our Connector for public use?
A Q&A with the Authors of the newly released O'Reilly title: Database Reliability Engineering. Join Laine and Charity as they discuss their new book, "Database Reliability Engineering", which focuses on designing and operating resilient database systems and uses open-source engines such as MySQL, PostgreSQL, MongoDB, and Cassandra as examples throughout.
Pepper.com is purposely different than other platforms that list daily deals. Around the clock, the community seeks and finds the best offers in fashion, electronics, traveling and much more. With 500 million page views, more than 25 million users and over 70,000 user-submitted deals per month across communities in the United States, Europe and Asia, Pepper has quickly become the largest community deal platform worldwide. The minute-by-minute back and forth with customers and the application – tracking new postings, rankings, messages, etc. – means database responsiveness and uptime are crucial to maintaining an excellent user experience. Pavel will describe how Pepper optimizes their database performance to make sure their web applications remain responsive and meet users’ expectations.
When Intel launched the Xeon Scalable Processors in July 2017 the database benchmark used was HammerDB. HammerDB is an open source graphical benchmarking tool that enables the comparison between both open source and commercial databases on multiple platforms for OLTP and Query based workloads.
This presentation takes a real-world example of comparing MariaDB with a commercial database on Linux on Intel to show how to understand the benchmarks used and how to tune, configure and present findings on both performance and cost in a clear and concise way to evaluate the move to an open source database platform.
Based on the findings this session will share key learnings on current optimal platforms and storage technologies for database as well as the Intel focus on applying technologies to open source database acceleration such as FPGA and SSD and non-volatile memory.
Insights and previews will also be given into ongoing HammerDB development.
The new MySQL InnoDB Cluster is an out-of-the-box high availability (HA) solution for MySQL 5.7 and later. It combines the MySQL Server, the MySQL Router and the MySQL Shell for an easy-to-use, integrated solution. Setting up and managing the cluster with the new MySQL Shell puts HA into the hands of everybody, making it a core part of every MySQL installation This session starts with a FAQ covering myths and reality about MySQL InnoDB Cluster and then gives a high-level overview of the MySQL InnoDB Cluster feature set, shows how to use it and why there is no excuse to treat HA as an afterthought anymore.
We will also cover the concept of MySQL Group Replication and explain the best practices.
The session ends with an overview of the latest development.
Vitess (vitess.io) is a NewSQL system built as a sharding middleware for MySQL. It is especially suited for cloud deployment, and is beginning to gain traction in the community.
This session will focus on Vitess's new query handling features, its pluggable sharding scheme, and how the two work together seamlessly. There will be a demo at the end.
At the Wikimedia Foundation (host of Wikipedia and many other open collaborative projects) we work on a limited budget, donated by our many generous donors. As many other companies that are not Facebook- or Google-sized, we have to do more with less both in terms of budget and our small number of Ops in order to serve the over 400 thousand requests per second and the 1200 million monthly users. We made several mistakes (and a few successes) along the road regarding architecture and hardware decisions, especially for the database-distributed components, storage model, hardware chosen, server size, technology adoption, etc. Now we want to share those with you.
As service providers, one of our responsibilities is helping clients understand what causes contributed to a production downtime incident, and how to avoid (as much as possible) them from happening again. We do this with Incident Reports, and one common recommendation we make is to have a historical monitoring system in place. All our clients have point-in-time monitoring solutions in place, solutions that can alert them when a system is down or behaving in unacceptable ways. But historical monitoring is still not common, and we believe a lot of companies can benefit from deploying one of them.
In most cases, we have recommended Percona Monitoring and Management (PMM), as a good and Open Source solution for this problem. In this session, we will talk about the reasons why we recommend PMM as a way to prevent incidents, and also to investigate their possible causes when one has happpened.
At LifeStreet we needed to scale our real time ad analytics platform to multiple petabytes. We evaluated and used a number of Open Source and Commercial Solutions, but they were not efficient enough or too expensive. When Yandex has released ClickHouse to Open Source we quickly realized its potential, and started our implementation project. It was a long way but it finally worked out great.
In this presentation I will talk about our experiences from application developer's viewpoint - what worked well and not so well, what challenges we had to overcome as well as share share the best practices for building large scale platform based on ClickHouse.
Percona Monitoring and Management (PMM) is a platform using Prometheus, Grafana and other tools. This sounds great at the high level, but how do you use it in the real world? What benefit is it to me?
This talk covers these questions and more. We will look at how to use PMM for MongoDB on four examples of different problems, and how you might go about tracing what happened and understanding if it is a normal flow.
PostgreSQL has had JSONB support since 2012 with 9.2, but is it fast enough to beat MongoDB? In this talk, we compare the performance of using schemaless documents with both PostgreSQL and MongoDB for high performance workloads.
Today, the world of IOT is still in a primordial stage: several vendors offer ?platforms? hoping to cover all the aspects of IOT projects in an attempt to simplify the flow and management of IOT data.
FogLAMP is the effort of organizations active in IOT, IIOT (Industrial Internet of Things) and Fog Computing to provide a fully open source stack operating from the Edge and integrated with other Cloud and Enterprise solutions. FogLAMP?s main objective is to simplify the management of IOT data, whether this data is simply stored and forwarded, or consumed and analyzed at the Edge.
We will talk about what is FogLAMP and how it works. We will explore the pluggable architecture and how the modularity of the product allows developers and architects to build IOT projects. We will also see FogLAMP in action in a demo with sensor data collected from the Edge, analyzed locally and pushed to OSIsoft PI System, used to collect, analyze, visualize time-series data.
In any busy operations environment, there are countless tasks to perform - some monthly, or weekly, some daily or more frequently, and some on an ad-hoc basis. And automation is key to performing fast, efficient and consistently repeatable software deployments and recovery.
There are many generic tools available, both commercial and open source, to aid with the automation of operational tasks. Some of these tools are even deployed in the database world. However, there are a small number of specialist domain-specific automation tools available also, and we are going to compare two of these products: MongoDB?s own Ops Manager, and ClusterControl from Severalnines.
We will cover:
* Installation and maintenance
* Complexity of architecture
* Options for redundancy
* Comparative functionality
* Monitoring, Dashboard, Alerting
* Backing up and restoring
* Automated deployment of advanced configurations
* Upgrading existing deployments
Participants should take away a clear understanding of the differences between these tools, and how they help automate and manage MongoDB operations.
Anyone looking for a high availability master-slave management solution for MySQL may come across ProxySQL and Orchestrator. This combination of products solves many problems, but still requires some manual labour when the configuration changes, when there is a network split and other scenarios. In this talk I will discuss the standard architecture, the solutions it provides and what it's missing. I will then share an automation solution developed at Wix, that solves those problems using Consul, to combine everything together.
We've migrated our platform (Kinja) from a datacenter-based approach to AWS, including migration of standalone MySQL hosts to RDS/Aurora.
I'd like to talk about our findings, what kind of problems we were hitting during this transition, giving you a hands-on experience about how you should change your thinking when you decide to move into a managed database service because it's a kind of different way compared to what you used to have.
I'd like to show you our best practices, I'd like to show some characteristics of Aurora, I'd like to show you a few of our utilties that we had to create to make daily operations possible.
The MySQL marketplace has quite a few High-Availability (HA) solutions such as Continuent Tungsten, Galera/XtraDB Cluster, various MySQL script/patch solutions, and RDS with multi-zone HA. Providing seamless, automatic failover with zero down-time maintenance in one data center can be challenging, but extending that same functionality across multiple sites in different continents truly makes that a difficult goal to achieve. Continuent has perfected various cluster solutions over the years and we have many customers running them in production, with more currently converting from Galera or RDS to our solutions.
ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.
Acting as a data bridge between MySQL protocol and ClickHouse protocol, ProxySQL now enables MySQL clients to execute queries in ClickHouse through it.
In this session we will show how to configure ClickHouse as a backend for ProxySQL, and how a MySQL client (for example in PHP) will be able to execute data reports in ClickhHouse
Instead of using ETL Tools, which consume tons of memory on their own system, you will learn how to do ETL jobs directly in and with a database: PostgreSQL.
PostgreSQL Management of External Data (SQL/MED) is also known as Foreign Data Wrapper (FDW). With FDW, there is nearly no limit of external data, that you could use directly inside a PostgreSQL database.
This talk will show you how to use them with examples accessing several data sources.
When storing time-series data, many developers start with some well-trusted system like Postgres, but as their data hits a certain scale, give up its query power and ecosystem by migrating to a NoSQL or other "modern" time-series architecture.
In this talk, I describe why this trade-off is unnecessary, and how we've built TimescaleDB, an efficient, scalable time-series database engineered up from Postgres. The nature of time-series workloads--appending data about recent events--presents different demands than transactional (OLTP) workloads. We've architected our time-series database to take advantage of and embrace these differences.
TimescaleDB improves insert rates by 15X over Postgres, even on a single node. By right-sizing chunks, it avoids the "performance cliff" Postgres experiences once reaching table sizes of 50+ million of rows, while offering compelling complex query performance improvements. TimescaleDB is packaged as a Postgres extension, released under the Apache
MariaDB has had its first 10.3 alpha release. This talk will go into what new features the MariaDB team has planned for this release.
Notable features include the "AS OF" syntax, as well as a subset of PL/SQL syntax that Oracle supports.
In this session we'll explore the different aspects of Consul and understand how it can be leveraged to provide opinionated Service Discovery and a distributed K/V store.
We will cover:
- Consul architecture & Service Discovery concepts
- Installation & configuration
- Service Registration
- Service Discovery using the Consul API & DNS
- K/V Store & ACL
- Using Consul with Prometheus
- Monitoring Consul
After this session you'll have had an overview to give you everything you need to get started with Consul, and should have a good understanding of how it can be leveraged in the context of software such as MySQL, ProxySQL, & Prometheus.
MySQL security is an interesting subject. In this session we'll show how to use Vault (https://www.vaultproject.io) to create a secrets store we can authenticate against using GitHub. On the other side, we'll demonstrate how we can use Vault to dynamically grant and revoke rights on a running MySQL Server. The end result allows us to not have to have a real MySQL user, but instead we first authenticate against Vault and then ask Vault for temporary access credentials to the MySQL instance of choice.
gh-ost is a tool by GitHub that changes the paradigm of MySQL online schema changes, designed to overcome today's limitations and difficulties in online migrations. gh-ost is:
- Triggerless: no triggers placed;
- Pausable: can suspend master writes altogether
- Lightweight: makes a low impact on the master database
- Controllable: one can interact with an executing gh-ost process, get info and reconfigure parameters
- Testable: gh-ost allows for testable, safe, non obtrusive migrations in production
- Designed to allow for multiple concurrent migrations
In this session we will:
- Introduce gh-ost, explain the reasoning for developing a new tool
- Describe the underlying logic
- Compare with existing online schema change tools
- Show off extra perks that make gh-ost operations so friendly
- Discuss the roadmap and present with surprising implications
gh-ost is open sourced under the MIT license
Planning to offer MySQL, but hoping to offer high availability or horizontal scaling capacity? Galera seems like the perfect fit! And it can be, so long as you and your developers are aware of several important hazards. Galera's documentation hints at these, but understanding the implications of their warnings can be tricky. This presentation will take you through the street signs you may encounter on the road to Galera, and how to interpret them quickly. This guide will help you choose the best path for you and your users.
The open source Platform as a Service project, Cloud Foundry, gives developers self-service access to relational databases. The open source project, cf-mysql, allows CF Operators reliable, automated access to slices of a MySQL Galera cluster. Over the years, we've experienced many different deployments under many different conditions. We'll share what we've learned, what worked, and what we'd do differently based on our use cases.
I will walk through Yandex's development of ClickHouse, and how its iterative approach to organizing data storage has resulted in a powerful and extremely fast open source system.
This presentation will highlight the new support for SQL window functions in MySQL 8.0: showing how they can be used to simplify and speed up analytical queries. The talk will first give a gentle introduction to basic concepts like partitions, frames and peers, explaining the differences between physical and logical frames. Then we move on to the two kinds of window functions and show their usage: SQL aggregates like COUNT and SUM used as window functions, as well as the dedicated window functions like ROW_NUMBER, NTILE, LEAD and more.
We will also cover some implementation aspects, particularly as they pertain to performance.
MySQL offers all the standard non-aggregate window functions, as well as most of the existing MySQL aggregate functions used as window functions. Window functions have the potential to greatly speed up many kinds of queries, and should be in the repertoire of all SQL developers.
This talk is intended to give a basic overview of the encryption requirements of several current compliance standards (EU GDPR, PCI DSS, HIPAA/HITRUST, and SOC II TSP) and how the "at rest" encryption component can be met in a technology-agnostic way.
Have you wanted to deploy some cool new database to production, and simply can't make use of transparent data encryption? Come to this talk to find out how to use LUKS/dm-crypt to perform at-rest encryption, and where this fits into your overall compliance stance.
Yes, you read that right. Microsoft loves MySQL and PostgreSQL! Azure Database for MySQL and PostgreSQL are Microsoft's first foray into OSS databases in Azure as fully-managed PaaS offerings. Come and learn about the platform architecture that powers Azure Database for MySQL and PostgreSQL in Azure and where Microsoft is headed next in this space!
As a support engineer, everyday we get dozens of pt-stalk captures from our customers containing samples of iostat, vmstat, top, ps, SHOW ENGINE INNODB STATUS, SHOW PROCESSLIST and a multitude of other diagnostics outputs.
These are the tools of the trade for performance and troubleshooting, and we must learn to digest these outputs in an effective and systematic way to allow us to provide high quality service to a large volume of customers, and that is the knowledge we want to share through this presentation.
We will learn to setup, capture data, write plugins to trigger collection and to capture custom data, look at our systematic approach and learn what data to read first and how to unwind the tangled threads of pt-stalk.
By the end of this presentation you will have expert knowledge on how to capture diagnostic metrics at the right time and will have a generic approach to digest the captured data, allowing you to work on a large number of problems common to MySQL setups.
At a blistering pace and for a variety of reasons, companies are migrating their on-premise database infrastructures to cloud-based solutions?to save costs on hardware, tame the impact of disaster recovery or even to improve security. Zalando is not an exception and more than two years ago we migrated our first production services to AWS.
In addition to the fully managed database services like RDS and Aurora, Amazon offers a wide spectra of EC2 instances with different performance and price. Without a lot of experience in running cloud databases it's not easy to make a right choice and as a result you will either have a pure database performance or you will overpay for over-provisioned resources.
In this talk I will explain why we decided to run most of our databases on EC2 Instances instead of RDS, how we chose EC2 Instance types and EBS Volume sizes, which AWS CloudWatch metrics MUST be monitored (and why), what problems we hit and how to avoid them.
The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.
ClickHouse is an open source DBMS for high-performance analytics, originally developed at Yandex for the needs of Yandex.Metrica web analytics system. It is capable of storing petabytes of data and processing billions of rows per second per server, all while ingesting new data in real-time.
I will talk about architectural decisions we made with ClickHouse, their consequences from the point of view of an application developer and how to determine if ClickHouse is a good fit for your use case.
I will cover the following topics:
* Overview of storage engine and query execution engine.
* Data distribution and distributed query processing.
* Replication and where it sits on the consistency-availability spectrum.
This talk is intended to give an overview of the feature differences between Percona Server for MongoDB and MongoDB Community Edition. We'll also dive into how some of these features are useful to a database developer or DBA in troubleshooting issues, increasing query performance, improving reliability, and improving security.
Everyone already knows the Jsonb data type: one of PostgreSQL's most attractive features that allows efficient work with semi-structured data without sacrificing strong consistency and ability to use all the power of proven relational technology. But what exactly is inside Jsonb? Are there any caveats, and how can you accidentally bring down performance?
We will discuss all these questions together with advantages and disadvantages of using Jsonb in different situations in comparison with other solutions and existing standards. I'll show some important best practices about how to write compact queries to work with Jsonb, and avoid common mistakes/performance problems.
Database management systems (DBMSs) are the most important component of any data-intensive application. They can handle large amounts of data and complex workloads. But they're difficult to manage because they have hundreds of configuration "knobs" that control factors such as the amount of memory to use for caches and how often to write data to storage. Organizations often hire experts to help with tuning activities, but experts are prohibitively expensive for many.
In this talk, I will present OtterTune, a new tool that can automatically find good settings for a DBMS?s configuration knobs. OtterTune differs from other DBMS configuration tools because it leverages knowledge gained from tuning previous DBMS deployments to tune new ones. Our evaluation shows that OtterTune recommends configurations that are as good as or better than ones generated by existing tools or a human expert.
This talk will cover backup and recovery solutions for MongoDB replica sets and clusters, focusing on online and low-impact solutions for production systems.
Percona XtraDB Cluster is a very robust, high performing and widely used solution that answers high availability needs. But it can be very challenging when deploying the cluster over a geographically dispersed area.
This presentation will briefly discuss the right approach to successfully deploying Percona XtraDB Cluster when in the need to cover multiple geographical sites, close and far.
- What is Percona XtraDB Cluster and what happens in a set of nodes during commit
- Clarify what geo-dispersed means
- What to keep in mind
- How to correctly measure metrics
- Use sync the right way (sync/async)
- Use tools like replication_manager
"It's just a log, right?" How hard can it be, how can you possibly mess this up?
Wrong. Logs can impact your reliability, performance and quality of sleep in a million ways small and large. In this session we'll cover some of the lessons every engineer should know (and often learns the hard way), such as why good logging solutions are so expensive, why treating your logs as strings can be costly and dangerous, how logs can impact code efficiency and add/fix/change race conditions in your code. And what's the difference between a log line and an event, anyway?
We'll talk about how to craft a good, helpful log line or event and how to spot a bad one. We'll also talk about trends in debugging for complex systems, like the drive for structured logs/events and what comes next.
This presentation gives a sharing about how we scale MySQL Databases in AWS. At Grab, we are using AWS RDS for most of our core services. In following talk, we will go through the different phases of our database infrastructure, the challenges and issues we have met at each phase, as well as the corresponding modification and optimisations we had for achieving better database performance to support various services that we have.
Producing backups can be a complex task to achieve. Full backup, incremental backup, streaming backups, restore backups and encrypt backups. There are many factors to take into consideration.
This talk covers the best practices for backups using Percona XtraBackup.
We all know those conference talks that bleat on about doing the right thing at the right time. This talk aims to reveal some of the anti-best practices to illustrate how some installations of MySQL are doomed from day 0. Infrastructure choice, queries and everything in between deserve special attention if you really want to fail fast.
This talk will cover:
- Picking the worst hardware you can for your mission critical database
- Schema over-engineering
- Query disasters
- Split brain scenarios we all want to see
- Replication disaster zone
- Highly available fails
- Accumulate these tips for instant dismissal
Starting with MySQL 5.7 a new Document Store feature has been introduced that makes working with JSON documents an integral part of the MySQL experience. The new X DevAPI gives MySQL users the best of both worlds - SQL and NoSQL - and allows an entirely new category of use cases for managing data. It is constantly evolving based on the community feedback and can be run on top of the brand new MySQL InnoDB Cluster feature. This session gives an overview of the Document Store possibilities and we will migrate data from MongoDB to MySQL and finally play with the data using NoSQL and SQL.
Icinga is a popular open source successor of Nagios that checks hosts and services, and notifies you of their statuses. But covering availability is not enough for a comprehensive database monitoring. On top of that, you need metrics for performance and growth to deal with your scaling needs. Adding conditional behaviours and configuration in Icinga is not just intuitive, but also intelligently adaptive at runtime. This makes it easy to deal with a bunch of different database flavours at once.
If we are talking about any MySQLish, PostgreSQL, MongoDB, or whatever open source database, Icinga is able to give you all the needed information. The talk will give you a detailed introduction into Icinga's abilities and shows practical guidelines for a successful database monitoring in a live demo.
Unlike MySQL (and the MySQL manual), there are hidden variables in MongoDB that no one really documents. But they can control things such as how data sets are returned, to connection pool size/ timeouts, to even if intersections should be allowed.
In this talk, we will show you examples of these knobs, how to find them on GitHub, what they are for and when to use them.
Beyond that, we will also talk about some very unknown WiredTiger and MongoRocks internal engine settings that are useful for when your database grows and needs a bit of tweaking to save the day.
MySQL replication has been changing considerably over the years to meet the demands of highly volatile and very dynamic deployments of MySQL technology. The fact is that the MySQL that powered the LAMP stack is now powering the infrastructure of many of the popular cloud solutions out there. A common factor across them is replication, which has been often used for read scale-out. In fact, it has also deployed as the foundation to provision high availability and address disaster recovery when deployed across wide-area networks. Elasticity and redundancy are thus key properties in old as in new deployments.
The new replication features in MySQL 8 continue to match the requirements set in order to address web scale scenarios. All of this without compromising on features that are also important when deploying MySQL on premise or local clouds.
Come and learn about the new replication features in MySQL 8 that you will use to grow your business on top
sysbench is a benchmark tool that is quite ubiquitous in the MySQL community. It is used by both beginners and huge corporations alike as a quick way to evaluate general system performance, a universal measuring tool to compare configuration or code changes, server releases or server flavors, or as a part of QA process. This session will present new features provided by recent releases and explain how they can be used to create complex benchmark scenarios and collect performance metrics with a simple Lua API.
We will also run a live demo of some of the new sysbench features.
This talk is an unbiased look at understanding the high-level uses and differences between open source databases. We'll see what relational, document, key-value and columnar databases are meant for, and when you should avoid them.
Percona Monitoring and Management (PMM) is a free and open-source solution for managing and monitoring MySQL and MongoDB performance. It provides accurate per-second analysis for MySQL and MongoDB servers, which allows you to tune the database as efficiently as possible.
This session will also be a review of internal PMM architecture, an overview of all components, and the communications between them.
Cloudflare operates multiple DNS services that handle over 100 billion queries per day for over 6 million internet properties. We collect and aggregate logs for these queries for customer analytics, DDoS attack analysis and ad-hoc debugging. Due to the scale at which we operate, we've had to be creative in our implementation. In this talk, I'll go into more detail on the architecture we use for log ingestion and insertion into a ClickHouse cluster, as well as how we aggregate the data over time for longevity. I'll also touch on the tools we use downstream of ClickHouse to visualize and analyze the data ad-hoc.