Laurie Coffin welcomes everyone to Percona Live Open Source Database Conference 2018
How companies build applications and deploy databases has changed drastically over the last 5 years. Enterprises are moving applications and workloads to the cloud in order to take advantage of flexibility, match resource consumption to actual needs and reduce hardware and software expenses. This panel will discuss the rapid changes occurring with databases deployed in the cloud and what that means for the future of databases, management and monitoring and the role of the DBA and developer.
It's obvious that macro trends such as cloud computing, microservices, containerization, and serverless applications are fundamentally changing how we architect, build, deploy, and operate modern applications. We've already seen how these changes have affected our data platforms dramatically over the past few years. Where is this going? Are we about to see the total obsolescence of the basic administration things we do today, like backups and upgrades? What about schema design, query optimization, and indexing? Will there even BE a database as we know it in ten years? And what role will open source and free software play? Bring your bitpens and write Baron's predictions into the blockchain, because one thing's sure: he's going to say a lot of things that will be proven wrong.
Upwork is the largest freelancing website for connecting clients and freelancers. Learn what MongoDB is used for at Upwork, how they chose the database, and how Percona helps make them successful.
This talk will go through the set of optimizations available to MariaDB's query optimizer in 10.3.
We will also compare MariaDB's optimizer and special querying capabilities with that of other MySQL branches as well as other databases, to provide a broad overview of each solution's strengths and weaknesses, when it comes to regular OLTP as well as Analytical queries.
postgres_dba (https://github.com/NikolayS/postgres_dba) - is a brand new open source DBA tool set, which can be used by any application developer to find database issues possible solutions much faster.
In this talk, we'll discuss why is it important to have proper tools to analyze database health, and why DBA's work very often looks like black magic.
Taking PostgreSQL as an example (including its managed cloud versions like Heroku Postgres or AWS RDS Postgres), we'll discuss what should be done to make DBA's tasks of maintaining and optimizing databases more efficient and automated.
We'll cover these topics:
- controlling sizes of tables and indexes;
- controlling bloat level and autovacuum;
- index set optimization;
- major disadvantages of current approaches to group slowest queries;
- using machine learning to find, analyze slowest and optimize slowest queries iteratively.
MySQL is the backbone of Slack's data storage infrastructure, handling billions of queries per day across thousands of sharded database hosts. We are the midst of migrating this system to use Vitess' flexible sharding and topology management instead of simple application-based shard routing and manual administration. This effort aims to provide an architecture that scales to meet the growing demands of our largest customers and features while under pressure to maintain a stable and performant service.
This talk will present the core motivations behind our decision, why Vitess won out as the best option, and how we laid the groundwork for the migration within our development teams. We will then present some challenges and surprises (both good and bad) found during our transition and our contributions to the Vitess project that mitigated them. Finally, we will discuss the future plans for our migration and suggest improvements to the Vitess ecosystem to aid other adoption efforts.
In this day and age, maintaining privacy throughout our electronic communications is absolutely necessary. Creating user accounts and not exposing your MongoDB environment to the wider internet are basic concepts that have been missed in the past. Once that has been addressed, individuals and organizations interested in becoming PCI compliant must turn to securing their data through encryption.
With MongoDB, we have two options for encryption: at rest (only available as enterprise feature with MongoDB) and transport encryption. In this session, we will review:
- Why encryption is important
- What are the prerequisites to set up encryption
- Step by step for encryption at rest and in transit
- Encrypting data with volume encryption in the cloud
- Percona for MongoDB encryption features
The presentation will discuss so of the best practices in determining whether to put you MySQL instances on Amazon RDS, Amazon Aurora or just leave it on-premise. The session will go into details of the pros vs cons of each platform such as performance, versioning, limitations and more. After this session, you will be equipped with
PostgreSQL version 10 has added logical replication, and now the field of replication options in PostgreSQL has gotten wide: Streaming replication, warm standby, logical replication?
We'll discuss what the options are, their limitations and pitfalls, and what the best use-case for each one is. We'll show what it takes to set each one up, monitor it, and get it working again on failures. We'll cover:
* The history of replication in PostgreSQL.
* WAL shipping
* Streaming replication
* Trigger-based replication
* Logical decoding
* And some exotic animals
Apache Mesos and DC/OS are powerful tools to manage, deploy, and maintain services. But, rolling your own stateful application on top of DC/OS requires a deep understanding of Apache Mesos primitives and DC/OS components. Enter the DC/OS SDK.
The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.
At Yelp we have a constantly growing polyglot data tier consisting of datastores such as Cassandra, Elasticsearch, MySQL and Zookeeper. These distributed datastores often ask to be treated like pets but can only be reared like cattle given the scale of our systems. Requiring engineers to pamper them individually is neither feasible nor scalable. We need cluster automation which is powerful, resilient and reliable, and more importantly safe. This is where Taskerman steps in.
Taskerman is a distributed cluster task manager, wearing many hats to keep our clusters highly available, consistent, secure and in an optimal condition. Reusability has also been our focus, hence Taskerman has been built on top of AWS and existing open source infrastructures like Yelp PaaSTA, Zookeeper and Sensu.
This talk covers the genesis of Taskerman inside Yelp, its architecture and evolution. Much like the infrastructure it stands on top of, we also hope to open-source Taskerman in the future.
MariaDB 10.3 is rapidly approaching GA status. This talk will go through all new features coming in MariaDB 10.3. Highlights are:
* Oracle Compatibility Layer
* System Versioned Tables
* Custom Aggregate Functions
Since the beginning, Facebook has used a conventional username/password to secure access to production MySQL instances. Over the last few years we've been working on moving to x509 TLS client certificate authenticated connections. Given the many types of languages and systems at Facebook that use MySQL in some way - this required a massive amount of changes for a lot of teams.
This talk is part technical overview of how our new solution works and part hard-learned tricks for getting an entire company to change their underlying mysql client libraries.
Starting with MySQL 5.7 a new Document Store feature has been introduced that makes working with JSON documents an integral part of the MySQL experience. The new X DevAPI gives MySQL users the best of both worlds - SQL and NoSQL - and allows an entirely new category of use cases for managing data. It is constantly evolving based on the community feedback and can be run on top of the brand new MySQL InnoDB Cluster feature. This session gives a broad, high level introduction as to what the Document Store is about, its client components, the latest developments, what you can do with it, why you'd want to and how. MySQL 8.0 as Document Store will change the way people use MySQL.
Slack is a messaging platform for teams that brings all communication together, creating a single unified archive accessible through powerful search.
MySQL is the primary storage for all our customer data and we currently execute billions of transactions per hours. As more users join the service, and Slack becomes a more critical part of their workflow, the system become more complicated and difficult to manage. What started out as a simple MySQL database was only the starting point for a long journey redesigning our entire database infrastructure.
This talk will analyze how our operations team took Vitess, a bleeding edge, poorly-documented open source software developed by Google and then hardened, tested and shaped it for our infrastructure and host all our mission critical data. This presentation will to thought the technical challenges that faced to successfully deploy this project (AWS instance upgrade i2 -> i3, storage SSD -> NVMe, kernel 3.13 -> 4.4, MySQL 5.6 -> 5.7, replication type async -> semi-sync, etc..) the key decisions that we took, what went well, what didn't and the course correction that we made along the way.
Attendees can expect to hear details about how we took some whiteboard conversations and turned them into battle-tested, production-caliber systems.
A database trigger is a stored procedure that is executed when specific actions occur within a database. Triggers fit perfectly on a relational schema (foreign keys) and are implemented as a built-in functionality on popular relational database like MySQL.
MongoDB does not have any support for triggers, mainly due to the lack of support for foreign keys. Even if it usually considered an antipattern, there are use cases in MongoDB that benefit from a partially-relational schema. The lack of triggers is an obstacle for a partially-relational schema but there can be workarounds for simulating trigger behavior.
This presentation will guide you through different ways to implement triggers in MongoDB. We will cover the topics streams, tailable cursors, and hooks. We will demonstrate coding examples for each topic and we will explain pros and cons of each implementation.
In this session, we will dive deep into the unique features and changes that make up Aurora PostgreSQL -
including understanding the architectural differences that contribute to improved scalability, availability and durability. Some of the items that we will cover are the elimination of checkpointing, removal of the log buffer and the use of a 4/6 quroum to improved durability and availability while reducing jitter.
Other areas we will cover are improvements in vacuum and shared buffer cache as well some of our new features like Fast Clones and Performance Insight.
To finish off the session we will walk through the techniques available to migrate to Aurora PostgreSQL.
This session is a review of the various ways in which PostgreSQL allows you to distribute your data across multiple nodes: remote data access, replication, sharding, distributed query and multi-master.
Planning to run MySQL, but want HA or horizontal scaling? Galera seems like the perfect fit! It can be, so long as your developers are aware of several important hazards. Galera's documentation hints at these, but understanding their implications can be tricky.
I'll show a series of demos which will show you the street signs you'll encounter on the road to Galera. This guide will help you choose the best path for your users. I'll expand on what I've presented before to provide better Galera background and a new demo of how multiple readers can read different data for the same queries over extended periods of time.
Cloud Foundry, an OSS PaaS project, gives developers self-service access to DBs. CF-MySQL provides CF Operators a reliable, automated, Galera cluster. We'll share what we've learned: what worked, and what we'd do differently next time.
Time-series data is now everywhere and increasingly used to power core applications. It also creates a number of technical challenges: to ingest high volumes of data; to ask complex, queries for recent and historical time intervals; to perform time-centric analysis and data management. And this data doesn't exist in isolation: entries are often joined against other relational data to ask key business questions.
In this talk, I offer an overview of how we re-engineered TimescaleDB, a new open-source database designed for time series workloads, engineered up as a plugin to PostgreSQL, in order to simplify time-series application development. Unlike most time-series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. This enables developers to avoid today's polyglot architectures and their corresponding operational and application complexity.
Optimizing MySQL performance and troubleshooting MySQL problems are two of the most critical and challenging tasks for MySQL DBA's. The databases powering your applications need to be able to handle heavy traffic loads while remaining responsive and stable so that you can deliver an excellent user experience. Further, DBA's are also expected to find cost-efficient means of solving these issues.In this presentation, we will discuss how you can optimize and troubleshoot MySQL performance and demonstrate how Percona Monitoring and Management (PMM) enables you to solve these challenges using free and open source software. We will look at specific, common MySQL problems and review the essential components in PMM that allow you to diagnose and resolve them.
MariaDB has made it easy to switch from MySQL to MariaDB by aiming to be a drop-in replacement. MySQL doesn't make switching back nearly as easy, however. This talk will walk you through the basics of moving from MariaDB to MySQL and back, the best practices, and the problems you will encounter along the way.
The EU's General Data Protection Regulation (GDPR) goes into effect on 25 May 2018. Your company's lawyers and compliance staff are (hopefully) well-versed on the subject, but what does GDPR mean for the DBA?
We recently finished migrating from InnoDB to MyRocks in our user database (UDB) at Facebook. We have been running MyRocks in production for a while and we have learned several lessons. In this talk, I will share several interesting lessons learned from production deployment and operations, and will introduce future MyRocks development roadmaps.
Vitess is now used in production at multiple companies. This has led to many inquiries about Observability. Vitess shines in this area by providing query logs, transaction logs, information URLs, and status variables that can feed into a monitoring system like Prometheus.
This session will cover these features, along with a demonstration on how they can be used to troubleshoot production issues.
We all use and love relational databases....albeit the latter might not always be true, since sometimes we try to use them for purposes for which they are not a good fit.
And it is precisely these special uses cases that have given birth to dozens of other databases that are built upon paradigms that don't follow the relational model.
In this talk, we'll review the goals, pros and cons and good and bad use cases of these alternative paradigms by looking at some modern open source implementations that are all the rage nowadays.
By the end of this talk, the audience will have learned the basics of three database paradigms (document, key-value and columnar store) and will know when it's appropriate to opt for one of these or when to favor relational databases and avoid falling into buzzword temptations.
In this session, we will deep dive into the exciting features of Amazon RDS for PostgreSQL, including new versions of PostgreSQL releases, new extensions and larger instances. We will also show benchmarks of new RDS instance types and their value proposition. We will also look at how high availability and read scaling works on RDS PostgreSQL. We will also explore lessons we have learned managing a large fleet of PostgreSQL instances, including important tunables and possible gotchas around pg_upgrade.
POLARDB provides read scale out on shared everything architecture. It features 100% backward compatibility with MySQL 5.6 and the ability to expand the capacity of a single database to over 100TB. Users can expand the computing engine and storage capability in just a matter of seconds! PolarDB offers a 6x performance improvement over MySQL 5.6 and a significant drop in costs compared to other commercial databases.
POLARDB leverages InnoDB's redo logs for physical replication. InnoDB stores physical page level operations in redo logs for crash recovery. POLARDB extends this functionality to deploy multiple read replicas for read load sharing.
In this talk we'll take a deep dive into InnoDB internals and explain the changes we made to the core InnoDB code. We'll touch upon design issues around logging, crash recovery, buffer pool management, MVCC, DDL synchronization etc.
This talk will be mostly about the core internals of InnoDB. Some basic knowledge of internals like redo logs, undo logs, read view (transaction isolation), purge and buffer pool management will be very helpful.
Kubernetes is the most popular container orchestrator and is enabling enterprises to rapidly containerize their application stacks. Kubernetes' adoption still faces many challenges, particularly when it comes to stateful applications.
The engineers at Kasten have open sourced Kanister to allow ops teams to incorporate their existing tools into Kubernetes. Kanister is a framework for domain experts to write blueprints specifying how to perform data management in Kubernetes. Each blueprint is specific to a data service, like MySQL and can be modified to integrate with your infrastructure. The talk will conclude with demos of backup and restore of MySQL and MongoDB using example blueprints included with Kanister.
This talk will be targeted towards anyone interesting in running stateful applications in Kubernetes. The audience will learn why the current primitives exposed by Kubernetes aren't sufficient for data operations and how Kanister fills in the gaps.
As any modern DBA we lean more towards development daily activities. We write more software and do less routine tasks.
Along the way to development workflow full automation we faced many problems. The session covers solutions of those:
* Git flow adaptation for highly restrictive compliance requirements;
* Unit testing. What to mock and how;
* Surviving dependencies hell;
* Packaging Python code.
In this talk we will review the new functionality released by Amazon Web Services, that allows us to import data from our non-RDS MySQL instances, to RDS instances (MySQL or Aurora). We'll see what works, what doesn't, and how to do it.
With MySQL 8, security models have changed (and they have been getting better since 5.6 & 5.7). This means there is diversion with MariaDB Server 10.0 and greater (being a fork). The bonus is that Percona Server for MySQL is quite close to MySQL (being a branch), but there are also security enhancements that one could benefit from. Come learn about them in this quick overview.
Some topics covered, but not limited to:
- Using TLS/SSL for connections
- Using TLS/SSL with MySQL replication
- Using external authentication plugins (LDAP, PAM, Kerberos)
- Encrypting your data at rest
- Monitoring your database with the audit plugins
With the recent explosion of cryptocurrencies and the rapid rise of associated blockchain technologies, some seem to assume that blockchain will replace many other types of databases. Many even believe that blockchains are a database. We won't debate that in this session. However, we will discuss what blockchain is, why this technology is taking off, its basic architecture and functionality and how it really works. We'll also cover smart contracts a bit before pointing out where gaps still exist.
It is in those gap areas that NoSQL databases such as MongoDB and Elasticsearch still have a great seat at the table and have plenty to offer in this growing ecosystem.
At the end of 2016, Oracle released a new Plugin called MySQL Group Replication, which is a new MySQL replication method that aims to provide better High Availability, and built-in failover with consistency guarantees.
I evaluated the initial GA versions back in early 2017. I presented my initial findings with several best practices and concerns with the current implementation which made me state that Group Replication was not quite ready yet.
(Un)lucky as I was, a large part of the attendees were Oracle developers and the months after this, many of these bugs and missing features were implemented in both MySQL 8.0 as well as backported to MySQL 5.7. (Thank you!)
This is a followup presentation on my previous analysis, where I will look into the changes since and re-evaluate the readiness of Group Replication for production usage and provide my insights and opinion on the state of GR.
POLARDB is Alibaba Cloud's new generation cloud-native database, POLARDB for MyRocks is our POLARDB-serials products which is based on MyRocks, and running on shared storage, using RocksDB logs for replication.
We solved many problems for deploying MyRocks on shared storage, such as:
1. RocksDB log replication
2. Converting system tables engine to RocksDB
3. Privileges cache replication
4. DDL replication
5. MVCC in Replica
6. RocksDB new log format and how to recycle them
Do you have a 24x7 system and can't afford any downtime? In this talk, we are going to discuss the best methods to upgrade MongoDB versions, as well as how to change storage engines without downtime.
We will discuss the best architecture to do so, and the steps you need to walk through to perform these operations.
How could Amazon Migration Service work in your environment to migrate away from that proprietary colossus?
How does it work?
Why would you use this tool, why would u avoid it.
What components does it have?
How does it perform?
These are all questions which are answered during this talk. My goal is to provide you with an overview of it's functionalities and explain you my findings.
At Datadog we handle trillions of points of data per day from the thousands of customers that rely on us to monitor their applications and infrastructure. In this session, I'll share how we've scaled PostgreSQL to not only handle the deluge of data, but how we've made our PostgreSQL systems more resilient.
I'll also discuss which metrics to watch and how troubleshooting based on those metrics will help you solve problems more quickly. In this session, we will look at a framework for your metrics and how to use it to find solutions to the issues that come up.
We will cover the three types of monitoring data: what to collect, what should trigger an alert (avoiding an alert storm and pager fatigue), and how to follow the resources to find the root causes of problems.
This focus of this session is not tool-specific, so attendees will leave with strategies and frameworks they can implement in environments today regardless of the platforms and tools they use.
In current world most of the Database Administration tasks were repeatable. The challenge is to avoid repeated works by automate/script the task for efficient time management. In Groupon we use Opensource databases MySQL and PostgreSQL. We manage operations using Ansible play books.
This presentation covers how Ansible Playbooks are helpful to avoid manual works.
- What is Ansible?
- How Ansibles are used to improve efficiency and reduced most of the manual works.
- Provisioning MySQL & Postgres Databases.
- Cloning Slaves from Master DB
- Destroy databases
- Whitelisting app servers against databases.
- Find config/parameter consistency across the board.
- Backup the database
- Restore validation.
- Failover database.
- Restore DB using ZFS snashots.
- Install & enable monitoring config.
- Bunch of useful Ansible commands.
- Q & A
As a distributed key-value storage engine, TiKV supports strong data consistency, auto-horizontal scalability, and ACID transaction. Many users are now using TiKV directly in production as the replacement of other key-value storage, some of them even have scaled TiKV to 100+ Nodes.
In this talk, I will talk about how we make it possible. The details include but not limited to:
1. Why did we choose RocksDB as the backend storage engine? How to optimize it?
2. How to use the Raft consensus algorithm to support data consistency and horizontal scalability?
3. How to support distributed transaction?
4. How to use Prometheus to monitor the systems and troubleshoot?
5. How to test TiKV to verify its correctness and guarantee its stability?
Database backup and validation is a trending topic these years. In order to survive from all the possible accidents, Facebook designed and implemented a MySQL backup and validation system for large scale deployment. In this talk, I'll share the implementation detail for this system and the evolution story of it.