This talk will go through the set of optimizations available to MariaDB's query optimizer in 10.3.
We will also compare MariaDB's optimizer and special querying capabilities with that of other MySQL branches as well as other databases, to provide a broad overview of each solution's strengths and weaknesses, when it comes to regular OLTP as well as Analytical queries.
The function of security has always been a significant part of the database engineer's job. The security of an organization's most critical asset is paramount. In the siloed, historical world of the DBA, the admin would focus on database security controls only. As the stewards of the organization's data, however, the database reliability engineer must take a more holistic approach to the job. A methodology and strategy for mitigation that is holistic, and that can be integrated into the entire engineering culture, is needed to ensure effective data security at scale.
In this talk, we establish a process for instilling repeatable, scalable data security through education and collaboration, self-service libraries and patterns, continuous integration and testing, and monitoring and metrics. After this, we discuss potential vulnerabilities and exploits, methods of encryption at rest and in flight, and the various compliance standards we must take into consideration.
MySQL is the backbone of Slack's data storage infrastructure, handling billions of queries per day across thousands of sharded database hosts. We are the midst of migrating this system to use Vitess' flexible sharding and topology management instead of simple application-based shard routing and manual administration. This effort aims to provide an architecture that scales to meet the growing demands of our largest customers and features while under pressure to maintain a stable and performant service.
This talk will present the core motivations behind our decision, why Vitess won out as the best option, and how we laid the groundwork for the migration within our development teams. We will then present some challenges and surprises (both good and bad) found during our transition and our contributions to the Vitess project that mitigated them. Finally, we will discuss the future plans for our migration and suggest improvements to the Vitess ecosystem to aid other adoption efforts.
In this day and age, maintaining privacy throughout our electronic communications is absolutely necessary. Creating user accounts and not exposing your MongoDB environment to the wider internet are basic concepts that have been missed in the past. Once that has been addressed, individuals and organizations interested in becoming PCI compliant must turn to securing their data through encryption.
With MongoDB, we have two options for encryption: at rest (only available as enterprise feature with MongoDB) and transport encryption. In this session, we will review
- Why encryption is important
- What are the prerequisites to set up encryption
- Step by step for encryption at rest and in transit
- Encrypting data with volume encryption in the cloud
- Percona for MongoDB encryption features
The presentation will discuss so of the best practices in determining whether to put you MySQL instances on Amazon RDS, Amazon Aurora or just leave it on-premise. The session will go into details of the pros vs cons of each platform such as performance, versioning, limitations and more. After this session, you will be equipped with
PostgreSQL version 10 has added logical replication, and now the field of replication options in PostgreSQL has gotten wide: Streaming replication, warm standby, logical replication?
We'll discuss what the options are, their limitations and pitfalls, and what the best use-case for each one is. We'll show what it takes to set each one up, monitor it, and get it working again on failures. We'll cover:
* The history of replication in PostgreSQL.
* WAL shipping
* Streaming replication
* Trigger-based replication
* Logical decoding
* And some exotic animals
Apache Mesos and DC/OS are powerful tools to manage, deploy, and maintain services. But, rolling your own stateful application on top of DC/OS requires a deep understanding of Apache Mesos primitives and DC/OS components. Enter the DC/OS SDK.
The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.
At Yelp we have a constantly growing polyglot data tier consisting of datastores such as Cassandra, Elasticsearch, MySQL and Zookeeper. These distributed datastores often ask to be treated like pets but can only be reared like cattle given the scale of our systems. Requiring engineers to pamper them individually is neither feasible nor scalable. We need cluster automation which is powerful, resilient and reliable, and more importantly safe. This is where Taskerman steps in.
Taskerman is a distributed cluster task manager, wearing many hats to keep our clusters highly available, consistent, secure and in an optimal condition. Reusability has also been our focus, hence Taskerman has been built on top of AWS and existing open source infrastructures like Yelp PaaSTA, Zookeeper and Sensu.
This talk covers the genesis of Taskerman inside Yelp, its architecture and evolution. Much like the infrastructure it stands on top of, we also hope to open-source Taskerman in the future.
MariaDB 10.3 is rapidly approaching GA status. This talk will go through all new features coming in MariaDB 10.3. Highlights are:
* Oracle Compatibility Layer
* System Versioned Tables
* Custom Aggregate Functions
Since the beginning, Facebook has used a conventional username/password to secure access to production MySQL instances. Over the last few years we've been working on moving to x509 TLS client certificate authenticated connections. Given the many types of languages and systems at Facebook that use MySQL in some way - this required a massive amount of changes for a lot of teams.
This talk is part technical overview of how our new solution works and part hard-learned tricks for getting an entire company to change their underlying mysql client libraries.
Slack is the messaging platform for teams that brings all communication together, creating a single unified archive accessible through powerful search. MySQL is the primary storage for all our customer data and we currently execute billions of transactions per hour. As more users join the service and Slack becomes a more critical part of their workflow, the system becomes more complicated and difficult to manage.
This talk analyzes how our Operations team chose Vitess, a bleeding edge but poorly-documented open source software developed by Google, hardened, tested and shaped it for our infrastructure to host all our mission-critical data.
This presentation goes through the technical challenges that we faced to successfully deploy this project (AWS instance upgrade i2 -> i3, storage SSD -> NVMe, kernel 3.13 -> 4.4, MySQL 5.6 -> 5.7, replication type async -> semi-sync, etc.) the key decisions that we made, what went well, what didn't and the course correction that we made along the way
A database trigger is a stored procedure that is executed when specific actions occur within a database. Triggers fit perfectly on a relational schema (foreign keys) and are implemented as a built-in functionality on popular relational database like MySQL.
MongoDB does not have any support for triggers, mainly due to the lack of support for foreign keys. Even if it usually considered an antipattern, there are use cases in MongoDB that benefit from a partially-relational schema. The lack of triggers is an obstacle for a partially-relational schema but there can be workarounds for simulating trigger behavior.
This presentation will guide you through different ways to implement triggers in MongoDB. We will cover the topics streams, tailable cursors, and hooks. We will demonstrate coding examples for each topic and we will explain pros and cons of each implementation.
In this session, we will dive deep into the unique features and changes that make up Aurora PostgreSQL -
including understanding the architectural differences that contribute to improved scalability, availability and durability. Some of the items that we will cover are the elimination of checkpointing, removal of the log buffer and the use of a 4/6 quroum to improved durability and availability while reducing jitter.
Other areas we will cover are improvements in vacuum and shared buffer cache as well some of our new features like Fast Clones and Performance Insight.
To finish off the session we will walk through the techniques available to migrate to Aurora PostgreSQL.
Starting with MySQL 5.7 a new Document Store feature has been introduced that makes working with JSON documents an integral part of the MySQL experience. The new X DevAPI gives MySQL users the best of both worlds - SQL and NoSQL - and allows an entirely new category of use cases for managing data. It is constantly evolving based on the community feedback and can be run on top of the brand new MySQL InnoDB Cluster feature. This session gives a broad, high level introduction as to what the Document Store is about, its client components, the latest developments, what you can do with it, why you'd want to and how. MySQL 8.0 as Document Store will change the way people use MySQL.
This session is a review of the various ways in which PostgreSQL allows you to distribute your data across multiple nodes: remote data access, replication, sharding, distributed query and multi-master.
Planning to run MySQL, but want HA or horizontal scaling? Galera seems like the perfect fit! It can be, so long as your developers are aware of several important hazards. Galera's documentation hints at these, but understanding their implications can be tricky.
I'll show a series of demos which will show you the street signs you'll encounter on the road to Galera. This guide will help you choose the best path for your users. I'll expand on what I've presented before to provide better Galera background and a new demo of how multiple readers can read different data for the same queries over extended periods of time.
Cloud Foundry, an OSS PaaS project, gives developers self-service access to DBs. CF-MySQL provides CF Operators a reliable, automated, Galera cluster. We'll share what we've learned: what worked, and what we'd do differently next time.
Time-series data is now everywhere and increasingly used to power core applications. It also creates a number of technical challenges: to ingest high volumes of data; to ask complex, queries for recent and historical time intervals; to perform time-centric analysis and data management. And this data doesn't exist in isolation: entries are often joined against other relational data to ask key business questions.
In this talk, I offer an overview of how we re-engineered TimescaleDB, a new open-source database designed for time series workloads, engineered up as a plugin to PostgreSQL, in order to simplify time-series application development. Unlike most time-series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. This enables developers to avoid today's polyglot architectures and their corresponding operational and application complexity.
Optimizing MySQL performance and troubleshooting MySQL problems are two of the most critical and challenging tasks for MySQL DBA's. The databases powering your applications need to be able to handle heavy traffic loads while remaining responsive and stable so that you can deliver an excellent user experience. Further, DBA's are also expected to find cost-efficient means of solving these issues.In this presentation, we will discuss how you can optimize and troubleshoot MySQL performance and demonstrate how Percona Monitoring and Management (PMM) enables you to solve these challenges using free and open source software. We will look at specific, common MySQL problems and review the essential components in PMM that allow you to diagnose and resolve them.
MariaDB has made it easy to switch from MySQL to MariaDB by aiming to be a drop-in replacement. MySQL doesn't make switching back nearly as easy, however. This talk will walk you through the basics of moving from MariaDB to MySQL and back, the best practices, and the problems you will encounter along the way.
The EU's General Data Protection Regulation (GDPR) goes into effect on 25 May 2018. Your company's lawyers and compliance staff are (hopefully) well-versed on the subject, but what does GDPR mean for the DBA?
Vitess is now used in production at multiple companies. This has led to many inquiries about Observability. Vitess shines in this area by providing query logs, transaction logs, information URLs, and status variables that can feed into a monitoring system like Prometheus.
This session will cover these features, along with a demonstration on how they can be used to troubleshoot production issues.
We all use and love relational databases....albeit the latter might not always be true, since sometimes we try to use them for purposes for which they are not a good fit.
And it is precisely these special uses cases that have given birth to dozens of other databases that are built upon paradigms that don't follow the relational model.
In this talk, we'll review the goals, pros and cons and good and bad use cases of these alternative paradigms by looking at some modern open source implementations that are all the rage nowadays.
By the end of this talk, the audience will have learned the basics of three database paradigms (document, key-value and columnar store) and will know when it's appropriate to opt for one of these or when to favor relational databases and avoid falling into buzzword temptations.
In this session, we will deep dive into the exciting features of Amazon RDS for PostgreSQL, including new versions of PostgreSQL releases, new extensions and larger instances. We will also show benchmarks of new RDS instance types and their value proposition. We will also look at how high availability and read scaling works on RDS PostgreSQL. We will also explore lessons we have learned managing a large fleet of PostgreSQL instances, including important tunables and possible gotchas around pg_upgrade.
We recently finished migrating from InnoDB to MyRocks in our user database (UDB) at Facebook. We have been running MyRocks in production for a while and we have learned several lessons. In this talk, I will share several interesting lessons learned from production deployment and operations, and will introduce future MyRocks development roadmaps.
Kubernetes is the most popular container orchestrator and is enabling enterprises to rapidly containerize their application stacks. Kubernetes' adoption still faces many challenges, particularly when it comes to stateful applications.
The engineers at Kasten have open sourced Kanister to allow ops teams to incorporate their existing tools into Kubernetes. Kanister is a framework for domain experts to write blueprints specifying how to perform data management in Kubernetes. Each blueprint is specific to a data service, like MySQL and can be modified to integrate with your infrastructure. The talk will conclude with demos of backup and restore of MySQL and MongoDB using example blueprints included with Kanister.
This talk will be targeted towards anyone interesting in running stateful applications in Kubernetes. The audience will learn why the current primitives exposed by Kubernetes aren't sufficient for data operations and how Kanister fills in the gaps.
Tungsten Replicator is a very powerful tool that allows replication between one-to-many or many-to-one style topologies. The replication source can either be MySQL (all versions) or Oracle (from 9i to 12c). The target for the replication can be Cassandra, Elasticsearch, Hadoop, Kafka, MongoDB, MySQL, Oracle, Redshift or Vertica. You can even have a topology that applies to a mixture of these, for example extract from Oracle and apply simultaneously both into Kafka and Hadoop.
This heterogeneous replication model provides a very powerful solution to many businesses. In addition to that, the Tungsten Replicator has many built-in filters allowing you the flexibility of eliminating rows, columns, tables or even whole databases from the source and you can also modify datatypes on the fly.
In this session, we will look at how data can be effectively replicated into Kafka and Elasticsearch and how we use that information as it goes in.
In this talk we will review the new functionality released by Amazon Web Services, that allows us to import data from our non-RDS MySQL instances, to RDS instances (MySQL or Aurora). We'll see what works, what doesn't, and how to do it.
With MySQL 8, security models have changed (and they have been getting better since 5.6 & 5.7). This means there is diversion with MariaDB Server 10.0 and greater (being a fork). The bonus is that Percona Server for MySQL is quite close to MySQL (being a branch), but there are also security enhancements that one could benefit from. Come learn about them in this quick overview.
Some topics covered, but not limited to:
- Using TLS/SSL for connections
- Using TLS/SSL with MySQL replication
- Using external authentication plugins (LDAP, PAM, Kerberos)
- Encrypting your data at rest
- Monitoring your database with the audit plugins
With the recent explosion of cryptocurrencies and the rapid rise of associated blockchain technologies, some seem to assume that blockchain will replace many other types of databases. Many even believe that blockchains are a database. We won't debate that in this session. However, we will discuss what blockchain is, why this technology is taking off, its basic architecture and functionality and how it really works. We'll also cover smart contracts a bit before pointing out where gaps still exist.
It is in those gap areas that NoSQL databases such as MongoDB and Elasticsearch still have a great seat at the table and have plenty to offer in this growing ecosystem.
POLARDB is Alibaba Cloud's new generation cloud-native database, POLARDB for MyRocks is our POLARDB-serials products which is based on MyRocks, and running on shared storage, using RocksDB logs for replication.
We solved many problems for deploying MyRocks on shared storage, such as:
1. RocksDB log replication
2. Converting system tables engine to RocksDB
3. Privileges cache replication
4. DDL replication
5. MVCC in Replica
6. RocksDB new log format and how to recycle them
Do you have a 24x7 system and can't afford any downtime? In this talk, we are going to discuss the best methods to upgrade MongoDB versions, as well as how to change storage engines without downtime.
We will discuss the best architecture to do so, and the steps you need to walk through to perform these operations.
How could Amazon Migration Service work in your environment to migrate away from that proprietary colossus?
How does it work?
Why would you use this tool, why would u avoid it.
What components does it have?
How does it perform?
These are all questions which are answered during this talk. My goal is to provide you with an overview of it's functionalities and explain you my findings.
At the end of 2016, Oracle released a new Plugin called MySQL Group Replication, which is a new MySQL replication method that aims to provide better High Availability, and built-in failover with consistency guarantees.
I evaluated the initial GA versions back in early 2017. I presented my initial findings with several best practices and concerns with the current implementation which made me state that Group Replication was not quite ready yet.
(Un)lucky as I was, a large part of the attendees were Oracle developers and the months after this, many of these bugs and missing features were implemented in both MySQL 8.0 as well as backported to MySQL 5.7. (Thank you!)
This is a followup presentation on my previous analysis, where I will look into the changes since and re-evaluate the readiness of Group Replication for production usage and provide my insights and opinion on the state of GR.
At Datadog we handle trillions of points of data per day from the thousands of customers that rely on us to monitor their applications and infrastructure. In this session, I'll share how we've scaled PostgreSQL to not only handle the deluge of data, but how we've made our PostgreSQL systems more resilient.
I'll also discuss which metrics to watch and how troubleshooting based on those metrics will help you solve problems more quickly. In this session, we will look at a framework for your metrics and how to use it to find solutions to the issues that come up.
We will cover the three types of monitoring data: what to collect, what should trigger an alert (avoiding an alert storm and pager fatigue), and how to follow the resources to find the root causes of problems.
This focus of this session is not tool-specific, so attendees will leave with strategies and frameworks they can implement in environments today regardless of the platforms and tools they use.
MySQL is the world's most popular open source database and Kubernetes, an open source container orchestration system, is the fastest growing open source project and most vibrant community.
There have been some misconceptions about running stateful applications-- databases in particular-- on Kubernetes because of how it lends itself well to scaling applications which are more ephemeral in nature as well as a concern that Kubernetes would add more complexity to already complex applications like databases.
With newer Kubernetes features such as StatefulSets and operators now available with Kubernetes, running databases on Kubernetes allows a whole new way of thinking about database deployments and made easier than ever.
This talk will cover various Kubernetes features and two important projects that showcase these features and exemplify how Kubernetes is a ideal platform for MySQL: Vitess, Youtube's massively sharded back-end database, and the new MySQL Operator by Oracle.
As a distributed key-value storage engine, TiKV supports strong data consistency, auto-horizontal scalability, and ACID transaction. Many users are now using TiKV directly in production as the replacement of other key-value storage, some of them even have scaled TiKV to 100+ Nodes.
In this talk, I will talk about how we make it possible. The details include but not limited to:
1. Why did we choose RocksDB as the backend storage engine? How to optimize it?
2. How to use the Raft consensus algorithm to support data consistency and horizontal scalability?
3. How to support distributed transaction?
4. How to use Prometheus to monitor the systems and troubleshoot?
5. How to test TiKV to verify its correctness and guarantee its stability?
Database backup and validation is a trending topic these years. In order to survive from all the possible accidents, Facebook designed and implemented a MySQL backup and validation system for large scale deployment. In this talk, I'll share the implementation detail for this system and the evolution story of it.