Percona Live 2017 Open Source Database Conference

April 24 - 27, 2017

Santa Clara, California

Slides from Percona Live

In this talk you'll learn how we store and analyze time series data efficiently at VividCortex, using MySQL and Redis as a storage engine. VividCortex's time series workload presents interesting and unusual challenges that most conventional time series databases don't handle well, at a speed and volume that is also unusual. Building on MySQL and Redis enabled us to do this with low costs on relatively inexpensive EC2...
PostgreSQL and ZFS were made for each other. This talk dives downstack into the internals and way that PostgreSQL consumes disk resources and tricks that are available if you run PostgreSQL on ZFS (ZFS on Linux, ZFS on FreeBSD, or ZFS on Illumos). Topics covered will include: *) Performance and sizing considerations *) Workload estimation heuristics *) Standard administrative practices that leverage ZFS *) Recovery using ZFS...
In the previous year, many RocksDB features were added. In this talk we explain the six most important ones: bulk loading, persistent cache, Lua compaction filter, blob storage, range delete and direct I/O. We also introduce features we expect to add in 2017.
Smyte is building a fraud and spam detection platform that analyzes all of the traffic running through busy consumer websites and mobile apps. In this talk I'm going to describe how we build our own distributed database, SmyteDB, by integrating Kafka with RocksDB. In our design, Kafka enables us to support database replication and linearization without reinventing distributed primitives. Meanwhile, RocksDB's unique data model...
Traditionally, machines were statically partitioned across the different services at Uber. In an effort to increase the machine utilization, Uber has recently started transitioning most of its services, including the storage services, to run on top of Mesos. This presentation will describe the initial experience building and operating a framework for running Cassandra on top of Mesos running across multiple datacenters at Uber. This...
MySQL is by far the most common choice among Facebook engineering teams when they are looking for a persistent data store for their product or application. Not all of this data goes into the "Facebook Graph" as not everything developed inside Facebook applies to a user or something they are sharing. This creates a lot more unique use cases of MySQL inside Facebook than one team can operationally optimize for. Today several...
Netflix created and open sourced Dynomite project to provide reusable distributed database infrastructure that turns single server data stores into scalable, distributed databases. Dynomite supports pluggable protocols and pluggable storage engines, which allows us to add sharding and replication to a variety of non-distributed data stores. The entire database infrastructure can be reused across a variety of workloads from in-memory to on-...
Support for both high-availability and strong-consistency is a great challenge for MySQL cluster. Built on Paxos and with minimal modification to MySQL, PhxSQL provides full functionalities completely compatible with MySQL, zookeeper-level availability and consistency, and semi-sync like performance. PhxSQL has been deployed in WeChat backend supporting hundreds of millions daily users for mission-critical tasks and proves its reliability...
Flynn is an open source platform as a service that automatically deploys and configures highly available database clusters with safe, automatic failover. This session will cover how Flynn automates the provisioning of MongoDB clusters, ensures that the replication topology is safe and available during failures, and provides useful administration features like backups.
Prometheus is an open-source monitoring and alerting system that has quickly gained popularity over the last two years (which includes sophisticated monitoring of MySQL database servers). One of the components of Prometheus is a time-series database (TSDB) embedded into the monitoring server. The TSDB uses a highly domain-specific query language called PromQL. The decision to not use a SQL-like query language was driven by the specific...
SQLite is the most widely used and deployed database engine in the world with many billions of active installations. But SQLite is not a competitor to MySQL, PostgreSQL, SQL Server, or Oracle. SQLite solves a different problem and is complementary to those other technologies. MySQL and its competitors are designed to run in the datacenter at the center of the network, whereas SQLite is designed to run on devices on the edge of the...
TiDB is a NewSQL database and is compatible with MySQL. It's inspired by Google Spanner and Google F1. In this talk, I will address the following topics: 1. The scalability and performance of the latest TiDB. 2. How we make TiDB to be a hybrid database. 3. How we are making it 10x to 100x faster than MySQL in some complex queries. 4. The experience of how users use TiDB to replace MySQL proxy based solutions, and how they put...
Flashback: - Makes use of the binary log to roll back an instance, database or table to a previous snapshot. - Is available as a first release is MariaDB 10.2.4/RDS MySQL 5.6. - Is implemented on the server-level, so supports all storage engines. - Makes use of full image format binary logs. - Is currently a mysqlbinlog feature (the --flashback option) The talk will discuss how Flashback is currently implemented, what it...
Public clouds like AWS & Azure have become very popular platforms over the past few years. These public clouds provide a plethora of infrastructure features to help make your life easier - we will dig into the features/assets that you should be actively leveraging. On the flip side there are also a number of potential pitfalls that you need to be aware of and work around. In this talk we will talk about the common architecture...
Data integrity is a core functional requirement driven by your business' requirements. Over the last decade, we’ve seen an explosion of distributed datastores (aka polyglot storage), including datastores managed for us as a service. Between distributed, bleeding edge and abstraction, we find ourselves needing to build more comprehensive solutions than ever to insure that unacceptable data loss does not occur. We can’t anticipate every...
No database is an island and as NoSQL and BigData provide additional challenges and opportunities, you need a way for information and data to be replicated between your existing MySQL, MariaDB or Oracle installation out to other databases. Whether you are doing analytics, distributing data through Kafka to other systems, or merely sharing data with your NoSQL DB such as MongoDB, Couchbase or Cassandra, you need an environment to handle...
Over the last year and a half we built an open source storage engine from scratch specifically for time series data. In this presentation I take a deep dive into the storage engine inside InfluxDB. More than just a single storage engine, InfluxDB is two engines in one: the first for time series data and the second, an index for metadata. I'll delve into the optimizations for achieving high write throughput, compression and fast reads...
RocksDB is used extensively by applications on the cloud. The stock RocksDB library does not provide for durability of data in the case of machine failures. This means that applications typically have to implement their own mechanisms for replicating data. On the other-hand, the AWS cloud environment provides services that allow elegant durability and replication of data. This talk describes how RocksDB-cloud can leverage these cloud-...
Data inconsistency is the worst problem that can happen for synchronous database cluster. After data inconsistency has sneaked in the cluster, it will sooner or later appear during replication processing, simply because applying of replication events is not possible in some cluster node, due to missing rows, excessive rows or wrong values in the row columns. And this usually stops the replication process altogether and at least the...
This talk will cover the special case of time series data and the evolution of various schemas from RRD files to RDBMS schemas to NoSQL stores. Particularly we'll focus on why, as the amount of time series data grows and slicing the data by various dimensions becomes important, many users eschewed RDBMS for NoSQL or custom data layers. We'll look at: * RRDTool * RDBMS * Single table RDBMS * Single table RDBMS with...