In the last 20 years, researchers and vendors have built advisory tools to assist DBAs in tuning and physical design. Most of this previous work is incomplete because they require humans to make the final decisions about any database changes and are reactionary measures that fix problems after they occur. What is needed for a "self-driving" DBMS are components that are designed for autonomous operation. This will enable new optimizations that are not possible today because the complexity of managing these systems has surpassed the abilities of humans.
In this talk, I present the core principles of an autonomous DBMS based on reinforcement learning. These are necessary to support ample data collection, fast state changes, and accurate reward observations. I will discuss techniques on how to build a new autonomous DBMS or retrofit an existing one. Our work is based on our experiences at CMU from developing an automatic tuning service (OtterTune) and our self-driving DBMS (Peloton).
Braze is a lifecycle engagement platform used by consumer brand to deliver great customer experiences to over 1 billion monthly active users. In this talk, co-founder and CTO Jon Hyman will go over multiple production use cases Braze uses for buffering data to Redis for efficient real-time processing.
Braze processes more than a third of a trillion pieces of data each month when generating time series analytics for its customers. Jon will describe how each of these events gets buffered to Redis hashes, and some to Redis sets, before ultimately flushed to Braze's analytics database hundreds of thousands of times per minute. This talk will also discuss how Redis sets are the cornerstore of Canvas, Braze's user journey orchestration product used by brands such as OKCupid, Postmates, and Microsoft.
Lastly, Jon will cover how Braze has written its own application-based sharding for Redis in order to support the millions of operations per second that Braze needs to handle its daily volume.
Continuent Tungsten Replicator enables you to effectively move data in real time from your database source into various targets. You can move data from your MySQL and Oracle transactional stores into your data warehouse, or into HPE Vertica or Hadoop. Furthermore, this can be performed from multiple sources into a single target, modifying and augmenting the data so that the information can be analyzed both in total and for the individual sources and hosts. This allows you to more effectively analyze your data without losing information, and because the data can be filtered, you can be secure in the knowledge that the information can be analyzed without releasing personal data. In this session, we'll examine the replication process and how the data can be concentrated and combined without losing the source identity information.
Azure provides fully managed, enterprise-ready community MySQL and PostgreSQL services that are built for developers and devops engineers. These services use the community database technologies you love and enable you to focus on your apps instead of management and administration burden. In this session, we will walk you through service capabilities such as built-in high availability, security, and elastic scaling of performance that allow you to optimize your time and save costs. We will demonstrate how the service integrates with the broader Azure platform enabling you to deliver innovative and new experience to your users. The talk will cover best practices and real customer examples to demonstrate the benefits and how you can easily migrate your databases to the managed service.
The earliest relational databases were monolithic on-premise systems that were powerful and full-featured. Fast forward to the Internet and NoSQL: BigTable, DynamoDB and Cassandra. These distributed systems were built to scale out for ballooning user bases and operations. As more and more companies vied to be the next Google, Amazon, or Facebook, they too ?required? horizontal scalability.
But in a real way, NoSQL and even NewSQL have forgotten single node performance where scaling out isn't an option. And single node performance is important because it allows you to do more with much less. With a smaller footprint and simpler stack, overhead decreases and your application can still scale.
In this talk, we describe TimescaleDB's methods for single node performance. The nature of time-series workloads and how data is partitioned allows users to elastically scale up even on single machines, which provides operational ease and architectural simplicity, especially in cloud environments.
Keeping data safe is the top responsibility of anyone running a database. Learn how the Google Cloud SQL team protects against data loss. Cloud SQL is Google's fully-managed database service that makes it easy to set up and maintain MySQL databases in the cloud. In this session, we'll dive into Cloud SQL's storage architecture to learn how we check data down to the disk level. We will also discuss MySQL checksums and infrastructure Cloud SQL uses to verify that checksums for data files are accurate without affecting performance of the database.
ClickHouse is an open source analytical DBMS. It is capable of storing petabytes of data and processing billions of rows per second per server, all while ingesting new data in real-time.
I will talk about ClickHouse internal design and unique implementation details that allow us to achive the maximum performance of query processing and data storage efficiency.
Accelerating MySQL with Just-In-Time (JIT) compilation is emerging as a quick and easy way to achieve greater efficiencies with MySQL. In this talk, l'll go over the benefits and caveats of using Dynimizer, a binary-to-binary JIT compiler, with MySQL workloads. I'll discuss how to identify situations where JIT compilation can help, how to get setup and running, and go over benchmark results along with other performance metrics. We'll also peek under the hood and take a look at what's happening at a lower level.
ClickHouse is very fast and feature rich open source analytics DBMS with multi-petabyte scale. It gained a lot of attention over the last year, thanks to excellent results in benchmarks, conference talks and first successful projects.
After the initial wave of early adopters, the second wave is coming: many companies started to consider ClickHouse as their analytics backend. In this talk I'll review the state of ClickHouse worldwide adoption, share insights about business problems ClickHouse helps to solve efficiently, highlight possible implementation challenges and discuss best practices.
This year the Cassandra team in Instagram has been working on a very interesting project to make Apache Cassandra's storage engine pluggable, and implemented a new RocksDB-based storage engine into Cassandra. The new storage engine can improve the performance of Apache Cassandra significantly, make Cassandra 3-4 times faster in general, and even 100 times faster in some use cases.
In this talk, we will describe the motivation and different approaches we have considered, the high-level design of the solution we choose, also the performance metrics in benchmark and production environments.
The database team at GitHub is tasked with keeping the data available and with maintaining its integrity. Our infrastructure automates away much of our operation, but automation requires trust, and trust is gained by testing. This session highlights three examples of infrastructure testing automation that helps us sleep better at night:
- Backups: scheduling backups; making backup data accessible to our engineers; auto-restores and backup validation. What metrics and alerts we have in place.
- Failovers: how we continuously test our failover mechanism, orchestrator. How we setup a failover scenario, what defines a successful failover, how we automate away the cleanup. What we do in production.
- Schema migrations: how we ensure that gh-ost, our schema migration tool, which keeps rewriting our (and your!) data, does the right thing. How we test new branches in production without putting production data at risk.
Time-series data is now everywhere and increasingly used to power core applications. It also creates a number of technical challenges: to ingest high volumes of data; to ask complex, queries for recent and historical time intervals; to perform time-centric analysis and data management. And this data doesn't exist in isolation: entries are often joined against other relational data to ask key business questions.
In this talk, I offer an overview of how we re-engineered TimescaleDB, a new open-source database designed for time series workloads, engineered up as a plugin to PostgreSQL, in order to simplify time-series application development. Unlike most time-series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. This enables developers to avoid today's polyglot architectures and their corresponding operational and application complexity.
Tungsten Replicator is a very powerful tool that allows replication between one-to-many or many-to-one style topologies. The replication source can either be MySQL (all versions) or Oracle (from 9i to 12c). The target for the replication can be Cassandra, Elasticsearch, Hadoop, Kafka, MongoDB, MySQL, Oracle, Redshift or Vertica. You can even have a topology that applies to a mixture of these, for example extract from Oracle and apply simultaneously both into Kafka and Hadoop.
This heterogeneous replication model provides a very powerful solution to many businesses. In addition to that, the Tungsten Replicator has many built-in filters allowing you the flexibility of eliminating rows, columns, tables or even whole databases from the source and you can also modify datatypes on the fly.
In this session, we will look at how data can be effectively replicated into Kafka and Elasticsearch and how we use that information as it goes in.
As a distributed key-value storage engine, TiKV supports strong data consistency, auto-horizontal scalability, and ACID transaction. Many users are now using TiKV directly in production as the replacement of other key-value storage, some of them even have scaled TiKV to 100+ Nodes.
In this talk, I will talk about how we make it possible. The details include but not limited to:
1. Why did we choose RocksDB as the backend storage engine? How to optimize it?
2. How to use the Raft consensus algorithm to support data consistency and horizontal scalability?
3. How to support distributed transaction?
4. How to use Prometheus to monitor the systems and troubleshoot?
5. How to test TiKV to verify its correctness and guarantee its stability?