MongoDB Cluster is excellent at scale out to support large web traffic. In this session, I will talk about the following topics:
- Typical MongoDB cluster topologies that support large traffic
- Best practices to manage MongoDB clusters, such as add/remove shards from clusters, add/remove indexes, etc.
- Methods for finding bottlenecks and optimizing clusters
Open source relational databases like MySQL and PostgreSQL power some of the world's largest websites, including Yelp. They can be used out of the box with few adjustments, and rarely require a dedicated Database Administrator (DBA) right away. This means that System Administrators, Site Reliability Engineers, or Developers are usually the first to respond to some of the more interesting issues that can arise as you scale your databases.
In this talk, I'll assume that you already have a database up and running and will first go over a broad set of basics to introduce you to MySQL Database Administration. Next, I will cover the InnoDB storage engine, high performance and availability, monitoring and database defense. Finally, I'll cover the wide array of online resources, books, open source toolkits and scripts from MySQL, Percona and the Open Source community that will make the job easier.
While not hands-on, I'll be encouraging questions and this is expected to be a very interactive tutorial!
Offering MySQL, PostgreSQL and MariaDB database services in the cloud is different than doing so on-premise. Latency, connection redirection, optimal performance configuration are just a few challenges. In this session, Jun Su will walk you through Microsoft's journey to not only offer these popular OSS RDBMS in Microsoft Azure, but how it is implemented as a true DBaaS. Learn about Microsoft's Azure Database Services platform architecture, and how these services are built to scale.
The function of security has always been a significant part of the database engineer's job. The security of an organization's most critical asset is paramount. In the siloed, historical world of the DBA, the admin would focus on database security controls only. As the stewards of the organization's data, however, the database reliability engineer must take a more holistic approach to the job. A methodology and strategy for mitigation that is holistic, and that can be integrated into the entire engineering culture, is needed to ensure effective data security at scale.
In this talk, we establish a process for instilling repeatable, scalable data security through education and collaboration, self-service libraries and patterns, continuous integration and testing, and monitoring and metrics. After this, we discuss potential vulnerabilities and exploits, methods of encryption at rest and in flight, and the various compliance standards we must take into consideration.
Existing tools like mysqldump and replication cannot migrate data between GTID-enabled MySQL and non-GTID-enabled MySQL -- a common configuration across multiple cloud providers that cannot be changed. These tools are also cumbersome to operate and error-prone, thus requiring a DBA’s attention for each data migration. We introduced a tool that allows for easy migration of data between MySQL databases with constant downtime on the order of seconds.
Inspired by gh-ost, our tool is named Ghostferry and allows application developers at Shopify to migrate data without assistance from DBAs. It has been used to rebalance sharded data across databases. We plan to open source Ghostferry at the conference so that anyone can migrate their own data with minimal hassle and downtime. Since Ghostferry is written as a library, you can use it to build specialized data movers that move arbitrary subsets of data from one database to another.
While most applications are aware of the minimum basic security features, there is often a lack of understanding about how best to manage them, especially with major security features being released with every major version of Postgres. As for advanced features, sadly most of them go unnoticed and unused in most cases. This talk will cover the various features that Postgres provides for data security, from the very basic to the most advanced:
- Postgres HBA and types of authentications
- Permissions and ACL in Postgres
- Row-level security
- Event triggers
- PCI security implementation techniques
- Filesystem permission options
- Data encryption management in Postgres
- Table level auditing and storage efficiency
- Monitoring for SQL injections
- Other PostgreSQL security features
- Tips for security enhancement for Postgres as a Service users (RDS, GCE, Azure Postgres)
- Upcoming security features in Postgres 11
- Features that Postgres currently lacks
While GitHub isn't the biggest database around in terms of the amount of data we hold in MySQL, it is among the top 50 busiest sites on the internet. Facing an immediate need to distribute load, we came up with creative ways to move significant amount of traffic off of our main MySQL cluster, with no user impact. Moving five of our hottest tables required collaboration between engineers, DBAs and SRE. This talk will describe when and how to do it, and prove it to be an efficient database scalability solution.
Moving tables required changes to our database infrastructure as well as our application. I'll explain the impetus for this work and why we did it. We'll walk through the application-level changes that allowed us to change connections while still serving data. Then, I'll discuss the ways we moved tables to different clusters, using MySQL replication, or in some cases, temporary sharding and copying billions of rows. Finally, I'll outline the orchestration of the actual cutovers.
The JSON data type and functions that support it comprise one of the most interesting features introduced in MySQL 5.7 for application developers. But no feature is a "Golden Hammer." We need to apply a little expertise to get the best result, and avoid misusing it. I’ll show practical examples that work well with JSON, and other scenarios where conventional columns perform better.
Questions addressed in this presentation:
- How much space does JSON data use, compared to conventional data?
- What is the performance of querying JSON vs. conventional data?
- How do I create indexes for JSON data?
- What kind of data is best to store in JSON?
- How do I get the best of both worlds?
We recently finished migrating from InnoDB to MyRocks in our user database (UDB) at Facebook. We have been running MyRocks in production for a while and we have learned several lessons. In this talk, I will share several interesting lessons learned from production deployment and operations, and will introduce future MyRocks development roadmaps.
The full title of this presentation should be: "Save some bandwidth by not transmitting the full resultset metadata over the wire when you don't need it. " Indeed, one the latest features in the MySQL protocol allows you to save some network bandwidth by not sending the metadata with the resultsets for which you know the metadata.
Join this talk to learn how to turn this on, and how much data does it save per query.