Running Cassandra on Apache Mesos across multiple datacenters at Uber
Traditionally, machines were statically partitioned across the different services at Uber. In an effort to increase the machine utilization, Uber has recently started transitioning most of its services, including the storage services, to run on top of Mesos. This presentation will describe the initial experience building and operating a framework for running Cassandra on top of Mesos running across multiple datacenters at Uber. This framework automates several Cassandra operations such as node repairs, addition of new nodes and backup/restore. It improves efficiency by co-locating CPU-intensive services as well as multiple Cassandra nodes on the same Mesos agent. It handles failure and restart of Mesos agents by using persistent volumes and dynamic reservations. This talk includes statistics about the number of Cassandra clusters in production, time taken to start a new cluster, add a new node, detect a node failure; and the observed Cassandra query throughput and latency.
Senior Software Engineer, Uber
Karthik Gandhi works as a Senior Software Engineer in the Compute & Storage team at Uber. His current work includes developing the open source Mesos framework for managing Cassandra clusters and helping application teams to migrate over to Cassandra. Previously, he worked on Autopilot, an infrastructure management system developed by Microsoft to manage Bing and Azure infrastructure. Karthik is looking forward to talk about the challenges involved in deploying and managing a stateful service such as Mysql or Cassandra on Mesos.