At Yelp we have a constantly growing polyglot data tier consisting of datastores such as Cassandra, Elasticsearch, MySQL and Zookeeper. These distributed datastores often ask to be treated like pets but can only be reared like cattle given the scale of our systems. Requiring engineers to pamper them individually is neither feasible nor scalable. We need cluster automation which is powerful, resilient and reliable, and more importantly safe. This is where Taskerman steps in.
Taskerman is a distributed cluster task manager, wearing many hats to keep our clusters highly available, consistent, secure and in an optimal condition. Reusability has also been our focus, hence Taskerman has been built on top of AWS and existing open source infrastructures like Yelp PaaSTA, Zookeeper and Sensu.
This talk covers the genesis of Taskerman inside Yelp, its architecture and evolution. Much like the infrastructure it stands on top of, we also hope to open-source Taskerman in the future.
Raghavendra Prabhu works as a Software Engineer in the Distributed Systems team at Yelp's London office. His work revolves around distributed datastores such as Cassandra, Elasticsearch, Zookeeper, their interactions, and automations. Prior to that, he was the Product Lead of Galera-based Percona XtraDB Cluster (PXC) at Percona. He started his career at Yahoo as a Systems Engineer, working primarily with the database stacks of Yahoo. Raghavendra's main interests include databases, virtualization and containers, distributed systems, and operating systems. In his spare time, he likes to read books and technical papers/literature, listen to music, hack on FOSS software, go on hiking in nature reserves. He has previously spoken at various conferences such as Percona Live, FOSDEM, LinuxConfAu (LCA), Fossetcon, RICON, Highload++ and SCALE. Slides from these talks are available here: http://www.slideshare.net/slidunder