Traditionally, machines were statically partitioned across the different services at Uber. In an effort to increase the machine utilization, Uber has recently started transitioning most of its services, including the storage services, to run on top of Mesos. This presentation will describe the initial experience building and operating a framework for running Cassandra on top of Mesos running across multiple datacenters at Uber. This framework automates several Cassandra operations such as node repairs, addition of new nodes and backup/restore. It improves efficiency by co-locating CPU-intensive services as well as multiple Cassandra nodes on the same Mesos agent. It handles failure and restart of Mesos agents by using persistent volumes and dynamic reservations. This talk includes statistics about the number of Cassandra clusters in production, time taken to start a new cluster, add a new node, detect a node failure; and the observed Cassandra query throughput and latency.
About the Speaker
Abhishek Verma Software Engineer, Uber
Dr. Abhishek Verma is currently working on running Cassandra on top of Mesos at Uber. Prior to this, he worked on BorgMaster at Google and was the first author of the Borg paper published in Eurosys 2015. He received an MS in 2010 and a PhD in 2012 in Computer Science from the University of Illinois at Urbana-Champaign, during which he authored more than 20 publications in conferences, journals and books and presented tens of talks.