HUG August 2010: Mesos
by Hadoop User Group on Aug 29, 2010
- 2,916 views
Mesos: A Flexible Cluster Resource Manager...
Mesos: A Flexible Cluster Resource Manager
Mesos is a platform for sharing clusters between multiple parallel applications, such as Hadoop, MPI, and web services. It allows organizations to consolidate workloads from multiple applications onto a single cluster, which increases utilization and simplifies management, while ensuring that each application gets suitable resources to run on (e.g. that Hadoop gets data locality). Hadoop users can leverage Mesos to run multiple isolated instances of Hadoop (e.g. production and experimental instances) on the same cluster, to run multiple versions of Hadoop concurrently (e.g. to test patches), and to run MPI and other applications alongside Hadoop. In addition, due to its simple design, Mesos is highly scalable (to tens of thousands of nodes) and fault tolerant (using ZooKeeper). In this talk, we'll provide an introduction to Mesos for Hadoop users. We have also been working to test Mesos at Twitter and Facebook, and we'll present experience and results from those deployments.
Spark: A Framework for Iterative and Interactive Cluster Computing
Although the MapReduce programming model has been highly successful, it is not suitable for all applications. We present Spark, a framework optimized for one such type of applications - iterative jobs where a dataset is shared across multiple parallel operations, as is common in machine learning algorithms. Spark provides a functional programming model similar to MapReduce, but also lets users cache data between iterations, leading to up to 10x better performance than Hadoop. Spark also makes programming jobs easy by integrating into the Scala programming language. Finally, the ability of Spark to load a dataset into memory and query it repeatedly makes it especially suitable for interactive analysis of big datasets. We have modified the Scala interpreter to make it possible to use Spark interactively, providing a much more responsive experience than Hive and Pig. Spark is built on top of Mesos and can therefore run alongside Hadoop.
Project URL: http://www.cs.berkeley.edu/~matei/spark/
- Total Views
- Views on SlideShare
- Embed Views