HUG August 2010: Mesos
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

HUG August 2010: Mesos

  • 3,645 views
Uploaded on

Mesos: A Flexible Cluster Resource Manager...

Mesos: A Flexible Cluster Resource Manager

Mesos is a platform for sharing clusters between multiple parallel applications, such as Hadoop, MPI, and web services. It allows organizations to consolidate workloads from multiple applications onto a single cluster, which increases utilization and simplifies management, while ensuring that each application gets suitable resources to run on (e.g. that Hadoop gets data locality). Hadoop users can leverage Mesos to run multiple isolated instances of Hadoop (e.g. production and experimental instances) on the same cluster, to run multiple versions of Hadoop concurrently (e.g. to test patches), and to run MPI and other applications alongside Hadoop. In addition, due to its simple design, Mesos is highly scalable (to tens of thousands of nodes) and fault tolerant (using ZooKeeper). In this talk, we'll provide an introduction to Mesos for Hadoop users. We have also been working to test Mesos at Twitter and Facebook, and we'll present experience and results from those deployments.
Spark: A Framework for Iterative and Interactive Cluster Computing

Although the MapReduce programming model has been highly successful, it is not suitable for all applications. We present Spark, a framework optimized for one such type of applications - iterative jobs where a dataset is shared across multiple parallel operations, as is common in machine learning algorithms. Spark provides a functional programming model similar to MapReduce, but also lets users cache data between iterations, leading to up to 10x better performance than Hadoop. Spark also makes programming jobs easy by integrating into the Scala programming language. Finally, the ability of Spark to load a dataset into memory and query it repeatedly makes it especially suitable for interactive analysis of big datasets. We have modified the Scala interpreter to make it possible to use Spark interactively, providing a much more responsive experience than Hive and Pig. Spark is built on top of Mesos and can therefore run alongside Hadoop.
Project URL: http://www.cs.berkeley.edu/~matei/spark/

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,645
On Slideshare
3,434
From Embeds
211
Number of Embeds
8

Actions

Shares
Downloads
96
Comments
0
Likes
10

Embeds 211

http://developer.yahoo.net 103
http://www.scoop.it 73
http://yahoohadoop.tumblr.com 15
http://static.slidesharecdn.com 9
https://www.tumblr.com 5
http://developer.yahoo.com 4
http://www.slideshare.net 1
https://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Mention it’s a research project but has recently gone open source
  • Mention Hama, Pregel, Dremel, etc; also mention Spark***Mention how we want to enable NEW frameworks too!!***
  • Mention how we want to enable NEW frameworks too!
  • Note here that even users of Hadoop want Mesos for the isolation it provides
  • Explain that Hadoop port uses pluggable scheduler, hence few changes to Hadoop and no changes to job clients
  • Point out that Scala is a modern PL etcMention DryadLINQ (but we go beyond it with RDDs)Point out that interactive use and iterative use go hand in hand because both require small tasks and dataset reuse
  • Note that dataset is reused on each gradient computation
  • This is for a 29 GB dataset on 20 EC2 m1.xlarge machines (4 cores each)
  • This slide has to say we’re committed to making Nexus “real”, and that it will be open sourced

Transcript

  • 1. Mesos
    A Resource Management Platform for Hadoop and Big Data Clusters
    Benjamin Hindman, Andy Konwinski,MateiZaharia,
    Ali Ghodsi, Anthony D. Joseph, Randy Katz,
    Scott Shenker, Ion Stoica
    UC Berkeley
  • 2. Challenges in Hadoop Cluster Management
    Isolation (both fault and resource)
    E.g. if co-locating production and experimental jobs
    Testing and deploying upgrades
    JobTracker scalability (in larger clusters)
    Running non-Hadoop jobs
    E.g. MPI, streaming MapReduce, etc
  • 3. Mesos
    Mesosis a cluster resource manager over which multiple instances of Hadoop and other distributed applications can run
    Hadoop
    (production)
    Hadoop
    (ad-hoc)
    MPI

    Mesos
    Node
    Node
    Node
    Node
  • 4. What’s Different About It?
    Other resource managers exist today, including
    Hadoop on Demand
    Batch schedulers (e.g. Torque)
    VM schedulers (e.g. Eucalyptus)
    However, have 2 problems with Hadoop-like workloads:
    Data locality compromised due to static partitioning of nodes
    Utilization hurt because jobs hold nodes for their full duration
    Mesos addresses these through two features:
    Fine-grained sharing at the level of tasks
    Two-level scheduling model where jobs control placement
  • 5. Fine-Grained Sharing
    Coarse-Grained (HOD, etc)
    Fine-Grained (Mesos)
    Hadoop 3
    Hadoop 3
    Hadoop 2
    Hadoop 1
    Hadoop 1
    Hadoop 1
    Hadoop 2
    Hadoop 2
    Hadoop 1
    Hadoop 3
    Hadoop 1
    Hadoop 2
    Hadoop 2
    Hadoop 3
    Hadoop 3
    Hadoop 2
    Hadoop 2
    Hadoop 3
    Hadoop 2
    Hadoop 3
    Hadoop 1
    Hadoop 3
    Hadoop 1
    Hadoop 2
  • 6. Hadoop job
    Hadoopjob
    MPI job
    Hadoop 0.20 scheduler
    Hadoop 0.19 scheduler
    MPI
    scheduler
    Mesosmaster
    Mesosslave
    Mesosslave
    Mesosslave
    MPI
    executor
    MPI
    executor
    Hadoop 19 executor
    Hadoop 20 executor
    Hadoop 19 executor
    task
    task
    task
    task
    Mesos Architecture
    task
  • 7. Design Goals
    Scalability (to 10,000’s of nodes)
    Robustness (even to master failure)
    Flexibility (support wide variety of frameworks)
    Resulting design: simple, minimal core that pushes resource selection logic to frameworks
  • 8. Other Features
    Master fault tolerance using ZooKeeper
    Resource isolation using Linux Containers
    Isolate CPU and memory between tasks on each node
    In newest kernels, can isolate network & disk IO too
    Web UI for viewing cluster state
    Deploy scripts for private clusters and EC2
  • 9. Mesos Status
    Prototype in 10000 lines of C++
    Ported frameworks:
    Hadoop (0.20.2), MPI (MPICH2), Torque
    New frameworks:
    Spark, Scala framework for iterative & interactive jobs
    Test deployments at Twitter and Facebook
  • 10. Results
    Dynamic Resource Sharing
    Data Locality
    Scalability
  • 11. Mesos Demo
  • 12. Spark
    A framework for iterative and interactive cluster computing
    Matei Zaharia, MosharafChowdhury,
    Michael Franklin, Scott Shenker, Ion Stoica
  • 13. Spark Goals
    Support iterative applications
    Common in machine learning but problematic forMapReduce, Dryad, etc
    Retain MapReduce’s fault tolerance & scalability
    Experiment with programmability
    Integrate into Scala programming language
    Support interactive use from Scala interpreter
  • 14. Key Abstraction
    Resilient Distributed Datasets (RDDs)
    Collections of elements distributed across cluster that can persist across parallel operations
    Can be stored in memory, on disk, etc
    Can be transformed with map, filter, etc
    Automatically rebuilt on failure
  • 15. Example: Log Mining
    Load error messages from a log into memory, then interactively search for various patterns
    Cache 1
    Base RDD
    Transformed RDD
    lines = spark.textFile(“hdfs://...”)
    errors = lines.filter(_.startsWith(“ERROR”))
    messages = errors.map(_.split(‘ ’)(2))
    cachedMsgs = messages.cache()
    Worker
    results
    tasks
    Block 1
    Driver
    Cached RDD
    Parallel operation
    cachedMsgs.filter(_.contains(“foo”)).count
    Cache 2
    cachedMsgs.filter(_.contains(“bar”)).count
    Worker
    . . .
    Cache 3
    Block 2
    Worker
    Block 3
  • 16. Example: Logistic Regression
    Goal: find best line separating two sets of points
    random initial line
    +
    +
    +
    +
    +
    +

    +
    +


    +

    +






    target
  • 17. Logistic Regression Code
    val data = spark.textFile(...).map(readPoint).cache()
    varw = Vector.random(D)
    for (i <- 1 to ITERATIONS) {
    val gradient = data.map(p => {
    val scale = (1/(1+exp(-p.y*(w dot p.x))) - 1) * p.y
    scale * p.x
    }).reduce(_ + _)
    w -= gradient
    }
    println("Finalw: " + w)
  • 18. Logistic Regression Performance
    127 s / iteration
    first iteration 174 s
    further iterations 6 s
  • 19. Spark Demo
  • 20. Conclusion
    Mesos provides a stable platform to multiplex resources among diverse cluster applications
    Spark is a new cluster programming framework for iterative & interactive jobs enabled by Mesos
    Both are open-source (but in very early alpha!)
    http://github.com/mesos
    http://mesos.berkeley.edu