Getting Started Running Apache Spark on Apache Mesos
 

Getting Started Running Apache Spark on Apache Mesos

on

  • 3,460 views

O'Reilly Media webcast 2014-01-24 http://www.oreillynet.com/pub/e/2986

O'Reilly Media webcast 2014-01-24 http://www.oreillynet.com/pub/e/2986

Statistics

Views

Total Views
3,460
Views on SlideShare
3,296
Embed Views
164

Actions

Likes
18
Downloads
80
Comments
0

9 Embeds 164

http://cynigma.com 129
https://twitter.com 17
http://dschool.co 4
https://www.linkedin.com 3
http://moellus.tumblr.com 3
http://www.dschool.co 3
http://www.linkedin.com 2
http://feedly.com 2
http://www.inoreader.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Getting Started Running Apache Spark on Apache Mesos Getting Started Running Apache Spark on Apache Mesos Presentation Transcript

  • Getting Started Running 
 Apache Spark on Apache Mesos, 2014-01-24 Paco Nathan 
 liber118.com/pxn
 @pacoid
  • Spark on Mesos, 2014-01-24 • what is Apache Mesos? • launch a Mesos cluster in the cloud • configure and run Spark on Mesos • run jobs in Spark • further resources…
  • Datacenter Computing Google has been doing datacenter computing for years, 
 to address the complexities of large-scale data workflows: • • leveraging the modern kernel: isolation in lieu of VMs • “most (>80%) jobs are batch jobs, but the majority 
 of resources (55–80%) are allocated to service jobs” • • • mixed workloads, multi-tenancy among the top 10 Linux kernel OSS contributors: cgroups relatively high utilization rates JVM? not so much… ! take-aways: 
 scheduling batch is not so difficult; 
 scheduling services is hard+expensive
  • Google describes the business case… Taming Latency Variability
 Jeff Dean
 plus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv
  • “Return of the Borg” Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon
 Cade Metz
 wired.com/wiredenterprise/2013/03/googleborg-twitter-mesos ! The Datacenter as a Computer: An Introduction 
 to the Design of Warehouse-Scale Machines Luiz André Barroso, Urs Hölzle research.google.com/pubs/pub35290.html ! ! 2011 GAFS Omega
 John Wilkes, et al.
 youtu.be/0ZFMlO98Jkc
  • Google describes the technology… Omega: flexible, scalable schedulers for large compute clusters Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes eurosys2013.tudos.org/wp-content/uploads/2013/paper/ Schwarzkopf.pdf
  • Mesos – open source datacenter computing a common substrate for cluster computing mesos.apache.org heterogenous assets in your datacenter or cloud 
 made available as a homogenous set of resources • • • • • • • • top-level Apache project scalability to 10,000s of nodes obviates the need for virtual machines isolation (pluggable) for CPU, RAM, I/O, FS, etc. fault-tolerant leader election based on Zookeeper APIs in C++, Java, Python, Go web UI for inspecting cluster state available for Linux, OpenSolaris, Mac OSX
  • Mesos – architecture services batch Workloads Apps Scalding MPI Impala Hadoop Shark Spark MySQL Kafka JBoss Django Chronos Storm Rails Frameworks Marathon Kernel distributed file system distributed resources: CPU, RAM, I/O, FS, rack locality, etc. DFS Cluster
  • Mesos – architecture apps: HA services, web apps, batch jobs, scripts, etc. frameworks: Spark, Storm, MPI, Jenkins, etc. task schedulers: Chronos, etc. meta-frameworks: Aurora, Marathon APIs: C++, JVM, Py, Go Mesos, distrib kernel HDFS, distrib file system Linux: libcgroup, libprocess, libev, etc.
  • Mesos – dynamics scheduled apps HA services distrib frameworks Marathon distrib init.d Mesos distrib kernel Chronos distrib cron
  • Mesos – dynamics distributed framework Scheduler Executor Executor Executor Mesos Mesos slave slave Mesos Mesos slave slave Mesos Mesos slave slave resource offers Mesos Mesos master master available resources distributed kernel
  • Production Deployments (public)
  • Case Study: Twitter (bare metal / on premise) “Mesos is the cornerstone of our elastic compute infrastructure – 
 it’s how we build all our new services and is critical for Twitter’s
 continued success at scale. It's one of the primary keys to our
 data center efficiency." Chris Fry, SVP Engineering ! blog.twitter.com/2013/mesos-graduates-from-apache-incubation wired.com/gadgetlab/2013/11/qa-with-chris-fry/ • • • key services run in production: analytics, typeahead, ads • allows services to scale and leverage a shared pool of 
 servers across datacenters efficiently • reduces the time between prototyping and launching Twitter engineers rely on Mesos to build all new services instead of thinking about static machines, engineers think 
 about resources like CPU, memory and disk
  • Spark on Mesos, 2014-01-24 • what is Apache Mesos? • launch a Mesos cluster in the cloud • configure and run Spark on Mesos • run jobs in Spark • further resources…
  • http://elastic.mesosphere.io launch a Mesos cluster in the Amazon AWS 
 cloud in three simple steps, given: 
 • • • AWS credentials SSH public key email address
  • Spark on Mesos, 2014-01-24 • what is Apache Mesos? • launch a Mesos cluster in the cloud • configure and run Spark on Mesos • run jobs in Spark • further resources…
  • http://mesosphere.io/learn/run-spark-on-mesos/ configure and run Spark on a Mesos 
 cluster on AWS, in a seven-step tutorial…
  • step 1: ssh to master
  • ssh -l ubuntu <master>
  • step 2: install git, jdk-7
  • sudo aptitude -y install git! sudo aptitude -y install openjdk-7-jdk
  • step 3: download spark
  • wget http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz! tar xzf spark-0.8.0-incubating-bin-cdh4.tgz! cd spark-0.8.0-incubating-bin-cdh4/
  • step 4: sbt clean assembly
  • SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.4.0 sbt/sbt clean assembly
  • step 5: make distro, cp to HDFS
  • ./make-distribution.sh --hadoop 2.0.0-mr1-cdh4.4.0! mv dist spark-0.8.0-2.0.0-mr1-cdh4.4.0! tar czf spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz spark-0.8.0-2.0.0-mr1-cdh4.4.0! ! hadoop fs -mkdir /tmp! hadoop fs -put spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz /tmp
  • step 6: config env
  • cd conf/! cp spark-env.sh.template spark-env.sh! vim spark-env.sh! ! export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so! export SPARK_EXECUTOR_URI=hdfs://<nn>/tmp/spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz! export MASTER=zk://<master>:2181/mesos! ! cat spark-env.sh! cd ..! ! ./spark-shell
  • et voilà!
  • Spark on Mesos, 2014-01-24 • what is Apache Mesos? • launch a Mesos cluster in the cloud • configure and run Spark on Mesos • run jobs in Spark • further resources…
  • http://spark.incubator.apache.org/examples.html run an example job in Spark, 
 to filter an RDD of integers, in two steps at the REPL…
  • step 1: create an RDD
  • val data = 1 to 10000! val distData = sc.parallelize(data)! ! distData.filter(_< 10).collect()
  • step 2: run the filter
  • Spark on Mesos, 2014-01-24 • what is Apache Mesos? • launch a Mesos cluster in the cloud • configure and run Spark on Mesos • run jobs in Spark • further resources…
  • Join us! ! O’Reilly Strata, Santa Clara
 Feb 11-13
 strataconf.com/strata2014
 Mesos tutorial, Tue 2/11 1:30pm BOF lunch, Wed 2/12 12:10pm Mesos session, Thu 2/13 2:20pm office hours, Thu 2/13 3:15pm
  • More insights… ! Monthly newsletter for 
 events, conf summaries, 
 workshops, etc.: liber118.com/pxn/ ! collected Mesos notes: goo.gl/jPtTP