SlideShare a Scribd company logo
Datacenter Computing 

with Apache Mesos	



BigData DC Meetup @AddThis

2014-04-15	



Paco Nathan 

http://liber118.com/pxn/

@pacoid
meetup.com/bigdatadc/events/172610652/
A Big Idea
!
Have you heard about 

“data democratization” ? ? ?	

	

	

 making data available

	

	

 	

 throughout more of the organization
!
Have you heard about 

“data democratization” ? ? ?	

	

	

 making data available

	

	

 	

 throughout more of the organization	

!
Then how would you handle 

“cluster democratization” ? ? ?	

	

	

 making data+resources available

	

	

 	

 throughout	

more of the organization
!
Have you heard about 

“data democratization” ? ? ?	

	

	

 making data available

	

	

 	

 throughout more of the organization	

!
Then how would you handle 

“cluster democratization” ? ? ?	

	

	

 making data+resources available

	

	

 	

 throughout	

more of the organization
In other words, 

how to remove silos…
Lessons

from Google
Datacenter Computing	

Google has been doing datacenter computing for years, 

to address the complexities of large-scale data workflows:	

• leveraging the modern kernel: isolation in lieu of VMs	

• “most (>80%) jobs are batch jobs, but the majority 

of resources (55–80%) are allocated to service jobs”	

• mixed workloads, multi-tenancy	

• relatively high utilization rates	

• JVM? not so much…	

• reality: scheduling batch is simple; 

scheduling services is hard/expensive
The Modern Kernel: Top Linux Contributors…	

arstechnica.com/information-technology/2013/09/...
“Return of the Borg”	

Return of the Borg: HowTwitter Rebuilt Google’s SecretWeapon

Cade Metz

wired.com/wiredenterprise/2013/03/google-
borg-twitter-mesos	

!
The Datacenter as a Computer: An Introduction 

to the Design ofWarehouse-Scale Machines	

Luiz André Barroso, Urs Hölzle	

research.google.com/pubs/pub35290.html	

!
!
2011 GAFS Omega

John Wilkes, et al.

youtu.be/0ZFMlO98Jkc
Google describes the technology…	

Omega: flexible, scalable schedulers for large compute clusters	

Malte Schwarzkopf,Andy Konwinski, Michael Abd-El-Malek, John Wilkes	

eurosys2013.tudos.org/wp-content/uploads/2013/paper/
Schwarzkopf.pdf
Google describes the business case…	

Taming LatencyVariability

Jeff Dean

plus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv
Commercial OS Cluster Schedulers	

!
• IBM Platform Symphony

• Microsoft Autopilot	

!


Arguably, some grid controllers 

are quite notable in-category:	

• Univa Grid Engine (formerly SGE)

• Condor	

• etc.
Emerging

at Berkeley
Beyond Hadoop	

Hadoop – an open source solution for fault-tolerant
parallel processing of batch jobs at scale, based on
commodity hardware… however, other priorities have
emerged for the analytics lifecycle:	

• apps require integration beyond Hadoop	

• multiple topologies, mixed workloads, multi-tenancy	

• significant disruptions in h/w cost/performance
curves	

• higher utilization	

• lower latency	

• highly-available, long running services	

• more than “Just JVM” – e.g., Python growth
Just No Getting Around It	

“There's Just No Getting Around It:You're Building a Distributed System”

Mark Cavage

ACM Queue (2013-05-03)

queue.acm.org/detail.cfm?id=2482856	

key takeaways on architecture:	

• decompose the business application into discrete services on the
boundaries of fault domains, scaling, and data workload	

• make as many things as possible stateless	

• when dealing with state, deeply understand CAP, latency, throughput,
and durability requirements
“Without practical experience working on successful—and failed—systems, most engineers
take a "hopefully it works" approach and attempt to string together off-the-shelf software,
whether open source or commercial, and often are unsuccessful at building a resilient,
performant system. In reality, building a distributed system requires a methodical approach
to requirements along the boundaries of failure domains, latency, throughput, durability,
consistency, and desired SLAs for the business application at all aspects of the application.”
Mesos – open source datacenter computing	

a common substrate for cluster computing	

mesos.apache.org	

heterogenous assets in your datacenter or cloud 

made available as a homogenous set of resources	

• top-level Apache project	

• scalability to 10,000s of nodes	

• obviates the need for virtual machines	

• isolation (pluggable) for CPU, RAM, I/O, FS, etc.	

• fault-tolerant leader election based on Zookeeper	

• APIs in C++, Java/Scala, Python, Go, Erlang, Haskell	

• web UI for inspecting cluster state	

• available for Linux, OpenSolaris, Mac OSX
What are the costs of Virtualization?
benchmark	

type
OpenVZ	

improvement
mixed workloads 210%-300%
LAMP (related) 38%-200%
I/O throughput 200%-500%
response time order magnitude
more pronounced 

at higher loads
What are the costs of Single Tenancy?
0%
25%
50%
75%
100%
RAILS CPU
LOAD
MEMCACHED
CPU LOAD
0%
25%
50%
75%
100%
HADOOP CPU
LOAD
0%
25%
50%
75%
100%
t t
0%
25%
50%
75%
100%
Rails
Memcached
Hadoop
COMBINED CPU LOAD (RAILS,
MEMCACHED, HADOOP)
Arguments for Datacenter Computing	

rather than running several specialized clusters, each 

at relatively low utilization rates, instead run many 

mixed workloads 	

obvious benefits are realized in terms of:	

• scalability, elasticity, fault tolerance, performance, utilization	

• reduced equipment capex, Ops overhead, etc.	

• reduced licensing, eliminating need forVMs or potential 

vendor lock-in	

subtle benefits – arguably, more important for Enterprise IT:	

• reduced time for engineers to ramp up new services at scale	

• reduced latency between batch and services, enabling new 

high ROI use cases	

• enables Dev/Test apps to run safely on a Production cluster
Analogies and
Architecture
Prior Practice: Dedicated Servers	

• low utilization rates	

• longer time to ramp up new services
DATACENTER
Prior Practice: Virtualization	

DATACENTER PROVISIONED VMS
• even more machines to manage	

• substantial performance decrease 

due to virtualization	

• VM licensing costs
Prior Practice: Static Partitioning
STATIC PARTITIONING
• even more machines to manage	

• substantial performance decrease 

due to virtualization	

• VM licensing costs	

• failures make static partitioning 

more complex to manage
DATACENTER
MESOS
Mesos: One Large Pool of Resources	

“We wanted people to be able to program 

for the datacenter just like they program 

for their laptop."	

!
Ben Hindman
DATACENTER
Frameworks Integrated with Mesos	

Continuous Integration:

Jenkins, GitLab	

Big Data:

Hadoop, Spark, Storm,
Kafka
Python workloads:

DPark, Exelixi
Meta-Frameworks / HA Services:

Aurora, Marathon
Orchestration:

Singularity

Distributed Cron:

Chronos	

Data Storage:

ElasticSearch, Cassandra,

Hypertable
Containers:

Docker, Deimos
Parallel Processing:

Chapel, MPI
!
Fault-tolerant distributed systems…	

…written in 100-300 lines of 

C++, Java/Scala, Python, Go, etc.	

…building blocks, if you will	

!
Q: required lines of network code?	

A: probably none
Kernel
Apps
servicesbatch
Frameworks
Python
JVM
C
++
Workloads
distributed file system
Chronos
DFS
distributed resources: CPU, RAM, I/O, FS, rack locality, etc. Cluster
Storm
Kafka JBoss Django RailsSharkImpalaScalding
Marathon
SparkHadoopMPI
MySQL
Mesos – architecture
Mesos – architecture	

HDFS, distrib file system
Mesos, distrib kernel
meta-frameworks: Aurora, Marathon
frameworks: Spark, Storm,
MPI, Jenkins, etc.
task schedulers: Chronos, etc.
APIs: C++, JVM, Py, Go
apps: HA services, web apps, batch
jobs, scripts, etc.
Linux: libcgroup, libprocess, libev, etc.
Mesos – dynamics	

Mesos
distrib kernel
Marathon
distrib init.d
Chronos
distrib cron
distrib
frameworks
HA
services
scheduled
apps
Mesos – dynamics	

resource
offers
distributed
framework
Scheduler Executor Executor Executor
Mesos
slave
Mesos
slave
Mesos
slave
distributed
kernel
available resources
Mesos
slave
Mesos
slave
Mesos
slave
Mesos
masterMesos
master
Example: Resource Offer in a Two-Level Scheduler
mesos.apache.org/documentation/latest/mesos-architecture/
M
Master
Docker
Registry
index.docker.io
Local
Docker
Registry
( optional )
M
M
S
S
S
S
S
S
marathon
docker
docker
docker
Mesos
master servers
Mesos
slave servers
Marathon can launch and monitor
service containers from one or
more Docker registries, using
the Docker executor for Mesos
S
S
S S
S
S
…
…
…
……
…
…
Example: Docker on Mesos
mesosphere.io/2013/09/26/docker-on-mesos/
Mesos Master Server
init
|
+ mesos-master
|
+ marathon
|
Mesos Slave Server
init
|
+ docker
| |
| + lxc
| |
| + (user task, under container init system)
| |
|
+ mesos-slave
| |
| + /var/lib/mesos/executors/docker
| | |
| | + docker run …
| | |
The executor, monitored by the
Mesos slave, delegates to the
local Docker daemon for image
discovery and management. The
executor communicates with
Marathon via the Mesos master
and ensures that Docker enforces
the specified resource limitations.
Example: Docker on Mesos
mesosphere.io/2013/09/26/docker-on-mesos/
Mesos Master Server
init
|
+ mesos-master
|
+ marathon
|
Mesos Slave Server
init
|
+ docker
| |
| + lxc
| |
| + (user task, under container init system)
| |
|
+ mesos-slave
| |
| + /var/lib/mesos/executors/docker
| | |
| | + docker run …
| | |
Docker
Registry
When a user requests
a container…
Mesos, LXC, and
Docker are tied
together for launch
2
1
3
4
5
6
7
8
Example: Docker on Mesos
mesosphere.io/2013/09/26/docker-on-mesos/
Looking
Ahead…
Quasar+Mesos @ Stanford, Twitter, etc.…	

Quasar: Resource-Efficient and QoS-Aware Cluster Management

Christina Delimitrou, Christos Kozyrakis

stanford.edu/~cdel/2014.asplos.quasar.pdf
Quasar+Mesos @ Stanford, Twitter, etc.…	

Improving Resource Efficiency with Apache Mesos

Christina Delimitrou

youtu.be/YpmElyi94AA
Quasar+Mesos @ Stanford, Twitter, etc.…	

Consider that for datacenter computing at scale, surge in workloads
implies:	

• large cap-ex investment, long lead-time to build	

• utilities cannot supply the power requirements	

Even for large players that achieve 2x beyond typical industry DC
util rates, those factors become show-stoppers. Even so, high rates
of over-provisioning are typical, so there’s much room to improve.	

Experiences with Quasar+Mesos showed:	

• 88% apps get >95% performance	

• ~10% overprovisioning instead of 500%	

• up to 70% cluster util at steady state	

• 23% shorter scenario completion
What’s New in Mesos 0.18 ?	

mesos.apache.org/blog/mesos-0-18-0-released/	

• Containerizer API	

• Isolator API	

• Cgroup layout change	

!
!
Containerization and Docker:
“The first change is in terminology.The Mesos Slave now
uses a Containerizer to provide an environment for each
executor and its tasks to run in. Containerization includes
resource isolation but is a more general concept that can
encompass such things as packaging.”
Because…

Use Cases
Production Deployments (public)
Built-in /

bare metal
Hypervisors
Solaris Zones
Linux CGroups
Opposite Ends of the Spectrum, One Common Substrate
Opposite Ends of the Spectrum, One Common Substrate	

Request /

Response
Batch
Case Study: Twitter (bare metal / on premise)	

“Mesos is the cornerstone of our elastic compute infrastructure – 

it’s how we build all our new services and is critical forTwitter’s

continued success at scale. It's one of the primary keys to our

data center efficiency."	

Chris Fry, SVP Engineering	

blog.twitter.com/2013/mesos-graduates-from-apache-incubation	

wired.com/gadgetlab/2013/11/qa-with-chris-fry/	

!
• key services run in production: analytics, typeahead, ads	

• Twitter engineers rely on Mesos to build all new services	

• instead of thinking about static machines, engineers think 

about resources like CPU, memory and disk	

• allows services to scale and leverage a shared pool of 

servers across datacenters efficiently	

• reduces the time between prototyping and launching
Case Study: Airbnb (fungible cloud infrastructure)	

“We think we might be pushing data science in the field of travel 

more so than anyone has ever done before… a smaller number 

of engineers can have higher impact through automation on 

Mesos."	

Mike Curtis,VP Engineering

gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data...	

• improves resource management and efficiency	

• helps advance engineering strategy of building small teams 

that can move fast	

• key to letting engineers make the most of AWS-based 

infrastructure beyond just Hadoop	

• allowed company to migrate off Elastic MapReduce	

• enables use of Hadoop along with Chronos, Spark, Storm, etc.
Case Study: eBay (continuous integration)	

eBay PaaS Team

ebaytechblog.com/2014/04/04/delivering-ebays-ci-
solution-with-apache-mesos-part-i/	

• cluster management (PaaS core framework
services) for CI 	

• integration of: OpenStack, Jenkins, Zookeeper,
Mesos, Marathon,Ansible
In eBay’s existing CI model, each developer gets a personal CI/Jenkins Master
instance.This Jenkins instance runs within a dedicatedVM, and over time the
result has beenVM sprawl and poor resource utilization.We started looking at
solutions to maximize our resource utilization and reduce theVM footprint while
still preserving the individual CI instance model.After much deliberation, we
chose Apache Mesos for a POC.This post shares the journey of how we
approached this challenge and accomplished our goal.
Case Study: HubSpot (cluster management)	

Tom Petr

youtu.be/ROn14csiikw	

• 500 deployable objects; 100 deploys/day to
production; 90 engineers; 3 devops on Mesos
cluster	

• “Our QA cluster is now a fixed $10K/month —
that used to fluctuate”
Resources	

Apache Mesos Project

mesos.apache.org	

Twitter

@ApacheMesos	

Mesosphere

mesosphere.io	

Tutorials

mesosphere.io/learn	

Documentation

mesos.apache.org/documentation	

2011 USENIX Research Paper

usenix.org/legacy/event/nsdi11/tech/full_papers/
Hindman_new.pdf	

Collected Notes/Archives

goo.gl/jPtTP
DIY
!
!
http://elastic.mesosphere.io
!
http://mesosphere.io/learn	

!
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Master 2
NN
ZK
Master 1
NN
ZK
Master 3
NN
ZK
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Worker
DN
Elastic Mesos
ありがとう

ございました
GlueCon

Broomfield, May 21

gluecon.com/2014/	

DockerCon

SF, Jun 9

dockercon.com	

Spark Summit

SF, Jun 30

spark-summit.org/2014	

OSCON

PDX, Jul 20

oscon.com/oscon2014/	

Strata NYC + Hadoop World

NYC, Oct 15

strataconf.com/stratany2014
Calendar:
New Book:	

“Just Enough Math”, 

with Allen Day @MapR Asia	

!
advanced math for business people,

to leverage open source for Big Data	

galleys: July 2014 @ OSCON

oscon.com/oscon2014/public/
schedule/detail/34873
Enterprise DataWorkflows with Cascading	

O’Reilly, 2013	

shop.oreilly.com/product/0636920028536.do	

!
monthly newsletter for updates, 

events, conference summaries, etc.:	

liber118.com/pxn/

More Related Content

What's hot

Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark Summit
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Spark Summit
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesPhil Peace
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudAccubits Technologies
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthdaveconnors
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudTime to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudAmazon Web Services
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Spark Summit
 
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 

What's hot (20)

Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Hadoop on-mesos
Hadoop on-mesosHadoop on-mesos
Hadoop on-mesos
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudTime to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
 
File Context
File ContextFile Context
File Context
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
 
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 

Similar to Datacenter Computing with Apache Mesos - BigData DC

Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員MeetupDatacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員MeetupPaco Nathan
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamalJoarder Kamal
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackWes McKinney
 
Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and IntegrationAlex Baretto
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...Ludovic Piot
 
Integration in the age of DevOps
Integration in the age of DevOpsIntegration in the age of DevOps
Integration in the age of DevOpsAlbert Wong
 
Succeding with the Apache SOA stack
Succeding with the Apache SOA stackSucceding with the Apache SOA stack
Succeding with the Apache SOA stackJohan Edstrom
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OW2
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware
 
Enterprise-Ready Private and Hybrid Cloud Computing Today
Enterprise-Ready Private and Hybrid Cloud Computing TodayEnterprise-Ready Private and Hybrid Cloud Computing Today
Enterprise-Ready Private and Hybrid Cloud Computing TodayRightScale
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformMarc Dutoo
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with techMongoDB
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosPaco Nathan
 
Tackling complexity in giant systems: approaches from several cloud providers
Tackling complexity in giant systems: approaches from several cloud providersTackling complexity in giant systems: approaches from several cloud providers
Tackling complexity in giant systems: approaches from several cloud providersPatrick Chanezon
 
Modern Web Development (2018)
Modern Web Development (2018)Modern Web Development (2018)
Modern Web Development (2018)Randy Connolly
 
LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...
LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...
LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...The Linux Foundation
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 

Similar to Datacenter Computing with Apache Mesos - BigData DC (20)

Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員MeetupDatacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
 
Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
 
Integration in the age of DevOps
Integration in the age of DevOpsIntegration in the age of DevOps
Integration in the age of DevOps
 
Succeding with the Apache SOA stack
Succeding with the Apache SOA stackSucceding with the Apache SOA stack
Succeding with the Apache SOA stack
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
 
Enterprise-Ready Private and Hybrid Cloud Computing Today
Enterprise-Ready Private and Hybrid Cloud Computing TodayEnterprise-Ready Private and Hybrid Cloud Computing Today
Enterprise-Ready Private and Hybrid Cloud Computing Today
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with tech
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
 
Tackling complexity in giant systems: approaches from several cloud providers
Tackling complexity in giant systems: approaches from several cloud providersTackling complexity in giant systems: approaches from several cloud providers
Tackling complexity in giant systems: approaches from several cloud providers
 
Modern Web Development (2018)
Modern Web Development (2018)Modern Web Development (2018)
Modern Web Development (2018)
 
LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...
LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...
LF Collab Summit 2015: ARM Servers for the Next Generation Date Center and Cl...
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
The Enterprise Cloud
The Enterprise CloudThe Enterprise Cloud
The Enterprise Cloud
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 

More from Paco Nathan

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryPaco Nathan
 
Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 

More from Paco Nathan (20)

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with ML
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
 
Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 

Recently uploaded

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Boni Yeamin
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Intelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfIntelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfAnthony Lucente
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 

Recently uploaded (20)

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Intelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfIntelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdf
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 

Datacenter Computing with Apache Mesos - BigData DC

  • 1. Datacenter Computing 
 with Apache Mesos 
 BigData DC Meetup @AddThis
 2014-04-15 
 Paco Nathan 
 http://liber118.com/pxn/
 @pacoid meetup.com/bigdatadc/events/172610652/
  • 3. ! Have you heard about 
 “data democratization” ? ? ? making data available
 throughout more of the organization
  • 4. ! Have you heard about 
 “data democratization” ? ? ? making data available
 throughout more of the organization ! Then how would you handle 
 “cluster democratization” ? ? ? making data+resources available
 throughout more of the organization
  • 5. ! Have you heard about 
 “data democratization” ? ? ? making data available
 throughout more of the organization ! Then how would you handle 
 “cluster democratization” ? ? ? making data+resources available
 throughout more of the organization In other words, 
 how to remove silos…
  • 7. Datacenter Computing Google has been doing datacenter computing for years, 
 to address the complexities of large-scale data workflows: • leveraging the modern kernel: isolation in lieu of VMs • “most (>80%) jobs are batch jobs, but the majority 
 of resources (55–80%) are allocated to service jobs” • mixed workloads, multi-tenancy • relatively high utilization rates • JVM? not so much… • reality: scheduling batch is simple; 
 scheduling services is hard/expensive
  • 8. The Modern Kernel: Top Linux Contributors… arstechnica.com/information-technology/2013/09/...
  • 9. “Return of the Borg” Return of the Borg: HowTwitter Rebuilt Google’s SecretWeapon
 Cade Metz
 wired.com/wiredenterprise/2013/03/google- borg-twitter-mesos ! The Datacenter as a Computer: An Introduction 
 to the Design ofWarehouse-Scale Machines Luiz André Barroso, Urs Hölzle research.google.com/pubs/pub35290.html ! ! 2011 GAFS Omega
 John Wilkes, et al.
 youtu.be/0ZFMlO98Jkc
  • 10. Google describes the technology… Omega: flexible, scalable schedulers for large compute clusters Malte Schwarzkopf,Andy Konwinski, Michael Abd-El-Malek, John Wilkes eurosys2013.tudos.org/wp-content/uploads/2013/paper/ Schwarzkopf.pdf
  • 11. Google describes the business case… Taming LatencyVariability
 Jeff Dean
 plus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv
  • 12. Commercial OS Cluster Schedulers ! • IBM Platform Symphony
 • Microsoft Autopilot ! 
 Arguably, some grid controllers 
 are quite notable in-category: • Univa Grid Engine (formerly SGE)
 • Condor • etc.
  • 14. Beyond Hadoop Hadoop – an open source solution for fault-tolerant parallel processing of batch jobs at scale, based on commodity hardware… however, other priorities have emerged for the analytics lifecycle: • apps require integration beyond Hadoop • multiple topologies, mixed workloads, multi-tenancy • significant disruptions in h/w cost/performance curves • higher utilization • lower latency • highly-available, long running services • more than “Just JVM” – e.g., Python growth
  • 15. Just No Getting Around It “There's Just No Getting Around It:You're Building a Distributed System”
 Mark Cavage
 ACM Queue (2013-05-03)
 queue.acm.org/detail.cfm?id=2482856 key takeaways on architecture: • decompose the business application into discrete services on the boundaries of fault domains, scaling, and data workload • make as many things as possible stateless • when dealing with state, deeply understand CAP, latency, throughput, and durability requirements “Without practical experience working on successful—and failed—systems, most engineers take a "hopefully it works" approach and attempt to string together off-the-shelf software, whether open source or commercial, and often are unsuccessful at building a resilient, performant system. In reality, building a distributed system requires a methodical approach to requirements along the boundaries of failure domains, latency, throughput, durability, consistency, and desired SLAs for the business application at all aspects of the application.”
  • 16.
  • 17. Mesos – open source datacenter computing a common substrate for cluster computing mesos.apache.org heterogenous assets in your datacenter or cloud 
 made available as a homogenous set of resources • top-level Apache project • scalability to 10,000s of nodes • obviates the need for virtual machines • isolation (pluggable) for CPU, RAM, I/O, FS, etc. • fault-tolerant leader election based on Zookeeper • APIs in C++, Java/Scala, Python, Go, Erlang, Haskell • web UI for inspecting cluster state • available for Linux, OpenSolaris, Mac OSX
  • 18. What are the costs of Virtualization? benchmark type OpenVZ improvement mixed workloads 210%-300% LAMP (related) 38%-200% I/O throughput 200%-500% response time order magnitude more pronounced 
 at higher loads
  • 19. What are the costs of Single Tenancy? 0% 25% 50% 75% 100% RAILS CPU LOAD MEMCACHED CPU LOAD 0% 25% 50% 75% 100% HADOOP CPU LOAD 0% 25% 50% 75% 100% t t 0% 25% 50% 75% 100% Rails Memcached Hadoop COMBINED CPU LOAD (RAILS, MEMCACHED, HADOOP)
  • 20. Arguments for Datacenter Computing rather than running several specialized clusters, each 
 at relatively low utilization rates, instead run many 
 mixed workloads obvious benefits are realized in terms of: • scalability, elasticity, fault tolerance, performance, utilization • reduced equipment capex, Ops overhead, etc. • reduced licensing, eliminating need forVMs or potential 
 vendor lock-in subtle benefits – arguably, more important for Enterprise IT: • reduced time for engineers to ramp up new services at scale • reduced latency between batch and services, enabling new 
 high ROI use cases • enables Dev/Test apps to run safely on a Production cluster
  • 22. Prior Practice: Dedicated Servers • low utilization rates • longer time to ramp up new services DATACENTER
  • 23. Prior Practice: Virtualization DATACENTER PROVISIONED VMS • even more machines to manage • substantial performance decrease 
 due to virtualization • VM licensing costs
  • 24. Prior Practice: Static Partitioning STATIC PARTITIONING • even more machines to manage • substantial performance decrease 
 due to virtualization • VM licensing costs • failures make static partitioning 
 more complex to manage DATACENTER
  • 25. MESOS Mesos: One Large Pool of Resources “We wanted people to be able to program 
 for the datacenter just like they program 
 for their laptop." ! Ben Hindman DATACENTER
  • 26. Frameworks Integrated with Mesos Continuous Integration:
 Jenkins, GitLab Big Data:
 Hadoop, Spark, Storm, Kafka Python workloads:
 DPark, Exelixi Meta-Frameworks / HA Services:
 Aurora, Marathon Orchestration:
 Singularity
 Distributed Cron:
 Chronos Data Storage:
 ElasticSearch, Cassandra,
 Hypertable Containers:
 Docker, Deimos Parallel Processing:
 Chapel, MPI
  • 27. ! Fault-tolerant distributed systems… …written in 100-300 lines of 
 C++, Java/Scala, Python, Go, etc. …building blocks, if you will ! Q: required lines of network code? A: probably none
  • 28. Kernel Apps servicesbatch Frameworks Python JVM C ++ Workloads distributed file system Chronos DFS distributed resources: CPU, RAM, I/O, FS, rack locality, etc. Cluster Storm Kafka JBoss Django RailsSharkImpalaScalding Marathon SparkHadoopMPI MySQL Mesos – architecture
  • 29. Mesos – architecture HDFS, distrib file system Mesos, distrib kernel meta-frameworks: Aurora, Marathon frameworks: Spark, Storm, MPI, Jenkins, etc. task schedulers: Chronos, etc. APIs: C++, JVM, Py, Go apps: HA services, web apps, batch jobs, scripts, etc. Linux: libcgroup, libprocess, libev, etc.
  • 30. Mesos – dynamics Mesos distrib kernel Marathon distrib init.d Chronos distrib cron distrib frameworks HA services scheduled apps
  • 31. Mesos – dynamics resource offers distributed framework Scheduler Executor Executor Executor Mesos slave Mesos slave Mesos slave distributed kernel available resources Mesos slave Mesos slave Mesos slave Mesos masterMesos master
  • 32. Example: Resource Offer in a Two-Level Scheduler mesos.apache.org/documentation/latest/mesos-architecture/
  • 33. M Master Docker Registry index.docker.io Local Docker Registry ( optional ) M M S S S S S S marathon docker docker docker Mesos master servers Mesos slave servers Marathon can launch and monitor service containers from one or more Docker registries, using the Docker executor for Mesos S S S S S S … … … …… … … Example: Docker on Mesos mesosphere.io/2013/09/26/docker-on-mesos/
  • 34. Mesos Master Server init | + mesos-master | + marathon | Mesos Slave Server init | + docker | | | + lxc | | | + (user task, under container init system) | | | + mesos-slave | | | + /var/lib/mesos/executors/docker | | | | | + docker run … | | | The executor, monitored by the Mesos slave, delegates to the local Docker daemon for image discovery and management. The executor communicates with Marathon via the Mesos master and ensures that Docker enforces the specified resource limitations. Example: Docker on Mesos mesosphere.io/2013/09/26/docker-on-mesos/
  • 35. Mesos Master Server init | + mesos-master | + marathon | Mesos Slave Server init | + docker | | | + lxc | | | + (user task, under container init system) | | | + mesos-slave | | | + /var/lib/mesos/executors/docker | | | | | + docker run … | | | Docker Registry When a user requests a container… Mesos, LXC, and Docker are tied together for launch 2 1 3 4 5 6 7 8 Example: Docker on Mesos mesosphere.io/2013/09/26/docker-on-mesos/
  • 37. Quasar+Mesos @ Stanford, Twitter, etc.… Quasar: Resource-Efficient and QoS-Aware Cluster Management
 Christina Delimitrou, Christos Kozyrakis
 stanford.edu/~cdel/2014.asplos.quasar.pdf
  • 38. Quasar+Mesos @ Stanford, Twitter, etc.… Improving Resource Efficiency with Apache Mesos
 Christina Delimitrou
 youtu.be/YpmElyi94AA
  • 39. Quasar+Mesos @ Stanford, Twitter, etc.… Consider that for datacenter computing at scale, surge in workloads implies: • large cap-ex investment, long lead-time to build • utilities cannot supply the power requirements Even for large players that achieve 2x beyond typical industry DC util rates, those factors become show-stoppers. Even so, high rates of over-provisioning are typical, so there’s much room to improve. Experiences with Quasar+Mesos showed: • 88% apps get >95% performance • ~10% overprovisioning instead of 500% • up to 70% cluster util at steady state • 23% shorter scenario completion
  • 40. What’s New in Mesos 0.18 ? mesos.apache.org/blog/mesos-0-18-0-released/ • Containerizer API • Isolator API • Cgroup layout change ! ! Containerization and Docker: “The first change is in terminology.The Mesos Slave now uses a Containerizer to provide an environment for each executor and its tasks to run in. Containerization includes resource isolation but is a more general concept that can encompass such things as packaging.”
  • 43. Built-in /
 bare metal Hypervisors Solaris Zones Linux CGroups Opposite Ends of the Spectrum, One Common Substrate
  • 44. Opposite Ends of the Spectrum, One Common Substrate Request /
 Response Batch
  • 45. Case Study: Twitter (bare metal / on premise) “Mesos is the cornerstone of our elastic compute infrastructure – 
 it’s how we build all our new services and is critical forTwitter’s
 continued success at scale. It's one of the primary keys to our
 data center efficiency." Chris Fry, SVP Engineering blog.twitter.com/2013/mesos-graduates-from-apache-incubation wired.com/gadgetlab/2013/11/qa-with-chris-fry/ ! • key services run in production: analytics, typeahead, ads • Twitter engineers rely on Mesos to build all new services • instead of thinking about static machines, engineers think 
 about resources like CPU, memory and disk • allows services to scale and leverage a shared pool of 
 servers across datacenters efficiently • reduces the time between prototyping and launching
  • 46. Case Study: Airbnb (fungible cloud infrastructure) “We think we might be pushing data science in the field of travel 
 more so than anyone has ever done before… a smaller number 
 of engineers can have higher impact through automation on 
 Mesos." Mike Curtis,VP Engineering
 gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data... • improves resource management and efficiency • helps advance engineering strategy of building small teams 
 that can move fast • key to letting engineers make the most of AWS-based 
 infrastructure beyond just Hadoop • allowed company to migrate off Elastic MapReduce • enables use of Hadoop along with Chronos, Spark, Storm, etc.
  • 47. Case Study: eBay (continuous integration) eBay PaaS Team
 ebaytechblog.com/2014/04/04/delivering-ebays-ci- solution-with-apache-mesos-part-i/ • cluster management (PaaS core framework services) for CI • integration of: OpenStack, Jenkins, Zookeeper, Mesos, Marathon,Ansible In eBay’s existing CI model, each developer gets a personal CI/Jenkins Master instance.This Jenkins instance runs within a dedicatedVM, and over time the result has beenVM sprawl and poor resource utilization.We started looking at solutions to maximize our resource utilization and reduce theVM footprint while still preserving the individual CI instance model.After much deliberation, we chose Apache Mesos for a POC.This post shares the journey of how we approached this challenge and accomplished our goal.
  • 48. Case Study: HubSpot (cluster management) Tom Petr
 youtu.be/ROn14csiikw • 500 deployable objects; 100 deploys/day to production; 90 engineers; 3 devops on Mesos cluster • “Our QA cluster is now a fixed $10K/month — that used to fluctuate”
  • 49. Resources Apache Mesos Project
 mesos.apache.org Twitter
 @ApacheMesos Mesosphere
 mesosphere.io Tutorials
 mesosphere.io/learn Documentation
 mesos.apache.org/documentation 2011 USENIX Research Paper
 usenix.org/legacy/event/nsdi11/tech/full_papers/ Hindman_new.pdf Collected Notes/Archives
 goo.gl/jPtTP
  • 50. DIY
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58. Worker DN Worker DN Worker DN Worker DN Master 2 NN ZK Master 1 NN ZK Master 3 NN ZK Worker DN Worker DN Worker DN Worker DN Worker DN Worker DN Worker DN Worker DN Worker DN Worker DN Worker DN Elastic Mesos
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 65. GlueCon
 Broomfield, May 21
 gluecon.com/2014/ DockerCon
 SF, Jun 9
 dockercon.com Spark Summit
 SF, Jun 30
 spark-summit.org/2014 OSCON
 PDX, Jul 20
 oscon.com/oscon2014/ Strata NYC + Hadoop World
 NYC, Oct 15
 strataconf.com/stratany2014 Calendar:
  • 66. New Book: “Just Enough Math”, 
 with Allen Day @MapR Asia ! advanced math for business people,
 to leverage open source for Big Data galleys: July 2014 @ OSCON
 oscon.com/oscon2014/public/ schedule/detail/34873
  • 67. Enterprise DataWorkflows with Cascading O’Reilly, 2013 shop.oreilly.com/product/0636920028536.do ! monthly newsletter for updates, 
 events, conference summaries, etc.: liber118.com/pxn/