SlideShare a Scribd company logo
1 of 87
Download to read offline
Augury and Omens Aside, Part 1:

The Business Case for Apache Mesos


Apache Mesos NYC Meetup @Shutterstock

2014-06-11


Paco Nathan 

http://liber118.com/pxn/

@pacoid
meetup.com/Apache-Mesos-NYC-Meetup/events/187583352/
Disclaimer
The following content results from research, use case analysis, industry
observations, plus personal perspectives and opinions – presented by
a speaker who is an independent author/consultant.	

The following content does not in any way represent the opinions or
official messaging for any clients of Liber 118,Apache Foundation,
United Nations,Area 51, S.P.E.C.T.R.E., etc.	

Except, perhaps, for the smarter ones who nurture an ample sense 

of humor, which unfortunately may disqualify much of SiliconValley…
Recent News
Apache releases Mesos 0.19

mesos.apache.org/blog/mesos-0-19-0-released/	

Program announced for inaugural #MesosCon

events.linuxfoundation.org/events/mesoscon	

Mesosphere takes $10.5M in funding

techcrunch.com/2014/06/09/mesosphere-grabs-10m-
in-series-a-funding-to-transform-datacenters/	

Google releases part of Borg/Omega as OSS

wired.com/2014/06/google-kubernetes/
Recent News
Apache releases Mesos 0.19
mesos.apache.org/blog/mesos-0-19-0-released/
Program announced for inaugural #MesosCon
events.linuxfoundation.org/events/mesoscon
Mesosphere takes $10.5M in funding
techcrunch.com/2014/06/09/mesosphere-grabs-10m-
in-series-a-funding-to-transform-datacenters/
Google releases part of Borg/Omega as OSS
wired.com/2014/06/google-kubernetes/
seriously, can’t top that
A Big Idea
From Business Use Cases To Bare Metal
Datacenter
Computing
DataWorkflow
Abstractions
Functional
Programming
Paradigm shifts can be observed at three levels of the tech stack for
cluster computing. Each implies orders of magnitude in cost savings
over prior best results, based on substantive changes in software
engineering practices…
From Business Use Cases To Bare Metal
Datacenter
Computing
DataWorkflow
Abstractions
Functional
Programming
In other words, now that we have Mesos, Docker, and Spark, 

why do we need Hadoop legacy software?
From Business Use Cases To Bare Metal
Datacenter
Computing
DataWorkflow
Abstractions
Functional
Programming
hard problems?
• latency	

• aggregation	

• parallelism	

• data rates
Countdown: Augury and Omens Aside, Part 3…
From Business Use Cases To Bare Metal
Datacenter
Computing
DataWorkflow
Abstractions
Functional
Programming
hard problems => solutions
• applicative systems	

• leveraging semigroup structure	

• lazy evaluation aka combinator graph reduction	

• probabilistic data structures
Countdown: Augury and Omens Aside, Part 3…
From Business Use Cases To Bare Metal
Datacenter
Computing
DataWorkflow
Abstractions
Functional
Programming
hard problems?
• process, data, and metadata in silos	

• BI + data modeling legacy culture	

• CAP theorem vs.ACID	

• accidental complexity	

• propagating schema and lineage	

• learning curve inertia	

• managing risk vs. innovation
Countdown: Augury and Omens Aside, Part 2…
From Business Use Cases To Bare Metal
Datacenter
Computing
DataWorkflow
Abstractions
Functional
Programming
hard problems => solutions
• interdisciplinary teams	

• generalize across batch + real-time + etc.	

• separation of concerns	

• pattern language	

• compiler => query planner
Countdown: Augury and Omens Aside, Part 2…
From Business Use Cases To Bare Metal
Datacenter
Computing
DataWorkflow
Abstractions
Functional
Programming
hard problems?
• commodity hardware failure rates	

• sched. batch is simple; sched. services is expensive	

• no getting around it: building a distrib system	

• static partitioning => cost of cluster computing	

• monolithic controllers vs. shared state	

• low util rates => upsidedown in power availability
Countdown: Augury and Omens Aside, Part 1…
From Business Use Cases To Bare Metal
Datacenter
Computing
DataWorkflow
Abstractions
Functional
Programming
hard problems => solutions
• isolation	

• containerization	

• mixed workloads	

• data locality	

• service+framework architecture	

• predictive scheduling
Countdown: Augury and Omens Aside, Part 1…
Why Does

This Matter?
IoT Data Rates:
technologyreview.com/...
IoT Data Rates:
technologyreview.com/...
Tools and techniques that
served well for ad-tech will
not necessarily apply for
“Industrial Internet” data
rates … we must retool;
power requirements alone
would boil the oceans
Some History,

Part 3
Theory, Eight Decades Ago:	

Haskell Curry, known for seminal work on
combinatory logic (1927)	

Alonzo Church, known for lambda calculus
(1936) and much more! 	

!
Both sought formal answers to the question,
“What can be computed?”
Narrative Arc: Lambda Somethingorother
Haskell Curry

haskell.org
Alonso Church

wikipedia.org
Praxis, Four Decades Ago:	

Leveraging lambda calculus, combinators, etc., to
increase parallelism of apps as applicative systems
John Backus

acm.org
Narrative Arc: Lambda Somethingorother
David Tuner

wikipedia.org
“Can Programming Be Liberated from the von Neumann

Style? A Functional Style and Its Algebra of Programs”

ACMTuring Award (1977)

stanford.edu/class/cs242/readings/backus.pdf
“A new implementation technique for applicative languages”

Turner, D.A. (1979)

Softw: Pract. Exper., 9: 31–49. doi: 10.1002/spe.4380090105
Today:	

Add ALL theThings:

Abstract Algebra Meets Analytics

infoq.com/presentations/abstract-
algebra-analytics

Avi Bryant, Strange Loop (2013)	

• grouping doesn’t matter (associativity)	

• ordering doesn’t matter (commutativity)	

• zeros get ignored	

In other words, while partitioning data at scale
is quite difficult, you can let the math allow your
code to be flexible at scale
Avi Bryant

@avibryant
Narrative Arc: Lambda Somethingorother
Algebra for Analytics

speakerdeck.com/johnynek/
algebra-for-analytics

Oscar Boykin, Strata SC (2014)
Oscar Boykin

@posco
A + B + C + D + E + F + G + H + I + J + K + L + M + N + O + P
+
+ +
+
+ +
+
(A + B) (C + D) (E + F) (G + H) (I + J) (K + L) (M + N) (O + P)
(A + B)
+ C
+ D
+ E
+ F
+ G
+ H
+ I
+ J
+ K
+ L
+ M
+ N
+ O
+ P
• “Associativity allows parallelism in reducing” 

by letting you put the () where you want	

• “Lack of associativity increases latency exponentially”
Narrative Arc: Lambda Somethingorother
???
That, plus oh so much more math fun in store!
Narrative Arc: Lambda Somethingorother
The Prior
(past decisions)
The Evidence
(the data)
The Posterior
(current decision)
v
u
w
x
M U
Σ
VH
n r nr
=
r
m
A
z - cT x'0
x
=
b
0
I
input hidden output
Some History,

Part 2
wikipedia.org/wiki/Firefly

businessweek.com/1996/41/b349690.htm

pubs.media.mit.edu/pubs/papers/32paper.ps
• Firefly, an early commercial recommender system	

• intent: the volume of data about things is more
than any person can digest	

• leveraged similarity within a network	

• an evolution of intelligent agents into web apps	

• collect machine data about consumer interests
• people communicating with each other and 

with machines
Narrative Arc: Data Workflow Abstractions
Pattie Maes

MIT Media Lab
machine data about
cognitive social systems
Q3 1997 inflection point: four independent teams working toward
horizontal scale-out of workflows based on commodity hardware	

This effort prepared the way for huge Internet successes during

the 1997 holiday season… 	

AMZN, EBAY, Inktomi (YHOO Search), then GOOG	

MapReduce on clusters of commodity hardware and the 

Apache Hadoop open source stack emerged from this context
Narrative Arc: Data Workflow Abstractions
Amazon	

“Early Amazon: Splitting the website” – Greg Linden	

glinden.blogspot.com/2006/02/early-amazon-splitting-
website.html	

!
eBay	

“The eBay Architecture” – Randy Shoup, Dan Pritchett	

addsimplicity.com/adding_simplicity_an_engi/2006/11/
you_scaled_your.html	

addsimplicity.com.nyud.net:8080/downloads/
eBaySDForum2006-11-29.pdf	

!
Inktomi (YHOO Search)	

“Inktomi’s Wild Ride” – Erik Brewer (0:05:31 ff)	

youtu.be/E91oEn1bnXM	

!
Google	

“Underneath the Covers at Google” – Jeff Dean (0:06:54 ff)	

youtu.be/qsan-GQaeyk	

perspectives.mvdirona.com/2008/06/11/
JeffDeanOnGoogleInfrastructure.aspx	

Narrative Arc: Data Workflow Abstractions
RDBMS
SQL Query
result sets
recommenders
+
classifiers
Web Apps
customer
transactions
Algorithmic
Modeling
Logs
event
history
aggregation
dashboards
Product
Engineering
UX
Stakeholder Customers
DW ETL
Middleware
servletsmodels
Narrative Arc: Data Workflow Abstractions
RDBMS
SQL Query
result sets
recommenders
+
classifiers
Web Apps
customer
transactions
Algorithmic
Modeling
Logs
event
history
aggregation
dashboards
Product
Engineering
UX
Stakeholder Customers
DW ETL
Middleware
servletsmodels
“data products”
Narrative Arc: Data Workflow Abstractions
See extended discussion + scorecard:

www.slideshare.net/pacoid/data-workflows-
for-machine-learning-33341183
MapReduce
General Batch Processing
Pregel Giraph
Dremel Drill Tez
Impala GraphLab
Storm S4
Specialized Systems:
iterative, interactive, streaming, graph, etc.
Narrative Arc: Data Workflow Abstractions
2002
2002
MapReduce @ Google
2004
MapReduce paper
2006
Hadoop @Yahoo!
2004 2006 2008 2010 2012 2014
2014
Apache Spark top-level
2010
Spark paper
2008
Hadoop Summit
The State of Spark, and
WhereWe're Going Next	

Matei Zaharia
Spark Summit (2013)	

youtu.be/nU6vO2EJAb4
action value
RDD
RDD
RDD
transformations RDD
How about a generalized engine for distributed,
applicative systems – apps sharing code across
multiple use cases: batch, iterative, streaming, etc.
Narrative Arc: Data Workflow Abstractions
Some History,

Part 1
Lessons

from Google
Datacenter Computing	

Google has been doing datacenter computing for years, 

to address the complexities of large-scale data workflows:	

• leveraging the modern kernel: isolation in lieu of VMs	

• “most (>80%) jobs are batch jobs, but the majority 

of resources (55–80%) are allocated to service jobs”	

• mixed workloads, multi-tenancy	

• relatively high utilization rates	

• JVM FTW? not so much…	

• reality: scheduling batch is simple; 

scheduling services is hard/expensive
The Modern Kernel: Top Linux Contributors…	

arstechnica.com/information-technology/2013/09/...
“Return of the Borg”	

Return of the Borg: HowTwitter Rebuilt Google’s SecretWeapon

Cade Metz

wired.com/wiredenterprise/2013/03/google-
borg-twitter-mesos	

!
The Datacenter as a Computer: An Introduction 

to the Design ofWarehouse-Scale Machines	

Luiz André Barroso, Urs Hölzle	

research.google.com/pubs/pub35290.html	

!
!
2011 GAFS Omega

John Wilkes, et al.

youtu.be/0ZFMlO98Jkc
Google describes the technology…	

Omega: flexible, scalable schedulers for large compute clusters	

Malte Schwarzkopf,Andy Konwinski, Michael Abd-El-Malek, John Wilkes	

eurosys2013.tudos.org/wp-content/uploads/2013/paper/
Schwarzkopf.pdf
Google describes the business case…	

Taming LatencyVariability

Jeff Dean

plus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv
Commercial OS Cluster Schedulers	

!
• IBM Platform Symphony

• Microsoft Autopilot	

!


Arguably, some grid controllers 

are quite notable in-category:	

• Univa Grid Engine (formerly SGE)

• Condor	

• etc.
Emerging

at Berkeley
Beyond Hadoop	

Hadoop – an open source solution for fault-tolerant
parallel processing of batch jobs at scale, based on
commodity hardware… however, other priorities have
emerged for the analytics lifecycle:	

• apps require integration beyond Hadoop	

• multiple topologies, mixed workloads, multi-tenancy	

• significant disruptions in h/w cost/performance
curves	

• higher utilization	

• lower latency	

• highly-available, long running services	

• more than “Just JVM” – e.g., Py adoption, etc.
Just No Getting Around It	

“There's Just No Getting Around It:You're Building a Distributed System”

Mark Cavage

ACM Queue (2013-05-03)

queue.acm.org/detail.cfm?id=2482856	

key takeaways on architecture:	

• decompose the business application into discrete services on the
boundaries of fault domains, scaling, and data workload	

• make as many things as possible stateless	

• when dealing with state, deeply understand CAP, latency, throughput,
and durability requirements
“Without practical experience working on successful—and failed—systems, most engineers
take a "hopefully it works" approach and attempt to string together off-the-shelf software,
whether open source or commercial, and often are unsuccessful at building a resilient,
performant system. In reality, building a distributed system requires a methodical approach
to requirements along the boundaries of failure domains, latency, throughput, durability,
consistency, and desired SLAs for the business application at all aspects of the application.”
Mesos – open source datacenter computing	

a common substrate for cluster computing	

mesos.apache.org	

heterogenous assets in your datacenter or cloud 

made available as a homogenous set of resources	

• top-level Apache project	

• scalability to 10,000s of nodes	

• obviates the need for virtual machines	

• isolation (pluggable) for CPU, RAM, I/O, FS, etc.	

• fault-tolerant leader election based on Zookeeper	

• APIs in C++, Java/Scala, Python, Go, Erlang, Haskell	

• web UI for inspecting cluster state	

• available for Linux, OpenSolaris, Mac OSX
What are the costs of Virtualization?
benchmark	

type
OpenVZ	

improvement
mixed workloads 210%-300%
LAMP (related) 38%-200%
I/O throughput 200%-500%
response time order magnitude
more pronounced 

at higher loads
What are the costs of Single Tenancy?
0%
25%
50%
75%
100%
RAILS CPU
LOAD
MEMCACHED
CPU LOAD
0%
25%
50%
75%
100%
HADOOP CPU
LOAD
0%
25%
50%
75%
100%
t t
0%
25%
50%
75%
100%
Rails
Memcached
Hadoop
COMBINED CPU LOAD (RAILS,
MEMCACHED, HADOOP)
Arguments for Datacenter Computing	

rather than running several specialized clusters, each 

at relatively low utilization rates, instead run many 

mixed workloads 	

obvious benefits are realized in terms of:	

• scalability, elasticity, fault tolerance, performance, utilization	

• reduced equipment capex, Ops overhead, etc.	

• reduced licensing, eliminating need forVMs or potential 

vendor lock-in	

subtle benefits – arguably, more important for Enterprise IT:	

• reduced time for engineers to ramp up new services at scale	

• reduced latency between batch and services, enabling new 

high ROI use cases	

• enables Dev/Test apps to run safely on a Production cluster
Analogies and
Architecture
Prior Practice: Dedicated Servers	

• low utilization rates	

• longer time to ramp up new services
DATACENTER
Prior Practice: Virtualization	

DATACENTER PROVISIONED VMS
• even more machines to manage	

• substantial performance decrease 

due to virtualization	

• VM licensing costs
Prior Practice: Static Partitioning
STATIC PARTITIONING
• even more machines to manage	

• substantial performance decrease 

due to virtualization	

• VM licensing costs	

• failures make static partitioning 

more complex to manage
DATACENTER
MESOS
Mesos: One Large Pool of Resources	

“We wanted people to be able to program 

for the datacenter just like they program 

for their laptop."	

!
Ben Hindman
DATACENTER
!
Fault-tolerant distributed systems…	

…written in 100-300 lines of 

C++, Java/Scala, Python, Go, etc.	

…building blocks, if you will	

!
Q: required lines of network code?	

A: probably none
Mesos – architecture	

HDFS, distrib file system
Mesos, distrib kernel
meta-frameworks: Aurora, Marathon
frameworks: Spark, Storm,
MPI, Jenkins, etc.
task schedulers: Chronos, etc.
APIs: C++, JVM, Py, Go
apps: HA services, web apps, batch
jobs, scripts, etc.
Linux: libcgroup, libprocess, libev, etc.
Mesos – dynamics	

Mesos
distrib kernel
Marathon
distrib init.d
Chronos
distrib cron
distrib
frameworks
HA
services
scheduled
apps
Mesos – dynamics	

resource
offers
distributed
framework
Scheduler Executor Executor Executor
Mesos
slave
Mesos
slave
Mesos
slave
distributed
kernel
available resources
Mesos
slave
Mesos
slave
Mesos
slave
Mesos
masterMesos
master
Example: Resource Offer in a Two-Level Scheduler
mesos.apache.org/documentation/latest/mesos-architecture/
Frameworks Integrated with Mesos	

Continuous Integration:

Jenkins, GitLab	

Big Data:

Hadoop, Spark, Storm,
Kafka, Hama
Python workloads:

DPark, Exelixi
Meta-Frameworks / HA Services:

Aurora, Marathon
Orchestration:

Singularity

Distributed Cron:

Chronos, JobServer	

Data Storage:

ElasticSearch, Cassandra,

Hypertable
Containers:

Docker, Deimos, GearD
Parallel Processing:

Chapel, MPI, Torque
Looking
Ahead…
Quasar+Mesos @ Stanford, Twitter, etc.…	

Quasar: Resource-Efficient and QoS-Aware Cluster Management

Christina Delimitrou, Christos Kozyrakis

stanford.edu/~cdel/2014.asplos.quasar.pdf
Quasar+Mesos @ Stanford, Twitter, etc.…	

Improving Resource Efficiency with Apache Mesos

Christina Delimitrou

youtu.be/YpmElyi94AA
Quasar+Mesos @ Stanford, Twitter, etc.…	

Consider that for datacenter computing at scale, a surge in 

workloads implies:	

• large cap-ex investment, long lead-time to build	

• utilities cannot supply the power requirements	

Even for large players that achieve 2x beyond typical industry DC
util rates, those factors become show-stoppers. Even so, high rates
of over-provisioning are typical, so there’s much room to improve.	

Experiences with Quasar+Mesos showed:	

• 88% apps get >95% performance	

• ~10% overprovisioning instead of 500%	

• up to 70% cluster util at steady state	

• 23% shorter scenario completion
Because…

Use Cases
Production Deployments (public)
Built-in /

bare metal
Hypervisors
Solaris Zones
Linux CGroups
Opposite Ends of the Spectrum, One Common Substrate
Opposite Ends of the Spectrum, One Common Substrate	

Request /

Response
Batch
Case Study: Twitter (bare metal / on premise)	

“Mesos is the cornerstone of our elastic compute infrastructure – 

it’s how we build all our new services and is critical forTwitter’s

continued success at scale. It's one of the primary keys to our

data center efficiency."	

Chris Fry, SVP Engineering	

blog.twitter.com/2013/mesos-graduates-from-apache-incubation	

wired.com/gadgetlab/2013/11/qa-with-chris-fry/	

!
• key services run in production: analytics, typeahead, ads	

• Twitter engineers rely on Mesos to build all new services	

• instead of thinking about static machines, engineers think 

about resources like CPU, memory and disk	

• allows services to scale and leverage a shared pool of 

servers across datacenters efficiently	

• reduces the time between prototyping and launching
Case Study: Airbnb (fungible cloud infrastructure)	

“We think we might be pushing data science in the field of travel 

more so than anyone has ever done before… a smaller number 

of engineers can have higher impact through automation on 

Mesos."	

Mike Curtis,VP Engineering

gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data...	

• improves resource management and efficiency	

• helps advance engineering strategy of building small teams 

that can move fast	

• key to letting engineers make the most of AWS-based 

infrastructure beyond just Hadoop	

• allowed company to migrate off Elastic MapReduce	

• enables use of Hadoop along with Chronos, Spark, Storm, etc.
Case Study: eBay (continuous integration)	

eBay PaaS Team

ebaytechblog.com/2014/04/04/delivering-ebays-ci-
solution-with-apache-mesos-part-i/	

• cluster management (PaaS core framework
services) for CI 	

• integration of: OpenStack, Jenkins, Zookeeper,
Mesos, Marathon,Ansible
In eBay’s existing CI model, each developer gets a personal CI/Jenkins Master
instance.This Jenkins instance runs within a dedicatedVM, and over time the
result has beenVM sprawl and poor resource utilization.We started looking at
solutions to maximize our resource utilization and reduce theVM footprint while
still preserving the individual CI instance model.After much deliberation, we
chose Apache Mesos for a POC.This post shares the journey of how we
approached this challenge and accomplished our goal.
Case Study: HubSpot (cluster management)	

Tom Petr

youtu.be/ROn14csiikw	

mesosphere.io/resources/mesos-case-study-hubspot/	

• 500 deployable objects; 100 deploys/day to production; 90
engineers; 3 devops on Mesos cluster	

• “Our QA cluster is now a fixed $10K/month — that used to
fluctuate”
DIY
!
!
http://elastic.mesosphere.io
!
http://mesosphere.io/learn	

!
Summary

Question
Given the points about Part 3, Part 2, Part 1…
Given the history from Church and Curry 

to BDAS and Twitter OSS… Given the needs,
e.g., IoT preferably not boiling the oceans…	

Why do we still see proto-legacy systems like
Tez? Or, for that matter, why do we find notable
experts stating that “Hadoop is an OS” ?	

It’s time to set the legacy of YHOO circa 2009 

aside, to step up to contemporary challenges with
better understanding of the underlying math and 

CS theory => solving business use cases at scale	

To paraphrase authorWilliam Gibson, the future is
already here – it’s just not very evenly distributed, 

nor is it google-able
Summary Question:
IoT Data Rates:
???
ありがとう

ございました
monthly newsletter for updates, 

events, conf summaries, etc.:
liber118.com/pxn/
Enterprise Data Workflows with Cascading
O’Reilly, 2013
shop.oreilly.com/product/0636920028536.do
Just Enough Math
O’Reilly, 2014
oreilly.com/go/enough_math/

preview: youtu.be/TQ58cWgdCpA
Spark Summit

SF, Jun 30 15% code: Paco2014

spark-summit.org/2014
OSCON 2014

PDX, Jul 20 20% code: PACOID

oscon.com/oscon2014/
#MesosCon

Chicago, Aug 21

events.linuxfoundation.org/events/mesoscon
Strata NYC + Hadoop World

NYC, Oct 15

strataconf.com/stratany2014
Data Day Texas

Austin, Jan 10

datadaytexas.com
calendar:

More Related Content

What's hot

Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Northwestern 20181004 v9
Northwestern 20181004 v9Northwestern 20181004 v9
Northwestern 20181004 v9ISSIP
 
Data Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OData Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OSri Ambati
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewSri Ambati
 
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFH2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFSri Ambati
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...Sri Ambati
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"Jo-fai Chow
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleThe Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleSri Ambati
 
H2O Machine Learning and Kalman Filters for Machine Prognostics
H2O Machine Learning and Kalman Filters for Machine PrognosticsH2O Machine Learning and Kalman Filters for Machine Prognostics
H2O Machine Learning and Kalman Filters for Machine PrognosticsSri Ambati
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join SlidesSri Ambati
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Driverless AI Hands-on Focused on Machine Learning Interpretability - H2O.ai
Driverless AI Hands-on Focused on Machine Learning Interpretability - H2O.aiDriverless AI Hands-on Focused on Machine Learning Interpretability - H2O.ai
Driverless AI Hands-on Focused on Machine Learning Interpretability - H2O.aiSri Ambati
 
Increasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchIncreasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchKrist Wongsuphasawat
 
H2O.ai's Driverless AI
H2O.ai's Driverless AIH2O.ai's Driverless AI
H2O.ai's Driverless AISri Ambati
 
OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)Keiichiro Ono
 
Vertical is the New Horizontal - MinneAnalytics 2016 Sri Ambati Keynote on AI
Vertical is the New Horizontal - MinneAnalytics 2016 Sri Ambati Keynote on AIVertical is the New Horizontal - MinneAnalytics 2016 Sri Ambati Keynote on AI
Vertical is the New Horizontal - MinneAnalytics 2016 Sri Ambati Keynote on AISri Ambati
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 

What's hot (20)

Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Northwestern 20181004 v9
Northwestern 20181004 v9Northwestern 20181004 v9
Northwestern 20181004 v9
 
Data Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OData Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2O
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
 
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFH2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleThe Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt Dowle
 
H2O Machine Learning and Kalman Filters for Machine Prognostics
H2O Machine Learning and Kalman Filters for Machine PrognosticsH2O Machine Learning and Kalman Filters for Machine Prognostics
H2O Machine Learning and Kalman Filters for Machine Prognostics
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join Slides
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Driverless AI Hands-on Focused on Machine Learning Interpretability - H2O.ai
Driverless AI Hands-on Focused on Machine Learning Interpretability - H2O.aiDriverless AI Hands-on Focused on Machine Learning Interpretability - H2O.ai
Driverless AI Hands-on Focused on Machine Learning Interpretability - H2O.ai
 
Increasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchIncreasing the Impact of Visualization Research
Increasing the Impact of Visualization Research
 
H2O.ai's Driverless AI
H2O.ai's Driverless AIH2O.ai's Driverless AI
H2O.ai's Driverless AI
 
OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)
 
Vertical is the New Horizontal - MinneAnalytics 2016 Sri Ambati Keynote on AI
Vertical is the New Horizontal - MinneAnalytics 2016 Sri Ambati Keynote on AIVertical is the New Horizontal - MinneAnalytics 2016 Sri Ambati Keynote on AI
Vertical is the New Horizontal - MinneAnalytics 2016 Sri Ambati Keynote on AI
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 

Viewers also liked

CI and CD at Scale: Scaling Jenkins with Docker and Apache Mesos
CI and CD at Scale: Scaling Jenkins with Docker and Apache MesosCI and CD at Scale: Scaling Jenkins with Docker and Apache Mesos
CI and CD at Scale: Scaling Jenkins with Docker and Apache MesosCarlos Sanchez
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on MesosPaco Nathan
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunitiesJose Quesada
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?Paco Nathan
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 

Viewers also liked (20)

CI and CD at Scale: Scaling Jenkins with Docker and Apache Mesos
CI and CD at Scale: Scaling Jenkins with Docker and Apache MesosCI and CD at Scale: Scaling Jenkins with Docker and Apache Mesos
CI and CD at Scale: Scaling Jenkins with Docker and Apache Mesos
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunities
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 

Similar to Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos

Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analyticsMatthias Funke
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData SolutionsTravis Oliphant
 
Open source ai_technical_trend
Open source ai_technical_trendOpen source ai_technical_trend
Open source ai_technical_trendMario Cho
 
Ibm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 shortIbm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 shortISSIP
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software AnalyticsMargaret-Anne Storey
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Ai4 space 20180617 v6
Ai4 space 20180617 v6Ai4 space 20180617 v6
Ai4 space 20180617 v6ISSIP
 
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...Keiichiro Ono
 

Similar to Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos (20)

Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Open source ai_technical_trend
Open source ai_technical_trendOpen source ai_technical_trend
Open source ai_technical_trend
 
Ibm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 shortIbm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 short
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
NoSQL (Not Only SQL)
NoSQL (Not Only SQL)NoSQL (Not Only SQL)
NoSQL (Not Only SQL)
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Ai4 space 20180617 v6
Ai4 space 20180617 v6Ai4 space 20180617 v6
Ai4 space 20180617 v6
 
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
 

More from Paco Nathan

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryPaco Nathan
 
Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEPaco Nathan
 

More from Paco Nathan (11)

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with ML
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
 
Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICME
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos

  • 1. Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos 
 Apache Mesos NYC Meetup @Shutterstock
 2014-06-11 
 Paco Nathan 
 http://liber118.com/pxn/
 @pacoid meetup.com/Apache-Mesos-NYC-Meetup/events/187583352/
  • 2. Disclaimer The following content results from research, use case analysis, industry observations, plus personal perspectives and opinions – presented by a speaker who is an independent author/consultant. The following content does not in any way represent the opinions or official messaging for any clients of Liber 118,Apache Foundation, United Nations,Area 51, S.P.E.C.T.R.E., etc. Except, perhaps, for the smarter ones who nurture an ample sense 
 of humor, which unfortunately may disqualify much of SiliconValley…
  • 3. Recent News Apache releases Mesos 0.19
 mesos.apache.org/blog/mesos-0-19-0-released/ Program announced for inaugural #MesosCon
 events.linuxfoundation.org/events/mesoscon Mesosphere takes $10.5M in funding
 techcrunch.com/2014/06/09/mesosphere-grabs-10m- in-series-a-funding-to-transform-datacenters/ Google releases part of Borg/Omega as OSS
 wired.com/2014/06/google-kubernetes/
  • 4. Recent News Apache releases Mesos 0.19 mesos.apache.org/blog/mesos-0-19-0-released/ Program announced for inaugural #MesosCon events.linuxfoundation.org/events/mesoscon Mesosphere takes $10.5M in funding techcrunch.com/2014/06/09/mesosphere-grabs-10m- in-series-a-funding-to-transform-datacenters/ Google releases part of Borg/Omega as OSS wired.com/2014/06/google-kubernetes/ seriously, can’t top that
  • 6. From Business Use Cases To Bare Metal Datacenter Computing DataWorkflow Abstractions Functional Programming Paradigm shifts can be observed at three levels of the tech stack for cluster computing. Each implies orders of magnitude in cost savings over prior best results, based on substantive changes in software engineering practices…
  • 7. From Business Use Cases To Bare Metal Datacenter Computing DataWorkflow Abstractions Functional Programming In other words, now that we have Mesos, Docker, and Spark, 
 why do we need Hadoop legacy software?
  • 8. From Business Use Cases To Bare Metal Datacenter Computing DataWorkflow Abstractions Functional Programming hard problems? • latency • aggregation • parallelism • data rates Countdown: Augury and Omens Aside, Part 3…
  • 9. From Business Use Cases To Bare Metal Datacenter Computing DataWorkflow Abstractions Functional Programming hard problems => solutions • applicative systems • leveraging semigroup structure • lazy evaluation aka combinator graph reduction • probabilistic data structures Countdown: Augury and Omens Aside, Part 3…
  • 10. From Business Use Cases To Bare Metal Datacenter Computing DataWorkflow Abstractions Functional Programming hard problems? • process, data, and metadata in silos • BI + data modeling legacy culture • CAP theorem vs.ACID • accidental complexity • propagating schema and lineage • learning curve inertia • managing risk vs. innovation Countdown: Augury and Omens Aside, Part 2…
  • 11. From Business Use Cases To Bare Metal Datacenter Computing DataWorkflow Abstractions Functional Programming hard problems => solutions • interdisciplinary teams • generalize across batch + real-time + etc. • separation of concerns • pattern language • compiler => query planner Countdown: Augury and Omens Aside, Part 2…
  • 12. From Business Use Cases To Bare Metal Datacenter Computing DataWorkflow Abstractions Functional Programming hard problems? • commodity hardware failure rates • sched. batch is simple; sched. services is expensive • no getting around it: building a distrib system • static partitioning => cost of cluster computing • monolithic controllers vs. shared state • low util rates => upsidedown in power availability Countdown: Augury and Omens Aside, Part 1…
  • 13. From Business Use Cases To Bare Metal Datacenter Computing DataWorkflow Abstractions Functional Programming hard problems => solutions • isolation • containerization • mixed workloads • data locality • service+framework architecture • predictive scheduling Countdown: Augury and Omens Aside, Part 1…
  • 16. IoT Data Rates: technologyreview.com/... Tools and techniques that served well for ad-tech will not necessarily apply for “Industrial Internet” data rates … we must retool; power requirements alone would boil the oceans
  • 18. Theory, Eight Decades Ago: Haskell Curry, known for seminal work on combinatory logic (1927) Alonzo Church, known for lambda calculus (1936) and much more! ! Both sought formal answers to the question, “What can be computed?” Narrative Arc: Lambda Somethingorother Haskell Curry
 haskell.org Alonso Church
 wikipedia.org
  • 19. Praxis, Four Decades Ago: Leveraging lambda calculus, combinators, etc., to increase parallelism of apps as applicative systems John Backus
 acm.org Narrative Arc: Lambda Somethingorother David Tuner
 wikipedia.org “Can Programming Be Liberated from the von Neumann
 Style? A Functional Style and Its Algebra of Programs”
 ACMTuring Award (1977)
 stanford.edu/class/cs242/readings/backus.pdf “A new implementation technique for applicative languages”
 Turner, D.A. (1979)
 Softw: Pract. Exper., 9: 31–49. doi: 10.1002/spe.4380090105
  • 20. Today: Add ALL theThings:
 Abstract Algebra Meets Analytics
 infoq.com/presentations/abstract- algebra-analytics
 Avi Bryant, Strange Loop (2013) • grouping doesn’t matter (associativity) • ordering doesn’t matter (commutativity) • zeros get ignored In other words, while partitioning data at scale is quite difficult, you can let the math allow your code to be flexible at scale Avi Bryant
 @avibryant Narrative Arc: Lambda Somethingorother
  • 21. Algebra for Analytics
 speakerdeck.com/johnynek/ algebra-for-analytics
 Oscar Boykin, Strata SC (2014) Oscar Boykin
 @posco A + B + C + D + E + F + G + H + I + J + K + L + M + N + O + P + + + + + + + (A + B) (C + D) (E + F) (G + H) (I + J) (K + L) (M + N) (O + P) (A + B) + C + D + E + F + G + H + I + J + K + L + M + N + O + P • “Associativity allows parallelism in reducing” 
 by letting you put the () where you want • “Lack of associativity increases latency exponentially” Narrative Arc: Lambda Somethingorother ???
  • 22. That, plus oh so much more math fun in store! Narrative Arc: Lambda Somethingorother The Prior (past decisions) The Evidence (the data) The Posterior (current decision) v u w x M U Σ VH n r nr = r m A z - cT x'0 x = b 0 I input hidden output
  • 24. wikipedia.org/wiki/Firefly
 businessweek.com/1996/41/b349690.htm
 pubs.media.mit.edu/pubs/papers/32paper.ps • Firefly, an early commercial recommender system • intent: the volume of data about things is more than any person can digest • leveraged similarity within a network • an evolution of intelligent agents into web apps • collect machine data about consumer interests • people communicating with each other and 
 with machines Narrative Arc: Data Workflow Abstractions Pattie Maes
 MIT Media Lab machine data about cognitive social systems
  • 25. Q3 1997 inflection point: four independent teams working toward horizontal scale-out of workflows based on commodity hardware This effort prepared the way for huge Internet successes during
 the 1997 holiday season… AMZN, EBAY, Inktomi (YHOO Search), then GOOG MapReduce on clusters of commodity hardware and the 
 Apache Hadoop open source stack emerged from this context Narrative Arc: Data Workflow Abstractions
  • 26. Amazon “Early Amazon: Splitting the website” – Greg Linden glinden.blogspot.com/2006/02/early-amazon-splitting- website.html ! eBay “The eBay Architecture” – Randy Shoup, Dan Pritchett addsimplicity.com/adding_simplicity_an_engi/2006/11/ you_scaled_your.html addsimplicity.com.nyud.net:8080/downloads/ eBaySDForum2006-11-29.pdf ! Inktomi (YHOO Search) “Inktomi’s Wild Ride” – Erik Brewer (0:05:31 ff) youtu.be/E91oEn1bnXM ! Google “Underneath the Covers at Google” – Jeff Dean (0:06:54 ff) youtu.be/qsan-GQaeyk perspectives.mvdirona.com/2008/06/11/ JeffDeanOnGoogleInfrastructure.aspx Narrative Arc: Data Workflow Abstractions
  • 27. RDBMS SQL Query result sets recommenders + classifiers Web Apps customer transactions Algorithmic Modeling Logs event history aggregation dashboards Product Engineering UX Stakeholder Customers DW ETL Middleware servletsmodels Narrative Arc: Data Workflow Abstractions
  • 28. RDBMS SQL Query result sets recommenders + classifiers Web Apps customer transactions Algorithmic Modeling Logs event history aggregation dashboards Product Engineering UX Stakeholder Customers DW ETL Middleware servletsmodels “data products” Narrative Arc: Data Workflow Abstractions
  • 29. See extended discussion + scorecard:
 www.slideshare.net/pacoid/data-workflows- for-machine-learning-33341183
  • 30. MapReduce General Batch Processing Pregel Giraph Dremel Drill Tez Impala GraphLab Storm S4 Specialized Systems: iterative, interactive, streaming, graph, etc. Narrative Arc: Data Workflow Abstractions
  • 31. 2002 2002 MapReduce @ Google 2004 MapReduce paper 2006 Hadoop @Yahoo! 2004 2006 2008 2010 2012 2014 2014 Apache Spark top-level 2010 Spark paper 2008 Hadoop Summit The State of Spark, and WhereWe're Going Next Matei Zaharia Spark Summit (2013) youtu.be/nU6vO2EJAb4 action value RDD RDD RDD transformations RDD How about a generalized engine for distributed, applicative systems – apps sharing code across multiple use cases: batch, iterative, streaming, etc. Narrative Arc: Data Workflow Abstractions
  • 34. Datacenter Computing Google has been doing datacenter computing for years, 
 to address the complexities of large-scale data workflows: • leveraging the modern kernel: isolation in lieu of VMs • “most (>80%) jobs are batch jobs, but the majority 
 of resources (55–80%) are allocated to service jobs” • mixed workloads, multi-tenancy • relatively high utilization rates • JVM FTW? not so much… • reality: scheduling batch is simple; 
 scheduling services is hard/expensive
  • 35. The Modern Kernel: Top Linux Contributors… arstechnica.com/information-technology/2013/09/...
  • 36. “Return of the Borg” Return of the Borg: HowTwitter Rebuilt Google’s SecretWeapon
 Cade Metz
 wired.com/wiredenterprise/2013/03/google- borg-twitter-mesos ! The Datacenter as a Computer: An Introduction 
 to the Design ofWarehouse-Scale Machines Luiz André Barroso, Urs Hölzle research.google.com/pubs/pub35290.html ! ! 2011 GAFS Omega
 John Wilkes, et al.
 youtu.be/0ZFMlO98Jkc
  • 37. Google describes the technology… Omega: flexible, scalable schedulers for large compute clusters Malte Schwarzkopf,Andy Konwinski, Michael Abd-El-Malek, John Wilkes eurosys2013.tudos.org/wp-content/uploads/2013/paper/ Schwarzkopf.pdf
  • 38. Google describes the business case… Taming LatencyVariability
 Jeff Dean
 plus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv
  • 39. Commercial OS Cluster Schedulers ! • IBM Platform Symphony
 • Microsoft Autopilot ! 
 Arguably, some grid controllers 
 are quite notable in-category: • Univa Grid Engine (formerly SGE)
 • Condor • etc.
  • 41. Beyond Hadoop Hadoop – an open source solution for fault-tolerant parallel processing of batch jobs at scale, based on commodity hardware… however, other priorities have emerged for the analytics lifecycle: • apps require integration beyond Hadoop • multiple topologies, mixed workloads, multi-tenancy • significant disruptions in h/w cost/performance curves • higher utilization • lower latency • highly-available, long running services • more than “Just JVM” – e.g., Py adoption, etc.
  • 42. Just No Getting Around It “There's Just No Getting Around It:You're Building a Distributed System”
 Mark Cavage
 ACM Queue (2013-05-03)
 queue.acm.org/detail.cfm?id=2482856 key takeaways on architecture: • decompose the business application into discrete services on the boundaries of fault domains, scaling, and data workload • make as many things as possible stateless • when dealing with state, deeply understand CAP, latency, throughput, and durability requirements “Without practical experience working on successful—and failed—systems, most engineers take a "hopefully it works" approach and attempt to string together off-the-shelf software, whether open source or commercial, and often are unsuccessful at building a resilient, performant system. In reality, building a distributed system requires a methodical approach to requirements along the boundaries of failure domains, latency, throughput, durability, consistency, and desired SLAs for the business application at all aspects of the application.”
  • 43.
  • 44. Mesos – open source datacenter computing a common substrate for cluster computing mesos.apache.org heterogenous assets in your datacenter or cloud 
 made available as a homogenous set of resources • top-level Apache project • scalability to 10,000s of nodes • obviates the need for virtual machines • isolation (pluggable) for CPU, RAM, I/O, FS, etc. • fault-tolerant leader election based on Zookeeper • APIs in C++, Java/Scala, Python, Go, Erlang, Haskell • web UI for inspecting cluster state • available for Linux, OpenSolaris, Mac OSX
  • 45. What are the costs of Virtualization? benchmark type OpenVZ improvement mixed workloads 210%-300% LAMP (related) 38%-200% I/O throughput 200%-500% response time order magnitude more pronounced 
 at higher loads
  • 46. What are the costs of Single Tenancy? 0% 25% 50% 75% 100% RAILS CPU LOAD MEMCACHED CPU LOAD 0% 25% 50% 75% 100% HADOOP CPU LOAD 0% 25% 50% 75% 100% t t 0% 25% 50% 75% 100% Rails Memcached Hadoop COMBINED CPU LOAD (RAILS, MEMCACHED, HADOOP)
  • 47. Arguments for Datacenter Computing rather than running several specialized clusters, each 
 at relatively low utilization rates, instead run many 
 mixed workloads obvious benefits are realized in terms of: • scalability, elasticity, fault tolerance, performance, utilization • reduced equipment capex, Ops overhead, etc. • reduced licensing, eliminating need forVMs or potential 
 vendor lock-in subtle benefits – arguably, more important for Enterprise IT: • reduced time for engineers to ramp up new services at scale • reduced latency between batch and services, enabling new 
 high ROI use cases • enables Dev/Test apps to run safely on a Production cluster
  • 49. Prior Practice: Dedicated Servers • low utilization rates • longer time to ramp up new services DATACENTER
  • 50. Prior Practice: Virtualization DATACENTER PROVISIONED VMS • even more machines to manage • substantial performance decrease 
 due to virtualization • VM licensing costs
  • 51. Prior Practice: Static Partitioning STATIC PARTITIONING • even more machines to manage • substantial performance decrease 
 due to virtualization • VM licensing costs • failures make static partitioning 
 more complex to manage DATACENTER
  • 52. MESOS Mesos: One Large Pool of Resources “We wanted people to be able to program 
 for the datacenter just like they program 
 for their laptop." ! Ben Hindman DATACENTER
  • 53. ! Fault-tolerant distributed systems… …written in 100-300 lines of 
 C++, Java/Scala, Python, Go, etc. …building blocks, if you will ! Q: required lines of network code? A: probably none
  • 54. Mesos – architecture HDFS, distrib file system Mesos, distrib kernel meta-frameworks: Aurora, Marathon frameworks: Spark, Storm, MPI, Jenkins, etc. task schedulers: Chronos, etc. APIs: C++, JVM, Py, Go apps: HA services, web apps, batch jobs, scripts, etc. Linux: libcgroup, libprocess, libev, etc.
  • 55. Mesos – dynamics Mesos distrib kernel Marathon distrib init.d Chronos distrib cron distrib frameworks HA services scheduled apps
  • 56. Mesos – dynamics resource offers distributed framework Scheduler Executor Executor Executor Mesos slave Mesos slave Mesos slave distributed kernel available resources Mesos slave Mesos slave Mesos slave Mesos masterMesos master
  • 57. Example: Resource Offer in a Two-Level Scheduler mesos.apache.org/documentation/latest/mesos-architecture/
  • 58. Frameworks Integrated with Mesos Continuous Integration:
 Jenkins, GitLab Big Data:
 Hadoop, Spark, Storm, Kafka, Hama Python workloads:
 DPark, Exelixi Meta-Frameworks / HA Services:
 Aurora, Marathon Orchestration:
 Singularity
 Distributed Cron:
 Chronos, JobServer Data Storage:
 ElasticSearch, Cassandra,
 Hypertable Containers:
 Docker, Deimos, GearD Parallel Processing:
 Chapel, MPI, Torque
  • 60. Quasar+Mesos @ Stanford, Twitter, etc.… Quasar: Resource-Efficient and QoS-Aware Cluster Management
 Christina Delimitrou, Christos Kozyrakis
 stanford.edu/~cdel/2014.asplos.quasar.pdf
  • 61. Quasar+Mesos @ Stanford, Twitter, etc.… Improving Resource Efficiency with Apache Mesos
 Christina Delimitrou
 youtu.be/YpmElyi94AA
  • 62. Quasar+Mesos @ Stanford, Twitter, etc.… Consider that for datacenter computing at scale, a surge in 
 workloads implies: • large cap-ex investment, long lead-time to build • utilities cannot supply the power requirements Even for large players that achieve 2x beyond typical industry DC util rates, those factors become show-stoppers. Even so, high rates of over-provisioning are typical, so there’s much room to improve. Experiences with Quasar+Mesos showed: • 88% apps get >95% performance • ~10% overprovisioning instead of 500% • up to 70% cluster util at steady state • 23% shorter scenario completion
  • 65. Built-in /
 bare metal Hypervisors Solaris Zones Linux CGroups Opposite Ends of the Spectrum, One Common Substrate
  • 66. Opposite Ends of the Spectrum, One Common Substrate Request /
 Response Batch
  • 67. Case Study: Twitter (bare metal / on premise) “Mesos is the cornerstone of our elastic compute infrastructure – 
 it’s how we build all our new services and is critical forTwitter’s
 continued success at scale. It's one of the primary keys to our
 data center efficiency." Chris Fry, SVP Engineering blog.twitter.com/2013/mesos-graduates-from-apache-incubation wired.com/gadgetlab/2013/11/qa-with-chris-fry/ ! • key services run in production: analytics, typeahead, ads • Twitter engineers rely on Mesos to build all new services • instead of thinking about static machines, engineers think 
 about resources like CPU, memory and disk • allows services to scale and leverage a shared pool of 
 servers across datacenters efficiently • reduces the time between prototyping and launching
  • 68. Case Study: Airbnb (fungible cloud infrastructure) “We think we might be pushing data science in the field of travel 
 more so than anyone has ever done before… a smaller number 
 of engineers can have higher impact through automation on 
 Mesos." Mike Curtis,VP Engineering
 gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data... • improves resource management and efficiency • helps advance engineering strategy of building small teams 
 that can move fast • key to letting engineers make the most of AWS-based 
 infrastructure beyond just Hadoop • allowed company to migrate off Elastic MapReduce • enables use of Hadoop along with Chronos, Spark, Storm, etc.
  • 69. Case Study: eBay (continuous integration) eBay PaaS Team
 ebaytechblog.com/2014/04/04/delivering-ebays-ci- solution-with-apache-mesos-part-i/ • cluster management (PaaS core framework services) for CI • integration of: OpenStack, Jenkins, Zookeeper, Mesos, Marathon,Ansible In eBay’s existing CI model, each developer gets a personal CI/Jenkins Master instance.This Jenkins instance runs within a dedicatedVM, and over time the result has beenVM sprawl and poor resource utilization.We started looking at solutions to maximize our resource utilization and reduce theVM footprint while still preserving the individual CI instance model.After much deliberation, we chose Apache Mesos for a POC.This post shares the journey of how we approached this challenge and accomplished our goal.
  • 70. Case Study: HubSpot (cluster management) Tom Petr
 youtu.be/ROn14csiikw mesosphere.io/resources/mesos-case-study-hubspot/ • 500 deployable objects; 100 deploys/day to production; 90 engineers; 3 devops on Mesos cluster • “Our QA cluster is now a fixed $10K/month — that used to fluctuate”
  • 71. DIY
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 83. Given the points about Part 3, Part 2, Part 1… Given the history from Church and Curry 
 to BDAS and Twitter OSS… Given the needs, e.g., IoT preferably not boiling the oceans… Why do we still see proto-legacy systems like Tez? Or, for that matter, why do we find notable experts stating that “Hadoop is an OS” ? It’s time to set the legacy of YHOO circa 2009 
 aside, to step up to contemporary challenges with better understanding of the underlying math and 
 CS theory => solving business use cases at scale To paraphrase authorWilliam Gibson, the future is already here – it’s just not very evenly distributed, 
 nor is it google-able Summary Question:
  • 86. monthly newsletter for updates, 
 events, conf summaries, etc.: liber118.com/pxn/ Enterprise Data Workflows with Cascading O’Reilly, 2013 shop.oreilly.com/product/0636920028536.do Just Enough Math O’Reilly, 2014 oreilly.com/go/enough_math/
 preview: youtu.be/TQ58cWgdCpA
  • 87. Spark Summit
 SF, Jun 30 15% code: Paco2014
 spark-summit.org/2014 OSCON 2014
 PDX, Jul 20 20% code: PACOID
 oscon.com/oscon2014/ #MesosCon
 Chicago, Aug 21
 events.linuxfoundation.org/events/mesoscon Strata NYC + Hadoop World
 NYC, Oct 15
 strataconf.com/stratany2014 Data Day Texas
 Austin, Jan 10
 datadaytexas.com calendar: