Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mesos at OpenTable
Pablo Delgado
Senior Data Engineer
OpenTable
@pablete
MesosCon 2015, Seattle, WA
• Over 32,000 restaurants worldwide
• more than 760 million diners seated since 1998, representing
more than $30 billion s...
At OpenTable
we aim to power
the best dining
experiences!
Service Oriented Architecture
5
From monolith to microservices
6
• Mesos: A Platform for Fine-Grained Resource Sharing in
the Data Center

PAPER: http://mesos.berkeley.edu/mesos_tech_re...
7
Apache Mesos
• Mesos slaves connect to
masters and offer resources
like CPU, disk, and memory.
• Masters take those offe...
8
Zookeeper
Netflix’s Exhibitor
Mesos Master
Zookeeper
Netflix’s Exhibitor
Standby Master
Zookeeper
Netflix’s Exhibitor
St...
Hubspot’s Singularity
Scheduler
10
• Native Docker Support
• JSON REST API and Java Client
• Fully featured web application (replaces and improves Mesos M...
11
Hubspot’s Singularity
Process types:

Web Services 

Workers

Scheduled (CRON-type) Jobs

On-Demand Processes
Slave pla...
Linux Containers
13
Docker
• Immutability
• Portability
• Isolation
Service Discovery
15
Services no longer live in a well known address/port, so we needed a registry
or dynamic way to find them. Also it had ...
16
Zookeeper Zookeeper Zookeeper
availability zone 2bavailability zone 2a availability zone 2c
Service Discovery
Discovery...
17
Service Discovery API
FrontDoor
19
FrontDoor
• Route external traffic to
internal services
• Simple Discovery-aware
proxy
• Dynamic configuration
• Develo...
Monitoring
21
Monitoring
https://github.com/opentable/mesos_stats
• Finds your service
name by parsing
the task names.
• Includes gra...
All together
23
Github
Continuous
Integration
Singularity
Discovery
Master
Zookeeper
Master
Zookeeper
Master
Zookeeper
Slave
Docker
Sla...
24
Github
Continuous
Integration
Singularity
Docker
Registry
Developer’s Concerns
• Initialize projects with Continuous
in...
25
Singularity
Discovery
Master
Zookeeper
Master
Zookeeper
Master
Zookeeper
Slave
Docker
Slave
Docker
Slave
Docker
Slave
D...
26
Stateless Mesos Cluster
Datastores
Caches
Stateless Simplicity
Other
Mysql, PostgreSQL,
MongoDB
Redis, Memcached
Zookee...
27
US Data Center EU Data Center
AWS us-west-2 AWS eu-west-1 AWS us-west-2
PROD PROD
PROD PROD
QA
DATA
PROCESSING
28
US Data Center EU Data Center
AWS us-west-2 AWS eu-west-1 AWS us-west-2
PROD PROD
PROD PROD
QA
DATA
PROCESSING
Kafka Ka...
Data Processing
30
Distributed Multitenant Data Processing
31
Spark’s Approach
• Generalize MapReduce in order to support new apps in the same engine
• General DAGs and Data Sharing...
32
Spark RDDs
Resilient Distributed Datasets (or RDD) are fault-tolerant distributed collections
They exists in the form o...
33
HadoopRDD(
path(=(hdfs://...(
FilteredRDD(
func(=(_.contains(…)(
shouldCache(=(true(
file:%
errors:%
Partition.level%vie...
34
Scheduling Process
rdd1.join(rdd2)
.groupBy(…)
.filter(…)
RDD#Objects#
build#operator#DAG!
agnos&c(to(
operators!(
does...
35
Scheduling Process
rdd1.join(rdd2)
.groupBy(…)
.filter(…)
RDD#Objects#
build#operator#DAG!
agnos&c(to(
operators!(
does...
36
Scheduling Process
rdd1.join(rdd2)
.groupBy(…)
.filter(…)
RDD#Objects#
build#operator#DAG!
agnos&c(to(
operators!(
does...
37
Scheduling Process
rdd1.join(rdd2)
.groupBy(…)
.filter(…)
RDD#Objects#
build#operator#DAG!
agnos&c(to(
operators!(
does...
38
Alternating Least Squares (ALS) in MLlib
39
Driver Program
SparkContext
Cluster Manager
Worker Node
Executor
Task Task
Cache
Worker Node
Executor
Task Task
Cache
R...
40
Driver Program
SparkContext
Cluster Manager
Worker Node


Executor
Task Task
Cache
Mesos Master
Mesos Executor
Worker N...
41
Driver Program
SparkContext
Cluster Manager
Worker Node
Task
Mesos Master
Mesos Executor
Worker Node
Mesos Executor
Tas...
Pull Requests (maybe merged)
[SPARK-7373] Add docker support for launching drivers 

in mesos cluster mode.
[SPARK-5338] A...
43
Memory-centric distributed
storage system (cache)
Distributed file system
General engine for large-scale data
processin...
44
Other frameworks
• KAFKA on mesos https://github.com/mesos/kafka
• SAMZA on mesos https://github.com/banno/samza-mesos
...
45
Kafka
User
Activity
backups
Query/Processing Layer
Spark SQL
JSON
Data Products
ETL
Spark MLlib
Spark Streaming
46
{“userId”:"xxxxxxxx","event":"personalizer_search","query_longitude":-77.16816,"latitude":38.918159,"req_attribute_tag_...
47
Matrix Factorization. Spark MLlib
• Collaborative Filtering
• Topic Modeling
• Restaurant Demand Analysis
48
nigiri
sashimi
gari
maki
roku
rolls
roll
godzilla
chirashi
robata
zushi
omakase
yellowtail
unagi
samba
toro
gyoza
aburi...
49
Sushi of Gari,
Gari Columbus, NYC
Masaki Sushi
Chicago
Sansei Seafood Restaurant &
Sushi Bar, Maui
A restaurant like yo...
keep in touch
@pablete
Upcoming SlideShare
Loading in …5
×

Mesos at OpenTable

823 views

Published on

Presentation for Mesoscon 2015, Seattle wa
http://sched.co/3BZR

Published in: Data & Analytics
  • Be the first to comment

Mesos at OpenTable

  1. 1. Mesos at OpenTable Pablo Delgado Senior Data Engineer OpenTable @pablete MesosCon 2015, Seattle, WA
  2. 2. • Over 32,000 restaurants worldwide • more than 760 million diners seated since 1998, representing more than $30 billion spent at partner restaurants • Over 16 million diners seated every month • OpenTable has seated over 190 million diners via a mobile device. Almost 50% of our reservations are made via a mobile device • OpenTable currently has presence in US, Canada, Mexico, UK, Germany and Japan • OpenTable has nearly 600 partners including Facebook, Google, TripAdvisor, Urbanspoon, Yahoo and Zagat. 2 OpenTable the world’s leading provider of online restaurant reservations
  3. 3. At OpenTable we aim to power the best dining experiences!
  4. 4. Service Oriented Architecture
  5. 5. 5 From monolith to microservices
  6. 6. 6 • Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
 PAPER: http://mesos.berkeley.edu/mesos_tech_report.pdf • Omega: flexible, scalable schedulers for large compute clusters
 PAPER: http://research.google.com/pubs/pub41684.html Apache Mesos
  7. 7. 7 Apache Mesos • Mesos slaves connect to masters and offer resources like CPU, disk, and memory. • Masters take those offers and make decisions about resource allocation using frameworks like Singularity. • Frameworks in turn choose to use resource offers, and run tasks on slaves.
  8. 8. 8 Zookeeper Netflix’s Exhibitor Mesos Master Zookeeper Netflix’s Exhibitor Standby Master Zookeeper Netflix’s Exhibitor Standby Master Docker Mesos Slave Docker Mesos Slave Docker Mesos Slave Docker Mesos Slave Docker Mesos Slave Docker Mesos Slave availability zone 2bavailability zone 2a availability zone 2c Apache Mesos
  9. 9. Hubspot’s Singularity Scheduler
  10. 10. 10 • Native Docker Support • JSON REST API and Java Client • Fully featured web application (replaces and improves Mesos Master UI) • Deployments, automatic rollbacks, and healthchecks • Configurable email alerts to service owners Singularity Features
  11. 11. 11 Hubspot’s Singularity Process types:
 Web Services 
 Workers
 Scheduled (CRON-type) Jobs
 On-Demand Processes Slave placement:
 GREEDY
 SEPARATE_BY_DEPLOY
 SEPARATE_BY_REQUEST
 OPTIMISTIC Executors:
 Mesos executor
 Singularity executor
 Docker executor
  12. 12. Linux Containers
  13. 13. 13 Docker • Immutability • Portability • Isolation
  14. 14. Service Discovery
  15. 15. 15 Services no longer live in a well known address/port, so we needed a registry or dynamic way to find them. Also it had to be MESOS agnostic. • Service announce their presence to the Discovery Server • Service subscribe to changes in dependencies announcement • Service un-announce on termination or timeout on crash Service Discovery
  16. 16. 16 Zookeeper Zookeeper Zookeeper availability zone 2bavailability zone 2a availability zone 2c Service Discovery Discovery Server Discovery Server Discovery Server A A A BB Announce Discover Subscribe
  17. 17. 17 Service Discovery API
  18. 18. FrontDoor
  19. 19. 19 FrontDoor • Route external traffic to internal services • Simple Discovery-aware proxy • Dynamic configuration • Developer friendly configuration via Git repo REQUEST_URI=/api/timezone* passthru timezone
  20. 20. Monitoring
  21. 21. 21 Monitoring https://github.com/opentable/mesos_stats • Finds your service name by parsing the task names. • Includes grafana dashboard • Runs inside mesos
  22. 22. All together
  23. 23. 23 Github Continuous Integration Singularity Discovery Master Zookeeper Master Zookeeper Master Zookeeper Slave Docker Slave Docker Slave Docker Slave Docker Slave Docker Slave Docker FrontDoor Docker Registry Discovery Discovery Overview
  24. 24. 24 Github Continuous Integration Singularity Docker Registry Developer’s Concerns • Initialize projects with Continuous integration template • Enable monitoring/logging of application level errors • Build project as an immutable docker image • Deploy to Mesos through singularity using a rest API
  25. 25. 25 Singularity Discovery Master Zookeeper Master Zookeeper Master Zookeeper Slave Docker Slave Docker Slave Docker Slave Docker Slave Docker Slave Docker FrontDoor Docker Registry Discovery Discovery Operational Concerns • Provide Mesos with resources • Monitor and maintain external traffic routing • Monitor and replace failing resources
  26. 26. 26 Stateless Mesos Cluster Datastores Caches Stateless Simplicity Other Mysql, PostgreSQL, MongoDB Redis, Memcached Zookeeper, Amazon S3
  27. 27. 27 US Data Center EU Data Center AWS us-west-2 AWS eu-west-1 AWS us-west-2 PROD PROD PROD PROD QA DATA PROCESSING
  28. 28. 28 US Data Center EU Data Center AWS us-west-2 AWS eu-west-1 AWS us-west-2 PROD PROD PROD PROD QA DATA PROCESSING Kafka Kafka Kafka Kafka Kafka
  29. 29. Data Processing
  30. 30. 30 Distributed Multitenant Data Processing
  31. 31. 31 Spark’s Approach • Generalize MapReduce in order to support new apps in the same engine • General DAGs and Data Sharing • Unification benefits the engine, which is more efficient, and simple for user • Handles batch, interactive and online processing • API available for Java, Scala, Python, SQL, R
  32. 32. 32 Spark RDDs Resilient Distributed Datasets (or RDD) are fault-tolerant distributed collections They exists in the form of: • Parallelized Collections • External datasets, distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, Amazon S3, etc.
  33. 33. 33 HadoopRDD( path(=(hdfs://...( FilteredRDD( func(=(_.contains(…)( shouldCache(=(true( file:% errors:% Partition.level%view:%Dataset.level%view:% Task%1%Task%2% ...% RDD Graph Dataset-level view Partition-level view file RDD errors RDD Task 1 Task 2 Task 3 Task n
  34. 34. 34 Scheduling Process rdd1.join(rdd2) .groupBy(…) .filter(…) RDD#Objects# build#operator#DAG! agnos&c(to( operators!( doesn’t(know( about(stages( DAGScheduler# split#graph#into# stages#of#tasks! submit#each# stage#as#ready# DAG# TaskScheduler# TaskSet# launch#tasks#via# cluster#manager! retry#failed#or# straggling#tasks! Cluster# manager# Worker# execute#tasks! store#and#serve# blocks! Block( manager( Threads( Task# stage# failed# Lifetime of a job. Scheduling Process
  35. 35. 35 Scheduling Process rdd1.join(rdd2) .groupBy(…) .filter(…) RDD#Objects# build#operator#DAG! agnos&c(to( operators!( doesn’t(know( about(stages( DAGScheduler# split#graph#into# stages#of#tasks! submit#each# stage#as#ready# DAG# TaskScheduler# TaskSet# launch#tasks#via# cluster#manager! retry#failed#or# straggling#tasks! Cluster# manager# Worker# execute#tasks! store#and#serve# blocks! Block( manager( Threads( Task# stage# failed# Lifetime of a job. Scheduling Process
  36. 36. 36 Scheduling Process rdd1.join(rdd2) .groupBy(…) .filter(…) RDD#Objects# build#operator#DAG! agnos&c(to( operators!( doesn’t(know( about(stages( DAGScheduler# split#graph#into# stages#of#tasks! submit#each# stage#as#ready# DAG# TaskScheduler# TaskSet# launch#tasks#via# cluster#manager! retry#failed#or# straggling#tasks! Cluster# manager# Worker# execute#tasks! store#and#serve# blocks! Block( manager( Threads( Task# stage# failed# Lifetime of a job. Scheduling Process
  37. 37. 37 Scheduling Process rdd1.join(rdd2) .groupBy(…) .filter(…) RDD#Objects# build#operator#DAG! agnos&c(to( operators!( doesn’t(know( about(stages( DAGScheduler# split#graph#into# stages#of#tasks! submit#each# stage#as#ready# DAG# TaskScheduler# TaskSet# launch#tasks#via# cluster#manager! retry#failed#or# straggling#tasks! Cluster# manager# Worker# execute#tasks! store#and#serve# blocks! Block( manager( Threads( Task# stage# failed# Lifetime of a job. Scheduling Process
  38. 38. 38 Alternating Least Squares (ALS) in MLlib
  39. 39. 39 Driver Program SparkContext Cluster Manager Worker Node Executor Task Task Cache Worker Node Executor Task Task Cache Running Spark
  40. 40. 40 Driver Program SparkContext Cluster Manager Worker Node 
 Executor Task Task Cache Mesos Master Mesos Executor Worker Node Task Task Cache Mesos Executor Framework Mesos Coarse Grained 
 Executor
  41. 41. 41 Driver Program SparkContext Cluster Manager Worker Node Task Mesos Master Mesos Executor Worker Node Mesos Executor Task Task Task 
 Executor 
 Executor 
 Executor 
 Executor Mesos Fine Grained Framework
  42. 42. Pull Requests (maybe merged) [SPARK-7373] Add docker support for launching drivers 
 in mesos cluster mode. [SPARK-5338] Add cluster mode support for Mesos [SPARK-5095] Support capping cores and launch mulitple executors in coarse mode [SPARK-6707] Mesos Scheduler should allow the user to specify constraints based on slave attributes
 
 [SPARK-6287] Add dynamic allocation to the coarse-grained Mesos scheduler
  43. 43. 43 Memory-centric distributed storage system (cache) Distributed file system General engine for large-scale data processing Kernel for the datacenter Ideal data processing stack
  44. 44. 44 Other frameworks • KAFKA on mesos https://github.com/mesos/kafka • SAMZA on mesos https://github.com/banno/samza-mesos • PHOENIX (secor on mesos) https://github.com/stealthly/phoenix • CASSANDRA on mesos https://github.com/mesosphere/cassandra-mesos We are also using: We are considering: • CHRONOS https://github.com/mesos/chronos • MARATHON https://github.com/mesosphere/marathon
  45. 45. 45 Kafka User Activity backups Query/Processing Layer Spark SQL JSON Data Products ETL Spark MLlib Spark Streaming
  46. 46. 46 {“userId”:"xxxxxxxx","event":"personalizer_search","query_longitude":-77.16816,"latitude":38.918159,"req_attribute_tag_ids": ["pizza"],"req_geo_query":"Current Location”,"sort_by":"best","longitude":-77.168156,"query_latitude":38.91816,"req_forward_minutes": 30,"req_party_size":2,"req_backward_minutes":30,"req_datetime":"2015-06-02T12:00","req_time":"12:00","res_num_results": 784,"calculated_radius":5.466253405962307,"req_date":"2015-06-02"},"type":"track","messageId":"b4f2fafc- dd4a-45e3-99ed-4b83d1e42dcd","timestamp":"2015-06-02T10:02:34.323Z"} ETL with Spark/ SparkSQL
  47. 47. 47 Matrix Factorization. Spark MLlib • Collaborative Filtering • Topic Modeling • Restaurant Demand Analysis
  48. 48. 48 nigiri sashimi gari maki roku rolls roll godzilla chirashi robata zushi omakase yellowtail unagi samba toro gyoza aburi spider starburst nakazawa shabu sasa katana sake hapa maguro tsunami raku kappo yasuda otoro seki tamari ra teppanyaki caterpillar japan shashimi hamasaku Early explorations with Word2vec: Find synonyms for “Sushi” We use Apache Spark’s Implementation of Word2Vec (skip-gram model)
  49. 49. 49 Sushi of Gari, Gari Columbus, NYC Masaki Sushi Chicago Sansei Seafood Restaurant & Sushi Bar, Maui A restaurant like your favorite one but in a different city. Find the “synonyms” of the restaurant in question, then filter by location! Akiko’s, SF San Francisco Maui Chicago New York ' Downtown upscale sushi experience with sushi bar
  50. 50. keep in touch @pablete

×