Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015 Autodesk
Scalable Eventing Over Mesos
Olivier Paugam
SW Architect / Autodesk Cloud
Big Data Montreal
© 2015 Autodesk
Goals & Challenges
© 2015 Autodesk 3
The Mission
 General purpose, high-volume eventing system.
 Batch oriented I/O.
 Target audience: 20+...
© 2015 Autodesk 4
A Few Use Cases
 Application log pre-aggregation transport.
 Metering updates from our Platform API.
...
© 2015 Autodesk 5
Our 5 Technical Commandments
 Must use Docker.
 Must run on Apache Mesos + Marathon.
 Must leverage A...
© 2015 Autodesk
Introducing Ochopod
© 2015 Autodesk 7
Ochopod
 100% Open Source !
 Novel container-centric orchestration model.
 Mix between a discovery & ...
© 2015 Autodesk 8
The Stack
© 2015 Autodesk 9
How Does It Work ?
 Source of truth : Zookeeper.
 Each container belong to a “cluster”.
 A “leader” i...
© 2015 Autodesk 10
 Proxy approach.
 100% Mesos+Ochopod.
 Used for CI/CD as well.
 Proxy running on an edge node.
 Co...
© 2015 Autodesk
Building verticals at scale
© 2015 Autodesk 12
Architecture
© 2015 Autodesk 13
Phone Switch & State Machines
© 2015 Autodesk 14
Going Global
© 2015 Autodesk 15
Shooting For Higher Scales
 Unit of scale == 1 Kafka topic.
 Keep the pressure on each broker constan...
© 2015 Autodesk
Conclusion
© 2015 Autodesk 17
 6 man/month effort.
 6 open-sourced 3rd-parties (Kafka, Zookeeper, RabbitMQ...).
 3 deployments ove...
© 2015 Autodesk 18
Issues & Next Steps
 What does one do if a slave goes offline ?
 Need for better placement constraint...
© 2015 Autodesk 19
https://github.com/autodesk-cloud/ochopod
Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other co...
Upcoming SlideShare
Loading in …5
×

Scalable Eventing Over Apache Mesos

7,730 views

Published on

Presentation given at Big Data Montreal #40 with a focus on our work around Apache Mesos, Docker and Ochopod !

Published in: Engineering
  • Be the first to comment

Scalable Eventing Over Apache Mesos

  1. 1. © 2015 Autodesk Scalable Eventing Over Mesos Olivier Paugam SW Architect / Autodesk Cloud Big Data Montreal
  2. 2. © 2015 Autodesk Goals & Challenges
  3. 3. © 2015 Autodesk 3 The Mission  General purpose, high-volume eventing system.  Batch oriented I/O.  Target audience: 20+ teams within Autodesk.  Must be active/active across multiple data-centers.  Must be able to scale at any time.  Must be able to absorb traffic spikes.  Must be accessible via a single API.  Must be secure (transport + data at rest).  Must not be tied to a specific provider.
  4. 4. © 2015 Autodesk 4 A Few Use Cases  Application log pre-aggregation transport.  Metering updates from our Platform API.  Analytics transport prior to indexing.  Event transport for Search, Activity & other services.  Identity updates down to our IT systems.  Editing increments for large 3D model collaboration.
  5. 5. © 2015 Autodesk 5 Our 5 Technical Commandments  Must use Docker.  Must run on Apache Mesos + Marathon.  Must leverage Apache Kafka.  Must be as autonomous & low-maintenance as possible.  No automation scripting allowed (Chef, Salt, Ansible…).
  6. 6. © 2015 Autodesk Introducing Ochopod
  7. 7. © 2015 Autodesk 7 Ochopod  100% Open Source !  Novel container-centric orchestration model.  Mix between a discovery & an init system.  No need for dedicated frameworks.  Direct Peer To Peer HTTP I/O.  Can run on Mesos, K8S, etc.  Relies on ZK.
  8. 8. © 2015 Autodesk 8 The Stack
  9. 9. © 2015 Autodesk 9 How Does It Work ?  Source of truth : Zookeeper.  Each container belong to a “cluster”.  A “leader” is picked per cluster.  Leaders manage their peers via HTTP I/O.  Settings passed via environment vars.  Eventually consistent.
  10. 10. © 2015 Autodesk 10  Proxy approach.  100% Mesos+Ochopod.  Used for CI/CD as well.  Proxy running on an edge node.  Could easily factor OAUTH2 in.  Access via direct HTTPS or using a CLI.  Toolkit to deploy, list, query, kill & update containers A quick DYI Mini-PaaS
  11. 11. © 2015 Autodesk Building verticals at scale
  12. 12. © 2015 Autodesk 12 Architecture
  13. 13. © 2015 Autodesk 13 Phone Switch & State Machines
  14. 14. © 2015 Autodesk 14 Going Global
  15. 15. © 2015 Autodesk 15 Shooting For Higher Scales  Unit of scale == 1 Kafka topic.  Keep the pressure on each broker constant.  Every sub-system can be scaled independently.  API protocol designed to account for nodes shutting down.  Mix of horizontal scaling & sharding via RabbitMQ.  Checkpoints + idempotency + state-machines.  Ochopod is critical to enable scaling.
  16. 16. © 2015 Autodesk Conclusion
  17. 17. © 2015 Autodesk 17  6 man/month effort.  6 open-sourced 3rd-parties (Kafka, Zookeeper, RabbitMQ...).  3 deployments over 2 data-centers, using DCOS.  36+ c3.2xlarge CoreOS slaves on AWS/EC2 + VPC.  ~20 Kafka brokers, ~40 Play! Nodes.  ~150 live containers.  ~500 live streaming sessions at any time.  ~30M events / ~65M API hits a day.  < 5 minor incidents, no major incident to date.  1 single dev/op (!).
  18. 18. © 2015 Autodesk 18 Issues & Next Steps  What does one do if a slave goes offline ?  Need for better placement constraints.  Need for better storage schemes.  The K8S “pod” concept is cool after all...  We could invest into a dedicated Mesos framework.  What about Spot instances ?
  19. 19. © 2015 Autodesk 19 https://github.com/autodesk-cloud/ochopod
  20. 20. Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document. © 2015 Autodesk

×