Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tales of Modern Systems: Distributed / Containerized Systems

4,855 views

Published on

manabiya.tech

Published in: Software
  • Be the first to comment

Tales of Modern Systems: Distributed / Containerized Systems

  1. 1. Tales of Modern Systems: Distributed / Containerized Systems MANABIYA.tech 2018/03/24 Satoshi Tagomori (@tagomoris) Treasure Data, Inc.
  2. 2. Satoshi Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, Woothee, ... Treasure Data, Inc.
  3. 3. Today's Session is ... 1. Patterns of Modern Systems 1. Containerized Systems 2. Distributed Systems 2. Tales of Distribution/Containerization 3. Storing Data in Modern Systems
  4. 4. Patterns of Modern Systems
  5. 5. "Modern System" ? • Many "Pattern"s in various white papers... • Distributed systems • Microservices • Containers • Serverless systems / FaaS • Common things: • Free from bare metal servers • Short process lifetime • Focusing on application deployment & Resource efficiency • Without permanent local filesystems as datastore
  6. 6. "Modern System" ? • Many "Pattern"s in various white papers... • Distributed systems • Microservices • Containers • Serverless systems / FaaS • Common things: • Free from bare metal servers • Short process lifetime • Focusing on application deployment & Resource efficiency • Without permanent local filesystems as datastore
  7. 7. Distributed Systems • Long history about distributed systems ... • on papers, as proprietary systems, etc • Recent middleware: • Distributed Storage: Cassandra, Hadoop HDFS, HBase, Riak, ... • Data Processing: Hadoop MapReduce, Spark, Presto, ... • Process Scheduler: Hadoop YARN, Mesos (, Kubernetes) • ( Service Discovery: Zookeeper, Etcd, Consul, ... ) • Distributing More and More
 Stored data → Data processor → Processing resource
  8. 8. Containerized Systems • Long history about containers too ... • FreeBSD Jail, Solaris Containers, Linux Containers, etc • Recent movement: Docker and Kubernetes • Docker: Dockerfile and diff-based image files • Kubernetes: Container orchestration • Containerize More and More
 Development → Testing → Production → ...
  9. 9. Patterns of Modern Systems • Patterns: • General purpose services: Containerized applications • Data processing platforms / Storages: Distributed systems • Common things: • Free from bare metal servers • Short process lifetime • Focusing on application deployment & Resource efficiency • Without permanent local filesystems as datastore From "Physical" infrastructure
 To "Logical" infrastructure on software
  10. 10. • Patterns: • General purpose services: Containerized applications • Data processing platforms / Storages: Distributed systems • Common things: • Free from bare metal servers • Short process lifetime • Focusing on application deployment & Resource efficiency • Without permanent local filesystems as datastore From "Physical" infrastructure
 To "Logical" infrastructure on software Patterns of Modern Systems Where is the app running on? Where can the app store data? Which process does take lead? How much resources the app can use?
  11. 11. Big Modern Issues on Modern Systems • Storage management • Bringing configuration • State management • Resource assignment • Leader election
  12. 12. Tales of Distribution / Containerization
  13. 13. 2003 20182006 2012 2013 2015 2017 Distributed Storage/Processing Various Distributed Systems Containerization / Orchestration existing container era Timeline
  14. 14. The Beginning of Hadoop (2006) • Papers from Google about distributed systems • "The Google File System" (2003) • "MapReduce: Simplified Data Processing on Large Clusters" (2004) • Hadoop 0.1.0 at 2006-04-01 • with HDFS, MapReduce • Yahoo!
 deploys 300 nodes cluster
 sorts 1.8TB on 188 nodes in 47.9hours • Many applications in various companies
 w/ side projects (Hive, Pig, HBase, ...) • Hadoop 1.0 at 2011-12-27
  15. 15. • Compile an application to tasks of "map", "shuffle" and "reduce" • Tasks will be executed on nodes separately,
 as JVM processes (containers) • Configurations are brought from master to slave at job deployment • Data are stored into HDFS • Nodes are listed on a text file • A single Master node manages everything • NameNode for HDFS • JobTracker for MapReduce Hadoop 0.1.0 as a Distributed System Container separation Non-local storage
  16. 16. Hadoop 0.1.0 as a Distributed System • Compile an application to tasks of "map", "shuffle" and "reduce" • Tasks will be executed on nodes separately,
 as JVM processes (containers) • Configurations are brought from master to slave at job deployment • Data are stored into HDFS • Nodes are listed on a text file • A single Master node manages everything • NameNode for HDFS • JobTracker for MapReduce Fixed configuration To fixed nodes Single Point Of Failures ! Container separation Non-local storage
  17. 17. 2003 20182006 2012 2013 2015 2017 Distributed Storage/Processing Various Distributed Systems Containerization / Orchestration existing container era Timeline Hadoop 0.1.0 Hadoop 2.2.0
  18. 18. Hadoop as a Distributed System Platform • MapReduce is NOT the only framework for large scale computation • DAG (directed acyclic graph) based processing (Spark, Tez) • Machine Learning • Distributed Database on DFS (HBase) • Stream processing (Storm) • Widely used middleware - More Stability & Scalability • Need to solve some serious issues • Non-flexible configurations • Fixed list of nodes • SPOFs
  19. 19. Hadoop 2.x as a Distributed System • Compile an application to "attempts" of YARN application • Attempts will be executed on nodes separately,
 as JVM processes (or Docker containers) • Configurations are brought from AppMaster to attempts at job deployment,
 or are fetched from Zookeeper • Data are stored into HDFS • Nodes can join into clusters • NameNode HA for HDFS, ResourceManager HA for YARN
 using QJM (Quorum Journal Manager) and Zookeeper Distributed Key-Value store for configuration, state management and Leader election
  20. 20. Patterns of Distributed Systems • Processing: • distribute processing to nodes efficiently • Data: • store data on distributed filesystem / database • store/fetch states/configurations from distributed key-value store • Environment Separation: • run processing in containers • High Availability: • route to a server of HA nodes using leader election
  21. 21. Patterns of Distributed Systems • Processing: • distribute processing to nodes efficiently: YARN • Data: • store data on distributed filesystem / database: HDFS, HBase • store/fetch states/configurations from distributed key-value store: Zookeeper • Environment Separation: • run processing in containers: JVM processes • High Availability: • route to a server of HA nodes using leader election: Zookeeper
  22. 22. 2003 20182006 2012 2013 2015 2017 Distributed Storage/Processing Various Distributed Systems Containerization / Orchestration existing container era Timeline Hadoop 0.1.0 Hadoop 2.2.0
  23. 23. 2003 20182006 2012 2013 2015 2017 Distributed Storage/Processing Various Distributed Systems Containerization / Orchestration existing container era Timeline Hadoop 0.1.0 Hadoop 2.2.0 Docker first release
  24. 24. Docker as a Container Builder/Runtime • Docker runtime • Linux Containers at first, then libcontainer • Docker image • Very simple Dockerfile to build a docker image • An image from some sparse files • DockerHub central repository - reusability / shared images • Become popular - for development / testing • Many issues about production:
 deployment, container configuration, resource management,
 storage, request routing, logging, ...
  25. 25. Docker as a Container Builder/Runtime • Docker runtime • Linux Containers at first, then libcontainer • Docker image • Very simple Dockerfile to build a docker image • An image from some sparse files • DockerHub central repository - reusability / shared images • Become popular - for development / testing • Many issues about production:
 deployment, container configuration, resource management,
 storage, request routing, logging, ... Orchestration
  26. 26. Patterns of Containerized Systems • Processing: • distribute processing to nodes efficiently (Deployment, Resource Management) • Data: • store data on distributed filesystem / database (Storage) • store/fetch states/configurations from distributed key-value store (Configuration) • Environment Separation: • run processing in containers: Docker • High Availability: • route to a server of HA nodes using leader election (Request Routing)
  27. 27. Many Containers Workload • Issues to be solved: • Free from bare metal servers • Short process lifetime • Focusing on application deployment & Resource efficiency • Without permanent local filesystems as datastore • Almost same issues with Distributed Systems! • We need: • Resource Manager (Resource Scheduler) • Distributed Key-Value Storage • Distributed Filesystem / Database • Leader Election + Request Routing
  28. 28. 2003 20182006 2012 2013 2015 2017 Distributed Storage/Processing Various Distributed Systems Containerization / Orchestration existing container era Timeline Docker first release
  29. 29. 2003 20182006 2012 2013 2015 2017 Distributed Storage/Processing Various Distributed Systems Containerization / Orchestration existing container era Timeline Docker first release Cloud Native Computing Foundation Kubernetes
  30. 30. Patterns of Containerized Systems • Processing: • distribute processing to nodes efficiently: Kubernetes • Data: • store data on distributed filesystem / database (Storage) • store/fetch states/configurations from distributed key-value store: Etcd • Environment Separation: • run processing in containers: Docker • High Availability: • route to a server of HA nodes using leader election: Etcd + Kubernetes
  31. 31. Patterns of Containerized Systems • Processing: • distribute processing to nodes efficiently: Kubernetes • Data: • store data on distributed filesystem / database: RDBMS + Object Storage • store/fetch states/configurations from distributed key-value store: Etcd • Environment Separation: • run processing in containers: Docker • High Availability: • route to a server of HA nodes using leader election: Etcd + Kubernetes
  32. 32. 2003 20182006 2012 2013 2015 2017 Distributed Storage/Processing Various Distributed Systems Containerization / Orchestration existing container era Timeline Docker first release Cloud Native Computing Foundation Kubernetes Docker Swarm Docker bundles Kubernetes
  33. 33. Storing Data in Modern Systems
  34. 34. Data Stored • Mind the type of data • Configurations? States of processes/nodes? • Permanent data for applications? • Mind the formant of data for applications • Large text or binary? • Customer relation, purchase record, payment or ...? • Ad bidding log, customer behavior reports or ...?
  35. 35. Data Stored • Mind the type of data • Configurations? States of processes/nodes? → Distributed Key-Value Storage • Permanent data for applications? • Mind the formant of data for applications • Large text or binary? → Distributed Filesystem • Customer relation, purchase record, payment or ...? → RDBMS • Ad bidding log, customer behavior reports or ...? → Logging Service
  36. 36. Managing Storages, or Using Services • Managing storages is extraordinary expensive • storage lifetime is much longer than processing lifetime • storage service level should be much better than processing • ... and we need some various types of storages :( • Cloud Platforms provides various services • Distributed Filesystems: Object Storages like S3, GCS, ... • Relational Databases: RDS, Cloud SQL, Aurora, Cloud Spanner, .... • Logging Service: CloudWatch Logs, Stackdriver, (Fluentd + other storages), ...
  37. 37. Or, Other New Trends?
  38. 38. Watch Trends and History: Something New May Come From There Thanks! @tagomoris

×