Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Containerized Hadoop beyond Kubernetes

339 views

Published on


Supporting Hadoop in containers takes much more than the very primitive support Docker provides using the Storage Plugin. A production scale Hadoop deployment inside containers needs to honor anti/affinity, fault-domain and data-locality policies. Kubernetes alone, with primitives such as StatefulSets and PersitentVolumeClaims, is not sufficient to support a complex data-heavy application such as Hadoop. One needs to think about this problem more holistically across containers, networking and storage stacks. Also, constructs around deployment, scaling, upgrade etc in traditional orchestration platforms is designed for applications that have adopted a microservices philosophy, which doesn't fit most Big Data applications across the ingest, store, process, serve and visualization stages of the pipeline. Come to this technical session to learn how to run and manage lifecycle of containerized Hadoop and other applications in the data analytics pipeline efficiently and effectively, far and beyond simple container orchestration. #BigData, #NoSQL, #Hortonworks, #Cloudera, #Kafka, #Tensorflow, #Cassandra, #MongoDB, #Kudu, #Hive, #HBase, PARTHA SEETALA, CTO, Robin Systems.

Published in: Technology
  • Be the first to comment

Containerized Hadoop beyond Kubernetes

  1. 1. Partha Seetala CTO, Robin Systems Containerized Hadoop beyond Kubernetes
  2. 2. Who am I? SAMPLE CUSTOMER DEPLOYMENTS 11 billion security events ingested and analyzed a day (Elasticsearch, Logstash, Kibana, Kafka) 6 Petabytes under active management in a single Robin cluster (Cloudera, Impala, Kafka, Druid) 400 Oracle RAC databases managed by a single Robin cluster (Oracle, Oracle RAC) CTO of Robin Systems, before that Distinguished Engineer at Veritas/Symantec We have solved some fundamental problems to enable containers and Kubernetes for running complex Big Data, NoSQL, Database and AI/ML workloads Robin is The Kubernetes platform for big data, databases and AI/ML
  3. 3. Why containerize Big Data, NoSQL, and Databases?
  4. 4. Why containerize: developers perspective? 1. Dislike opening an IT ticket and wait weeks for their apps to be ready for use They want their apps to be available now 2. Want to experiment with tools, but dislike the complexity of setting them up For example, which of RDBMS, NoSQL, DocumentDB, or GraphDB is better for the app? 3. Want to run apps where it makes the most sense Their laptop, on prem datacenter, public cloud, etc Containers offer deployment agility and infrastructure independence
  5. 5. Why containerize: infrastructure perspective? 40% Resource utilization is pretty low on Big Data clusters
  6. 6. Why containerize – infrastructure perspective? $34 K Utilization worsens with every hardware refresh $141 K$25 K 4 years ago Today CPU 20 Cores 40 Cores Memory 128 GiB 512 GiB Storage 48 TiB 144 TiB Network 2x10 Gbe 2x40 Gbe CPU 24 Cores Memory 256 GiB Storage 540 TiB Network 4x40 Gbe CPU 36 Cores Memory 768 GiB Storage 122 TiB Network 2x100 Gbe GPU 8x NVIDIA V100 Modern hardware offers a lot more resources per rack unit which must be kept busy to realize RIO Containers allow you to maximize infrastructure utilization
  7. 7. Why Containers, not Virtualization? › Get the benefits of virtualization without any of its overhead › Containers run applications directly on baremetal without virtualizing hardware › A resource given to a hypervisor is a resource that is taken away from your Big Data application › Applications are being packaged and shipped as container images, not VMs › You must leverage and adapt to this shift in application packaging › Containers avoid the need for specialized storage stacks for deduplicating VM images
  8. 8. What are the challenges with containerizing Big Data, NoSQL and Databases?
  9. 9. Challenges with containers Incomplete cgroups virtualization causes many Big Data and Databases to misbehave CPU › Contiguous core IDs, CPU ID mapping (Kudu), accurate threads:cores mapping (DB) › NUMA aware assignment (HANA) Memory: › JVM sees entire host memory even if you cap the memory for container (Any JVM app) › Memory allocation inconsistencies (hugepages, shared page cache) (Oracle) Storage › Apps that need raw block devices need correct WWNs management (e.g., Oracle, MapR) › blkio cgroups setting is useless to avoid noisy neighbor problems (All apps) Confidential – Restricted Distribution
  10. 10. Challenges with container orchestration platforms Very opiniated and architected with a microservices-oriented philosophy › Expects that apps can be brought up trivially within milliseconds during crash recovery › Scale by adding more containers and registering with a load balancer to spread load around › Recommend modeling your app as a collection of stateless containers, each serving a single service But you are dealing with applications that have decades of built in assumptions › Big Data and databases are not written as a micro-services applications › You can’t stop and restart them rapidly › You have to worry about both storage and network state for ensuring high availability › There is significant investment in custom scripting that assume SSH access to hosts running apps Confidential – Restricted Distribution
  11. 11. Storage and Networking challenges 2018 CNCF survey says Storage and Networking are the biggest challenges in Kubernetes https://www.cncf.io/blog/2017/06/28/survey-shows-kubernetes-leading-orchestration-platform 48% 44%
  12. 12. Storage and Networking challenges › Latest 2018 CNCF: 48% say Storage is a big challenge, 44% say Networking is a challenge in Kubernetes › There are 27 Storage vendors and 21 Network vendors providing Storage & Networking solutions for containers and Kubernetes1 1 https://github.com/cncf/landscape Despite so many vendor solutions, why is it still a challenge for so many people? Storage vendors Network vendors
  13. 13. Operational challenges to overcome Storage › Performance un-predictability when consolidating Big Data, and Database apps › Data locality requirements (both performance and datacenter network bandwidth constraints) › Anti/affinity and isolation constraints Networking › Services running inside K8S are often times consumed by applications running in different L3 subnets › Putting a load-balancer in between apps and services is unnecessary and less performant for most Big Data, NoSQL, and Database applications › 90% of the apps being used in real-life require IP address to be preserved during restarts Spending time setting up Storage and Networking is a drag on user productivity
  14. 14. Don’t miss the forest for the trees Users Applications Infrastructure Most vendors are looking at the problem in this direction Whereas we should be looking at it in this direction Focus on User- App Interaction Let apps drive infrastructure to meet user requirements Focus is on Infra components StatefulSets, Deployments, Persistent Volume Claims, Services, CSI, CNI, HPA
  15. 15. Can Kubernetes alone get us to the promised land? MANAGEMENT (kubectl, helm) SERVICES (Ingress, Proxy, LB) STORAGE (CSI) NETWORKING (CNI, Overlay) MONITORING (Heapster, HPA) CONFIGURATION (ConfigMap, Secrets) UI SERVER INFRA (Baremetal, On-prem VM, AWS, Azure, GCP) DATA (PV, PVC) TROUBLESHOOTING (Logging, Events) CONTAINERS (docker, LXC)
  16. 16. Time to reframe our thinking Let applications drive infrastructure to meet user requirements (in this model application workflows configure Kubernetes, Networking and Storage)
  17. 17. Robin is The Kubernetes platform for big data, databases and AI/ML Integrated App-aware Storage Docker, LxC, Kubernetes Integrated Networking Application-aware Workflow Manager + + + Application workflows configure Kubernetes, Networking and Storage
  18. 18. When you elevate your thinking to Applications You do less of this › Deployments, ReplicaSets, and StatefulSets › Persistent Volumes and Claims › Service endpoints, proxy › Ingress and Egress routes › Secrets and Configmaps › Heapster, CSI, CNI, and CRI And do more of this › Time-travel application states › Clone entire Applications with their data › Backup and restore entire apps, any app › Upgrade applications in a failsafe manner › Control QoS of apps to meet performance SLAs › Make applications and data mobile across clouds
  19. 19. Give your users a managed service experience SPECIFY DATA-LOCALITY, ANTI/AFFINITY CONSTRAINTS AND PLACEMENT HINTS ENABLE SERVICE COMPONENTS SPECIFY COMPUTE SPECIFY STORAGE SPECIFY SCALE Just minutes from click to use 64 node Hadoop Cluster with 1408 CPU Cores, 4.5 TB of Memory, 1.5 PB of Storage  takes just 23 mins Services enabled: Atlas, Spark, Hive, Kerberos, Sentry, HDFS, namenode HA K8S components auto created (StatefulSets, PVC, Services, …) Data-locality, anti/affinity policies enforced Any Big Data, NoSQL, Database, AI/ML app
  20. 20. Adjust resources to meet changing priorities › Application priorities change with time › Faster ingest during daytime › Faster querying for end-of-quarter reporting › Trade resources between adjacent applications dynamically › Adjust CPU, Memory, GPU, Network and IOPs dynamically › Scaling resources vs scaling entire service › K8S’ Horizontal Pod Autoscaler (HPA) is not suitable for data applications: › Works by adding more Pods to scale horizontally › Great for stateless apps › Not so good for Big Data, NoSQL and Databases › Results in data rebalancing which is a costly and permanent. Scaling down is very hard. Kafka Hadoop1 Hadoop2 Druid Assign each app its own resource quota (CPU, Mem, IOPS) Shift resources from Hadoop2 to Hadoop1 with 1-Click Shift resources from Hadoop1 to Hadoop2 with 1-Click 8 AM 3 PM12 AM 11 PM
  21. 21. Application-centric resource management and QoS › We enhanced K8S’ cgroups management capabilities › More comprehensive procfs, and sysfs virtualization › Virtualize sysinfo(2) system call › We implemented an application-topology aware MIN and MAX storage QoS › Predictable performance for mission-critical workloads › Eliminate noisy-neighbor challenges when consolidating workloads Robin Application-aware Storage Hadoop Mongo DB Kafka MySQL Postgres Mongo DB IO IO IO Postgres
  22. 22. Operational challenges for Big Data, NoSQL, Databases extend beyond just provisioning and scaling
  23. 23. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud
  24. 24. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud 1-click Application-consistent Snapshots Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 4 Current
  25. 25. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud Snapshot 1 4 months ago Snapshot 2 2 weeks ago Snapshot 3 3 days ago Snapshot 4 yesterday Current now 1-click Ready-to-use Clones RoW based Cloning (Ultra fast) Clone gets different network identify
  26. 26. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud
  27. 27. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud
  28. 28. See demos at booth G3 Robin is The Kubernetes platform for big data, databases and AI/ML www.RobinSystems.com 1-click Provision 1-click Scale 1-click QoS Control 1-click Snapshots 1-click Clones 1-click Backup 1-click Upgrade 1-click Migrate

×