Ravikumar Alluboyina, Tushar Doshi
Robin Systems
Deliver Big Data, Database and AI/ML
as-a-Service anywhere
Who are we?
SAMPLE CUSTOMER DEPLOYMENTS
11 billion security events ingested and analyzed a day
(Elasticsearch, Logstash, Kibana, Kafka)
6 Petabytes under active management in a single Robin cluster
(Cloudera, Impala, Kafka, Druid)
400 Oracle RAC databases managed by a single Robin cluster
(Oracle, Oracle RAC)
We have solved some fundamental problems to enable containers and Kubernetes for running
complex Big Data, NoSQL, Database and AI/ML workloads
Robin is The Kubernetes platform for big data, databases and AI/ML
What are the challenges with deployment of
Big Data, NoSQL and Databases?
Container placement
DN1 DN2
DN3
DN1 DN2 DN3
Node fault tolerance
Compute
anti-affinity
DN1 DN2DN3
Rack fault tolerance
Compute
anti-affinity
Location
Awareness
Rack / DC
DN1 DN2DN3
Storage placement
Compute
anti-affinity
Location
Awareness
Rack / DC
DN1 DN2DN3
Storage fault tolerance
Compute
anti-affinity
Location
Awareness
Rack / DC
Storage &
Compute
Affinity
DN1 DN2DN3
ZK2ZK1
ZK3
Storage performance
Compute
anti-affinity
Location
Awareness
Rack / DC
Storage &
Compute
Affinity
DN1 DN2DN3
ZK2ZK1
ZK3
Workload types and QoS enforcement
Compute
anti-affinity
Location
Awareness
Rack / DC
Storage &
Compute
Affinity
IO patterns
QoS
DN1 DN2DN3
ZK2ZK1
ZK3
CM
Unprotected components
Compute
anti-affinity
Location
Awareness
Rack / DC
Storage &
Compute
Affinity
IO patterns
QoS
Compute
anti-affinity
DN1 DN2DN3
Location
Awareness
Rack / DC
Storage &
Compute
Affinity
ZK2ZK1
ZK3
IO patterns
QoS
CM
High
Availability
Storage replication and failover
Compute
anti-affinity
DN1 DN2DN3
Location
Awareness
Rack / DC
Storage &
Compute
Affinity
ZK2ZK1
ZK3
IO patterns
QoS
CM
High
Availability
Complete deployment
NM NMGW
GW
HBase
Hive
Kudu KuduKudu
KuduM KuduM
KuduM
Solr
Big data deployment and management challenges
Storage &
Compute
Affinity
Location
Awareness
Rack / DC
Compute
anti-affinity
Scale-out
compute
and storageStorage
workload
types
(IO patterns
/ QoS)
High
Availability
Data
Protection
(Backup /
DR)
Snapshot /
Rollback
Kubernetes landscape
Storage and Networking challenges
› Latest 2018 CNCF: 48% say Storage is a big challenge, 44% say Networking is a challenge in Kubernetes
› There are 27 Storage vendors and 21 Network vendors providing Storage & Networking solutions for
containers and Kubernetes1
1 https://github.com/cncf/landscape
Despite so many vendor solutions, why is it still a challenge for so many people?
Storage vendors Network vendors
Challenges with containers
Incomplete cgroups virtualization causes many Big Data and Databases to misbehave
CPU
› Contiguous core IDs, CPU ID mapping (Kudu), accurate threads:cores mapping (DB)
› NUMA aware assignment (HANA)
Memory:
› JVM sees entire host memory even if you cap the memory for container (Any JVM app)
› Memory allocation inconsistencies (hugepages, shared page cache) (Oracle)
Storage
› Apps that need raw block devices need correct WWNs management (e.g., Oracle, MapR)
› blkio cgroups setting is useless to avoid noisy neighbor problems (All apps)
Confidential – Restricted Distribution
Time to reframe our thinking
Let applications drive infrastructure to meet user requirements
(in this model application workflows configure Kubernetes, Networking and Storage)
Robin is The Kubernetes platform for big data, databases and AI/ML
www.robin.io
1-click Provision
1-click Scale
1-click QoS Control
1-click Snapshots
1-click Clones
1-click Backup
1-click Upgrade
1-click Migrate
Robin Software Stack
Virtual
Networking
App-aware
Storage
Robin’s built-in
enterprise-grade
storage stack
Snapshots, Clones, QoS,
Replication, Backup,
Data rebalancing, Tiering,
Thin-provisioning,
Encryption, Compression
Built-in flexible networking
OVS, Calico,
VLAN, Overlay networking,
Persistent IPs
Application Workflow Manager
Kubernetes
1-click application Deploy, Snapshot, Clone, Scale, Upgrade, Backup
Application workflows configure Kubernetes, Storage & Networking
Works anywhere
On-Prem (Bare Metal, VM) or Public-CloudCONFIDENTIAL – RESTRICTED DISTRIBUTION
Converged NodesCompute-only Nodes Storage-only Nodes
Robin Software Stack
Robin application-aware scale-out storage Robin’s built-in
enterprise-grade
storage stack
Snapshots, Clones, QoS,
Replication, Backup,
Data rebalancing, Tiering,
Thin-provisioning,
Encryption, Compression
Application-aware Networking
Robin
Agent
Kublet
K8S
Master
Robin
RCM
Built-in flexible
networking
OVS, Calico,
VLAN, Overlay networking,
Persistent IPs
kubectl, helmrobin
Robin programs
Kubernetes
StatefulSets, Persistent
Volumes, Claims, Services,
etc., are auto-created to
meet application needs
GPUGPU
TensorFlowKafkaMongo Spark Hortonworks Oracle RAC
Elastic
Search Robin
Agent
Kublet
Works any where
On-Prem (Bare Metal, VM) or Public-Cloud
Install to deploying apps in
15 minutes
Kublet
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Enough talk.. Demo time …
Thank you !
http://bit.ly/gorobin
Robin is The Kubernetes platform for big data, databases and AI/ML
www.robin.io
1-click Provision
1-click Scale
1-click QoS Control
1-click Snapshots
1-click Clones
1-click Backup
1-click Upgrade
1-click Migrate

Deliver Big Data, Database and AI/ML as-a-Service anywhere

  • 1.
    Ravikumar Alluboyina, TusharDoshi Robin Systems Deliver Big Data, Database and AI/ML as-a-Service anywhere
  • 2.
    Who are we? SAMPLECUSTOMER DEPLOYMENTS 11 billion security events ingested and analyzed a day (Elasticsearch, Logstash, Kibana, Kafka) 6 Petabytes under active management in a single Robin cluster (Cloudera, Impala, Kafka, Druid) 400 Oracle RAC databases managed by a single Robin cluster (Oracle, Oracle RAC) We have solved some fundamental problems to enable containers and Kubernetes for running complex Big Data, NoSQL, Database and AI/ML workloads Robin is The Kubernetes platform for big data, databases and AI/ML
  • 3.
    What are thechallenges with deployment of Big Data, NoSQL and Databases?
  • 4.
  • 5.
    DN1 DN2 DN3 Nodefault tolerance Compute anti-affinity
  • 6.
    DN1 DN2DN3 Rack faulttolerance Compute anti-affinity Location Awareness Rack / DC
  • 7.
  • 8.
    DN1 DN2DN3 Storage faulttolerance Compute anti-affinity Location Awareness Rack / DC Storage & Compute Affinity
  • 9.
  • 10.
    DN1 DN2DN3 ZK2ZK1 ZK3 Workload typesand QoS enforcement Compute anti-affinity Location Awareness Rack / DC Storage & Compute Affinity IO patterns QoS
  • 11.
  • 12.
    Compute anti-affinity DN1 DN2DN3 Location Awareness Rack /DC Storage & Compute Affinity ZK2ZK1 ZK3 IO patterns QoS CM High Availability Storage replication and failover
  • 13.
    Compute anti-affinity DN1 DN2DN3 Location Awareness Rack /DC Storage & Compute Affinity ZK2ZK1 ZK3 IO patterns QoS CM High Availability Complete deployment NM NMGW GW HBase Hive Kudu KuduKudu KuduM KuduM KuduM Solr
  • 14.
    Big data deploymentand management challenges Storage & Compute Affinity Location Awareness Rack / DC Compute anti-affinity Scale-out compute and storageStorage workload types (IO patterns / QoS) High Availability Data Protection (Backup / DR) Snapshot / Rollback
  • 15.
  • 16.
    Storage and Networkingchallenges › Latest 2018 CNCF: 48% say Storage is a big challenge, 44% say Networking is a challenge in Kubernetes › There are 27 Storage vendors and 21 Network vendors providing Storage & Networking solutions for containers and Kubernetes1 1 https://github.com/cncf/landscape Despite so many vendor solutions, why is it still a challenge for so many people? Storage vendors Network vendors
  • 17.
    Challenges with containers Incompletecgroups virtualization causes many Big Data and Databases to misbehave CPU › Contiguous core IDs, CPU ID mapping (Kudu), accurate threads:cores mapping (DB) › NUMA aware assignment (HANA) Memory: › JVM sees entire host memory even if you cap the memory for container (Any JVM app) › Memory allocation inconsistencies (hugepages, shared page cache) (Oracle) Storage › Apps that need raw block devices need correct WWNs management (e.g., Oracle, MapR) › blkio cgroups setting is useless to avoid noisy neighbor problems (All apps) Confidential – Restricted Distribution
  • 18.
    Time to reframeour thinking Let applications drive infrastructure to meet user requirements (in this model application workflows configure Kubernetes, Networking and Storage)
  • 19.
    Robin is TheKubernetes platform for big data, databases and AI/ML www.robin.io 1-click Provision 1-click Scale 1-click QoS Control 1-click Snapshots 1-click Clones 1-click Backup 1-click Upgrade 1-click Migrate
  • 20.
    Robin Software Stack Virtual Networking App-aware Storage Robin’sbuilt-in enterprise-grade storage stack Snapshots, Clones, QoS, Replication, Backup, Data rebalancing, Tiering, Thin-provisioning, Encryption, Compression Built-in flexible networking OVS, Calico, VLAN, Overlay networking, Persistent IPs Application Workflow Manager Kubernetes 1-click application Deploy, Snapshot, Clone, Scale, Upgrade, Backup Application workflows configure Kubernetes, Storage & Networking Works anywhere On-Prem (Bare Metal, VM) or Public-CloudCONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 21.
    Converged NodesCompute-only NodesStorage-only Nodes Robin Software Stack Robin application-aware scale-out storage Robin’s built-in enterprise-grade storage stack Snapshots, Clones, QoS, Replication, Backup, Data rebalancing, Tiering, Thin-provisioning, Encryption, Compression Application-aware Networking Robin Agent Kublet K8S Master Robin RCM Built-in flexible networking OVS, Calico, VLAN, Overlay networking, Persistent IPs kubectl, helmrobin Robin programs Kubernetes StatefulSets, Persistent Volumes, Claims, Services, etc., are auto-created to meet application needs GPUGPU TensorFlowKafkaMongo Spark Hortonworks Oracle RAC Elastic Search Robin Agent Kublet Works any where On-Prem (Bare Metal, VM) or Public-Cloud Install to deploying apps in 15 minutes Kublet CONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 22.
  • 23.
  • 24.
    Robin is TheKubernetes platform for big data, databases and AI/ML www.robin.io 1-click Provision 1-click Scale 1-click QoS Control 1-click Snapshots 1-click Clones 1-click Backup 1-click Upgrade 1-click Migrate