Kubernetes Clusters At Scale
Managing Hundreds Apache Pinot Kubernetes Clusters
Inside Each End User’s Own Cloud Infrastructure
Xiaoman Dong
DevOps, Software Engineer @StarTree
Apache Pinot
• OLAP Datastore
• Columnar, Indexed storage
• Real-time Low latency analytics
• Distributed – highly available, reliable, scalable
• Lambda architecture
• SQL Interface
• Open Source - Apache TLP
Typical/Traditional SaaS
● K8S Owned by SaaS Company
● Data Stays in SaaS Company
Virtual Private Cloud
We Do Delegated Management Solution
● K8S Owned by Customer
● Data Stays inside Customer’s
Virtual Private Cloud
● Fully Managed by Us
Design Context throughout the Talk
The 3 Major Constraints
● Cloud Boundaries
● Optimized for Apache Pinot
● Scale to hundreds or more
We will focus on how these 3 makes our system special
How do we design such a system?
(My job is safe from ChatGPT ... for now)
The journey: design such a system
• We are going to start small, automate, and dive deeper
• Always think about our context: customer’s cloud, our backend
Step 1: Creating the Clusters
• Each customer will be able to create and see their own clusters
• Self-serve provisioning via UI
• Multi-cloud support (AWS, GCP, Azure)
Step 1: Provisioning
The Manual Way
Automate this!
● Log into AWS UI by credentials provided by the customer
● Create Account, Networking, Kubernetes Cluster
● ❌ Bash script the aws eks creation
● ✅ Write your own microservice
- Use aws client library
- Terraform
Step 1: Provisioning (Cont’d)
- Scale to 1k customers?
Step 1: Provisioning - Orchestration
Orchestration Engine
Workflow Needed:
1. Create Account
2. Create Network
3. Create NodeGroup
4. Create K8S
5. Create …
6. Notify Finished
Retry in each step, report status
Step 2: Installing Applications
Goal: The customers needs to access their clusters with Pinot Running
Step 2: Installing Applications
The Manual Way
Automate This
● ❌ kubectl apply -f all-apps.yaml
● ✅ helm upgrade --install startree-platform …
● Build our own helm charts
● Run our own private helm repo (or pay for AWS ECR)
● All applications deployed via Helm Chart
● Call helm libraries in our code
K8S Cluster Runs as a Platform, Applications are Pluggable
Charts and docker owned by separate teams 😍
Step 3: Networking
A huge topic worth a dedicated session
Public facing vs. “Internal” facing (VPC Peering)
Kubernetes Has Good Network Modeling and EcoSystems
● Ingress - We choose Traefik, easy for teams to define ingress
● LoadBalancer by Each Cloud Provider
● ExtraVPC Peering on demand
● Multi-Zone High Availability
Step 4: TLS and Certificates - Problem
Secure connection is required nearly everywhere
● Even withinVPC/Firewall customers request it
● Manual certificate generation will not scale
Certificate has expiration dates
● Automated renewal is needed
● First Time Creation == Future Renewal
Step 4: TLS and Certificates - Knowledge
Facts of Certificates
- Proves that you own this DNS name properly
- To generate certificate, we need to do DNS related challenge to prove ownership
- Established by chain of trusts
- Issued by well-known/pre-installed 3rd party issuers like ZeroSSL
Step 4: TLS and Certificates: Centralized
Option 1: Centralized solution
✅ Better Security
❌ Harder to Scale
Step 4: TLS and Certificates: Distributed
Decentralized Certificate Renewal
❌ Less Secure
✅ Easier to Scale Up
Special Part for Delegated Management Solution
Step 5,6,7…
The Usual DevOps stuff
● OIDC for AuthZ/AuthN
● Prometheus + AlertManager for Observability
● Logging, Debugging
● Backup and Disaster Recovery
● Metrics push to centralized monitoring and/or customer’s metrics storage
● Backup to customer’s deep store
Checkpoint 1: Kubernetes Fleet Management
Architecture So Far A mini version of multi-cloud Kubernetes fleet
management system, like the KubeSphere
Wait, What About Apache Pinot?
Pinot Kubernetes Operator
Configuration/Customization
Templated Environment Creation
● Some customers like to enable groovy in Query, some don't
● Customizations/Configurations are applied onto templates
● Customization are applied like aVisitor pattern in the old Design Patterns
Are we there yet?
“Ops” part of DevOps!
* Image courtesy https://devopedia.org/devops
Version and Upgrades
Version and Upgrades (Cont’d)
The version matrix Lessons Learnt
● Create good release pipeline with tests
● Discipline: avoid releasing versions with
breaking changes
● Keep helm chart and image tag the
same as release version
Efficiency and Reliability
Efficiency and Reliability are key to Scale up
● Discipline in DevOps is important
● No architecture is bulletproof
● Less Outages == Better Efficiency
● DevOps are created for end to end ownership
Efficiency and Reliability - Cont’d
Best Practices
● Build Good Infra Integration/Regression Test
● Trunk-Based Release Pipelines
○ Always release from master
○ Say no for release branches
● Do not customize by Kubectl command
Operations and OnCalls
There is no silver bullet for OnCall
• Discipline and Process
- Root Cause every outage
- Follow up on every outage
• Effective Alerts
- Differentiate alerts from signals
- Review and Keep Improving
- Build metrics to measure effectiveness
Lessons Learnt
Security design in Provisioned Cluster is hard
• Centralized Control, less Scalability
• Decentralized Control, harder to protect credentials
• Build good debugging support on TLS certificates
Do not run complicated Terraforms
• Bugs if state gets complicated, unwanted recreation
• Internal states of terraform are hard to keep track
Lessons Learnt (cont’d)
Certificate Issuer like ZeroSSL may partially go down for half a day
• No new customer can onboard during that downtime
One 3rd Party Helm Repo goes down and blocks customer cluster upgrade
• Serve Helm Charts by your own repo like JFrog
What’s Ahead
• Improving Design For Layering
• Improve Resource Efficiency
• No Downtime Upgrade
• Cluster Federation
• …
Questions?
Thank you!
Reach me via https://www.linkedin.com/in/xiaoman/

Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clusters Inside Each End User’s Own Cloud Infrastructure

  • 1.
    Kubernetes Clusters AtScale Managing Hundreds Apache Pinot Kubernetes Clusters Inside Each End User’s Own Cloud Infrastructure Xiaoman Dong DevOps, Software Engineer @StarTree
  • 2.
    Apache Pinot • OLAPDatastore • Columnar, Indexed storage • Real-time Low latency analytics • Distributed – highly available, reliable, scalable • Lambda architecture • SQL Interface • Open Source - Apache TLP
  • 3.
    Typical/Traditional SaaS ● K8SOwned by SaaS Company ● Data Stays in SaaS Company Virtual Private Cloud
  • 4.
    We Do DelegatedManagement Solution ● K8S Owned by Customer ● Data Stays inside Customer’s Virtual Private Cloud ● Fully Managed by Us
  • 5.
    Design Context throughoutthe Talk The 3 Major Constraints ● Cloud Boundaries ● Optimized for Apache Pinot ● Scale to hundreds or more We will focus on how these 3 makes our system special
  • 6.
    How do wedesign such a system? (My job is safe from ChatGPT ... for now)
  • 7.
    The journey: designsuch a system • We are going to start small, automate, and dive deeper • Always think about our context: customer’s cloud, our backend
  • 8.
    Step 1: Creatingthe Clusters • Each customer will be able to create and see their own clusters • Self-serve provisioning via UI • Multi-cloud support (AWS, GCP, Azure)
  • 9.
    Step 1: Provisioning TheManual Way Automate this! ● Log into AWS UI by credentials provided by the customer ● Create Account, Networking, Kubernetes Cluster ● ❌ Bash script the aws eks creation ● ✅ Write your own microservice - Use aws client library - Terraform
  • 10.
    Step 1: Provisioning(Cont’d) - Scale to 1k customers?
  • 11.
    Step 1: Provisioning- Orchestration Orchestration Engine Workflow Needed: 1. Create Account 2. Create Network 3. Create NodeGroup 4. Create K8S 5. Create … 6. Notify Finished Retry in each step, report status
  • 12.
    Step 2: InstallingApplications Goal: The customers needs to access their clusters with Pinot Running
  • 13.
    Step 2: InstallingApplications The Manual Way Automate This ● ❌ kubectl apply -f all-apps.yaml ● ✅ helm upgrade --install startree-platform … ● Build our own helm charts ● Run our own private helm repo (or pay for AWS ECR) ● All applications deployed via Helm Chart ● Call helm libraries in our code
  • 14.
    K8S Cluster Runsas a Platform, Applications are Pluggable Charts and docker owned by separate teams 😍
  • 15.
    Step 3: Networking Ahuge topic worth a dedicated session Public facing vs. “Internal” facing (VPC Peering) Kubernetes Has Good Network Modeling and EcoSystems ● Ingress - We choose Traefik, easy for teams to define ingress ● LoadBalancer by Each Cloud Provider ● ExtraVPC Peering on demand ● Multi-Zone High Availability
  • 16.
    Step 4: TLSand Certificates - Problem Secure connection is required nearly everywhere ● Even withinVPC/Firewall customers request it ● Manual certificate generation will not scale Certificate has expiration dates ● Automated renewal is needed ● First Time Creation == Future Renewal
  • 17.
    Step 4: TLSand Certificates - Knowledge Facts of Certificates - Proves that you own this DNS name properly - To generate certificate, we need to do DNS related challenge to prove ownership - Established by chain of trusts - Issued by well-known/pre-installed 3rd party issuers like ZeroSSL
  • 18.
    Step 4: TLSand Certificates: Centralized Option 1: Centralized solution ✅ Better Security ❌ Harder to Scale
  • 19.
    Step 4: TLSand Certificates: Distributed Decentralized Certificate Renewal ❌ Less Secure ✅ Easier to Scale Up
  • 20.
    Special Part forDelegated Management Solution Step 5,6,7… The Usual DevOps stuff ● OIDC for AuthZ/AuthN ● Prometheus + AlertManager for Observability ● Logging, Debugging ● Backup and Disaster Recovery ● Metrics push to centralized monitoring and/or customer’s metrics storage ● Backup to customer’s deep store
  • 21.
    Checkpoint 1: KubernetesFleet Management Architecture So Far A mini version of multi-cloud Kubernetes fleet management system, like the KubeSphere
  • 22.
    Wait, What AboutApache Pinot? Pinot Kubernetes Operator
  • 23.
    Configuration/Customization Templated Environment Creation ●Some customers like to enable groovy in Query, some don't ● Customizations/Configurations are applied onto templates ● Customization are applied like aVisitor pattern in the old Design Patterns
  • 24.
    Are we thereyet? “Ops” part of DevOps! * Image courtesy https://devopedia.org/devops
  • 25.
  • 26.
    Version and Upgrades(Cont’d) The version matrix Lessons Learnt ● Create good release pipeline with tests ● Discipline: avoid releasing versions with breaking changes ● Keep helm chart and image tag the same as release version
  • 27.
    Efficiency and Reliability Efficiencyand Reliability are key to Scale up ● Discipline in DevOps is important ● No architecture is bulletproof ● Less Outages == Better Efficiency ● DevOps are created for end to end ownership
  • 28.
    Efficiency and Reliability- Cont’d Best Practices ● Build Good Infra Integration/Regression Test ● Trunk-Based Release Pipelines ○ Always release from master ○ Say no for release branches ● Do not customize by Kubectl command
  • 29.
    Operations and OnCalls Thereis no silver bullet for OnCall • Discipline and Process - Root Cause every outage - Follow up on every outage • Effective Alerts - Differentiate alerts from signals - Review and Keep Improving - Build metrics to measure effectiveness
  • 30.
    Lessons Learnt Security designin Provisioned Cluster is hard • Centralized Control, less Scalability • Decentralized Control, harder to protect credentials • Build good debugging support on TLS certificates Do not run complicated Terraforms • Bugs if state gets complicated, unwanted recreation • Internal states of terraform are hard to keep track
  • 31.
    Lessons Learnt (cont’d) CertificateIssuer like ZeroSSL may partially go down for half a day • No new customer can onboard during that downtime One 3rd Party Helm Repo goes down and blocks customer cluster upgrade • Serve Helm Charts by your own repo like JFrog
  • 32.
    What’s Ahead • ImprovingDesign For Layering • Improve Resource Efficiency • No Downtime Upgrade • Cluster Federation • …
  • 33.
  • 34.
    Thank you! Reach mevia https://www.linkedin.com/in/xiaoman/