SlideShare a Scribd company logo
1 of 19
Self-Healing
Automated
Deploy/Rollback
Auto Scaling
Load Balancing
Application
Centric
Backup
Disaster Recovery
Infrastructure
Management
Hybrid Cloud
Mobility
K8s Strengths K8s Weaknesses
Security
What Problem Do We Address?
1
Go No Go Decision Template
$ 0.10/hr
Pay-as-you-go: Standard costs of node VMsand otherresources
$ 0.10/hr
2018 2018 2015
Cloud Formation Stack AzureResource Manager Auto-Pilot /Standard Window
EBS / CSI AzureDisk /AzureFile (CSI) GCE PersistentDisk / CSI
… … …
… … …
ManagedKubernetes- Comparison
AmazonEKS AzureAKS GoogleCloudGKE
• Kubernetes Data Landscape / Adoption
• Users should be free to adopt and use whatever Kubernetes managed
services that best suits their requirements
• Storage Providers / Classes
• Users should be free to adopt whatever persistent storage volume classes and
protect data without overthinking the migration of moving between
infrastructure
• Different storage classes
• Azure Block / Azure File
• Amazon EBS / CSI
• Google Compute Engine Persistent Disk
Agnostic Approach
3
• https://kubernetes-
csi.github.io/docs/drivers.html
Supported Capabilities of Current CSI Drivers
4
Supported Capabilities of Current CSI Drivers
An Interesting Case Of Azure File CSI Driver
5
• https://docs.microsoft.com/en-us/azure/aks/azure-files-csi
Find some more details about automated restore
6
• https://cloudcasa.io/blog/automating-azure-files-restore-in-aks/
Cross-cluster and Cross-cloud Migrations Challenges
7
Reasons:
• Reduce workload on existing cluster by moving some of the workload to another cluster
• Clone the environment for testing purposes
• Migrate to different data center or cloud
Challenges
• Storage and network isolation
• Different storage options for each cloud provider
• No native Kubernetes support for operations such as copying or data migration
Tools:
• Velero and Restic
• CloudCasa by Catalogic Software
• Infrastructure – installs on the cluster
• Cloud Credentials requirement to talk to cluster
• Setup differs for each cloud provider
• No central logging and storage credentials management
• There is no one place to manage all clusters
• Works if you have a handful of clusters – but leaves a lot to be desired when talking about Kubernetes scale
Challenges with Velero
8
Issues With Restic
9
• Poor restore speed
"Restic incremental restore speed continued to be extremely slow because 1 and sometimes 2 Restic threads are using 100% CPU/ Only 15 GB was restored in 16 hours
at 0.25 MB/s [..] it will take 2 months to restore 1.5 TB."
• Performed poorly with parallelism
• Performed poorly with small and large files
Alternatives
• Kopia
• Borg (BorgBackup)
• Leveraging data copies for more than just insurance policies
• Spin up a cluster from scratch (vs. having a standby cluster ready and
available waiting to receive data)
• Automating restore processes
• Migrate between managed Kubernetes service providers (cloud, on-
prem)
• Restore with production data / customizations
• Remap storage / namespaces
• Cross cluster (on-prem / cloud)
10
Restoring from Backup
Cluster Recoveries
11
• Protect your data but also your infrastructure
• Integrate with different cloud providers without unnecessary credentials management
Keep You Backup Data Private
11
• Bring Your Own Storage
• Keep our data isolated from the public Internet with Private Link
Azure Private Link
• Reduces exposure to brute force and DDoS attacks
• Azure Private Link enables user to access Azure Services over a private endpoint in a virtual network
Did you know about Azure Private Link?
You can backup your data to S3 without traversing the Internet.
13
Azure
Private Link
Public
Internet
Data Transfer
Azure Private Link allows you to provide private
connectivity between your clusters and Storage,
without exposing your traffic to the public internet.
Data Leakage is a common concern in using SaaS
services, but Azure Private Links shut the door on
exfiltration.
Trusted worldwide by large organizations such as
Discover, Salesforce, Autodesk and Goldman Sachs.
Private Endpoints are supported by CloudCasa to
direct backup traffic to your Azure Blob Storage
through this service
Local Proxy AKS Cluster is used to route all
S3 bucket and storage management operations
privately.
Maintain at least 3copies of
yourdata
3
2
1
Keep 2 copies on different media
Store at least 1 copy at an off-site
location, ideally air-gapped
1 verified copy for recovery
3-2-1-1 Ransomware ProtectionRule
14
Centralized Management, Cross-Platform and Composable Recoveries
Think about RTO and Compliance
15
 One step onboarding of
all clusters
 Never miss a new cluster
 Multi-cluster and multi-
cloud
Auto Discovery and
Compliance
 Backup your AKS/EKS
cluster configuration
 Over 40 configuration
properties collected
 Run security scans on your
clusters
Cloud-Aware
Config. Protection
 Backup to another region,
cloud or account
 One-click, tamper proof
SafeLock on your backups
Cross-Region
Compliance
 Create an AKS or
EKS cluster on the fly
 No need to maintain a
standby cluster
Recovery Through
Code
50+ CSI
DRIVERS
Managed Storage
CSI Snapshots
Choice of 20+
Regions
BYO
Storage
Cloud-Aware Backup-as-a-Service for Kubernetes
And more…
In the Cloud On-Premises
Self Service Backup
& Recovery
Multi-Cluster and
Multi-Cloud
Centralized
Discovery & Mgmt
Proactive
Security Scans
Tamper-Proof
Backups
Cross Region & Cloud
Resilience
App-Aware
Templates
16
Screenshot (Dashboard)
17
Discovery from Cloud Accounts
18
19
Discovery from Cloud Accounts

More Related Content

More from DoKC

The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
DoKC
 
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
DoKC
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
DoKC
 
What we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesWhat we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on Kubernetes
DoKC
 
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
DoKC
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” service
DoKC
 
The many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent dataThe many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent data
DoKC
 
Testing the Mettle: Evaluating data solutions for large-scale production to c...
Testing the Mettle: Evaluating data solutions for large-scale production to c...Testing the Mettle: Evaluating data solutions for large-scale production to c...
Testing the Mettle: Evaluating data solutions for large-scale production to c...
DoKC
 

More from DoKC (20)

Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
 
What's New in Kubernetes Storage
What's New in Kubernetes StorageWhat's New in Kubernetes Storage
What's New in Kubernetes Storage
 
What we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesWhat we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on Kubernetes
 
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” service
 
The many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent dataThe many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent data
 
The Data on Kubernetes Landscape
The Data on Kubernetes LandscapeThe Data on Kubernetes Landscape
The Data on Kubernetes Landscape
 
Testing the Mettle: Evaluating data solutions for large-scale production to c...
Testing the Mettle: Evaluating data solutions for large-scale production to c...Testing the Mettle: Evaluating data solutions for large-scale production to c...
Testing the Mettle: Evaluating data solutions for large-scale production to c...
 

Overcoming challenges with protecting and migrating data in multi-cloud K8s environments - DoK Talks #149

  • 1. Self-Healing Automated Deploy/Rollback Auto Scaling Load Balancing Application Centric Backup Disaster Recovery Infrastructure Management Hybrid Cloud Mobility K8s Strengths K8s Weaknesses Security What Problem Do We Address? 1
  • 2. Go No Go Decision Template $ 0.10/hr Pay-as-you-go: Standard costs of node VMsand otherresources $ 0.10/hr 2018 2018 2015 Cloud Formation Stack AzureResource Manager Auto-Pilot /Standard Window EBS / CSI AzureDisk /AzureFile (CSI) GCE PersistentDisk / CSI … … … … … … ManagedKubernetes- Comparison AmazonEKS AzureAKS GoogleCloudGKE
  • 3. • Kubernetes Data Landscape / Adoption • Users should be free to adopt and use whatever Kubernetes managed services that best suits their requirements • Storage Providers / Classes • Users should be free to adopt whatever persistent storage volume classes and protect data without overthinking the migration of moving between infrastructure • Different storage classes • Azure Block / Azure File • Amazon EBS / CSI • Google Compute Engine Persistent Disk Agnostic Approach 3
  • 4. • https://kubernetes- csi.github.io/docs/drivers.html Supported Capabilities of Current CSI Drivers 4 Supported Capabilities of Current CSI Drivers
  • 5. An Interesting Case Of Azure File CSI Driver 5 • https://docs.microsoft.com/en-us/azure/aks/azure-files-csi
  • 6. Find some more details about automated restore 6 • https://cloudcasa.io/blog/automating-azure-files-restore-in-aks/
  • 7. Cross-cluster and Cross-cloud Migrations Challenges 7 Reasons: • Reduce workload on existing cluster by moving some of the workload to another cluster • Clone the environment for testing purposes • Migrate to different data center or cloud Challenges • Storage and network isolation • Different storage options for each cloud provider • No native Kubernetes support for operations such as copying or data migration Tools: • Velero and Restic • CloudCasa by Catalogic Software
  • 8. • Infrastructure – installs on the cluster • Cloud Credentials requirement to talk to cluster • Setup differs for each cloud provider • No central logging and storage credentials management • There is no one place to manage all clusters • Works if you have a handful of clusters – but leaves a lot to be desired when talking about Kubernetes scale Challenges with Velero 8
  • 9. Issues With Restic 9 • Poor restore speed "Restic incremental restore speed continued to be extremely slow because 1 and sometimes 2 Restic threads are using 100% CPU/ Only 15 GB was restored in 16 hours at 0.25 MB/s [..] it will take 2 months to restore 1.5 TB." • Performed poorly with parallelism • Performed poorly with small and large files Alternatives • Kopia • Borg (BorgBackup)
  • 10. • Leveraging data copies for more than just insurance policies • Spin up a cluster from scratch (vs. having a standby cluster ready and available waiting to receive data) • Automating restore processes • Migrate between managed Kubernetes service providers (cloud, on- prem) • Restore with production data / customizations • Remap storage / namespaces • Cross cluster (on-prem / cloud) 10 Restoring from Backup
  • 11. Cluster Recoveries 11 • Protect your data but also your infrastructure • Integrate with different cloud providers without unnecessary credentials management
  • 12. Keep You Backup Data Private 11 • Bring Your Own Storage • Keep our data isolated from the public Internet with Private Link Azure Private Link • Reduces exposure to brute force and DDoS attacks • Azure Private Link enables user to access Azure Services over a private endpoint in a virtual network
  • 13. Did you know about Azure Private Link? You can backup your data to S3 without traversing the Internet. 13 Azure Private Link Public Internet Data Transfer Azure Private Link allows you to provide private connectivity between your clusters and Storage, without exposing your traffic to the public internet. Data Leakage is a common concern in using SaaS services, but Azure Private Links shut the door on exfiltration. Trusted worldwide by large organizations such as Discover, Salesforce, Autodesk and Goldman Sachs. Private Endpoints are supported by CloudCasa to direct backup traffic to your Azure Blob Storage through this service Local Proxy AKS Cluster is used to route all S3 bucket and storage management operations privately.
  • 14. Maintain at least 3copies of yourdata 3 2 1 Keep 2 copies on different media Store at least 1 copy at an off-site location, ideally air-gapped 1 verified copy for recovery 3-2-1-1 Ransomware ProtectionRule 14
  • 15. Centralized Management, Cross-Platform and Composable Recoveries Think about RTO and Compliance 15  One step onboarding of all clusters  Never miss a new cluster  Multi-cluster and multi- cloud Auto Discovery and Compliance  Backup your AKS/EKS cluster configuration  Over 40 configuration properties collected  Run security scans on your clusters Cloud-Aware Config. Protection  Backup to another region, cloud or account  One-click, tamper proof SafeLock on your backups Cross-Region Compliance  Create an AKS or EKS cluster on the fly  No need to maintain a standby cluster Recovery Through Code
  • 16. 50+ CSI DRIVERS Managed Storage CSI Snapshots Choice of 20+ Regions BYO Storage Cloud-Aware Backup-as-a-Service for Kubernetes And more… In the Cloud On-Premises Self Service Backup & Recovery Multi-Cluster and Multi-Cloud Centralized Discovery & Mgmt Proactive Security Scans Tamper-Proof Backups Cross Region & Cloud Resilience App-Aware Templates 16
  • 18. Discovery from Cloud Accounts 18

Editor's Notes

  1. Not have all eggs in one basket Differences to be aware of, role in what you choose Orienting the rest of these 3 platforms, multi-cloud neutral to multi-cloud environments Integration into cloud providers. All the challenges we solve X% of Kubernetes users are running in Cloud, gist of what we want to communicate in the slide. Everyone has so many different accounts
  2. Want to choose a data protection scheme that doesn’t vendor lock you in to doing data protection a certain way Amazon snapshots Azure snapshots on block/file storage CSI snapshots on prem storage Security concerns (discussion from Sebastian, one per cluster, service account/custom role, bind to one project; GKE (organization-project-cluster, repeat) Discovered cloud agnostic approach allows customer to choose the best managed service to suit their requirements Auto-discovery of all clusters - native cloud services integration Cloud Formation Stack, Azure Resource Manager Easy automated deployment for on-prem. Deploy an agent Manage everything the same way from the Cloud We have multiple cloud accounts in each of these providers, customer is storing their data into each of these platforms. Inventorying is first big problem Cloud Accounts – no idea how many clusters in an environment of that size. Everything created by code, large ISP with 100+ cloud accounts (Concurrency – 30 down to 4 minutes, ) What would it take to perform Velero backup in one cluster. Management Octeto talking In addition to maintaining infrastructure, other concerns you would need to be aware of when using a basic toolset. (streamline processes from an end-user perspective)
  3. Every single Velero install at a cluster level. Cloud credentials in order to talk to Cloud storage Cloud connectivity separately Manage schedule and everything separately. 200 times for 200 clusters. No policy to distribute to all clusters. No centralized monitoring / management, simple agent that runs on CLI, no management plane across, but do this every single cluster. And even if you automate there is no centralized management. Works great if you have a handful of clusters. Just not something that scales with Kuberenetes. Open Source Options Scripting backups and restores Deployment concerns (Charter anecdote here) …How many clusters do you have and manage? Centralized management
  4. Every single Velero install at a cluster level. Cloud credentials in order to talk to Cloud storage Cloud connectivity separately Manage schedule and everything separately. 200 times for 200 clusters. No policy to distribute to all clusters. No centralized monitoring / management, simple agent that runs on CLI, no management plane across, but do this every single cluster. And even if you automate there is no centralized management. Works great if you have a handful of clusters. Just not something that scales with Kuberenetes. Open Source Options Scripting backups and restores Deployment concerns (Charter anecdote here) …How many clusters do you have and manage? Centralized management
  5. Every single Velero install at a cluster level. Cloud credentials in order to talk to Cloud storage Cloud connectivity separately Manage schedule and everything separately. 200 times for 200 clusters. No policy to distribute to all clusters. No centralized monitoring / management, simple agent that runs on CLI, no management plane across, but do this every single cluster. And even if you automate there is no centralized management. Works great if you have a handful of clusters. Just not something that scales with Kuberenetes. Open Source Options Scripting backups and restores Deployment concerns (Charter anecdote here) …How many clusters do you have and manage? Centralized management
  6. Working with account – how many clusters (?), how many accounts?
  7. Where is the snapshot? Local to storage copied to a separate location? What will it take to restore the snapshot? Infrastructure requirements,
  8. Mention Proxy servers here in this slide
  9. 3-2-1 rule is critical when considering backup solutions for recovering from ransomware DPX can provide you at least three (3) copies of your data, on two (2) different storage media types, with one (1) airgapped copy offsite or in the cloud. One copy on vStor stored in immutable snapshots One copy replicated using vStor replication technology One airgapped copy in the cloud or on tape. When ransomware hits, they attempt to also remove all backups that one must pay to get the data back With at least 2 copies one in immutable snapshots and one-off site you rest easy that the data can be recovered
  10. AWS inventory, concurrency 30 minutes inventory to 4 minutes. EKS offers no backups, AKS offers preview mode (what does GKE support) 3-2-1 rule in play. Option today is Restic a bit of the pain and Velero now promoting Kopia (each vendor write your own data mover). Velero creating a pluggable data mover for upcoming releases (Sebastian) Performed very poorly with lot of small files (and large files) (Sebastian) Performed poorly with parallelism (which is why we ended up using Kopia)
  11. Let’s face it, backups are a necessary evil… no one wants to do backups, but need to do backups Snapshot or Backup (Copy) Automate the backup environment (Sebastian – REST API, Automation, CICD pipelines (don’t have access to the UI) (Screenshot documentation / swagger / example / api-docs) BYOS – where to backup to Isolated environments that cannot speak to the Internet (Sebastian, private links, Azure/AWS, does not traverse internet)