SlideShare a Scribd company logo
1 of 9
Download to read offline
Srikumar Venugopal
DoK Day Europe 2022 @ KubeCon
Datashim - a framework for declarative
management of datasets on Kubernetes
DoK Day Europe 2022 @ KubeCon
Data Science on Kubernetes
Introducing Datashim
Cloud-native data access abstraction
Open-Source (LF Data and AI Foundation Incubation): https://datashim.io
DoK Day Europe 2022 @ KubeCon
Operational Flow
DoK Day Europe 2022 @ KubeCon
Kubeflow Pipeline Example
DoK Day Europe 2022 @ KubeCon
kind: Dataset
metadata:
name: “my-dataset”
spec:
local:
type: “COS”
accessKeyID: ...
secretAccessKey: ...
import kfp
import kfp.dsl as dsl
from kfp.dsl import PipelineVolume
...
def volume_op_dag():
dataset = PipelineVolume(”my-dataset")
step1 = dsl.ContainerOp(
name="step1",
image="library/bash:4.4.23",
command=["sh", "-c"],
arguments=["echo 1|tee /data/file1"],
pvolumes={"/data": dataset}
)
step2 = dsl.ContainerOp(
name="step2",
image="library/bash:4.4.23",
command=["sh", "-c"],
arguments=["cp /data/file1 /data/file2"],
pvolumes={"/data": step1.pvolume}
)
...
PVC: my-dataset
Example from: https://github.com/datashim-io/datashim/wiki/PVCs-for-Pipelines-SDK
DoK Day Europe 2022 @ KubeCon
human reference genome
g1k_queries
g1k_genomes
FTP
S3
PVC
PVC
PVC
PVC
PVC
PVC S3
results
Pipeline Simplification
human reference genome
g1k_genomes
DS
DS
results
Samtools
Sidecar
Samtools
DS
DS
DS
DS
DS
Y. Gkoufas, D.Y. Yuan, C.Pinto, P. Koutsovasilis, S. Venugopal,
"Datashim and Its Applications in Bioinformatics", Proceedings
of International Conference on High Performance Computing
PVC – Persistent Volume Claim
DS – Datashim Dataset
Declarative Caching
DoK Day Europe 2022 @ KubeCon
P. Koutsovasilis, S. Venugopal, Y. Gkoufas and C. Pinto, "A Holistic Approach to
Data Access for Cloud-Native Analytics and Machine Learning," in 2021 IEEE
14th International Conference on Cloud Computing (CLOUD)
Roadmap
Ephemeral volume support for S3
Integration with COSI (when finalised)
Auto-discovery of CSI implementation capabilities
Support for more frameworks (Tekton, Flyte)
Focus on observability (Design phase)
DoK Day Europe 2022 @ KubeCon
Acknowledgments
Yiannis Gkoufas
Christian Pinto
Panagiotis (Panos) Koutsovasilis
and many other contributors
DoK Day Europe 2022 @ KubeCon

More Related Content

Similar to Datashim - a framework for declarative management of datasets on Kubernetes

Similar to Datashim - a framework for declarative management of datasets on Kubernetes (20)

Terraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormationTerraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormation
 
Discovering OpenBSD on AWS
Discovering OpenBSD on AWSDiscovering OpenBSD on AWS
Discovering OpenBSD on AWS
 
Kubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native PragueKubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native Prague
 
Docker training
Docker trainingDocker training
Docker training
 
Reloca - Project as Code approach and MVP demonstration
Reloca - Project as Code approach and MVP demonstrationReloca - Project as Code approach and MVP demonstration
Reloca - Project as Code approach and MVP demonstration
 
Exploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in PythonExploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in Python
 
Microservices DevOps on Google Cloud Platform
Microservices DevOps on Google Cloud PlatformMicroservices DevOps on Google Cloud Platform
Microservices DevOps on Google Cloud Platform
 
Hoverboards, Jetpacks, Clusters and Flux Capacitors
Hoverboards, Jetpacks,  Clusters and Flux CapacitorsHoverboards, Jetpacks,  Clusters and Flux Capacitors
Hoverboards, Jetpacks, Clusters and Flux Capacitors
 
Kubernetes - Sailing a Sea of Containers
Kubernetes - Sailing a Sea of ContainersKubernetes - Sailing a Sea of Containers
Kubernetes - Sailing a Sea of Containers
 
Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
 
Deploying a Kubernetes App with Amazon EKS
Deploying a Kubernetes App with Amazon EKSDeploying a Kubernetes App with Amazon EKS
Deploying a Kubernetes App with Amazon EKS
 
Building a Kubernetes App with Amazon EKS
Building a Kubernetes App with Amazon EKSBuilding a Kubernetes App with Amazon EKS
Building a Kubernetes App with Amazon EKS
 
Steering the Sea Monster - Integrating Scylla with Kubernetes
Steering the Sea Monster - Integrating Scylla with KubernetesSteering the Sea Monster - Integrating Scylla with Kubernetes
Steering the Sea Monster - Integrating Scylla with Kubernetes
 
Aad Versteden | State-of-the-art web applications fuelled by Linked Data awar...
Aad Versteden | State-of-the-art web applications fuelled by Linked Data awar...Aad Versteden | State-of-the-art web applications fuelled by Linked Data awar...
Aad Versteden | State-of-the-art web applications fuelled by Linked Data awar...
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Kubernetes meetup 102
Kubernetes meetup 102Kubernetes meetup 102
Kubernetes meetup 102
 
Yet Another Session about Docker and Containers​
Yet Another Session about Docker and Containers​Yet Another Session about Docker and Containers​
Yet Another Session about Docker and Containers​
 
Guillotina
GuillotinaGuillotina
Guillotina
 
Simplify Cloud Applications using Spring Cloud
Simplify Cloud Applications using Spring CloudSimplify Cloud Applications using Spring Cloud
Simplify Cloud Applications using Spring Cloud
 

More from DoKC

The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
DoKC
 

More from DoKC (20)

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Datashim - a framework for declarative management of datasets on Kubernetes

  • 1. Srikumar Venugopal DoK Day Europe 2022 @ KubeCon Datashim - a framework for declarative management of datasets on Kubernetes
  • 2. DoK Day Europe 2022 @ KubeCon Data Science on Kubernetes
  • 3. Introducing Datashim Cloud-native data access abstraction Open-Source (LF Data and AI Foundation Incubation): https://datashim.io DoK Day Europe 2022 @ KubeCon
  • 4. Operational Flow DoK Day Europe 2022 @ KubeCon
  • 5. Kubeflow Pipeline Example DoK Day Europe 2022 @ KubeCon kind: Dataset metadata: name: “my-dataset” spec: local: type: “COS” accessKeyID: ... secretAccessKey: ... import kfp import kfp.dsl as dsl from kfp.dsl import PipelineVolume ... def volume_op_dag(): dataset = PipelineVolume(”my-dataset") step1 = dsl.ContainerOp( name="step1", image="library/bash:4.4.23", command=["sh", "-c"], arguments=["echo 1|tee /data/file1"], pvolumes={"/data": dataset} ) step2 = dsl.ContainerOp( name="step2", image="library/bash:4.4.23", command=["sh", "-c"], arguments=["cp /data/file1 /data/file2"], pvolumes={"/data": step1.pvolume} ) ... PVC: my-dataset Example from: https://github.com/datashim-io/datashim/wiki/PVCs-for-Pipelines-SDK
  • 6. DoK Day Europe 2022 @ KubeCon human reference genome g1k_queries g1k_genomes FTP S3 PVC PVC PVC PVC PVC PVC S3 results Pipeline Simplification human reference genome g1k_genomes DS DS results Samtools Sidecar Samtools DS DS DS DS DS Y. Gkoufas, D.Y. Yuan, C.Pinto, P. Koutsovasilis, S. Venugopal, "Datashim and Its Applications in Bioinformatics", Proceedings of International Conference on High Performance Computing PVC – Persistent Volume Claim DS – Datashim Dataset
  • 7. Declarative Caching DoK Day Europe 2022 @ KubeCon P. Koutsovasilis, S. Venugopal, Y. Gkoufas and C. Pinto, "A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning," in 2021 IEEE 14th International Conference on Cloud Computing (CLOUD)
  • 8. Roadmap Ephemeral volume support for S3 Integration with COSI (when finalised) Auto-discovery of CSI implementation capabilities Support for more frameworks (Tekton, Flyte) Focus on observability (Design phase) DoK Day Europe 2022 @ KubeCon
  • 9. Acknowledgments Yiannis Gkoufas Christian Pinto Panagiotis (Panos) Koutsovasilis and many other contributors DoK Day Europe 2022 @ KubeCon