Datashim - a framework for declarative management of datasets on Kubernetes

•

0 likes•105 views

Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation. This talk was given by Srikumar Venugopal for DoK Day Europe @ KubeCon 2022.

Technology

Srikumar Venugopal
DoK Day Europe 2022 @ KubeCon
Datashim - a framework for declarative
management of datasets on Kubernetes

DoK Day Europe 2022 @ KubeCon
Data Science on Kubernetes

Introducing Datashim
Cloud-native data access abstraction
Open-Source (LF Data and AI Foundation Incubation): https://datashim.io
DoK Day Europe 2022 @ KubeCon

Operational Flow
DoK Day Europe 2022 @ KubeCon

$Kubeflow Pipeline Example DoK Day Europe 2022 @ KubeCon kind: Dataset metadata: name: “my-dataset” spec: local: type: “COS” accessKeyID: ... secretAccessKey: ... import kfp import kfp.dsl as dsl from kfp.dsl import PipelineVolume ... def volume_op_dag(): dataset = PipelineVolume(”my-dataset") step1 = dsl.ContainerOp( name="step1", image="library/bash:4.4.23", command=["sh", "-c"], arguments=["echo 1|tee /data/file1"], pvolumes={"/data": dataset} ) step2 = dsl.ContainerOp( name="step2", image="library/bash:4.4.23", command=["sh", "-c"], arguments=["cp /data/file1 /data/file2"], pvolumes={"/data": step1.pvolume} ) ... PVC: my-dataset Example from: https://github.com/datashim-io/datashim/wiki/PVCs-for-Pipelines-SDK$

DoK Day Europe 2022 @ KubeCon
human reference genome
g1k_queries
g1k_genomes
FTP
S3
PVC
PVC
PVC
PVC
PVC
PVC S3
results
Pipeline Simplification
human reference genome
g1k_genomes
DS
DS
results
Samtools
Sidecar
Samtools
DS
DS
DS
DS
DS
Y. Gkoufas, D.Y. Yuan, C.Pinto, P. Koutsovasilis, S. Venugopal,
"Datashim and Its Applications in Bioinformatics", Proceedings
of International Conference on High Performance Computing
PVC – Persistent Volume Claim
DS – Datashim Dataset

Declarative Caching
DoK Day Europe 2022 @ KubeCon
P. Koutsovasilis, S. Venugopal, Y. Gkoufas and C. Pinto, "A Holistic Approach to
Data Access for Cloud-Native Analytics and Machine Learning," in 2021 IEEE
14th International Conference on Cloud Computing (CLOUD)

Roadmap
Ephemeral volume support for S3
Integration with COSI (when finalised)
Auto-discovery of CSI implementation capabilities
Support for more frameworks (Tekton, Flyte)
Focus on observability (Design phase)
DoK Day Europe 2022 @ KubeCon

Acknowledgments
Yiannis Gkoufas
Christian Pinto
Panagiotis (Panos) Koutsovasilis
and many other contributors
DoK Day Europe 2022 @ KubeCon

Similar to Datashim - a framework for declarative management of datasets on Kubernetes

OK, you are convinced, that infrastructure-as-code (managing computing resources, networks, configuring services through machine-readable definition files, rather than interactive configuration tools) is the way to go! But do not know, which technology to use for it? Should I learn Terraform, do I need to buy Atlas, is Ansible really enough to implement everything? Will my code stay maintainable in 3 or 5 years? We will have a look at a small, but real world task, consider implementations with all the mentioned tools and we will also discuss, which tools give you advantages in which specific situations

Terraform, Ansible or pure CloudFormation

geekQ

Discovering OpenBSD on AWS

Laurent Bernaille

While Go is the language-of-choice in the cloud-native world, Python has a huge community and makes it really easy to extend Kubernetes in only a few lines of code. This talk shows examples on how to use Python to query the Kubernetes API, how to write simple controllers in only 10 lines of Python, how to build complete web UIs, and how to test everything with py.test and Kind. Some of the open-source projects which will be covered: pykube-ng, Kubernetes Web View, kube-janitor, and Kopf (Kubernetes Operator Pythonic Framework). Talk held in Prague on 2019-09-05: https://www.meetup.com/Cloud-Native-Prague/events/263802447/

Kubernetes + Python = ❤ - Cloud Native Prague

Henning Jacobs

Docker training

Kiran Kumar

Reloca - Project as Code approach and MVP demonstration

Fabienne Mariën

Exploring MySQL Operator for Kubernetes in Python

Ivan Ma

Microservices DevOps on Google Cloud Platform

Sunnyvale

In this talk, Connor and Niklas will talk about their thoughts on the next decade of cluster computing. They have worked on Apache Mesos, Kubernetes and Mesos Frameworks; from design of subsystems to tooling and operationalizing at scale. They will discuss past, present and future trends in public and private cloud computing and unique opportunities for the cluster computing communities. By the end of the talk, they hope you will leave with a fresh perspective on scheduling and orchestration, at a deeper level than "Mesos vs. Kubernetes vs. Omega vs. Borg …" KubeCon schedule link: http://sched.co/4Wgx

Hoverboards, Jetpacks, Clusters and Flux Capacitors

KubeAcademy

Containers are everywhere these days. Many of us are containerizing our applications to take advantage of the ease of a single artifact, but what can we do to make deploying these containers to a fleet of servers easier? Kubernetes is arguably the most popular container orchestration system to date. Kubernetes was born out of a decade of research at Google and has seen success; by itself as a fantastic way to orchestrate containers across multiple machines and as a component in other platforms. This talk will begin with the anatomy and setup of a Kubernetes cluster. We'll demonstrate (live) taking a container containing a simple web service and launch our application into a small Kubernetes cluster. Next we'll perform a rolling update to deploy a new container version with zero downtime. Also, we'll check out some cool debugging features Kubernetes provides over the course of our demo.

Kubernetes - Sailing a Sea of Containers

Kel Cecil

Containers for sensor web services, applications and research @ Sensor Web Co...

Daniel Nüst

Interested in learning how to set up a Kubernetes cluster and use automation to test and deploy an app? During this presentation, Laura Frank will take a deep dive into CI/CD best practices with Kubernetes and Amazon EKS. You will be introduced to AmazonEKS, Amazon’s Kubernetes service and CloudBees CodeShip, a flexible continuous integration (CI)/continuous delivery(CD) tool that runs your builds in the cloud. Designed with developers in mind, both EKS and CodeShip when used together reduce the complexity of running an app with Kubernetes. Attend this webinar to learn: - An overview of Amazon EKS - How to set up your own CI/CD pipeline - How to leverage CI/CD best practices with Kubernetes

Deploying a Kubernetes App with Amazon EKS

Laura Frank Tacho

Building a Kubernetes App with Amazon EKS

DevOps.com

Kubernetes is a declarative system for automatically deploying, managing, and scaling server-side applications and their dependencies. In this webinar, we will introduce Kubernetes at a high level and demonstrate how to get started using Scylla with Kubernetes and Google Compute Engine. Join us to: Understand the principles of Kubernetes and how it solves common problems of deploying distributed applications Explore an example configuration of Scylla with Kubernetes that can serve as a starting point for your own system. Get insight into the performance characteristics of Scylla when it it is run in a container (e.g. Docker) and deployed via Kubernetes.

Steering the Sea Monster - Integrating Scylla with Kubernetes

ScyllaDB

Aad Versteden | State-of-the-art web applications fuelled by Linked Data awar...

semanticsconference

DevOps is gericht op het tot stand brengen van een cultuur binnen organisaties waardoor het ontwikkelen, valideren en releasen van software sneller, meer betrouwbaar en frequenter kan verlopen. Om dit te realiseren staan het automatiseren van het 'software delivery process' en de bijhorende infrastructurele veranderingen centraal. Door de opkomst van 'Microservice Architecture' neemt het belang hiervan nog verder toe.

Docker and Cloud - Enables for DevOps - by ACA-IT

Stijn Wijndaele

Sprekers: Stijn Van den Enden & Stijn Wijndaele (ACA IT-Solutions) DevOps is gericht op het tot stand brengen van een cultuur binnen organisaties waardoor het ontwikkelen, valideren en releasen van software sneller, meer betrouwbaar en frequenter kan verlopen. Om dit te realiseren staan het automatiseren van het 'software delivery process' en de bijhorende infrastructurele veranderingen centraal. Door de opkomst van 'Microservice Architecture' neemt het belang hiervan nog verder toe. In deze avondconferentie werd, na een korte toelichting over DevOps, nagegaan wat Docker en de Cloud kunnen betekenen voor uw business, en hoe zij als enablers kunnen dienen voor het tot stand brengen van een DevOps-cultuur. Het container-landschap waarvan tools zoals Kubernetes, Docker Swarm, ...een belangrijk onderdeel vormen, wordt toegelicht en er wordt ingegaan op de wijze waarop deze tools aangewend kunnen worden om 'development' en 'operations' efficiënt te laten samenwerken.

'DOCKER' & CLOUD: ENABLERS For DEVOPS

ACA IT-Solutions

Kubernetes meetup 102

Jakir Patel

Yet Another Session about Docker and Containers

Pedro Sousa

Guillotina

Ramon Navarro

Developing an application to a cloud platform involves working with deployed application's environment and connecting to services. Spring Cloud, a new project, simplifies these tasks in a variety of cloud platforms including Cloud Foundry and Heroku. Spring Cloud makes it possible to deploy the same artifact (a war or a jar) to multiple cloud environments. It supports multiple clouds through the concept of Cloud Connector and provides out of the box implementation for Cloud Foundry and Heroku. Spring Cloud is designed for extension, making it simple to create a cloud connector for other cloud platforms. Spring Cloud also supports connecting to multiple services through the concept of service connectors. Out of the box, it provides support for many common services, but also makes it easy to extend it to other services. While Spring Cloud can be used by applications using any JVM language and framework, it further simplifies Spring applications through Java and XML-based configuration. In this talk, we will introduce the Spring Cloud project, show how you can simplify configuring applications for cloud deployment, discuss its extensibility mechanism, and put it to good use by showing practical examples from the field.

Simplify Cloud Applications using Spring Cloud

Ramnivas Laddad

Similar to Datashim - a framework for declarative management of datasets on Kubernetes (20)

Terraform, Ansible or pure CloudFormation

Discovering OpenBSD on AWS

Kubernetes + Python = ❤ - Cloud Native Prague

Docker training

Reloca - Project as Code approach and MVP demonstration

Exploring MySQL Operator for Kubernetes in Python

Microservices DevOps on Google Cloud Platform

Hoverboards, Jetpacks, Clusters and Flux Capacitors

Kubernetes - Sailing a Sea of Containers

Containers for sensor web services, applications and research @ Sensor Web Co...

Deploying a Kubernetes App with Amazon EKS

Building a Kubernetes App with Amazon EKS

Steering the Sea Monster - Integrating Scylla with Kubernetes

Aad Versteden | State-of-the-art web applications fuelled by Linked Data awar...

Docker and Cloud - Enables for DevOps - by ACA-IT

'DOCKER' & CLOUD: ENABLERS For DEVOPS

Kubernetes meetup 102

Yet Another Session about Docker and Containers

Guillotina

Simplify Cloud Applications using Spring Cloud

Distributed Vector Databases - What, Why, and How

DoKC

Is It Safe? Security Hardening for Databases Using Kubernetes Operators - Robert Hodges, Altinity Thanks to the Operator Pattern, Kubernetes is now an outstanding platform to run databases. But to quote Marathon Man, "is it safe?" This talk is a top-level review of the database security problem in Kubernetes, standard ways that operators can mitigate threats, and a wallet-sized checklist of security features you should look for in any operator you use. Our talk is practical and focused on needs of Kubernetes developers. Join us!

Is It Safe? Security Hardening for Databases Using Kubernetes Operators

DoKC

Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery - Shivani Gupta, Elotl & Sergey Pronin, Percona Disaster Recovery(DR) is critical for business continuity in the face of widespread outages taking down entire data centers or cloud provider regions. DR relies on deployment to multiple locations, data replication, monitoring for failure and failover. The process is typically manual involving several moving parts, and, even in the best case, involves some downtime for end-users. A multi-cluster K8s control plane presents the opportunity to automate the DR setup as well as the failure detection and failover. Such automation can dramatically reduce RTO and improve availability for end-users. This talk (and demo) describes one such setup using the open source Percona Operator for PostgreSQL and a multi-cluster K8s orchestrator. The orchestrator will use policy driven placement to replicate the entire workload on multiple clusters (in different regions), detect failure using pluggable logic, and do failover processing by promoting the standby as well as redirecting application traffic

Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery

DoKC

Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Data Mesh - Rakesh Subramanian Suresh & Jainik Vora, Intuit This presentation explores how Intuit uses Kubernetes with Domain-Driven Design and Data Mesh principles to transform its data processing landscape, crucial for its AI-driven expert platform. We will discuss the importance of clean data in developing robust generative artificial intelligence and how Intuit is addressing this through the creation of paved paths for data platforms running on Kubernetes. We'll examine the challenges and solutions in managing 100,000 data pipelines and 1000+ engineers interacting with data, highlighting the need for scalable solutions. We'll also discuss how Intuit uses Kubernetes to build its batch and stream processing platform, overcoming hurdles in data pipeline deployment, scheduling, orchestration, and dependency management. We'll conclude by emphasizing how this transformation, based on treating data as a product, has improved decision-making speed and accuracy across the organization and fostered a more efficient, collaborative data culture.

Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...

DoKC

The State of Stateful on Kubernetes - Stateful Workloads in Kubernetes: A Deep Dive - Kaslin Fields & Michelle Au, Google As a platform for distributed computing, Kubernetes enables users to run their workloads across machines. However data has gravity, and when workloads in Kubernetes have to share data with other applications, managing the application’s requirements can get more tricky. In this talk, we will explore what "Stateful" means from Kubernetes' perspective. We will discuss the different types of stateful workloads, and the challenges of deploying them on Kubernetes. We will also look at the features that exist in Kubernetes to support stateful workloads, as well as the features that are in the works. Key Takeaways: What is a stateful workload from Kubernetes’ perspective? What are the challenges of deploying stateful workloads on Kubernetes? What features exist in Kubernetes to support stateful workloads? What features are in the works to support stateful workloads better in the future?

The State of Stateful on Kubernetes

DoKC

Colocating Data Workloads and Web Services on Kubernetes to Improve Resource Utilization - He Cao, ByteDance Recently, more and more data workloads are running on top of Kubernetes, such as ETL processes, Spark and Flink jobs, and more. These workloads typically exhibit high resource utilization and remain relatively stable over time. In contrast, web services often exhibit tidal patterns, characterized by significant fluctuations in resource utilization. The resource model of vanilla Kubernetes is static, which can lead to low resource utilization accumulated over 24 hours. In this talk, He will introduce how ByteDance uses Katalyst to colocate data workloads and online services on Kubernetes to improve resource utilization. In addition, He will explain how Katalyst ensures the QoS of these workloads through QoS-aware scheduling, service profiling, multi-dimensional resource isolation, real-time container resource adjustment, and more. In ByteDance, Katalyst has been deployed on 500,000+ nodes with tens of millions of cores, and has improved daily resource utilization from 20% to 60%.

Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...

DoKC

Make Your Kafka Cluster Production-Ready - Jakub Scholz, Red Hat Kubernetes became the de-facto standard for running cloud-native applications. And more and more users turn to it also to run stateful applications such as Apache Kafka. While there are different tools such as Helm charts or operators which can get you quickly up and running, there is often still a long way to make sure the Kafka cluster is production-ready. This talk will take you through the main aspects you should consider for your Kafka cluster and will cover things such as resource management, storage, scheduling, rolling updates, or reliability. It will show you how to do it using the Strimzi operator, but the lessons learned will apply also to any other Kafka cluster. If you are interested in production-ready Apache Kafka on Kubernetes, this is a talk for you.

Make Your Kafka Cluster Production-Ready

DoKC

Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo Workflows and Argo Events - Ovidiu Valeanu, AWS & Vara Bonthu, Amazon Are you eager to build and manage large-scale Spark clusters on Kubernetes for powerful data processing? Whether you are starting from scratch or considering migrating Spark workloads from existing Hadoop clusters to Kubernetes, the challenges of configuring storage, compute, networking, and optimizing job scheduling can be daunting. Join us as we unveil the best practices to construct a scalable Spark clusters on Kubernetes, with a special emphasis on leveraging Argo Workflows and Argo Events. In this talk, we will guide you through the journey of building highly scalable Spark clusters on Kubernetes, using the most popular open-source tools. We will showcase how to harness the potential of Argo Workflows and Argo Events for event-driven job scheduling, enabling efficient resource utilization and seamless scalability. By integrating these powerful tools, you will gain better control and flexibility for executing Spark jobs on Kubernetes.

Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...

DoKC

Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud - Sagy Volkov, Lightbits PostgreSQL as a SQL engine can accommodate a very high-transaction rate, but as your data grows and the number of connections and queries increases, there is a challenge for the storage to keep up with the SQL engine. To the rescue comes NVMe over TCP (or NVMe/TCP). Developed by Lightbits Labs in 2016 and donated to the Linux community, it is the next evaluation of using NVMe based storage over TCP Fabric. NVMe/TCP simplifies how you interact with remote NVMe devices (targets) and allows your PostgreSQL storage to consume fast storage very easily. In this session I will explain the core concept of the NVMe/TCP protocol, current storage providers that can use it, how you can consume it in Kubernetes (super easy), and discuss the possibilities of using NVMe/TCP in the cloud. The session will also include a performance comparison of a few storage that are available in AWS and even a live demo of how PostgreSQL can run super fast - warp speed fast - in AWS.

Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud

DoKC

Link: https://www.youtube.com/watch?v=D8kJCvsHD9Q&list=PLHgdNuGxrJt04Fwaip9aDYvXrbRSmc5HZ&index=12 https://go.dok.community/slack https://dok.community/ From DoK Day NA 2022 (https://www.youtube.com/watch?v=YWTa-DiVljY&list=PLHgdNuGxrJt04Fwaip9aDYvXrbRSmc5HZ) In the software industry we’re fond of terms that define major trends, like “cloud native”, “Kubernetes native” and “serverless”. As more and more organizations move stateful workloads to Kubernetes, we’ve started to see these terms applied to data infrastructure, where they can get overtaken by marketing hype unless we work to define them. In this talk, we’ll examine two different databases, TiDB and Apache Cassandra, in order to identify what it means for a database to be Kubernetes native and why it matters. We’ll look at points including: - The differences between cloud native, Kubernetes native, and serverless - How databases become Kubernetes native - Benefits of Kubernetes native databases - How Kubernetes can better support databases ----- Jeff has worked as a software engineer and architect in multiple industries and as a developer advocate helping engineers get up to speed on Apache Cassandra. He's involved in multiple open source projects in the Cassandra and Kubernetes ecosystems including Stargate and K8ssandra. Jeff is the author of the O’Reilly books “Cassandra: The Definitive Guide" and “Managing Cloud Native Data on Kubernetes".

The Kubernetes Native Database

DoKC

An explanation of how ING deals with local persistence at scale in secure and compliant manner for Elastic and Prometheus workloads today and other Data Services in the future. In more detail we will elaborate on the following topics How we solve local persistence Type of workloads now and in the future Typical requirements for a banking environment Automation Scale Resilience Security / Compliance Service offering / demarcation About Tor and Luuk: Tor and Luuk are experienced engineers working at ING for over 10 years and working in the Kubernetes area for the last 5 years. They are specialized in and responsible for the Data Services OpenShift clusters in ING and have a strong focus on resilience, automation and security.

ING Data Services hosted on ICHP DoK Amsterdam 2023

DoKC

A small walkthrough of projects within the dutch government running Data(bases) on OpenShift. This talk shares success stories, provides a proven recipe to `get it done` and debunks some of the FUD. About Sebastiaan: I have always been a weird DBA, trying to combine Databases with out-of-the-box thinking and a DevOps mindset. Around 2016 I fell in love with both Postgres and Kubernetes, and I then committed my life to enabling Dutch organisations with running their Database workloads CloudNative. Over the last few years I worked as a private contractor for 2 large government agencies doing exactly that, and I want to share my and others (success stories) hoping to enable and inspire Data on Kubernetes adoption.

Implementing data and databases on K8s within the Dutch government

DoKC

https://go.dok.community/slack https://dok.community/ Link: https://youtu.be/n_thXwyJNSU ABSTRACT OF THE TALK Deploying Stateless applications is easy but this is not the case for Stateful applications. StatefulSets are the K8s API object that helps to manage stateful application. Learn about what Stateful sets are, how to create, How it differs from Deployments. KEY TAKE-AWAYS FROM THE TALK This talk is focused on basics of StatefulSet, how StatefulSet differs from Deployments, How to manage Stateful app using StatefulSet

StatefulSets in K8s - DoK Talks #154

DoKC

Link: https://youtu.be/cegd3Exg05w https://go.dok.community/slack https://dok.community/ Gabriele Bartolini - Vice President/CTO of Cloud Native and Kubernetes, EDB ABSTRACT OF THE TALK Imagine this: you have a virtual infrastructure based on Kubernetes, made up of virtual data centers, possibly spread across multiple Kubernetes clusters and regions. Your infrastructure could even be hosted on premises or on different cloud service providers. Infrastructure as Code is a requirement. You’ve been tasked to run Postgres databases, alongside your applications. The good news is that you can leverage a fully open source stack with Kubernetes, PostgreSQL and the CloudNativePG operator, and deploy your Postgres database in the same way you deploy applications. Join me in this webinar to discover the key role that you have to make this succeed, starting from day 0 through day 2 operations. I’ll share some examples and best practices for running Postgres databases in Kubernetes, before peeking at the new features we are developing for the months to come.

Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...

DoKC

Link: https://youtu.be/Y-1uFVKDfgY https://go.dok.community/slack https://dok.community/ ABSTRACT OF THE TALK This talk concerns performing analytical tasks with Apache Superset with ClickHouse as the data backend. ClickHouse is a super fast database for analytical tasks, and Apache Superset is an Apache Software foundation project meant for data visualization and exploration. Performing analytical tasks using this combo is super fast since both the software are designed to be scalable and capable of handling data of petabyte scale.

Analytics with Apache Superset and ClickHouse - DoK Talks #151

DoKC

Link: https://youtu.be/EFaRyl4HmmE https://go.dok.community/slack https://dok.community/ ABSTRACT OF THE TALK If you are running or planning a multi-cloud or even a multi-cluster environment, there are several considerations in implementing a data protection solution – especially if you plan on an organic home-grown, do-it-yourself option. This talk will highlight challenges and best practices around centralized management of configuration, credentials, compliance across multiple accounts, regions, providers etc. We will also highlight the deviations in CSI driver implementations of various storage vendors and cloud providers. Finally, we will cover the various recovery options available in the market today. Kubernetes cloud services are popular since they mitigate, but do not eliminate, the difficulties of operating a Kubernetes environment. This is especially true for protecting the stateful configuration and data of your Kubernetes applications, where the inherent high-availability and infrastructure as code are not a substitute for have cloud-native backup and disaster recovery capabilities. Further, many companies now have multi-cloud strategies for their cloud-native applications. These challenges can be addressed with backup applications that are both Kubernetes managed service and multi-cloud aware in order to snapshot, copy, restore, and migrate Kubernetes workloads (resources and data) running on AKS, EKS and GKE. Capturing information from cloud accounts and how the cluster and storage resources are configured allows 1) centralized visibility into all cloud accounts and the clusters and resources in the accounts including for compliance; 2) cross-account, cross-cluster, and cross-region data restores; 3) automation of the cluster and data restores including for Dev, Test, and Production recovery use cases. BIO Sebastian Glab is a Cloud Architect for CloudCasa and he resides in Poland. He is responsible for integrating the different cloud providers with the CloudCasa service, and making sure that all clusters in the cloud service get discovered and protected. In his free time, he plays volleyball and develops his own projects. Martin Phan is the Field CTO in North America for CloudCasa by Catalogic Software. With over 20+ years of experience in the software-industry, he takes pride in supporting, developing, implementing, and selling enterprise software and data protection solutions to help customer solve their backup and recovery challenges. KEY TAKE-AWAYS FROM THE TALK 1) Challenges and best practices around centralized management of configuration, credentials, compliance across multiple accounts, regions, providers etc. 2) Advantages of cloud awareness and Kubernetes managed service awareness for application and data recovery and security 3) Examples of overcoming Container Storage Interface (CSI) deviations 4) Various recovery options available in the market today.

Overcoming challenges with protecting and migrating data in multi-cloud K8s e...

DoKC

Link: https://youtu.be/YVXEpcSclwY https://go.dok.community/slack https://dok.community/ ABSTRACT OF THE TALK In a continuation of a talk given at DoK day at KubeCon EU 2022, join Dinesh Majrekar, Civo's CTO as they walk through their evaluation process of the CNCF Storage market. Civo offers managed Kubernetes clusters powered by K3s to customers around the world. We manage thousands of Virtual Machines and stateful customer data within multiple data centres across several continents. In late 2021, Civo had the opportunity to evaluate the CNCF storage landscape to move to a new technology stack. During the migration project, Civo evaluated Mayastor, Ondat, Ceph and Longhorn against the following metrics: Scalability Performance Ease of Support Attendants will see practical examples on how they could carry out their own similar evaluation and see some of the results of the Civo research project. BIO Dinesh is CTO at Civo. Having worked in the hosting industry for many years, Dinesh has a passion for creating solutions that operate at scale. This not only applies to the technology stack, but for nurturing engineers through their career.

Evaluating Cloud Native Storage Vendors - DoK Talks #147

DoKC

Link: https://youtu.be/qUW8LkxYayc https://go.dok.community/slack https://dok.community/ ABSTRACT OF THE TALK How do you make sure your Stateful Workloads remain available when your Kubernetes infrastructure updates? This talk will discuss different strategies of upgrading a Kubernetes cluster, and how you can manage risk for your workload. The talk will showcase demos of each upgrade strategy. BIO Peter is a Senior Software Engineer on GKE at Google. He works on improving Kubernetes for Stateful workloads. His main focus is on enhancing the Kubernetes ecosystem for high availability applications. KEY TAKE-AWAYS FROM THE TALK The mechanics of different upgrade strategies, when to apply a particular upgrade strategy depending on your Stateful workload and how to mitigate risk to your application’s availability.

Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...

DoKC

Link: https://youtu.be/AjvwG53yLMY https://go.dok.community/slack https://dok.community/ ABSTRACT OF THE TALK Stateful workloads are the heart of any application, yet they remain confusing and complicated even to daily K8s practitioners. That’s why many organizations shy away from migrating their data - their prized possession - to the unfamiliar stateful realm of Kubernetes. After meeting with many organizations in the adoption phase, I discovered what works best, what to avoid, and how critical it is to gain confidence and the right knowledge in order to successfully adopt stateful workloads. In this talk I will demonstrate how to optimally adopt Kubernetes and stateful workloads in a few steps, based on what I’ve learned from observing dozens of different adoption journeys. If you are taking your first steps in data on K8s or contemplating where to start - this talk is for you! BIO - A Developer turned Solution Architect. - Working at Komodor, a startup building the first K8s-native troubleshooting platform. - Love everything in infrastructure: storage, networks & security - from 70’s era mainframes to cloud-native. - All about “plan well, sleep well”. KEY TAKE-AWAYS FROM THE TALK - Understand how critical stateful workloads are for any system, and that the key challenges to migrating it to Kubernetes are knowledge and confidence. - How to build the foundational knowledge required to overcome adoption challenges by creating a learning path for individuals and teams. - How to gain confidence to run stateful workloads on Kubernetes with support from the community (and yourself!)

We will Dok You! - The journey to adopt stateful workloads on k8s

DoKC

Link: https://youtu.be/Pi5ueyl_1jU https://go.dok.community/slack https://dok.community/ ABSTRACT OF THE TALK During my first talk for DoK community I want to walk you through the world of NoSQL database MongoDB and Kubernetes Operators - Community Edition, Enterprise Edition (MongoDB and Ops Manager on K8s), and Atlas operator, highlight the most important capabilities, talk about use cases and challenges, the theory will be mixed with a live demos! BIO I'm a SRE / NoSQL / DevOps professional. I hold CKA, CKAD, CKS, also I’m MongoDB Certified DBA and MongoDB Champion. I have experience with multiple cloud providers, Kubernetes, different types of K8s operators (Strimzi, RabbitMQ Cluster Operator), but especially MongoDB K8s Operator. I also work with KEDA. Since 2017, I have been a speaker at MongoDB conferences all around the world (USA, China, Europe). KEY TAKE-AWAYS FROM THE TALK I would like to share the best practices of running NoSQL database - MongoDB on Kubernetes also I want to show how to manage Atlas (MongoDB cloud) via K8s operator https://www.mongodb.com/developer/community-champions/arkadiusz-borucki/

Mastering MongoDB on Kubernetes, the power of operators

DoKC

More from DoKC (20)

Distributed Vector Databases - What, Why, and How

Is It Safe? Security Hardening for Databases Using Kubernetes Operators

Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery

Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...

The State of Stateful on Kubernetes

Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...

Make Your Kafka Cluster Production-Ready

Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...

Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud

The Kubernetes Native Database

ING Data Services hosted on ICHP DoK Amsterdam 2023

Implementing data and databases on K8s within the Dutch government

StatefulSets in K8s - DoK Talks #154

Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...

Analytics with Apache Superset and ClickHouse - DoK Talks #151

Overcoming challenges with protecting and migrating data in multi-cloud K8s e...

Evaluating Cloud Native Storage Vendors - DoK Talks #147

Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...

We will Dok You! - The journey to adopt stateful workloads on k8s

Mastering MongoDB on Kubernetes, the power of operators

Recently uploaded

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

Increase engagement and revenue with Muvi Live Paywall! In this presentation, we will explore the five key benefits of using Muvi Live Paywall to monetize your live streams. You'll learn how Muvi Live Paywall can help you: Monetize your live content easily: Set up pay-per-view access to your live streams and start generating revenue from your content. Increase audience engagement: Provide exclusive, premium content behind the paywall to keep your viewers engaged. Gain valuable viewer insights: Track viewer data and analytics to better understand your audience and tailor your content accordingly. Reduce content piracy: Muvi Live Paywall's security features help protect your content from unauthorized distribution. Streamline your workflow: The all-in-one platform simplifies the process of managing and monetizing your live streams. With Muvi Live Paywall, you can take control of your live stream monetization and create a sustainable business model for your content. Learn more about Muvi Live Paywall and start generating revenue from your live streams today!

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Roshan Dwivedi

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Axa Assurance Maroc - Insurer Innovation Award 2024

Partners Life - Insurer Innovation Award 2024

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Strategies for Landing an Oracle DBA Job as a Fresher

Why Teams call analytics are critical to your entire business

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Boost Fertility New Invention Ups Success Rates.pdf

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

2024: Domino Containers - The Next Step. News from the Domino Container commu...

presentation ICT roal in 21st century education

AWS Community Day CPH - Three problems of Terraform

Boost PC performance: How more available memory can improve productivity

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Artificial Intelligence Chap.5 : Uncertainty

Datashim - a framework for declarative management of datasets on Kubernetes

1. Srikumar Venugopal DoK Day Europe 2022 @ KubeCon Datashim - a framework for declarative management of datasets on Kubernetes

2. DoK Day Europe 2022 @ KubeCon Data Science on Kubernetes

3. Introducing Datashim Cloud-native data access abstraction Open-Source (LF Data and AI Foundation Incubation): https://datashim.io DoK Day Europe 2022 @ KubeCon

4. Operational Flow DoK Day Europe 2022 @ KubeCon

5. Kubeflow Pipeline Example DoK Day Europe 2022 @ KubeCon kind: Dataset metadata: name: “my-dataset” spec: local: type: “COS” accessKeyID: ... secretAccessKey: ... import kfp import kfp.dsl as dsl from kfp.dsl import PipelineVolume ... def volume_op_dag(): dataset = PipelineVolume(”my-dataset") step1 = dsl.ContainerOp( name="step1", image="library/bash:4.4.23", command=["sh", "-c"], arguments=["echo 1|tee /data/file1"], pvolumes={"/data": dataset} ) step2 = dsl.ContainerOp( name="step2", image="library/bash:4.4.23", command=["sh", "-c"], arguments=["cp /data/file1 /data/file2"], pvolumes={"/data": step1.pvolume} ) ... PVC: my-dataset Example from: https://github.com/datashim-io/datashim/wiki/PVCs-for-Pipelines-SDK

6. DoK Day Europe 2022 @ KubeCon human reference genome g1k_queries g1k_genomes FTP S3 PVC PVC PVC PVC PVC PVC S3 results Pipeline Simplification human reference genome g1k_genomes DS DS results Samtools Sidecar Samtools DS DS DS DS DS Y. Gkoufas, D.Y. Yuan, C.Pinto, P. Koutsovasilis, S. Venugopal, "Datashim and Its Applications in Bioinformatics", Proceedings of International Conference on High Performance Computing PVC – Persistent Volume Claim DS – Datashim Dataset

7. Declarative Caching DoK Day Europe 2022 @ KubeCon P. Koutsovasilis, S. Venugopal, Y. Gkoufas and C. Pinto, "A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning," in 2021 IEEE 14th International Conference on Cloud Computing (CLOUD)

8. Roadmap Ephemeral volume support for S3 Integration with COSI (when finalised) Auto-discovery of CSI implementation capabilities Support for more frameworks (Tekton, Flyte) Focus on observability (Design phase) DoK Day Europe 2022 @ KubeCon

9. Acknowledgments Yiannis Gkoufas Christian Pinto Panagiotis (Panos) Koutsovasilis and many other contributors DoK Day Europe 2022 @ KubeCon

Datashim - a framework for declarative management of datasets on Kubernetes

Recommended

Recommended

More Related Content

Similar to Datashim - a framework for declarative management of datasets on Kubernetes

Similar to Datashim - a framework for declarative management of datasets on Kubernetes (20)

More from DoKC

More from DoKC (20)

Recently uploaded

Recently uploaded (20)

Datashim - a framework for declarative management of datasets on Kubernetes