Introduction to Container Storage Interface (CSI)Idan Atias
Among the cool stuff we do at Silk, my colleagues and I develop the Silk CSI Plugin for customers who use our system as the storage layer for their Kubernetes workloads.
Before deep diving into the code and as part of my ramp-up on this subject I prepared some slides that cover some basic and important information on this topic.
These slides start by recapping some basic storage principals in containers and Kubernetes, continues with some more advanced use cases (including an "offline demo" of persisting Redis data on EBS volumes), and ends with a detailed information on the CSI solution itself.
IMHO, reviewing these slides can improve your understanding on this matter and can get you started implementing your own CSI plugin.
The main sources of information I used for preparing these slides are:
* Official CSI docs
* Kubernetes Storage Lingo 101 - Saad Ali, Google
* Container Storage Interface: Present and Future - Jie Yu, Mesosphere, Inc.
In Pravega's first community meeting as a CNCF project, we overviewed experimental features of Pravega:
* Schema Registry - preserving the structure of data in an unstructured storage system and controlling for safe schema evolution
* Consumption-Based Retention - stream truncation based on subscriber positions
* Simplified Long-Term Storage (SLTS) - abstracting the distributed management of segments while removing complicated problems such as fencing
* SLTS Plugin for BookKeeper - an implementation of the SLTS interfaces for BlobIt! object stores on BookKeeper: https://github.com/diegosalvi/pravega-blobit-chunkmanager
StorPool presents at Cloud Field Day - the leading technology event focused on the impact of cloud technologies on enterprise IT. During the event, the high-performance block storage specialist will showcase how its storage technology allows cloud builders to easily outperform cloud titans like AWS, Microsoft Azure and GCP.
Performance is of major importance for modern applications and workloads. No matter if you run a private cloud or deliver public cloud services for customers, you need to ensure the excellent performance for the workloads running on the cloud. Often misunderstood, storage has a direct impact not only on the reliability of cloud services, but also on the performance of the entire cloud.
https://storpool.com/news/storpool-presents-at-cloud-field-day-9
How to power microservices with MariaDBMariaDB plc
Adoption of microservices is continuing at a rapid pace, but many deployments struggle when it comes to the database topology and data modeling. This session covers the pros and cons of different approaches (e.g., giving every microservice its own database or its own schema on a shared database) and various strategies for providing a consolidated view of data when different data is managed by different microservices.
Introduction to Container Storage Interface (CSI)Idan Atias
Among the cool stuff we do at Silk, my colleagues and I develop the Silk CSI Plugin for customers who use our system as the storage layer for their Kubernetes workloads.
Before deep diving into the code and as part of my ramp-up on this subject I prepared some slides that cover some basic and important information on this topic.
These slides start by recapping some basic storage principals in containers and Kubernetes, continues with some more advanced use cases (including an "offline demo" of persisting Redis data on EBS volumes), and ends with a detailed information on the CSI solution itself.
IMHO, reviewing these slides can improve your understanding on this matter and can get you started implementing your own CSI plugin.
The main sources of information I used for preparing these slides are:
* Official CSI docs
* Kubernetes Storage Lingo 101 - Saad Ali, Google
* Container Storage Interface: Present and Future - Jie Yu, Mesosphere, Inc.
In Pravega's first community meeting as a CNCF project, we overviewed experimental features of Pravega:
* Schema Registry - preserving the structure of data in an unstructured storage system and controlling for safe schema evolution
* Consumption-Based Retention - stream truncation based on subscriber positions
* Simplified Long-Term Storage (SLTS) - abstracting the distributed management of segments while removing complicated problems such as fencing
* SLTS Plugin for BookKeeper - an implementation of the SLTS interfaces for BlobIt! object stores on BookKeeper: https://github.com/diegosalvi/pravega-blobit-chunkmanager
StorPool presents at Cloud Field Day - the leading technology event focused on the impact of cloud technologies on enterprise IT. During the event, the high-performance block storage specialist will showcase how its storage technology allows cloud builders to easily outperform cloud titans like AWS, Microsoft Azure and GCP.
Performance is of major importance for modern applications and workloads. No matter if you run a private cloud or deliver public cloud services for customers, you need to ensure the excellent performance for the workloads running on the cloud. Often misunderstood, storage has a direct impact not only on the reliability of cloud services, but also on the performance of the entire cloud.
https://storpool.com/news/storpool-presents-at-cloud-field-day-9
How to power microservices with MariaDBMariaDB plc
Adoption of microservices is continuing at a rapid pace, but many deployments struggle when it comes to the database topology and data modeling. This session covers the pros and cons of different approaches (e.g., giving every microservice its own database or its own schema on a shared database) and various strategies for providing a consolidated view of data when different data is managed by different microservices.
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
Every enterprise spends significant resources to protect its data. This is especially true in the case of big data, since some of this data may include sensitive or confidential customer and financial information. Common methods for protecting data include permissions and access controls as well as the encryption of data at rest and in flight.
The Hadoop community has recently rolled out Transparent Data Encryption (TDE) support in HDFS. Transparent Data Encryption refers to the process whereby data is transparently encrypted by the big data application writing the data; it is not decrypted again until it is accessed by another application. The data is encrypted during its entire lifespan—in transit and at rest—except when it is being specifically accessed by a processing application.
TDE is an excellent approach for protecting data stored in data lakes built on the latest versions of HDFS. However, it does have its challenges and limitations. Systems that want to use TDE require tight integration with enterprise-wide Kerberos Key Distribution Center (KDC) services and Key Management Systems (KMS). This integration isn’t easy to set up or maintain. These issues can be even more challenging in a virtualized or containerized environment where one Kerberos realm may be used to secure the big data compute cluster and a different Kerberos realm may be used to secure the HDFS filesystem accessed by this cluster.
BlueData has developed significant expertise in configuring, managing, and optimizing access to TDE-protected HDFS. This session at the Strata Data Conference in March 2018 (by Thomas Phelan, co-founder and chief architect at BlueData) offers a detailed overview of how transparent data encryption works with HDFS, with a particular focus on containerized environments.
You’ll learn how HDFS TDE is configured and maintained in an environment where many big data frameworks run simultaneously (e.g., in a hybrid cloud architecture using Docker containers). Moreover, you’ll learn how KDC credentials can be managed in a Kerberos cross-realm environment to provide data scientists and analysts with the greatest flexibility in accessing data while maintaining complete enterprise-grade data security.
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63763
Tech Preview: Kubernetes on Mesosphere DC/OS 1.10Mesosphere Inc.
Kubernetes is an amazing technology, but getting it up and running in your data center or VMs is challenging. Mesosphere is excited to deliver Kubernetes on DC/OS 1.10, bringing you point-and-click simplicity for container orchestration on your choice of infrastructure, on-premise or cloud.
These slides discuss the benefits of container orchestrators and answer frequently asked questions. Topics include:
1. Live demo showing how to deploy and manage 100% pure Kubernetes distribution on DC/OS
2. How to run multiple Kubernetes clusters (of different versions) alongside each other
3. How to run both stateless and stateful workloads on the same infrastructure
4. Live Q&A
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSJohn Burwell
Software is eating infrastructure. By pulling reliability and scalability responsibilities up the stack from hardware into software, object stores such as Basho's Riak CS and cloud orchestration platforms such as Apache CloudStack increase the utilization of compute and storage resources by dynamically shifting workloads based on demand. Together, those platforms can saturate compute and storage of 1000s of hosts with strong operational visibility and end-user self-service.
This talk will cover the following topics to explore private cloud design principles and best practices:
* Why Private Cloud?
* Anatomy of a Private Cloud
* Building a Apache CloudStack Compute Offering
* Large Object Storage using Riak CS
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Media
This presentation analyzes aspects of the Amazon EC2 IaaS cloud environment that differ from a traditional data center and introduces general best practices for ensuring data privacy, storage persistence, and reliable DBMS backup. Presented by Jorge Noa, CTO of Hyperstratus
Azure Virtual Machines Deployment ScenariosBrian Benz
Architecture and Scenarios for deploying Database and middleware applications on Azure Virtual Machines including SQL Server, Oracle, Hadoop, and others.
Cloudian HyperStore offer 100% S3 compatibility for low-cost, scalable smart object storage.
With HyperStore 6.0, we are focused on bringing down operational costs so that you can more effectively track, manage, and optimize your data storage as you scale.
Data is as critical as ever. Storage costs are lower but we have more and more data to store. This is where Microsoft Azure Data Storage solutions come in. This slide deck provides an overview of the most important data storage options available in Azure.
Note: I did not create this deck. I instead combined slides from the Microsoft Azure-Readiness/DevCamp repo on GitHub (https://github.com/Azure-Readiness/DevCamp) while adding additional material from a slide deck of David Chappell's.
This talk was given at Cloud Camp Kitchener 2015.
Enabling OpenStack for Enterprise - Tarso Dos Santos, VeritasOpenStack
Audience Level
All levels
Synopsis
OpenStack offers many advantages for organisations building out their cloud environments, including flexibility and community-driven innovation. However, enterprises looking to deploy OpenStack in production typically find its storage management capabilities wanting from the perspective of management complexity and business resiliency. Enterprises are also challenged when it comes to ensuring protection of their data and providing the necessary performance – especially for their tier one applications. Meeting these fundamental needs is critical for enterprises to proceed confidently with their OpenStack deployments.
Veritas HyperScale for OpenStack is a software-defined storage management solution uniquely developed for OpenStack based clouds. It leverages direct attached storage (DAS) and provides enterprise-strength capabilities that enable robust, production-scale deployment while meeting performance and data protection needs. Learn how this innovative solution, coupled with other relevant Veritas offerings, solve the remaining issues around implementing OpenStack within the enterprise.
Speaker Bio:
Tarso dos Santos works as a Technical Account Manager at Veritas, directly engaging with customers to develop strategies, architectures and solutions with focus on Cloud – Openstack, Containers, Data Protection, High Availability and Compliance.
He has over 21 years in the IT industry architecting, delivering and positioning solutions such as private clouds, distributed systems, hpc, storage, and high available platforms.
Tarso has a great interest in distributed systems performance, and scientific organizations that push the boundaries of existing technologies, but also need to link these into the Enterprise.
Tarso in his life has enjoyed working in some of the most amazing projects ranging from mission critical systems protecting Australian lives, to IT infrastructure projects that are looking at the sky and discovering new planets out in the space.
OpenStack Australia Day Melbourne 2017
https://events.aptira.com/openstack-australia-day-melbourne-2017/
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
How did Maginatics build a strongly consistent and secure distributed file system? Niraj Tolia, Chief Architect at Maginatics, gave this presentation on the design of MagFS at the Storage Developer Conference on September 16, 2013.
For more information about MagFS—The File System for the Cloud, visit maginatics.com or contact us directly at info@maginatics.com.
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
Google Cloud Platform, Avere Systems, and Cycle Computing experts will share best practices for advancing solutions to big challenges faced by enterprises with growing compute and storage needs. In this “best practices” webinar, you’ll hear how these companies are working to improve results that drive businesses forward through scalability, performance, and ease of management.
The slides were from a webinar presented January 24, 2017. The audience learned:
- How enterprises are using Google Cloud Platform to gain compute and storage capacity on-demand
- Best practices for efficient use of cloud compute and storage resources
- Overcoming the need for file systems within a hybrid cloud environment
- Understand how to eliminate latency between cloud and data center architectures
- Learn how to best manage simulation, analytics, and big data workloads in dynamic environments
- Look at market dynamics drawing companies to new storage models over the next several years
Presenters communicated a foundation to build infrastructure to support ongoing demand growth.
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
Using “zero-copy” hybrid bursting on remote data to solve data lake analytics capacity and performance problems.
Data scientists want answers on demand. But in today’s enterprise architectures, the reality is that most data remains on-prem, despite the promise of cloud-based analytics. Moving all that data to the cloud has typically not been possible for many reasons including cost, latency, and technical difficulty. So, what if there was a technology that would connect these on-prem environments to any major cloud platform, enabling high-powered computing without the need to move massive amounts of data?
Join us for this webinar where Alex Ma of Alluxio, an open-source data orchestration platform, will discuss how a data orchestration approach offers a solution for connecting traditional on-prem data centers and cloud data lakes with other clouds and data centers. With Alluxio’s “zero-copy” burst solution, companies can bridge remote data centers and data lakes with computing frameworks in other locations, enabling them to offload, compute, and leverage the flexibility, scalability, and power of the cloud for their remote data.
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
Every enterprise spends significant resources to protect its data. This is especially true in the case of big data, since some of this data may include sensitive or confidential customer and financial information. Common methods for protecting data include permissions and access controls as well as the encryption of data at rest and in flight.
The Hadoop community has recently rolled out Transparent Data Encryption (TDE) support in HDFS. Transparent Data Encryption refers to the process whereby data is transparently encrypted by the big data application writing the data; it is not decrypted again until it is accessed by another application. The data is encrypted during its entire lifespan—in transit and at rest—except when it is being specifically accessed by a processing application.
TDE is an excellent approach for protecting data stored in data lakes built on the latest versions of HDFS. However, it does have its challenges and limitations. Systems that want to use TDE require tight integration with enterprise-wide Kerberos Key Distribution Center (KDC) services and Key Management Systems (KMS). This integration isn’t easy to set up or maintain. These issues can be even more challenging in a virtualized or containerized environment where one Kerberos realm may be used to secure the big data compute cluster and a different Kerberos realm may be used to secure the HDFS filesystem accessed by this cluster.
BlueData has developed significant expertise in configuring, managing, and optimizing access to TDE-protected HDFS. This session at the Strata Data Conference in March 2018 (by Thomas Phelan, co-founder and chief architect at BlueData) offers a detailed overview of how transparent data encryption works with HDFS, with a particular focus on containerized environments.
You’ll learn how HDFS TDE is configured and maintained in an environment where many big data frameworks run simultaneously (e.g., in a hybrid cloud architecture using Docker containers). Moreover, you’ll learn how KDC credentials can be managed in a Kerberos cross-realm environment to provide data scientists and analysts with the greatest flexibility in accessing data while maintaining complete enterprise-grade data security.
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63763
Tech Preview: Kubernetes on Mesosphere DC/OS 1.10Mesosphere Inc.
Kubernetes is an amazing technology, but getting it up and running in your data center or VMs is challenging. Mesosphere is excited to deliver Kubernetes on DC/OS 1.10, bringing you point-and-click simplicity for container orchestration on your choice of infrastructure, on-premise or cloud.
These slides discuss the benefits of container orchestrators and answer frequently asked questions. Topics include:
1. Live demo showing how to deploy and manage 100% pure Kubernetes distribution on DC/OS
2. How to run multiple Kubernetes clusters (of different versions) alongside each other
3. How to run both stateless and stateful workloads on the same infrastructure
4. Live Q&A
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSJohn Burwell
Software is eating infrastructure. By pulling reliability and scalability responsibilities up the stack from hardware into software, object stores such as Basho's Riak CS and cloud orchestration platforms such as Apache CloudStack increase the utilization of compute and storage resources by dynamically shifting workloads based on demand. Together, those platforms can saturate compute and storage of 1000s of hosts with strong operational visibility and end-user self-service.
This talk will cover the following topics to explore private cloud design principles and best practices:
* Why Private Cloud?
* Anatomy of a Private Cloud
* Building a Apache CloudStack Compute Offering
* Large Object Storage using Riak CS
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Media
This presentation analyzes aspects of the Amazon EC2 IaaS cloud environment that differ from a traditional data center and introduces general best practices for ensuring data privacy, storage persistence, and reliable DBMS backup. Presented by Jorge Noa, CTO of Hyperstratus
Azure Virtual Machines Deployment ScenariosBrian Benz
Architecture and Scenarios for deploying Database and middleware applications on Azure Virtual Machines including SQL Server, Oracle, Hadoop, and others.
Cloudian HyperStore offer 100% S3 compatibility for low-cost, scalable smart object storage.
With HyperStore 6.0, we are focused on bringing down operational costs so that you can more effectively track, manage, and optimize your data storage as you scale.
Data is as critical as ever. Storage costs are lower but we have more and more data to store. This is where Microsoft Azure Data Storage solutions come in. This slide deck provides an overview of the most important data storage options available in Azure.
Note: I did not create this deck. I instead combined slides from the Microsoft Azure-Readiness/DevCamp repo on GitHub (https://github.com/Azure-Readiness/DevCamp) while adding additional material from a slide deck of David Chappell's.
This talk was given at Cloud Camp Kitchener 2015.
Enabling OpenStack for Enterprise - Tarso Dos Santos, VeritasOpenStack
Audience Level
All levels
Synopsis
OpenStack offers many advantages for organisations building out their cloud environments, including flexibility and community-driven innovation. However, enterprises looking to deploy OpenStack in production typically find its storage management capabilities wanting from the perspective of management complexity and business resiliency. Enterprises are also challenged when it comes to ensuring protection of their data and providing the necessary performance – especially for their tier one applications. Meeting these fundamental needs is critical for enterprises to proceed confidently with their OpenStack deployments.
Veritas HyperScale for OpenStack is a software-defined storage management solution uniquely developed for OpenStack based clouds. It leverages direct attached storage (DAS) and provides enterprise-strength capabilities that enable robust, production-scale deployment while meeting performance and data protection needs. Learn how this innovative solution, coupled with other relevant Veritas offerings, solve the remaining issues around implementing OpenStack within the enterprise.
Speaker Bio:
Tarso dos Santos works as a Technical Account Manager at Veritas, directly engaging with customers to develop strategies, architectures and solutions with focus on Cloud – Openstack, Containers, Data Protection, High Availability and Compliance.
He has over 21 years in the IT industry architecting, delivering and positioning solutions such as private clouds, distributed systems, hpc, storage, and high available platforms.
Tarso has a great interest in distributed systems performance, and scientific organizations that push the boundaries of existing technologies, but also need to link these into the Enterprise.
Tarso in his life has enjoyed working in some of the most amazing projects ranging from mission critical systems protecting Australian lives, to IT infrastructure projects that are looking at the sky and discovering new planets out in the space.
OpenStack Australia Day Melbourne 2017
https://events.aptira.com/openstack-australia-day-melbourne-2017/
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
How did Maginatics build a strongly consistent and secure distributed file system? Niraj Tolia, Chief Architect at Maginatics, gave this presentation on the design of MagFS at the Storage Developer Conference on September 16, 2013.
For more information about MagFS—The File System for the Cloud, visit maginatics.com or contact us directly at info@maginatics.com.
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
Google Cloud Platform, Avere Systems, and Cycle Computing experts will share best practices for advancing solutions to big challenges faced by enterprises with growing compute and storage needs. In this “best practices” webinar, you’ll hear how these companies are working to improve results that drive businesses forward through scalability, performance, and ease of management.
The slides were from a webinar presented January 24, 2017. The audience learned:
- How enterprises are using Google Cloud Platform to gain compute and storage capacity on-demand
- Best practices for efficient use of cloud compute and storage resources
- Overcoming the need for file systems within a hybrid cloud environment
- Understand how to eliminate latency between cloud and data center architectures
- Learn how to best manage simulation, analytics, and big data workloads in dynamic environments
- Look at market dynamics drawing companies to new storage models over the next several years
Presenters communicated a foundation to build infrastructure to support ongoing demand growth.
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
Using “zero-copy” hybrid bursting on remote data to solve data lake analytics capacity and performance problems.
Data scientists want answers on demand. But in today’s enterprise architectures, the reality is that most data remains on-prem, despite the promise of cloud-based analytics. Moving all that data to the cloud has typically not been possible for many reasons including cost, latency, and technical difficulty. So, what if there was a technology that would connect these on-prem environments to any major cloud platform, enabling high-powered computing without the need to move massive amounts of data?
Join us for this webinar where Alex Ma of Alluxio, an open-source data orchestration platform, will discuss how a data orchestration approach offers a solution for connecting traditional on-prem data centers and cloud data lakes with other clouds and data centers. With Alluxio’s “zero-copy” burst solution, companies can bridge remote data centers and data lakes with computing frameworks in other locations, enabling them to offload, compute, and leverage the flexibility, scalability, and power of the cloud for their remote data.
Caching for Microservices Architectures: Session IVMware Tanzu
In this 60 minute webinar, we will cover the key areas of consideration for data layer decisions in a microservices architecture, and how a caching layer, satisfies these requirements. You’ll walk away from this webinar with a better understanding of the following concepts:
- How microservices are easy to scale up and down, so both the service layer and the data layer need to support this elasticity.
- Why microservices simplify and accelerate the software delivery lifecycle by splitting up effort into smaller isolated pieces that autonomous teams can work on independently. Event-driven systems promote autonomy.
- Where microservices can be distributed across availability zones and data centers for addressing performance and availability requirements. Similarly, the data layer should support this distribution of workload.
- How microservices can be part of an evolution that includes your legacy applications. Similarly, the data layer must accommodate this graceful on-ramp to microservices.
Presenter : Jagdish Mirani is a Product Marketing Manager in charge of Pivotal’s in-memory products
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on AWS. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a platform designed to address multi-faceted needs by offering multi-function Data Management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion. They need a worry-free experience with the architecture and its components.
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
Watch full webinar here: https://bit.ly/3dudL6u
It's not if you move to the cloud, but when. Most organisations are well underway with migrating applications and data to the cloud. In fact, most organisations - whether they realise it or not - have a multi-cloud strategy. Single, hybrid, or multi-cloud…the potential benefits are huge - flexibility, agility, cost savings, scaling on-demand, etc. However, the challenges can be just as large and daunting. A poorly managed migration to the cloud can leave users frustrated at their inability to get to the data that they need and IT scrambling to cobble together a solution.
In this session, we will look at the challenges facing data management teams as they migrate to cloud and multi-cloud architectures. We will show how the Denodo Platform can:
- Reduce the risk and minimise the disruption of migrating to the cloud.
- Make it easier and quicker for users to find the data that they need - wherever it is located.
- Provide a uniform security layer that spans hybrid and multi-cloud environments.
A Successful Journey to the Cloud with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3mPLIlo
A shift to the cloud is a common element of any current data strategy. However, a successful transition to the cloud is not easy and can take years. It comes with security challenges, changes in downstream and upstream applications, and new ways to operate and deploy software. An abstraction layer that decouples data access from storage and processing can be a key element to enable a smooth journey to the cloud.
Attend this webinar to learn more about:
- How to use Data Virtualization to gradually change data systems without impacting business operations
- How Denodo integrates with the larger cloud ecosystems to enable security
- How simple it is to create and manage a Denodo cloud deployment
AWS Summit 2013 | Auckland - Building Web Scale Applications with AWSAmazon Web Services
AWS provides a platform that is ideally suited for deploying highly available and reliable systems that can scale with a minimal amount of human interaction. This talk describes a set of architectural patterns that support highly available services that are also scalable, low cost, low latency and allow for agile development practices. We walk through the various architectural decisions taken for each tier and explain our choices for appropriate AWS services and building blocks to ensure the security, scale, availability and reliability of the application.
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
We hope this session was valuable in teaching you more about Cloudera Enterprise on AWS, and how fast and easy it is to deploy a modern data management platform—in your cloud and on your terms.
What Does Real World Mass Adoption of Decentralized Tech Look Like?All Things Open
Presented at All Things Open 2023
Presented by Karl Mozurkewich - Storj
Title: What Does Real World Mass Adoption of Decentralized Tech Look Like?
Abstract: We delve into the transformative potential of decentralized technology. Beginning with a brief overview of the rise of centralization with the advent of the internet and the counter-shift marked by blockchain we explore the intrinsic characteristics of decentralized and distributed systems, such as trustless operations, peer-to-peer networks, and enterprise application scalability. Various sectors, including finance, supply chains, media and entertainment, data science and cloud infrastructure are on the brink of disruption. The societal implications are vast, with the potential for greater individual empowerment, a greener planet and more viable resource utilization, but concerns about data security persist.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)Denodo
Watch full webinar here: https://bit.ly/3CWIBzd
Moving data to the Cloud is a priority for many organizations. Benefits - in terms of flexibility, agility, and cost savings - are driving Cloud adoption. This journey to the Cloud is not easy: moving application(s) and data to the Cloud can be challenging and entails disruption of business, when not carefully managed.
When systems are being migrated, the resultant hybrid (or even multi-) Cloud architecture is, by definition, more complex AND making it harder/more costly to retrieve the data we need.
Data Virtualization can help organizations at all stages of a Cloud journey - during migration as well as in our “new hybrid multi-Cloud reality”
Watch on-demand this webinar to learn how Data Virtualization can:
- Help organizations manage risk and minimize the disruption caused as systems are moved to the Cloud
- Provide a single point of access for data that is both on-premise and in the Cloud, making it easier for users to find and access the data that they need
- Provide a secure layer to protect and manage data when it's distributed across hybrid or multi-Cloud architectures
… watch a live demo about how to ease the migration.
Dremio, une architecture simple et performance pour votre data lakehouse.
Dans le monde de la donnée, Dremio, est inclassable ! C’est à la fois une plateforme de diffusion des données, un moteur SQL puissant basé sur Apache Arrow, Apache Calcite, Apache Parquet, un catalogue de données actif et aussi un Data Lakehouse ouvert ! Après avoir fait connaissance avec cette plateforme, il s’agira de préciser comment Dremio aide les organisations à relever les défis qui sont les leurs en matière de gestion et gouvernance des données facilitant l’exécution de leurs analyses dans le cloud (et/ou sur site) sans le coût, la complexité et le verrouillage des entrepôts de données.
SpringPeople - Introduction to Cloud ComputingSpringPeople
Cloud computing is no longer a fad that is going around. It is for real and is perhaps the most talked about subject. Various players in the cloud eco-system have provided a definition that is closely aligned to their sweet spot –let it be infrastructure, platforms or applications.
This presentation will provide an exposure of a variety of cloud computing techniques, architecture, technology options to the participants and in general will familiarize cloud fundamentals in a holistic manner spanning all dimensions such as cost, operations, technology etc
Data Lake and the rise of the microservicesBigstep
By simply looking at structured and unstructured data, Data Lakes enable companies to understand correlations between existing and new external data - such as social media - in ways traditional Business Intelligence tools cannot.
For this you need to find out the most efficient way to store and access structured or unstructured petabyte-sized data across your entire infrastructure.
In this meetup we’ll give answers on the next questions:
1. Why would someone use a Data Lake?
2. Is it hard to build a Data Lake?
3. What are the main features that a Data Lake should bring in?
4. What’s the role of the microservices in the big data world?
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
1. Big Data on Cloud Native
Platform
Rajesh Balamohan
Sunil Govindan
2. Speaker Bio
Rajesh Balamohan
Principal Engineer 2 @ Cloudera
Apache Hive, ORC Committer & Apache Tez PMC and Committer
@rajeshbalamohan
Sunil Govindan
Engineering Manager @ Cloudera
Apache Hadoop, Submarine, YuniKorn PMC member & Committer
@sunilgovind
3. Agenda
● Why Big Data workloads need to migrate to Cloud
● Aspects of Enterprise Ready Cloud Platform
● Challenges of Big Data on Cloud Platform
4. Why Big Data workloads need to migrate to cloud ?
5. About (Big) Data itself...
Key thought process from the customers about today’s DATA are,
“Ability to consistently extract accurate business proposition from data”
“Data will grow over time - probably, exponentially”
“Data analytics returns profound business insights only when you have access to
more data”
So how do we keep data available as needed (to get value from that data) ?
6. Data Architecture Evolution: Gen 1
Data volumes are growing
exponentially and on-prem is
not cost effective & scalable!
7. Cloud Adoption Trend
“The worldwide infrastructure as a service (IaaS) market grew 37.3% in 2019 to
total $44.5 billion, up from $32.4 billion in 2018, according to Gartner, Inc.”
Cloud Adoption is growing at a rapid pace, why ?
“Cloud computing offers access to data storage and compute on a more
scalable, flexible and cost-effective than can be achieved with an on-
premises deployment”
8. Why Big Data workloads need Cloud?
Some high level advantages:
● Pay as you go : No hardware acquisitions, thus Zero CAPEX
● Self Serve : Easier Accessibility
● Cost Effective & On-Demand
● Highly Elastic : Can scale 100s of nodes up/down easily
● No more installation/upgrade hassles
● Disaggregated Storage
10. Big Data in Cloud
Hadoop: “Decade Two, Day Zero”
Philosophy towards a modern Data Architecture
● Disaggregate storage, compute, security and governance
● Build for extremely large-scale using distributed systems
● Leverage open source for open standards and community scale
● Continuously evolve the ecosystem for innovation at every layer,
independently
13. Critical Aspects of Enterprise Cloud Platform
● Manage and monitor multiple
clusters
● Secure data via single window
● Authentication & Authorization via
single window
● Replicate data across multiple
clusters on need basis
● Profile and debug queries across
multiple clusters via single window
● Multiple experiences depending on
the user (Data Engineering,
Streaming, Fast Analytics, Data
profiling etc)
Classic Clusters
(Optional)
17. Challenges in the dimension of
- Storage
- Network
- Compute
- Throttling
- Security
- Hardware Specs
* These are some of dimensions that we would like to cover in today’s talk.
18. Consistency & Latency Issues with ObjectStores
● Eventual Consistency Issues
○ Certain ObjectStores provide eventual consistency (e.g S3)
■ New files may not be visible for listing (until safely propagated internally).
■ Opening deleted file may be possible due to consistency issues
○ S3Guard
■ Uses “DynamoDB” to persist metadata changes. Provides consistent view of S3
objects for processing.
■ Supports DynamoDB on-demand (i.e no need to explicitly set capacity limits).
● Renames can be expensive
○ Rename = “Copy + Delete” in ObjectStores like S3.
○ Need to build stack which reduces rename operations or favours direct write to
destination
● OS Page cache is not leveraged as data is read over network
19. Intelligent Caching for Query Performance
● Avoid reading same data from
ObjectStores
○ Systems like Hive/LLAP and Impala
cache data locally for improving query
performance.
20. Reduce Network Latency
● Reduce number of SSL
connections to
ObjectStores
○ Added lazySeek
implementation to reduce
connection breakages.
21. AutoScaling
● Determining the right cluster size can
be challenging.
● AutoScaling helps in scaling up/down
instances depending on workload
○ Concurrency Based AutoScaling
■ Helps in controlling number of
parallel queries
○ Query Isolation
■ When queries scan beyond a certain
limit, new clusters are automatically
spun up.
22. Affinity Policies for better Network Throughput
- AutoScaling policies allow you spin up instances across different
availability zones
- By default cloud providers tend to spread instances across AZ for availability.
- Impacts network throughput for nodes with 10Gbps speed
- Set affinity policy to have the instances in the same availability zone
23. Spin up Time
● Cluster/Compute spin up time plays a crucial role in adoption and
reducing cost.
● Containerized deployments help a lot in reducing spin up time
significantly with K8S
○ 10s of seconds as opposed to minutes
24. K8S: Pods can have same hostname/port
● Pods can have same hostname/port after restart
● This causes trouble for processes tracking nodes based on
hostname/port
● Added flexibility in the stack to take care of this situation
○ E.g TEZ-4179: [Kubernetes] Extend NodeId in tez to support unique worker identity
25. Throttling
● Cloud services throttle
requests
○ Throttling limits vary across cloud
vendors
● Critical to monitor throttling
metrics
○ Desirable to enable metrics
logging in ObjectStore
○ Accuracy limited to per minute in
most of the objectstores
27. Security
● Perimeter Security
● Encrypted data at rest
● Transfer of intermediate data encrypted
● Need to use optimised libs for improving transport security
28. Hardware Specs across Cloud Vendors
● Watch out for hardware specs across cloud vendors.
○ E.g SSD in Azure can have different perf characteristics than AWS
● OS settings have to be tweaked accordingly
○ E.g network, disk settings
● Choose optimal instance for the workload
○ E.g Instances with high density disks may not be needed as data is stored in ObjectStore
○ Too little disk space can hurt intermediate data being written out.
29. Tomorrow ...
● Plenty of challenges to run Big Data workloads on Cloud
○ Great efforts from Open Source community!
● Users need “No vendor lock in”
○ An Open Data layer for multi-cloud (SODA, CSI etc with infinite possibilities)
○ Network standards across clouds (CNI)
○ Data Lineage and governance for user (Apache Atlas)
○ Security and access as open standard (Apache Ranger)
● Users are looking for an Open Data Architecture for multiple clouds which
is enterprise ready!
30. Thank You
● References
○ Cloudera Data Platform (Multi Cloud): https://docs.cloudera.com/cdp/latest/index.html
○ Hadoop: Decade two, Day zero: https://blog.cloudera.com/hadoop-decade-two-day-zero/
● Cloudera careers
For a true enterprise ready cloud platform
We need a way to register, manage and control multiple clusters in a central place
Need a way to handle security policies via central place
Provide different user experiences depending on the data processing requirements like “Machine Learning”, “Data Warehouse”, “Data Engineering” and so on
Observed this in Azure, where throttling can have adverse impact on CPU utilization.
System was sending good amount of data to Azure ObjectStore and got throttled with 503 exceptions. Due to retry logic, system continued to retry and send over the same data over wire.
This caused high CPU usage due to encryption
Hardware specs across different cloud vendors could be very different. For instance, SSD in AWS gave around 288 MB/s speed, where as in Azure it gave 89 MB/s.
Would recommend to measure performance, before choosing appropriate instances.
OS settings need to be tweaked accordingly as well.
For e.g we had to recently disable certain disk settings to avoid unwanted kernel calls, as we were on SSD.
It would be good choose optimal instance type for the workload