Presentation at the International Industry-Academia Workshop on Cloud Reliability and Resilience. 7-8 November 2016, Berlin, Germany.
Organized by EIT Digital and Huawei GRC, Germany.
Twitter: @CloudRR2016
Wido den Hollander is a Ceph and CloudStack consultant who has contributed code to integrate Ceph storage with CloudStack. Ceph is an open source distributed object store that provides features like auto recovery from hardware failures and scaling capacity by adding new nodes. It uses commodity hardware and RADOS block devices to provide reliable primary storage for virtual machines in CloudStack. Future plans include adding RBD write caching and using Ceph for secondary storage. Help from the community is welcome to improve integration between Ceph, libvirt, and CloudStack.
A generic layer that can be used with many key-value storage engines like Redis, Memcached, LMDB, etc
Focus: performance, cross-datacenter active-active replication and high availability
Features: node warmup (cold bootstrapping), tunable consistency, S3 backups/restores
Status: Open source, fully integrated with existing NetflixOSS ecosystem
The document summarizes several presentations from KubeCon US 2021 and RookCon 2021. Key topics included the evolution of Twitter's edge network architecture using Envoy, running dedicated infrastructure for customers in a multitenant Kubernetes environment at Adobe, challenges of adopting Envoy at Tinder, and various sessions on storage solutions like Rook, database deployments on Kubernetes, and networking with Multus and eBPF.
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13Gosuke Miyashita
The document discusses Gosuke Miyashita's goal of building a scalable storage system for his company's web hosting service. He is exploring the use of several open source technologies including cman, CLVM, GFS2, GNBD, DRBD, and DM-MP to create a storage system that provides high availability, flexible I/O distribution, and easy extensibility without expensive hardware. He outlines how each technology works and shows some example configurations, but notes that integrating many components may introduce issues around complexity, overhead, performance, stability and compatibility with non-Red Hat Linux.
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Community
Big Data Analytics on Ceph* Object Storage
The document discusses using Ceph* object storage for big data analytics workloads on OpenStack. It covers deployment considerations for analytics clusters using options like VMs, containers, or bare metal. It details the design of using Ceph* RADOS Gateway (RGW) with an SSD cache tier for storage, and developing an RGW file system adapter and proxy for scheduling. Sample performance testing showed container overhead of 1.46x and VM overhead of 2.19x compared to bare metal. The next steps are to complete development and performance testing of the Ceph*/RGW solution.
Swift Architecture and Practice, by Alex YangHui Cheng
The document discusses the principles, architecture, and practice of Swift object storage at SinaAppEngine. It describes how Swift provides high reliability through consistent hashing, replication across multiple zones, and an eventual consistency model. It outlines SinaAppEngine's implementation of Swift for storage, including authentication, quotas, and domain remapping. Problems addressed include improving replication efficiency and SQLite performance as well as controlling rsync bandwidth. The document provides an overview of how Swift storage works at scale.
Orchestrating stateful applications with PKS and PortworxVMware Tanzu
This document provides an overview of Portworx, including:
1. Portworx is a leader in providing stateful container orchestration that works across any cloud or scheduler.
2. It has an experienced team and investors, with headquarters in Los Altos, CA and 70 employees globally.
3. Portworx allows applications to run across different infrastructure types and clouds with a portable cloud stack that provides high availability, replication, security and data mobility features.
Wido den Hollander is a Ceph and CloudStack consultant who has contributed code to integrate Ceph storage with CloudStack. Ceph is an open source distributed object store that provides features like auto recovery from hardware failures and scaling capacity by adding new nodes. It uses commodity hardware and RADOS block devices to provide reliable primary storage for virtual machines in CloudStack. Future plans include adding RBD write caching and using Ceph for secondary storage. Help from the community is welcome to improve integration between Ceph, libvirt, and CloudStack.
A generic layer that can be used with many key-value storage engines like Redis, Memcached, LMDB, etc
Focus: performance, cross-datacenter active-active replication and high availability
Features: node warmup (cold bootstrapping), tunable consistency, S3 backups/restores
Status: Open source, fully integrated with existing NetflixOSS ecosystem
The document summarizes several presentations from KubeCon US 2021 and RookCon 2021. Key topics included the evolution of Twitter's edge network architecture using Envoy, running dedicated infrastructure for customers in a multitenant Kubernetes environment at Adobe, challenges of adopting Envoy at Tinder, and various sessions on storage solutions like Rook, database deployments on Kubernetes, and networking with Multus and eBPF.
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13Gosuke Miyashita
The document discusses Gosuke Miyashita's goal of building a scalable storage system for his company's web hosting service. He is exploring the use of several open source technologies including cman, CLVM, GFS2, GNBD, DRBD, and DM-MP to create a storage system that provides high availability, flexible I/O distribution, and easy extensibility without expensive hardware. He outlines how each technology works and shows some example configurations, but notes that integrating many components may introduce issues around complexity, overhead, performance, stability and compatibility with non-Red Hat Linux.
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Community
Big Data Analytics on Ceph* Object Storage
The document discusses using Ceph* object storage for big data analytics workloads on OpenStack. It covers deployment considerations for analytics clusters using options like VMs, containers, or bare metal. It details the design of using Ceph* RADOS Gateway (RGW) with an SSD cache tier for storage, and developing an RGW file system adapter and proxy for scheduling. Sample performance testing showed container overhead of 1.46x and VM overhead of 2.19x compared to bare metal. The next steps are to complete development and performance testing of the Ceph*/RGW solution.
Swift Architecture and Practice, by Alex YangHui Cheng
The document discusses the principles, architecture, and practice of Swift object storage at SinaAppEngine. It describes how Swift provides high reliability through consistent hashing, replication across multiple zones, and an eventual consistency model. It outlines SinaAppEngine's implementation of Swift for storage, including authentication, quotas, and domain remapping. Problems addressed include improving replication efficiency and SQLite performance as well as controlling rsync bandwidth. The document provides an overview of how Swift storage works at scale.
Orchestrating stateful applications with PKS and PortworxVMware Tanzu
This document provides an overview of Portworx, including:
1. Portworx is a leader in providing stateful container orchestration that works across any cloud or scheduler.
2. It has an experienced team and investors, with headquarters in Los Altos, CA and 70 employees globally.
3. Portworx allows applications to run across different infrastructure types and clouds with a portable cloud stack that provides high availability, replication, security and data mobility features.
Manta: a new internet-facing object storage facility that features compute by...Hakka Labs
As the amount of unstructured data has greatly exceeded a single computer's ability to process it, data has become increasingly isolated from the compute elements . The resulting haul from stores of record (e.g., SAN, NAS, S3) to transient compute (e.g., Hadoop, EC2) creates needless mechanical work and human labor. Is there a better way? In this talk, we'll explore the coming convergence of data and compute in the cloud, focusing in particular on Joyent's Manta, a new internet-facing object storage facility that features compute. We will describe the design principles for Manta, the engineering challenges in building it, and more generally, the opportunities presented by the convergence of compute and data.
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData Inc
In this webinar experts from DataStax - the lead developer of Cassandra - and from MayaData - the lead developer of OpenEBS and LitmusChaos - will discuss and demonstrate ways to ensure the ease of use and resilience of Cassandra on Kubernetes.
Topics to be discussed and demonstrated include:
Provisioning underlying storage - how to make it consistent irrespective of the underlying hardware or cloud? Are there are ever reasons to have the storage replicate across nodes or is dynamic LocalPV the best choice in all cases?
Cass Operator - DataStax Kubernetes Operator for Apache Cassandra
Resilience - how to proactively assess the overall environment including the underlying Kubernetes with the help of Litmus
Brent Compton and Kyle Bader of Red Hat took the stage at Red Hat Storage Day New York on 1/19/16 to share with attendees best practices and lessons learned for architecting solutions with Red Hat Ceph Storage.
This document discusses cloud computing concepts including origins, definitions, characteristics, deployment models, service models, virtualization benefits, and economic principles. Specifically:
- Cloud computing originated from telecommunications networks and focuses on abstracting internal components while defining overall architecture and interfaces.
- Key characteristics include on-demand access to pooled resources over the network, rapid elasticity, and metered usage-based billing.
- Virtualization provides abstraction between hardware and software for location-independent and flexible resource allocation.
- Economic principles demonstrate how cloud computing can reduce costs compared to traditional fixed infrastructure models through pay-per-use pricing and scaling to fluctuating demand.
The OpenEBS Hangout #4 was held on 22nd December 2017 at 11:00 AM (IST and PST) where a live demo of cMotion was shown . Storage policies of OpenEBS 0.5 were also explained
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
This document discusses lessons learned from operating the AIST Super Green Cloud (ASGC), a fully virtualized high-performance computing (HPC) cloud system. It summarizes key findings from the first six months of operation, including performance evaluations of SR-IOV virtualization and HPC applications. It also outlines conclusions and future work, such as improving data movement efficiency across hybrid cloud environments.
OSCON 15 Building Opensource wtih Open SourceSusan Wu
This document discusses how Midokura builds its virtualization software for networking using open source technologies. It explains that Midokura uses Zookeeper to provide consistency for tracking changes to the virtual network topology and state, and uses Cassandra for high write volumes to backup stateful connection tracking information like flow state and metrics. The document also describes how Midokura leverages distributed intelligence at the edge by pushing SDN intelligence to agents, and how it must optimize consistency, availability, and partition tolerance differently for different types of data.
Scalable Storage for Massive Volume Data SystemsLars Nielsen
This document discusses scalable storage solutions for massive volumes of data. It introduces the concept of generalized deduplication as an extension of classic deduplication that can further reduce storage needs. Several research projects are described that utilize generalized deduplication, including MinervaFS, a file system, Alexandria, a cloud storage system, and Hermes, a data transfer protocol. MinervaFS was found to reduce storage usage for various datasets by up to 63.73% compared to other techniques like compression and classic deduplication. Alexandria demonstrated storage reductions of up to 14.49% in cloud storage configurations. Hermes aims to reduce data transmission costs through in-network deduplication.
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFSCeph Community
The presentation shows the setup of an unlimited file service cluster running Samba CTDB and using CephFS as the backend storage. It will show that using clustered Samba can achieve very high redundancy combined with the near limitless storage size of a Ceph storage system.
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAYCeph Community
Deploying Ceph can be a hassle. And once it's up and running it can be time consuming to add servers: install an operating system, configure the network, configure Ceph, ...|But that doesn't have to be that hard.|We've build a full Ceph management suite to help you with these tasks.|With croit, you just need to plug in your server and our zero touch provisioning takes care of the rest. croit boots a customized Linux image that is ready to be used directly from memory: no installation necessary!||We'll show a live demo, deploying a cluster from scratch and performing a few typical admin tasks; all without using the command line at all. In addition, we show you how croit can help you save a lot of money.
inwinSTACK - ceph integrate with kubernetesinwin stack
This document summarizes a presentation about integrating Ceph with Kubernetes. It discusses key updates in the Ceph Luminous release, including Bluestore, erasure coding overwrite, multiple MDS, and the Ceph MGR dashboard. It then explains why Kubernetes and Ceph are good choices for storage. Finally, it covers different approaches to integrating Ceph and Kubernetes, such as using CephFS or CephRBD, and hyperconverged vs hyperscale architectures.
The document discusses architectural description based overlay networks. It proposes using architectural description documents that define the roles and relationships of nodes in an overlay network. These documents allow heterogeneous networks to work collaboratively by dynamically changing the roles nodes play and integrating multiple overlay networks. Nodes can switch between different overlays by exchanging and executing architectural description documents.
Redis Conf 2019--Container Attached Storage for RedisOpenEBS
Kubernetes and containerized applications allow development teams to iterate fast, deploy efficiently and operate at scale. Kubernetes allows you to orchestrate containers that are highly available. However, in the case of container reschedule, Kubernetes does not provide a great set of primitives to manage your persistent data along with your application containers. In this talk, we will present some of the challenges associated with managing persistent data in Kubernetes and how we can make day 2 operations easier to manage. We will talk about a couple of approaches to solving data persistence problems in multi-cloud environments. During the demos, we will showcase how we address data replication and data encryption challenges.
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterRyousei Takano
The document describes Iris, an inter-cloud resource integration system that enables elastic cloud data centers. Iris uses nested virtualization technologies including nested KVM to construct a virtual infrastructure spanning multiple distributed data centers. It provides a new Hardware as a Service (HaaS) model for inter-cloud federation at the infrastructure provider level. The authors demonstrate Apache CloudStack can seamlessly manage resources across emulated inter-cloud environments using Iris.
This document discusses the design and implementation of a messaging queue for distributed systems using NATS and Elasticsearch. It describes how NATS is used for pub/sub messaging and request/reply patterns. Elasticsearch provides central data storage. Components communicate by publishing and subscribing to messages containing a standardized Msg struct. The queue abstracts messaging and caching interfaces to provide a common API for components.
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
The OpenEBS project has taken a different approach to storage when it comes to containers. Instead of using existing storage systems and making them work with containers; what if you were to redesign something from scratch using the same paradigms used in the container world? This resulted in the effort of containerizing the storage controller. Also, as applications that consume storage are changing over, do we need a scale-out distributed storage systems?
Ceph is an open-source distributed storage system that provides object, block, and file storage in a single unified platform. It has become popular for OpenStack deployments due to its ability to provide scalable storage using commodity hardware. Ceph uses a pseudo-random placement algorithm called CRUSH to distribute data and replicas across its clusters to provide fault tolerance and load balancing.
HPC in the cloud provides opportunities to improve resource utilization and reduce costs through elasticity and pay-as-you-go models. However, HPC applications often perform poorly in clouds due to communication overhead, multi-tenancy, and heterogeneity. Bridging this gap requires making clouds more HPC-aware through application-aware scheduling, dynamic load balancing, and enabling malleable jobs. This allows improving both HPC performance and cloud utilization.
An Integrated Cloud Computing Architectural Stack Zara Tariq
Cloud computing is the classification of those computing which are built on personal devices to handle applications.
The cloud computing architectural stack aims at facilitating communication about different cloud technologies and services.
Manta: a new internet-facing object storage facility that features compute by...Hakka Labs
As the amount of unstructured data has greatly exceeded a single computer's ability to process it, data has become increasingly isolated from the compute elements . The resulting haul from stores of record (e.g., SAN, NAS, S3) to transient compute (e.g., Hadoop, EC2) creates needless mechanical work and human labor. Is there a better way? In this talk, we'll explore the coming convergence of data and compute in the cloud, focusing in particular on Joyent's Manta, a new internet-facing object storage facility that features compute. We will describe the design principles for Manta, the engineering challenges in building it, and more generally, the opportunities presented by the convergence of compute and data.
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData Inc
In this webinar experts from DataStax - the lead developer of Cassandra - and from MayaData - the lead developer of OpenEBS and LitmusChaos - will discuss and demonstrate ways to ensure the ease of use and resilience of Cassandra on Kubernetes.
Topics to be discussed and demonstrated include:
Provisioning underlying storage - how to make it consistent irrespective of the underlying hardware or cloud? Are there are ever reasons to have the storage replicate across nodes or is dynamic LocalPV the best choice in all cases?
Cass Operator - DataStax Kubernetes Operator for Apache Cassandra
Resilience - how to proactively assess the overall environment including the underlying Kubernetes with the help of Litmus
Brent Compton and Kyle Bader of Red Hat took the stage at Red Hat Storage Day New York on 1/19/16 to share with attendees best practices and lessons learned for architecting solutions with Red Hat Ceph Storage.
This document discusses cloud computing concepts including origins, definitions, characteristics, deployment models, service models, virtualization benefits, and economic principles. Specifically:
- Cloud computing originated from telecommunications networks and focuses on abstracting internal components while defining overall architecture and interfaces.
- Key characteristics include on-demand access to pooled resources over the network, rapid elasticity, and metered usage-based billing.
- Virtualization provides abstraction between hardware and software for location-independent and flexible resource allocation.
- Economic principles demonstrate how cloud computing can reduce costs compared to traditional fixed infrastructure models through pay-per-use pricing and scaling to fluctuating demand.
The OpenEBS Hangout #4 was held on 22nd December 2017 at 11:00 AM (IST and PST) where a live demo of cMotion was shown . Storage policies of OpenEBS 0.5 were also explained
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
This document discusses lessons learned from operating the AIST Super Green Cloud (ASGC), a fully virtualized high-performance computing (HPC) cloud system. It summarizes key findings from the first six months of operation, including performance evaluations of SR-IOV virtualization and HPC applications. It also outlines conclusions and future work, such as improving data movement efficiency across hybrid cloud environments.
OSCON 15 Building Opensource wtih Open SourceSusan Wu
This document discusses how Midokura builds its virtualization software for networking using open source technologies. It explains that Midokura uses Zookeeper to provide consistency for tracking changes to the virtual network topology and state, and uses Cassandra for high write volumes to backup stateful connection tracking information like flow state and metrics. The document also describes how Midokura leverages distributed intelligence at the edge by pushing SDN intelligence to agents, and how it must optimize consistency, availability, and partition tolerance differently for different types of data.
Scalable Storage for Massive Volume Data SystemsLars Nielsen
This document discusses scalable storage solutions for massive volumes of data. It introduces the concept of generalized deduplication as an extension of classic deduplication that can further reduce storage needs. Several research projects are described that utilize generalized deduplication, including MinervaFS, a file system, Alexandria, a cloud storage system, and Hermes, a data transfer protocol. MinervaFS was found to reduce storage usage for various datasets by up to 63.73% compared to other techniques like compression and classic deduplication. Alexandria demonstrated storage reductions of up to 14.49% in cloud storage configurations. Hermes aims to reduce data transmission costs through in-network deduplication.
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFSCeph Community
The presentation shows the setup of an unlimited file service cluster running Samba CTDB and using CephFS as the backend storage. It will show that using clustered Samba can achieve very high redundancy combined with the near limitless storage size of a Ceph storage system.
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAYCeph Community
Deploying Ceph can be a hassle. And once it's up and running it can be time consuming to add servers: install an operating system, configure the network, configure Ceph, ...|But that doesn't have to be that hard.|We've build a full Ceph management suite to help you with these tasks.|With croit, you just need to plug in your server and our zero touch provisioning takes care of the rest. croit boots a customized Linux image that is ready to be used directly from memory: no installation necessary!||We'll show a live demo, deploying a cluster from scratch and performing a few typical admin tasks; all without using the command line at all. In addition, we show you how croit can help you save a lot of money.
inwinSTACK - ceph integrate with kubernetesinwin stack
This document summarizes a presentation about integrating Ceph with Kubernetes. It discusses key updates in the Ceph Luminous release, including Bluestore, erasure coding overwrite, multiple MDS, and the Ceph MGR dashboard. It then explains why Kubernetes and Ceph are good choices for storage. Finally, it covers different approaches to integrating Ceph and Kubernetes, such as using CephFS or CephRBD, and hyperconverged vs hyperscale architectures.
The document discusses architectural description based overlay networks. It proposes using architectural description documents that define the roles and relationships of nodes in an overlay network. These documents allow heterogeneous networks to work collaboratively by dynamically changing the roles nodes play and integrating multiple overlay networks. Nodes can switch between different overlays by exchanging and executing architectural description documents.
Redis Conf 2019--Container Attached Storage for RedisOpenEBS
Kubernetes and containerized applications allow development teams to iterate fast, deploy efficiently and operate at scale. Kubernetes allows you to orchestrate containers that are highly available. However, in the case of container reschedule, Kubernetes does not provide a great set of primitives to manage your persistent data along with your application containers. In this talk, we will present some of the challenges associated with managing persistent data in Kubernetes and how we can make day 2 operations easier to manage. We will talk about a couple of approaches to solving data persistence problems in multi-cloud environments. During the demos, we will showcase how we address data replication and data encryption challenges.
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterRyousei Takano
The document describes Iris, an inter-cloud resource integration system that enables elastic cloud data centers. Iris uses nested virtualization technologies including nested KVM to construct a virtual infrastructure spanning multiple distributed data centers. It provides a new Hardware as a Service (HaaS) model for inter-cloud federation at the infrastructure provider level. The authors demonstrate Apache CloudStack can seamlessly manage resources across emulated inter-cloud environments using Iris.
This document discusses the design and implementation of a messaging queue for distributed systems using NATS and Elasticsearch. It describes how NATS is used for pub/sub messaging and request/reply patterns. Elasticsearch provides central data storage. Components communicate by publishing and subscribing to messages containing a standardized Msg struct. The queue abstracts messaging and caching interfaces to provide a common API for components.
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
The OpenEBS project has taken a different approach to storage when it comes to containers. Instead of using existing storage systems and making them work with containers; what if you were to redesign something from scratch using the same paradigms used in the container world? This resulted in the effort of containerizing the storage controller. Also, as applications that consume storage are changing over, do we need a scale-out distributed storage systems?
Ceph is an open-source distributed storage system that provides object, block, and file storage in a single unified platform. It has become popular for OpenStack deployments due to its ability to provide scalable storage using commodity hardware. Ceph uses a pseudo-random placement algorithm called CRUSH to distribute data and replicas across its clusters to provide fault tolerance and load balancing.
HPC in the cloud provides opportunities to improve resource utilization and reduce costs through elasticity and pay-as-you-go models. However, HPC applications often perform poorly in clouds due to communication overhead, multi-tenancy, and heterogeneity. Bridging this gap requires making clouds more HPC-aware through application-aware scheduling, dynamic load balancing, and enabling malleable jobs. This allows improving both HPC performance and cloud utilization.
An Integrated Cloud Computing Architectural Stack Zara Tariq
Cloud computing is the classification of those computing which are built on personal devices to handle applications.
The cloud computing architectural stack aims at facilitating communication about different cloud technologies and services.
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
The Development Bank of Singapore (DBS) has evolved its data platforms over three generations to address big data challenges and the explosion of data. It now uses a hybrid cloud model with Alluxio to provide a unified namespace across on-prem and cloud storage for analytics workloads. Alluxio enables "zero-copy" cloud bursting by caching hot data and orchestrating analytics jobs between on-prem and cloud resources like AWS EMR and Google Dataproc. This provides dynamic scaling of compute capacity while retaining data locality. Alluxio also offers intelligent data tiering and policy-driven data migration to cloud storage over time for cost efficiency and management.
As enterprises adopt cloud native infrastructure to run their applications, data security and compliance is becoming a crucial area of interest. When you run your containers in a public cloud, you want to make sure that the data being accessed is secure and that there are no bread crumbs left behind once the container exits. A common mistake many people make is to host-mount a volume directly inside a container, which leaves the container's data behind (directly on the host.)
In this session, we focus on the best practices for ensuring the security and compliance of your applications’ persistent volumes. But ensuring security is an on-going exercise. Ideally you would deploy intelligent software that can constantly monitor and audit the application environment for security holes and breaches.
Autopilot is an automated application runtime management engine built for Kubernetes, and is an open source project sponsored by Portworx: https://github.com/libopenstorage/autopilot
Presented by Gunjan Patel, Gou Rao, and Aditya Dani, January 2019. More details here: https://www.meetup.com/openstack/events/258284618/
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
Using “zero-copy” hybrid bursting on remote data to solve data lake analytics capacity and performance problems.
Data scientists want answers on demand. But in today’s enterprise architectures, the reality is that most data remains on-prem, despite the promise of cloud-based analytics. Moving all that data to the cloud has typically not been possible for many reasons including cost, latency, and technical difficulty. So, what if there was a technology that would connect these on-prem environments to any major cloud platform, enabling high-powered computing without the need to move massive amounts of data?
Join us for this webinar where Alex Ma of Alluxio, an open-source data orchestration platform, will discuss how a data orchestration approach offers a solution for connecting traditional on-prem data centers and cloud data lakes with other clouds and data centers. With Alluxio’s “zero-copy” burst solution, companies can bridge remote data centers and data lakes with computing frameworks in other locations, enabling them to offload, compute, and leverage the flexibility, scalability, and power of the cloud for their remote data.
The document discusses cloud storage and file systems. It provides an overview of cloud storage, noting that data is stored across multiple servers and locations managed by hosting companies. Customers can purchase storage capacity as needed. File systems for cloud computing allow many clients shared access to data partitioned across chunks stored on remote machines. Popular distributed file systems like GFS and HDFS are designed to handle large datasets across thousands of servers for applications requiring massive parallel processing. Load balancing is important to efficiently distribute workloads.
Alluxio Use Cases and Future DirectionsAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Data Orchestration for Analytics and AI in the Cloud Era
Calvin Jia, Founding Engineer (Alluxio)
Bin Fan, Founding Engineer, VP of Open Source (Alluxio)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackShapeBlue
Adam's slides from his talk at the CloudStack European User group meetup, March 13, London. To provide tighter integration between the S3 compatible object store and CloudStack, Cloudian has developed a connector to allow users and their applications to utilize the object store directly from within the CloudStack platform in a single sign-on manner with self-service provisioning. Additionally, CloudStack templates and snapshots are centrally stored within the object store and managed through the CloudStack service. The object store offers protection of these templates and snapshots across data centres using replication or erasure coding.
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
Webinar Session - https://youtu.be/_5MfGMf8PG4
In this webinar, we share how the Container Attached Storage pattern makes performance tuning more tractable, by giving each workload its own storage system, thereby decreasing the variables needed to understand and tune performance.
We then introduce MayaStor, a breakthrough in the use of containers and Kubernetes as a data plane. MayaStor is the first containerized data engine available that delivers near the theoretical maximum performance of underlying systems. MayaStor performance scales with the underlying hardware and has been shown, for example, to deliver in excess of 10 million IOPS in a particular environment.
This document provides an overview of cloud computing, including what it is, examples of cloud computing applications and services, how it works, characteristics, types of cloud computing (public, private, hybrid, community cloud), advantages like cost efficiency and unlimited storage, and disadvantages like security, privacy, and loss of control. The document contains 13 sections that cover topics such as what is cloud computing, uses of cloud computing, working of cloud computing, types of cloud computing, advantages and disadvantages.
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
Alluxio Webinar
April 6, 2021
For more Alluxio events: https://www.alluxio.io/events/
Speakers:
Alex Ma, Alluxio
Peter Behrakis, Alluxio
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see.
In this tech talk, we'll go over:
- What is Alluxio Data Orchestration?
- How does it work?
- Alluxio customer results
Building realtime data applications that can seamlessly run and integrate data across On Prem, and multiple public cloud vendors. How Hybrid Cloud can help tackle regulatory requirements for Data Sovereignty, Stressed Exit, and operational resilience.
This document defines cloud computing as using remote servers on the Internet rather than local servers. It discusses how cloud computing is hosted in large data centers by major providers and offers savings over maintaining one's own servers by paying only for utilized resources. The document outlines the three main forms of cloud platforms - SaaS, PaaS, and IaaS - and compares cloud to locally hosted servers, noting the cost savings and reduced management needs of cloud. It provides three tips for using cloud, including understanding its benefits, architecting for cloud, and taking advantage of free trial offerings.
This document discusses microservices architecture compared to a monolithic architecture. A microservices architecture breaks an application into smaller, independent services that each perform discrete functions. This allows for more rapid development and improved scalability. However, a microservices architecture is also more complex to deploy and manage. The document provides an example of how a VoIP application could use a microservices approach by breaking components like billing, fraud detection, and call analytics into separate services. It also discusses using Docker containers and services to deploy and scale the microservices architecture.
This document summarizes the key features and disadvantages of cloud computing according to a student. The main features discussed are resource pooling, on-demand self-service, easy maintenance, scalability, cost-effectiveness, measurement and reporting services, security, automation, and resilience. The main disadvantages are internet connectivity issues, vendor lock-in, limited user control, and security risks from sending sensitive data to third-party cloud providers. The document was written by Rittik Deb, a computer science student at Bhairab Ganguly College, for a 5th semester cloud computing course.
This document discusses cloud computing and compares it to building and maintaining infrastructure on-premises. It notes that cloud computing allows companies to avoid the large upfront costs and complexity of managing their own infrastructure by paying only for the computing resources they use. It also discusses the benefits of scaling resources easily in the cloud without having to purchase and set up new hardware. Finally, it addresses common concerns about security, privacy, and control when using cloud services and outlines steps cloud providers take to isolate customer data and ensure its security.
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld
The document discusses the future of software-defined storage in 3 years. It predicts that storage media will continue to advance with higher capacities and lower latencies using technologies like 3D NAND and NVDIMMs. Networking and interconnects like NVMe over Fabrics will allow disaggregated storage resources to be pooled and shared across servers. Software-defined storage platforms will evolve to provide common services for distributed data platforms beyond just block storage, with advanced data placement and policy controls to optimize different workloads.
OpenStack on SmartOS allows running OpenStack on the SmartOS hypervisor platform. SmartOS provides an efficient and secure hypervisor through its use of zones, KVM, ZFS, and DTrace. The presentation outlines work to integrate OpenStack Nova compute and network services with SmartOS, with plans to integrate Quantum network virtualization and leverage ZFS and DTrace for monitoring. The goal is to provide an optimized OpenStack deployment on SmartOS' efficient and flexible virtualization architecture.
Similar to Dependable Storage and Computing using Multiple Cloud Providers (20)
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Mobile app Development Services | Drona InfotechDrona Infotech
Drona Infotech is one of the Best Mobile App Development Company In Noida Maintenance and ongoing support. mobile app development Services can help you maintain and support your app after it has been launched. This includes fixing bugs, adding new features, and keeping your app up-to-date with the latest
Visit Us For :
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Dependable Storage and Computing using Multiple Cloud Providers
1. 1
Dependable Storage and Computing
using Multiple Cloud Providers
Alysson Bessani
www.di.fc.ul.pt/~bessani
International Industry-Academia Workshop on
Cloud Reliability and Resilience
Berlin, 7-8 November 2016
2. 3
Cloud Sec. & Dep.
• Secure and dependable services are a
necessary condition for the long-term
existence of a cloud provider
– Clients need to trust providers
– Providers need to justify this trust
• To stay in the market, a provider needs
to invest on Sec. & Dep., however…
8. 9
Important!
To the best of my knowledge, there is no work on the
existing technical literature saying that using the cloud
is less secure than having everything on premises
Cloud services work pretty well…
but they are not perfect!
9. 10
Our view
• The best way to use the cloud in a
dependable and secure way is to not
rely on a single provider
– Don’t base your service on a single
provider, i.e., an application works if the
provider where it is running is correct
– Instead, consider distributed trust: an
application is correct if at most f out-of n
services/providers are faulty
10. 12
Cloud-of-clouds Service
• Two fundamental characteristics
1. No modification on existing cloud services
2. No collaboration between providers
13. 15
Fundamentals
• Let’s consider the problem of
implementing a cloud-of-clouds object
storage tolerating one cloud failure
• Assume you cannot know for sure if a
unresponsive cloud is faulty or not
14. 16
Writing Data
One needs to write at least two copies of the data!
What if one of the clouds take too
much to acknowledge the write?
Use three clouds, and wait for at least two acks to complete the write
15. 17
Reading Data
How many clouds you need to access for reading?
One cloud: might access an outdated/empty cloud
Two clouds: always access the last complete write
write
quorum
read
quorum
19. 21
Challenges for implementing
Updatable Objects
• How to implement an efficient
replication protocol using only
passive storage nodes?
• How to make it affordable?
read
write
20. 22
Cloud A
Cloud B
Cloud C
Cloud D
D
D
D
D
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
DepSky Write
WRITE
DATA
D
ACK
D
D
D
D
WRITE
METADATA
qwjda
sjkhd
ahsd
ACK
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
21. 23
Cloud A
Cloud B
Cloud C
Cloud D
D
D
D
D
DepSky Read
READ
DATA
D
DATA
D
D
D
D
READ
METADATA
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
qwjda
sjkhd
ahsd
METADATA
qwjda
sjkhd
ahsd
Data will be
fetched from
other clouds
if needed.
highest version number
22. 24
Cloud A Cloud B Cloud C Cloud D
DepSky Confidentiality and
Storage Efficiency
Data Limitations:
1. Data is accessible
by cloud providers
2. Requires n×|Data|
storage space
Data Data DataData
23. 25
Cloud A Cloud B Cloud C Cloud D
DepSky Confidentiality and
Storage Efficiency
S1 S2 S3 S4
share
K
generated
key
Data
disperse
F1 F2 F3 F4
F1 S1 F2 S2 F3 S3 F4 S4
encrypt
Inverse process for reading
from f+1 shares/fragments.
24. 27
Practical Considerations
• Read/write latency ≈ accessing a single cloud
• Storage costs roughly 50% higher
• DepSky is a Java programming library
http://cloud-of-clouds.github.io/depsky
• It does not support concurrent writers to the
same object without using locks
– Recent extensions for multi-writer storage (2016)
http://github.com/cloud-of-clouds/mwmr-registers
• How to build a complete system around
it?
25. 28
Coordination
Service
SCFS: Shared Cloud-backed File System
Storage
cloudsCloud Storage
Cache
Cache
Cache
Lock
Service
Access
Control
Metadata
Computing
clouds
SCFS
Agent
SCFS
Agent
SCFS
Agent
[USENIX ATC 2014]
26. 29
SCFS Consistency on Close
fd = open(“x”,…);
…
read(fd,…);
write(fd,…);
…
fsync(fd);
…
write(fd,…);
…
close(fd);
Coordination
Service
Cloud(-of-Clouds)
Persistent
Storage
Main Memory
READ
WRITE
Local Storage
If “x” is cached, it
is only validated
1
2
1
2
27. 30
• SCFS can use different backends
– i.e., different cloud storage and a coordination service plugin
• Operation: blocking, non-blocking and non-sharing
ire1 million
B of storage
1KB, assum-
usand tuples
d, requiring a
importantly,
substantially
rvice, allow-
pace file sys-
o connect the
l, the SCFS
mented Java
storage back-
Java mainly
dination and
high latency
aJava-based
meansthat datasecurity isensured even if f out-of 3f + 1
of the cloud providers suffer arbitrary faults, which en-
compasses unavailability and data deletion, corruption or
creation [15]. Although cloud providers havetheir means
to ensurethedependability of their services, therecurring
occurrence of outages, security incidents (with internal or
external origins) and datacorruptions [19, 24] justifies the
need for this sort of backend in several scenarios.
SCFS%
Agent%
SCFS%
Agent%
DS%
BFT$SMaRt*
DepSky*
S3(
EC2(
AWS%Backend% CoC%Backend%
S3(
DS% DS%
DS%DS% RS(
WA(
GS(
S3(
Figure5: SCFSwith Amazon Web Services(AWS) and Cloud-
of-Clouds (CoC) backends.
SCFS Backends
28. 32
Sharing Latency: SCFS vs DropBox
Amazon S3
Google
Storage
Rackspace
Files
Windows
Azure Blob
DATA
DATA
?$
#!
%|
DATA
DATA
SCFS
Dropbox
29. 34
Practical Considerations
• SCFS improves the transparency of cloud-of-
clouds(-backed) storage
• It is implemented as a Linux FUSE FS:
http://cloud-of-clouds.github.io/SCFS
• Limitations:
– Does not work well with big files
– Require computing instances to run the
coordination service (e.g., Zookeeper replicas)
• Can we do better?
32. 38
Avoiding write-write Conflicts without
External Coordination
f-tolerant Byzantine lease
lease
release
renew
enqueue,
dequeue, peekAll
put, get, del, cas
atomic
transactions
lease
release
renew Queue
lease
release
renew
Queue
lease
release
renew
Dynamo
DB
lease
release
renew Data Store
f+1
clouds
33. 40
Confidentiality & Storage-Efficiency
Cloud A Cloud B Cloud C Cloud D
S1 S2 S3
secret
sharing
K
randomly
generated
key
Compressed Chunk
erasure code
F1 F2 F3
F1 S1 F2 S2 F3 S3
encrypt
Big File
Chunk 1 Chunk N
compress
…
Optional
37. 46
Disaster Recovery/Tolerance
• Services in cloud A can have a backup in cloud B
• For less than €1/month it is possible to keep up to
30GB of data in another cloud w/ a RPO of 3 min
DBMS
Failover
Recovers take less
than 40s per GB
38. 47
Permissioned Blockchains
and Smart Contracts
• Permanent decentralized ledger used for recording
transactions; requires a practical BFT consensus
• Ex.: Hyperledger, Symbiont Assembly, Ethereum, …
operation
result
operation
result
operation
result
operation
result
operation
result
39. 48
Geo-replicated Services
• Largely used in (single domain) internet-scale apps
• Wide-area Crash or Byzantine fault tolerant
replication protocols are needed
Service
Replica
Service
Replica
Service
Replica
Service
Replica
40. 49
Final Remarks
• In 2010
– Paxos and Zookeeper were not very popular
– RAFT didn’t exist
– Bitcoin was a crazy new idea
– Blockchain and smart contracts were mostly ignored
• The world is changing fast, and advanced
distributed applications are here to stay...
• What will happen in 2020?
41. 50
Further reading…
Storage
• Bessani, Correia, Quaresma, André, Sousa. DepSky: Dependable and Secure
Storage in the Cloud of Clouds. ACM Transactions on Storage. 2013.
(preliminary version on ACM EuroSys’11).
• Oliveira, Mendes, Bessani. Exploring Key-Value Stores in Multi-Writer
Byzantine-Resilient Register Emulations. OPODIS’16.
• Mendes, Oliveira, Cogo, Neves, Bessani. Charon: A Dependable Cloud-
Backed System for Storing and Sharing Big Data. Under Submission. 2016.
• Bessani, Mendes, Oliveira, Neves, Correia, Pasin, Verissimo. SCFS: A Shared
Cloud-backed File System. USENIX ATC’14.
Computing
• Sousa, Bessani. Separating the WHEAT from the Chaff: An Empirical Design for Geo-
Replicated State Machines. IEEE SRDS’15.
• Bessani, Sousa, Alchieri. State Machine Replication for the Masses with BFT-SMaRt.
IEEE/IFIP DSN’14.