This slide delivered by Zuoyan Qin, Chief engineer from XiaoMi Cloud Storage Team, was for talk at Arch summit Beijing-2016 regarding how Pegasus was designed.
This document provides an overview of Virtual SAN design and architecture. It discusses Virtual SAN components such as disk groups, datastores, and objects. It describes how data is distributed across disks groups and hosts using techniques like striping and mirroring. It also covers storage policies and how they determine the layout and number of components for distributed objects. Use cases like all-flash configurations, ROBO solutions, and stretched clusters are explained at a high level.
This document provides an overview of VMware Virtual SAN 6.0, including:
- Virtual SAN can be deployed with a hybrid or all-flash architecture to provide high performance.
- Virtual SAN is embedded in the vSphere kernel for simple management and integration.
- Virtual SAN 6.0 provides 4x performance, 2x scale, and new features like snapshots and encryption.
- Case studies show Virtual SAN can reduce storage costs by 60% and management time by 90%.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Dell Technologies è un’esclusiva famiglia di aziende che offre alle organizzazioni l’infrastruttura necessaria per costruire il loro futuro digitale, favorire l’IT Transformation e proteggere le loro risorse più importanti: le informazioni.
In particolare per il settore dell’Education di livello superiore, Dell EMC ha studiato un catalogo di soluzioni in aree quali:
Converged Infrastructure
Storage e Protection dei dati
Servizi di didattica digitale
In questo ciclo di webinar illustreremo le soluzioni Dell EMC più all'avanguardia, attualmente oggetto di studio da parte della Fondazione CRUI per un possibile contratto in convenzione.
The document proposes a HCI solution using VxRail for Suntory to address their current environment challenges. It includes an analysis of their current physical and virtual environment, an approach using a VxRail solution, and a project delivery plan. The proposed VxRail solution would consist of an E560 1U1N node with Intel CPUs, memory, SSDs to support all of Suntory's existing and new VMs on a single converged infrastructure platform. Dimension Data would handle the installation, configuration, migration, and provide 3 years of maintenance and support.
This document provides an overview of Virtual SAN design and architecture. It discusses Virtual SAN components such as disk groups, datastores, and objects. It describes how data is distributed across disks groups and hosts using techniques like striping and mirroring. It also covers storage policies and how they determine the layout and number of components for distributed objects. Use cases like all-flash configurations, ROBO solutions, and stretched clusters are explained at a high level.
This document provides an overview of VMware Virtual SAN 6.0, including:
- Virtual SAN can be deployed with a hybrid or all-flash architecture to provide high performance.
- Virtual SAN is embedded in the vSphere kernel for simple management and integration.
- Virtual SAN 6.0 provides 4x performance, 2x scale, and new features like snapshots and encryption.
- Case studies show Virtual SAN can reduce storage costs by 60% and management time by 90%.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Dell Technologies è un’esclusiva famiglia di aziende che offre alle organizzazioni l’infrastruttura necessaria per costruire il loro futuro digitale, favorire l’IT Transformation e proteggere le loro risorse più importanti: le informazioni.
In particolare per il settore dell’Education di livello superiore, Dell EMC ha studiato un catalogo di soluzioni in aree quali:
Converged Infrastructure
Storage e Protection dei dati
Servizi di didattica digitale
In questo ciclo di webinar illustreremo le soluzioni Dell EMC più all'avanguardia, attualmente oggetto di studio da parte della Fondazione CRUI per un possibile contratto in convenzione.
The document proposes a HCI solution using VxRail for Suntory to address their current environment challenges. It includes an analysis of their current physical and virtual environment, an approach using a VxRail solution, and a project delivery plan. The proposed VxRail solution would consist of an E560 1U1N node with Intel CPUs, memory, SSDs to support all of Suntory's existing and new VMs on a single converged infrastructure platform. Dimension Data would handle the installation, configuration, migration, and provide 3 years of maintenance and support.
This document provides an overview and introduction to VMware Virtual SAN (VSAN). It discusses the VSAN architecture which uses SSDs for caching and HDDs for storage. It also covers how VSAN can be configured through storage policies assigned at the VM level. The document outlines how VSAN provides a software-defined storage solution that is hardware agnostic and can elastically scale storage performance and capacity by adding servers and disks.
Ceph is an open source distributed storage system that is highly scalable, self-managing, and provides multiple access methods including block, file, and object storage. It uses CRUSH to intelligently distribute data and replicas across clusters. Ceph Storage Clusters contain OSD, MON, and optionally MDS daemons. OSDs store data objects, MONs maintain cluster maps and state, and MDS provides metadata for CephFS. Ceph can be deployed with CloudStack to provide the backend storage for virtual machine volumes.
This document provides an overview of considerations for choosing container storage for Kubernetes. It discusses the existing storage landscape, including open-source and commercial options. Key factors to consider include supported storage types, data protection and replication capabilities, dynamic provisioning, and whether container-native storage (CNS) or container-attached storage (CAS) is preferable based on workload needs. Performance benchmarks show Piraeus and StorageOS providing the best performance. The document aims to help users determine the right storage solution for their specific Kubernetes applications and workloads.
DPDK greatly improves packet processing performance and throughput by allowing applications to directly access hardware and bypass kernel involvement. It can improve performance by up to 10 times, allowing over 80 Mbps throughput on a single CPU or double that with two CPUs. This enables telecom and networking equipment manufacturers to develop products faster and with lower costs. DPDK achieves these gains through techniques like dedicated core affinity, userspace drivers, polling instead of interrupts, and lockless synchronization.
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
The document describes Cisco's OpenSOC, an open source security operations center that can analyze 1.2 million network packets per second in real time. It discusses the business need for such a solution given how breaches often go undetected for months. The solution architecture utilizes big data technologies like Hadoop, Kafka and Storm to enable real-time processing of streaming data at large scale. It also provides lessons learned around optimizing the performance of components like Kafka, HBase and Storm topologies.
The document discusses Virtio, an interface for virtualized I/O devices. It introduces Virtio's architecture, which involves Virtqueues and Vrings to facilitate communication between guest drivers and hypervisor-level device emulators. It outlines the five Virtio APIs for adding, kicking, getting buffers and enabling/disabling callbacks. It also provides an overview of steps for adding new Virtio devices and drivers.
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
This document discusses different methods for virtualizing I/O in virtual machines. It covers virtual I/O approaches like virtio, PCI passthrough, and SR-IOV. It also explains the role of the VMM/hypervisor in managing I/O between VMs and physical devices using techniques like VT-d, Open vSwitch, and single root I/O virtualization. Finally, it discusses emerging standards for virtual switching like virtual Ethernet bridging.
The document provides step-by-step instructions for building and running Intel DPDK sample applications on a test environment with 3 virtual machines connected by 10G NICs. It describes compiling and running the helloworld, L2 forwarding, and L3 forwarding applications, as well as using the pktgen tool for packet generation between VMs to test forwarding performance. Key steps include preparing the Linux kernel for DPDK, compiling applications, configuring ports and MAC addresses, and observing packet drops to identify performance bottlenecks.
The document discusses Dell EMC VxRail, a hyper-converged appliance that combines servers, storage, and networking into a single system. It is presented as the standard in hyper-converged infrastructure and focuses on enabling business innovation through consumption-based buying which allows customers to focus resources on differentiating their business instead of IT integration. VxRail offers various configurations and scale options to match different use cases from small to large environments.
CloudStack - Top 5 Technical Issues and TroubleshootingShapeBlue
Cloudstack Top 5 technical issues and troubleshooting. Cloudstack is a mature product in use by companies world-wide. While being associated with CloudStack development for over 5 years, Abhi has come across some technical issues that once in a while affect the CloudStack deployment. This presentation is an effort to put together top 5 such issues, analyze their symptoms, see them from CloudStack architecture perspective and from the distributed nature of cloud orchestration, then look at ways to avoid them and finally be able to troubleshoot if they occur.
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
Kerberos is the ubiquitous authentication mechanism when it comes to secure any Hadoop Services. With recent updates in Hadoop core and various Apache Hadoop components, inherent Kerberos support has matured and has come a long way.
Understanding & configuring Kerberos is still a challenge but even more painful & frustrating is troubleshooting a Kerberos issue. There are lot of things (small & big) that can go wrong (and will go wrong!). This talk covers the Kerberos debugging part in detail and discusses the tools & tricks that can be used to narrow down any Kerberos issue.
Rather than discussing the issues and their resolution, we will focus on how to approach a Kerberos problem and do's / dont's in Kerberos scene. This talk will provide a step by step guide that will equip the audience for troubleshooting future Kerberos problems.
Agenda is to discuss:
- Systematic approach to Kerberos troubleshooting
- Kerberos Tools available in Hadoop arsenal
- Tips & Tricks to narrow down Kerberos issues quickly
- Some nasty Kerberos issues from Support trenches
Some prior knowledge on Kerberos basics will be appreciated but is not a prerequisite.
Speaker:
Vipin Rathor, Sr. Product Specialist (HDP Security), Hortonworks
Open vSwitch Offload: Conntrack and the Upstream KernelNetronome
Offloading all or part of the Open vSwitch datapath to SmartNICs has been shown to not only release CPU resources on the server, but improve traffic processing performance. Recently steps have been made to support such offloading in the upstream Linux kernel. This has focused on creating an OVS datapath using the TC flower filter and utilizing the offload hooks already present here. This presentation focuses on how Connection Tracking (Conntrack) may fit into this model. It describes current work being undertaken with the Netfilter community to allow offloading of Conntrack entries. It continues to link this work with the offloading of Conntrack rules within OVS-TC.
Redis is an in-memory key-value store that is often used as a database, cache, and message broker. It supports various data structures like strings, hashes, lists, sets, and sorted sets. While data is stored in memory for fast access, Redis can also persist data to disk. It is widely used by companies like GitHub, Craigslist, and Engine Yard to power applications with high performance needs.
DPDK in depth
This document provides an overview of DPDK (Data Plane Development Kit):
1. DPDK is an open source project for data plane programming and network acceleration. It started at Intel in 2010 and is now maintained by the Linux Foundation.
2. DPDK provides poll mode drivers (PMDs), libraries, and sample applications for fast packet processing. It uses hugepages and avoids kernel involvement for high performance.
3. The document outlines several DPDK projects, libraries, PMDs, advantages and disadvantages, development process, and demonstrates a simple DPDK application (l2fwd) and the testpmd tool.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a distributed publish-subscribe messaging system that allows both publishing and subscribing to streams of records. It uses a distributed commit log that provides low latency and high throughput for handling real-time data feeds. Key features include persistence, replication, partitioning, and clustering.
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
Kubernetes As of Spark 2.3, Spark can run on clusters managed by Kubernetes. we will describes the best practices about running Spark SQL on Kubernetes upon Tencent cloud includes how to deploy Kubernetes against public cloud platform to maximum resource utilization and how to tune configurations of Spark to take advantage of Kubernetes resource manager to achieve best performance. To evaluate performance, the TPC-DS benchmarking tool will be used to analysis performance impact of queries between configurations set.
Speakers: Junjie Chen, Junping Du
This document provides an overview and introduction to VMware Virtual SAN (VSAN). It discusses the VSAN architecture which uses SSDs for caching and HDDs for storage. It also covers how VSAN can be configured through storage policies assigned at the VM level. The document outlines how VSAN provides a software-defined storage solution that is hardware agnostic and can elastically scale storage performance and capacity by adding servers and disks.
Ceph is an open source distributed storage system that is highly scalable, self-managing, and provides multiple access methods including block, file, and object storage. It uses CRUSH to intelligently distribute data and replicas across clusters. Ceph Storage Clusters contain OSD, MON, and optionally MDS daemons. OSDs store data objects, MONs maintain cluster maps and state, and MDS provides metadata for CephFS. Ceph can be deployed with CloudStack to provide the backend storage for virtual machine volumes.
This document provides an overview of considerations for choosing container storage for Kubernetes. It discusses the existing storage landscape, including open-source and commercial options. Key factors to consider include supported storage types, data protection and replication capabilities, dynamic provisioning, and whether container-native storage (CNS) or container-attached storage (CAS) is preferable based on workload needs. Performance benchmarks show Piraeus and StorageOS providing the best performance. The document aims to help users determine the right storage solution for their specific Kubernetes applications and workloads.
DPDK greatly improves packet processing performance and throughput by allowing applications to directly access hardware and bypass kernel involvement. It can improve performance by up to 10 times, allowing over 80 Mbps throughput on a single CPU or double that with two CPUs. This enables telecom and networking equipment manufacturers to develop products faster and with lower costs. DPDK achieves these gains through techniques like dedicated core affinity, userspace drivers, polling instead of interrupts, and lockless synchronization.
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
The document describes Cisco's OpenSOC, an open source security operations center that can analyze 1.2 million network packets per second in real time. It discusses the business need for such a solution given how breaches often go undetected for months. The solution architecture utilizes big data technologies like Hadoop, Kafka and Storm to enable real-time processing of streaming data at large scale. It also provides lessons learned around optimizing the performance of components like Kafka, HBase and Storm topologies.
The document discusses Virtio, an interface for virtualized I/O devices. It introduces Virtio's architecture, which involves Virtqueues and Vrings to facilitate communication between guest drivers and hypervisor-level device emulators. It outlines the five Virtio APIs for adding, kicking, getting buffers and enabling/disabling callbacks. It also provides an overview of steps for adding new Virtio devices and drivers.
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
This document discusses different methods for virtualizing I/O in virtual machines. It covers virtual I/O approaches like virtio, PCI passthrough, and SR-IOV. It also explains the role of the VMM/hypervisor in managing I/O between VMs and physical devices using techniques like VT-d, Open vSwitch, and single root I/O virtualization. Finally, it discusses emerging standards for virtual switching like virtual Ethernet bridging.
The document provides step-by-step instructions for building and running Intel DPDK sample applications on a test environment with 3 virtual machines connected by 10G NICs. It describes compiling and running the helloworld, L2 forwarding, and L3 forwarding applications, as well as using the pktgen tool for packet generation between VMs to test forwarding performance. Key steps include preparing the Linux kernel for DPDK, compiling applications, configuring ports and MAC addresses, and observing packet drops to identify performance bottlenecks.
The document discusses Dell EMC VxRail, a hyper-converged appliance that combines servers, storage, and networking into a single system. It is presented as the standard in hyper-converged infrastructure and focuses on enabling business innovation through consumption-based buying which allows customers to focus resources on differentiating their business instead of IT integration. VxRail offers various configurations and scale options to match different use cases from small to large environments.
CloudStack - Top 5 Technical Issues and TroubleshootingShapeBlue
Cloudstack Top 5 technical issues and troubleshooting. Cloudstack is a mature product in use by companies world-wide. While being associated with CloudStack development for over 5 years, Abhi has come across some technical issues that once in a while affect the CloudStack deployment. This presentation is an effort to put together top 5 such issues, analyze their symptoms, see them from CloudStack architecture perspective and from the distributed nature of cloud orchestration, then look at ways to avoid them and finally be able to troubleshoot if they occur.
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
Kerberos is the ubiquitous authentication mechanism when it comes to secure any Hadoop Services. With recent updates in Hadoop core and various Apache Hadoop components, inherent Kerberos support has matured and has come a long way.
Understanding & configuring Kerberos is still a challenge but even more painful & frustrating is troubleshooting a Kerberos issue. There are lot of things (small & big) that can go wrong (and will go wrong!). This talk covers the Kerberos debugging part in detail and discusses the tools & tricks that can be used to narrow down any Kerberos issue.
Rather than discussing the issues and their resolution, we will focus on how to approach a Kerberos problem and do's / dont's in Kerberos scene. This talk will provide a step by step guide that will equip the audience for troubleshooting future Kerberos problems.
Agenda is to discuss:
- Systematic approach to Kerberos troubleshooting
- Kerberos Tools available in Hadoop arsenal
- Tips & Tricks to narrow down Kerberos issues quickly
- Some nasty Kerberos issues from Support trenches
Some prior knowledge on Kerberos basics will be appreciated but is not a prerequisite.
Speaker:
Vipin Rathor, Sr. Product Specialist (HDP Security), Hortonworks
Open vSwitch Offload: Conntrack and the Upstream KernelNetronome
Offloading all or part of the Open vSwitch datapath to SmartNICs has been shown to not only release CPU resources on the server, but improve traffic processing performance. Recently steps have been made to support such offloading in the upstream Linux kernel. This has focused on creating an OVS datapath using the TC flower filter and utilizing the offload hooks already present here. This presentation focuses on how Connection Tracking (Conntrack) may fit into this model. It describes current work being undertaken with the Netfilter community to allow offloading of Conntrack entries. It continues to link this work with the offloading of Conntrack rules within OVS-TC.
Redis is an in-memory key-value store that is often used as a database, cache, and message broker. It supports various data structures like strings, hashes, lists, sets, and sorted sets. While data is stored in memory for fast access, Redis can also persist data to disk. It is widely used by companies like GitHub, Craigslist, and Engine Yard to power applications with high performance needs.
DPDK in depth
This document provides an overview of DPDK (Data Plane Development Kit):
1. DPDK is an open source project for data plane programming and network acceleration. It started at Intel in 2010 and is now maintained by the Linux Foundation.
2. DPDK provides poll mode drivers (PMDs), libraries, and sample applications for fast packet processing. It uses hugepages and avoids kernel involvement for high performance.
3. The document outlines several DPDK projects, libraries, PMDs, advantages and disadvantages, development process, and demonstrates a simple DPDK application (l2fwd) and the testpmd tool.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a distributed publish-subscribe messaging system that allows both publishing and subscribing to streams of records. It uses a distributed commit log that provides low latency and high throughput for handling real-time data feeds. Key features include persistence, replication, partitioning, and clustering.
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
Kubernetes As of Spark 2.3, Spark can run on clusters managed by Kubernetes. we will describes the best practices about running Spark SQL on Kubernetes upon Tencent cloud includes how to deploy Kubernetes against public cloud platform to maximum resource utilization and how to tune configurations of Spark to take advantage of Kubernetes resource manager to achieve best performance. To evaluate performance, the TPC-DS benchmarking tool will be used to analysis performance impact of queries between configurations set.
Speakers: Junjie Chen, Junping Du
How do we manage more than one thousand of Pegasus clusters - backend partacelyc1112009
A presentation in Apache Pegasus meetup in 2021 from Wang Dan.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...acelyc1112009
A presentation in Apache Pegasus meetup in 2022 from Wei Wang.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How does Apache Pegasusused in SensorsDataacelyc1112009
A presentation in COSCon (China Open Source Conference) 2023 from Guohao Li.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
Similar to Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016) (20)
33. • Built on rDSN (Robust Distributed System Nucleus) framework
• https://github.com/Microsoft/rDSN
Microkernel Architecture
Most Things Are Pluggable
Friendly for Testing,
Debugging and Monitoring
Lock RPC AIO Timer
Logging Replication … …
Engineering …
34. • Unit Test
• Integration Test
• Simulation Test
• Simulate cluster in single process
• Processing can be replayed
• Scenario Test
• Declarative language
• Construct different scenarios (executing paths / corner cases)
等待状态
发起动作
注入错误
设置属性
How to Test
等待状态
36. Why
Pegasus是对HBase的有力补充
• 更优异的性能:C++、No GC、Data Locality
• 更高的可用性:No GC、Faster Failover
• 更轻量的部署:No HDFS、No ZooKeeper (in the future)
• 更简单的接口:Key-Value
How WhatHow
Many choices …
Fit is the best