Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)涛 吴
This slide delivered by Zuoyan Qin, Chief engineer from XiaoMi Cloud Storage Team, was for talk at Arch summit Beijing-2016 regarding how Pegasus was designed.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Ceph scale testing with 10 Billion ObjectsKaran Singh
In this performance testing, we ingested 10 Billion objects into the Ceph Object Storage system and measured its performance. We have observed deterministic performance, check out this presentation to know the details.
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)涛 吴
This slide delivered by Zuoyan Qin, Chief engineer from XiaoMi Cloud Storage Team, was for talk at Arch summit Beijing-2016 regarding how Pegasus was designed.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Ceph scale testing with 10 Billion ObjectsKaran Singh
In this performance testing, we ingested 10 Billion objects into the Ceph Object Storage system and measured its performance. We have observed deterministic performance, check out this presentation to know the details.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
Sina Weibo is the most popular Microblogging platform in China. It has more than 100 million user and tens of millions of daily updates. This slide explains the performance challenges in Weibo platform.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.
In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
Sina Weibo is the most popular Microblogging platform in China. It has more than 100 million user and tens of millions of daily updates. This slide explains the performance challenges in Weibo platform.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.
In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
How does Apache Pegasusused in SensorsDataacelyc1112009
A presentation in COSCon (China Open Source Conference) 2023 from Guohao Li.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
50. 高级使用 —— 容器支持
容器支持
Pegasus本身不支持容器类型,
但是其HashKey + SortKey的数据模型可以模拟容器
map Map ID Key Value
Key
HashKey SortKey Value
Value
set Set ID Key Null
Key Null
list List ID Index Value
Index Value