SlideShare a Scribd company logo
1 of 15
Unified read-only cache
proposal
Design goals & current status
• A standalone SSD caching library that can be re-used between librbd
RGW
• Current status:
• Librbd read-only cache: caching block contents on SSD
• Librbd parent/clone images, caching parent rbd contents on SSD, all cloned image can read
from parent image cache before COW happen
• PR will be send out soon
• RGW immutable caching: caching rados objects on SSD
• A small CDN farm behind RGW cluster
• PR against Jewel ready(#13144) but need to clean up
General architecture
• Libcachefile: common lib that does
read/write on SSD
• Sparse-file based cache
• Policy: controlling on the cache
promotion/demotion, sizing of the
cache
• Simple LRU based
• librbd/librgw hooks: call API from
libcachefile
FileImageCache
RBD_0
SSD
libCacheStore
RGW_DataCache
librbd librgw
RGW_civetweb
RBD_1
RBD_2
RGW_civetweb
RGW_civetweb
RADOS
librbd librados
hooks hookspolicy policy
Shared rbd read-only SSD cache
Shared Read-only cache for RBD –rbd clone flow
RBD_0 RBD_0@snap1 RBD_1
RBD_2
RBD_N
…
Template image Protected snapshot
Cloned image
Cloned image
Cloned image
This is the shared image content
Shared Read-only cache for RBD -- overview
• There will be a shared
cache(from parent image) on
each compute node
• Cloned image will read from
the shared cache unless COW
happened Local Cache
Write I/O
Read I/O
SSD backend
Write I/O
Read I/O
…
…
Compute node
Local CacheShared Cache
Shared
Cache
…
…
Compute node
RADOS
OSD OSD OSD OSD OSD OSD OSD
SSD backend
Shared Read-only cache for RBD -- Cache
metadata
Each cloned image will have its COW cache mapping:
- For each read hit, either in shared cache, or in its own
cache
- Cache mapping bits for COWed data
- Updated when COW happen
2 bits :
not_in_cache,
In_shared_cache,
In_cache
62 bits:
block_id
Cache fileCache file
RBD_2(cloned)
librbd
FileImageCache
COW
data
librbd
FileImageCache
Shared Read-only cache for RBD – IO flow
RBD_0(parent)
image_store
Shared Cache file
(fully promoted on first cloned image open)
RADOS
1
RBD_1(cloned)
librbd
FileImageCache
Cache lookup
COW
data
2
in shared cache:
- Read from shared cache
2’
in cow cache:
- Read from cow cache
Compute node
read
SSD
COW Cache mapping
rbd_id lba length
Rbd_1 8192 4096
Rbd_1 1048576 4096
COW Cache mapping
rbd_id lba length
Rbd_2 8192 4096
Rbd_2 1048576 4096
librbd
FileImageCache
Cache fileCache file
RBD_2(cloned)
librbd
FileImageCache
COW
data
librbd
FileImageCache
RBD_0(parent)
image_store
Shared Cache file
(fully promoted on first cloned image open)
RADOS
1
COW Cache mapping
RBD_1(cloned)
Cache lookup
COW
data
2
in shared cache:
- Create entry in COW mapping table
- Write to RADOS
2’
in cow cache:
- Invalidate the chunk in the cache file
- Write to RADOS
Compute node
rbd_id lba length
rbd_1 8192 4096
rbd_1 1048576 4096
write
SSD
Shared Read-only cache for RBD – IO flow
COW Cache mapping
rbd_id lba length
rbd_2 81920 4096
rbd_2 1048576 4096
Shared Read-only cache for RBD – initial results
4k Rand Read Op_Size Op_Type QD Runtime(sec) IOPS BW(MB/s) Latency(ms) 99.99% Latency(ms)
Baseline(w/o cache) 4k randread qd32 300 12927 50.5MB/s 2.437ms 8.89ms
Read-only cache 4k randread qd32 300 52351 204.5MB/s 0.555ms 3.95ms
independent read-only cache(2
volumes)
4k randread qd32 300 70079 273.74MB/s 0.860ms 5.56ms
Shared read-only cache(2 volumes) 4k randread qd32 300 68612 268MB/s 0.875ms 2.98ms(?)
PR will be send out soon..
Shared RGW read-only SSD cache
Shared Read-only cache for RGW
chunk_id RGW instance id Cache_chunk_id
7e21a6b2-89b9-4de6-869e-
1ddc0198a82b.5228.1__shadow_.Tzk
bVV_syqJ2vumnFe8uAaiL9j6ghtC_34
Rgw_1 7e21a6b2-89b9-4de6-869e-
1ddc0198a82b.5228.1__shadow_.Tzk
bVV_syqJ2vumnFe8uAaiL9j6ghtC_34
• A CDN cluster behind the RGW clusters
• L1 cache: allow to read from SSD cache of local RGW instance
• L2 cache(configurable): allow to read from SSD cache on other remote RGW instances
• Each object/chunk has an unique ID
• Need a centralized distributed K/V to store the mapping as the chunks maybe spreaded
on different RGW instances
Shared Read-only cache for RGW
rgw_1 rgw_2
RADOS
Local
cache
Local
cache
librados
Immutable Cache
S3 API Swift API
rgw_frontend
rgw_rados
rgw_cache
datacachepolicy
Immutable Cache
L1 L2
Issues
• different caching semantics for block and object?
• Promoting at block level(default 8k) for librbd
• Promoting at object level for RGW
• #13144 is not compiling
• https://github.com/maniaabdi/engage1.git
• Jewel based, need to rebase against master
• Currently the logic is inside rgw_rados, need to be decupled to cope with our
design(libcachefile + policy)
RGW datacache (PR #13144)
rgw_1 rgw_2
RADOS
Local
cache
Local
cache
librados
Immutable Cache
S3 API Swift API
rgw_frontend
rgw_rados
rgw_cache
datacache
policy
Immutable Cache
L1 L2

More Related Content

What's hot

Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackRed_Hat_Storage
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleJames Saint-Rossy
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ontico
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Community
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 

What's hot (19)

Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
MySQL on Ceph
MySQL on CephMySQL on Ceph
MySQL on Ceph
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Intorduce to Ceph
Intorduce to CephIntorduce to Ceph
Intorduce to Ceph
 
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 

Similar to Unified readonly cache for ceph

Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...DataStax Academy
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE
 
Experiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerExperiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerJomaSoft
 
Ceph Performance: Projects Leading Up to Jewel
Ceph Performance: Projects Leading Up to JewelCeph Performance: Projects Leading Up to Jewel
Ceph Performance: Projects Leading Up to JewelRed_Hat_Storage
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red_Hat_Storage
 
0628阙宏宇
0628阙宏宇0628阙宏宇
0628阙宏宇zhu02
 
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"OpenStack Korea Community
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data DeduplicationRedWireServices
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
 
Application_Benchmark_into_Virtualization
Application_Benchmark_into_VirtualizationApplication_Benchmark_into_Virtualization
Application_Benchmark_into_VirtualizationKhai Le
 
Persistent Memory Programming with Java*
Persistent Memory Programming with Java*Persistent Memory Programming with Java*
Persistent Memory Programming with Java*Intel® Software
 
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...MongoDB
 
Nytro-XV_NWD_VM_Performance_Acceleration
Nytro-XV_NWD_VM_Performance_AccelerationNytro-XV_NWD_VM_Performance_Acceleration
Nytro-XV_NWD_VM_Performance_AccelerationKhai Le
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Community
 
Exploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient WorkflowsExploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient Workflowsjasonajohnson
 

Similar to Unified readonly cache for ceph (20)

Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
 
Ceph on arm64 upload
Ceph on arm64   uploadCeph on arm64   upload
Ceph on arm64 upload
 
Experiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerExperiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 Server
 
Ceph Performance: Projects Leading Up to Jewel
Ceph Performance: Projects Leading Up to JewelCeph Performance: Projects Leading Up to Jewel
Ceph Performance: Projects Leading Up to Jewel
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
0628阙宏宇
0628阙宏宇0628阙宏宇
0628阙宏宇
 
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Application_Benchmark_into_Virtualization
Application_Benchmark_into_VirtualizationApplication_Benchmark_into_Virtualization
Application_Benchmark_into_Virtualization
 
Persistent Memory Programming with Java*
Persistent Memory Programming with Java*Persistent Memory Programming with Java*
Persistent Memory Programming with Java*
 
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Nytro-XV_NWD_VM_Performance_Acceleration
Nytro-XV_NWD_VM_Performance_AccelerationNytro-XV_NWD_VM_Performance_Acceleration
Nytro-XV_NWD_VM_Performance_Acceleration
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
Exploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient WorkflowsExploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient Workflows
 
All Zones
All ZonesAll Zones
All Zones
 

Recently uploaded

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 

Recently uploaded (20)

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 

Unified readonly cache for ceph

  • 2. Design goals & current status • A standalone SSD caching library that can be re-used between librbd RGW • Current status: • Librbd read-only cache: caching block contents on SSD • Librbd parent/clone images, caching parent rbd contents on SSD, all cloned image can read from parent image cache before COW happen • PR will be send out soon • RGW immutable caching: caching rados objects on SSD • A small CDN farm behind RGW cluster • PR against Jewel ready(#13144) but need to clean up
  • 3. General architecture • Libcachefile: common lib that does read/write on SSD • Sparse-file based cache • Policy: controlling on the cache promotion/demotion, sizing of the cache • Simple LRU based • librbd/librgw hooks: call API from libcachefile FileImageCache RBD_0 SSD libCacheStore RGW_DataCache librbd librgw RGW_civetweb RBD_1 RBD_2 RGW_civetweb RGW_civetweb RADOS librbd librados hooks hookspolicy policy
  • 5. Shared Read-only cache for RBD –rbd clone flow RBD_0 RBD_0@snap1 RBD_1 RBD_2 RBD_N … Template image Protected snapshot Cloned image Cloned image Cloned image This is the shared image content
  • 6. Shared Read-only cache for RBD -- overview • There will be a shared cache(from parent image) on each compute node • Cloned image will read from the shared cache unless COW happened Local Cache Write I/O Read I/O SSD backend Write I/O Read I/O … … Compute node Local CacheShared Cache Shared Cache … … Compute node RADOS OSD OSD OSD OSD OSD OSD OSD SSD backend
  • 7. Shared Read-only cache for RBD -- Cache metadata Each cloned image will have its COW cache mapping: - For each read hit, either in shared cache, or in its own cache - Cache mapping bits for COWed data - Updated when COW happen 2 bits : not_in_cache, In_shared_cache, In_cache 62 bits: block_id
  • 8. Cache fileCache file RBD_2(cloned) librbd FileImageCache COW data librbd FileImageCache Shared Read-only cache for RBD – IO flow RBD_0(parent) image_store Shared Cache file (fully promoted on first cloned image open) RADOS 1 RBD_1(cloned) librbd FileImageCache Cache lookup COW data 2 in shared cache: - Read from shared cache 2’ in cow cache: - Read from cow cache Compute node read SSD COW Cache mapping rbd_id lba length Rbd_1 8192 4096 Rbd_1 1048576 4096 COW Cache mapping rbd_id lba length Rbd_2 8192 4096 Rbd_2 1048576 4096
  • 9. librbd FileImageCache Cache fileCache file RBD_2(cloned) librbd FileImageCache COW data librbd FileImageCache RBD_0(parent) image_store Shared Cache file (fully promoted on first cloned image open) RADOS 1 COW Cache mapping RBD_1(cloned) Cache lookup COW data 2 in shared cache: - Create entry in COW mapping table - Write to RADOS 2’ in cow cache: - Invalidate the chunk in the cache file - Write to RADOS Compute node rbd_id lba length rbd_1 8192 4096 rbd_1 1048576 4096 write SSD Shared Read-only cache for RBD – IO flow COW Cache mapping rbd_id lba length rbd_2 81920 4096 rbd_2 1048576 4096
  • 10. Shared Read-only cache for RBD – initial results 4k Rand Read Op_Size Op_Type QD Runtime(sec) IOPS BW(MB/s) Latency(ms) 99.99% Latency(ms) Baseline(w/o cache) 4k randread qd32 300 12927 50.5MB/s 2.437ms 8.89ms Read-only cache 4k randread qd32 300 52351 204.5MB/s 0.555ms 3.95ms independent read-only cache(2 volumes) 4k randread qd32 300 70079 273.74MB/s 0.860ms 5.56ms Shared read-only cache(2 volumes) 4k randread qd32 300 68612 268MB/s 0.875ms 2.98ms(?) PR will be send out soon..
  • 11. Shared RGW read-only SSD cache
  • 12. Shared Read-only cache for RGW chunk_id RGW instance id Cache_chunk_id 7e21a6b2-89b9-4de6-869e- 1ddc0198a82b.5228.1__shadow_.Tzk bVV_syqJ2vumnFe8uAaiL9j6ghtC_34 Rgw_1 7e21a6b2-89b9-4de6-869e- 1ddc0198a82b.5228.1__shadow_.Tzk bVV_syqJ2vumnFe8uAaiL9j6ghtC_34 • A CDN cluster behind the RGW clusters • L1 cache: allow to read from SSD cache of local RGW instance • L2 cache(configurable): allow to read from SSD cache on other remote RGW instances • Each object/chunk has an unique ID • Need a centralized distributed K/V to store the mapping as the chunks maybe spreaded on different RGW instances
  • 13. Shared Read-only cache for RGW rgw_1 rgw_2 RADOS Local cache Local cache librados Immutable Cache S3 API Swift API rgw_frontend rgw_rados rgw_cache datacachepolicy Immutable Cache L1 L2
  • 14. Issues • different caching semantics for block and object? • Promoting at block level(default 8k) for librbd • Promoting at object level for RGW • #13144 is not compiling • https://github.com/maniaabdi/engage1.git • Jewel based, need to rebase against master • Currently the logic is inside rgw_rados, need to be decupled to cope with our design(libcachefile + policy)
  • 15. RGW datacache (PR #13144) rgw_1 rgw_2 RADOS Local cache Local cache librados Immutable Cache S3 API Swift API rgw_frontend rgw_rados rgw_cache datacache policy Immutable Cache L1 L2

Editor's Notes

  1. How to maintain the librbd parent/clone image table?
  2. When to promote the shared cache file? -> when opening the first cloned image, the cache will be promoted to local, this could be optimized What data should we promote? parent_image@snapshot Librbd caching will be promoting at block size(4k default) level What is the cache file format? -> sparse file based
  3. Only do promote when read Writes to osd directly and invalidates the cache if cache_hit