SlideShare a Scribd company logo
1 of 14
Ceph Design Summit – Jewel
-- Hadoop over RGW with SSD Cache Status update
yuan.zhou@intel.com
7/2015
Content
• Hadoop over RGW with SSD cache design
• Status update since Infernalis
• Performance of Hadoop over Swift
Rack 2
Server 1
RGW
(vanilla)
RGWFS
FileSystem
Interface
M/R
Server 2
RGW
(vanilla)
RGWFS
FileSystem
Interface
M/R
RGW-Proxy
(NEW!)
RGWFS
FileSystem
Interface
Scheduler
1
2
3
RGW
Service
1. Scheduler ask RGW service
where a particular block
locates (control path)
• RGW-Proxy returns the closest
active RGW instance(s)
2. Scheduler allocates a task on
the server that is near to the
data
3. Task access data from
nearby (data path)
4. RGW get/put data from the
CT, and CT would get/put
data from BT if necessary
(data path)
Ceph RGW with SSD cache
OSD OSD OSD OSD OSD OSD
Ceph
RADOS
Rack 1
4
Gateway
Gateway
OSD(SSD) OSD(SSD) OSD(SSD) OSD(SSD) Cache
Tier
Base
Tier
Isolated
network MON
Status update
• RGW-Proxy(done)
• Restful service based on Python WSGI
• Give out the block location(s) on restful request, where the locations are sorted by the
distance between RGW instances and the data OSD
• curl http://RGW-proxy/con/test1/1
• http://192.168.6.115/con/test1, http://192.168.6.114/con/test1
• RGWFS(70% done)
1. Forked from SwiftFS, RGWFS can talk to single RGW
• But only to single RGW instance
• Each Get/Put will need to go through the RGW
2. Now with RGW-proxy, it can talk to multiple RGW instances
3. We also add ‘block level location aware read’ feature
• Performance testing
• Baseline performance with HDFS and Swift is done
• New filesystem URL rgw://
1. Forked from Hadoop-8545(SwiftFS)
2. Hadoop is able to talk to a RGW cluster with this plugin
3. A new ‘block concept’ was added since Swift doesn’t
support blocks
• Thus scheduler could use multiple tasks to access the same file
• Based on the location, RGWFS is able to read from the closest RGW
through range GET API
• But for PUT all the traffic still goes through single RGW
RGWFS – a new adaptor for HCFS
RGWFS
FileSystem
Interface
Scheduler
RGW://
RGW
RADOS
RGW
file1
1. Before get/put, RGWFS would try to get the
location of each block from RGW-Proxy
1. One topology file of the cluster is generated
2. RGW-Proxy would get the manifest from the head
object first(librados + getxattr)
3. Then based on the crushmap RGW-proxy can get
the location of each object block(ceph osd map)
4. RGW-proxy could get the closest RGW the data
osd info and the topology file(simple lookup in the
topology file)
RGW-Proxy
(NEW!)
RGW-Proxy – Give out the closest RGW instance
RGWFS
FileSystem
Interface
Scheduler
RGW://
RGW
RADOS
RGW
file1
1. Setting up RGW on a Cache Tier thus we could
use SSD as the cache.
1. With some dedicated chunk size: e.g., 64MB
considering the data are quite big usually
2. rgw_max_chunk_size, rgw_obj_strip_size
2. Based on the account/container/object name,
RGW could get/put the content.
1. Using Range Read to get each chunk
3. We’ll use write-through mode here as a start
point to bypass the data consistency issue.
RGW(vanillia) – Serve the data requests
RGWFS
FileSystem
Interface
Scheduler
RGW
(modified)
RGW
(modified)
RGW-Proxy
(NEW!)RGW
Service
OSD OSD OSD OSD OSD OSD
Ceph
RADOS
Gateway
Gateway
OSD OSD OSD OSD
Cache
Tier
Base
Tier
MON
HDFS vs Swift
HDFS
Swift
with
list-enpoint
Swift
without
list-enpoint
Host
Data Node
MapReduce
Host
Data Node
MapReduce
…
Host
Object-Server
MapReduce
Host
Object-Server
MapReduce
Host
Data Node
MapReduce
Host
Object-Server
MapReduce
…
Proxy-Server
Host
Object-Server
MapReduce
Host
Object-Server
MapReduce
Host
Object-Server
MapReduce
…
Proxy-Server
Name Node
IMPACT
• List Endpoint impact is huge
• Swift overhead comes from “Rename”
1X
1.25X
1.67X
Less is better
Rename in Reduce Task
• The output of the reduce function is written to a temporary location in
HDFS. After completing, the output will automatically renamed from its
temporary location to its final location.
• Object storage cannot support rename, swiftfs use “copy and delete” for
rename function.
HDFS Rename -> Change METADATA in Name Node
Swift Rename -> Copy new object and Delete the older one in Swift
Next Step
• Finish the development(70% done) and complete the performance
testing work
• Based on the performance of Swift, we’ll need to solve the heavy
rename issue, may need to patch RGW
• Open source code repo(WIP)
• https://github.com/intel-bigdata/MOC
10
Q&A
23
Deployment Consideration Matrix
12
Storage
Compute
Distro/Plugin
Data Processing API
Vanilla CDH HDP MapRSpark
VM Container Bare-metal
Tenant vs. Admin
provisioned
Disaggregated vs. Collocated HDFS vs. other options
Traditional
EDP
(Sahara native)
3rd party
APIs
Storm
Performance results in the next section
Storage Architecture Tenant provisioned (in VM)
 HDFS in the same VMs of computing tasks vs. in the
different VMs
 Ephemeral disk vs. Cinder volume
 Admin provided
 Logically disaggregated from computing tasks
 Physical collocation is a matter of deployment
 For network remote storage, Neutron DVR is very
useful feature
 A disaggregated (and centralized) storage system has
significant values
 No data silos, more business opportunities
 Could leverage Manila service
 Allow to create advanced solutions (.e.g. in-memory
overlayer)
 More vendor specific optimization opportunities
13
#2 #4#3#1
Host
HDFS
VM
Comput-
ing Task
VM
Comput-
ing Task
Host
HDFS
VM
Comput-
ing Task
VM
HDFS
Host
HDFS
VM
Comput-
ing Task
VM
Comput-
ing Task
Legacy
NFS
GlusterFS Ceph*
External
HDFS Swift
HDFS
Scenario #1: computing and data service collocate in the VMs
Scenario #2: data service locates in the host world
Scenario #3: data service locates in a separate VM world
Scenario #4: data service locates in the remote network
Compute Engine
14
Pros Cons
VM
• Best support in OpenStack
• Strong security
• Slow to provision
• Relatively high runtime performance overhead
Container
• Light-weight, fast provisioning
• Better runtime performance than VM
• Nova-docker readiness
• Cinder volume support is not ready yet
• Weaker security than VM
• Not the ideal way to use container
Bare-Metal
• Best performance and QoS
• Best security isolation
• Ironic readiness
• Worst efficiency (e.g. consolidation of workloads with
different behaviors)
• Worst flexibility (e.g. migration)
• Worst elasticity due to slow provisioning
 Container seems to be promising but still need better support
 Determining the appropriate cluster size is always a challenge to tenants
 e.g. small flavor with more nodes or large flavor with less nodes

More Related Content

What's hot

BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleJames Saint-Rossy
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about cephEmma Haruka Iwao
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookDanny Al-Gaaf
 
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ontico
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Sage Weil
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014Kyle Bader
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldSage Weil
 

What's hot (19)

BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
MySQL on Ceph
MySQL on CephMySQL on Ceph
MySQL on Ceph
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
Intorduce to Ceph
Intorduce to CephIntorduce to Ceph
Intorduce to Ceph
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud world
 

Viewers also liked

Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Jens Hadlich
 
Mellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox Technologies
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph clusterMirantis
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
Ceph at Spreadshirt (June 2016)
Ceph at Spreadshirt (June 2016)Ceph at Spreadshirt (June 2016)
Ceph at Spreadshirt (June 2016)Jens Hadlich
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
 
SUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage - a Gentle IntroductionSUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage - a Gentle IntroductionGábor Nyers
 
High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstackDeepak Mane
 
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_singC cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_singJohn Sing
 
Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Joshua Harlow
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph SolutionsRed_Hat_Storage
 
Zimbra Forum France 2016 - Karine and StarXpert
Zimbra Forum France 2016 - Karine and StarXpertZimbra Forum France 2016 - Karine and StarXpert
Zimbra Forum France 2016 - Karine and StarXpertZimbra
 
Guts & OpenStack migration
Guts & OpenStack migrationGuts & OpenStack migration
Guts & OpenStack migrationopenstackindia
 
OpenStack Storage Buddy Ceph
OpenStack Storage Buddy CephOpenStack Storage Buddy Ceph
OpenStack Storage Buddy Cephopenstackindia
 
Zimbra Forum France 2016 - Beezim and Ceph
Zimbra Forum France 2016 - Beezim and CephZimbra Forum France 2016 - Beezim and Ceph
Zimbra Forum France 2016 - Beezim and CephZimbra
 

Viewers also liked (20)

Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
 
Mellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox High Performance Networks for Ceph
Mellanox High Performance Networks for Ceph
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Ceph at Spreadshirt (June 2016)
Ceph at Spreadshirt (June 2016)Ceph at Spreadshirt (June 2016)
Ceph at Spreadshirt (June 2016)
 
Cloudinit
CloudinitCloudinit
Cloudinit
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
SUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage - a Gentle IntroductionSUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage - a Gentle Introduction
 
High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstack
 
CloudInit Introduction
CloudInit IntroductionCloudInit Introduction
CloudInit Introduction
 
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_singC cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
 
Ceph on rdma
Ceph on rdmaCeph on rdma
Ceph on rdma
 
Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
Ceph Object Store
Ceph Object StoreCeph Object Store
Ceph Object Store
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph Solutions
 
Zimbra Forum France 2016 - Karine and StarXpert
Zimbra Forum France 2016 - Karine and StarXpertZimbra Forum France 2016 - Karine and StarXpert
Zimbra Forum France 2016 - Karine and StarXpert
 
Guts & OpenStack migration
Guts & OpenStack migrationGuts & OpenStack migration
Guts & OpenStack migration
 
OpenStack Storage Buddy Ceph
OpenStack Storage Buddy CephOpenStack Storage Buddy Ceph
OpenStack Storage Buddy Ceph
 
Zimbra Forum France 2016 - Beezim and Ceph
Zimbra Forum France 2016 - Beezim and CephZimbra Forum France 2016 - Beezim and Ceph
Zimbra Forum France 2016 - Beezim and Ceph
 

Similar to Hadoop over rgw

Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Community
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for youTaco Scargo
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisNoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisAnkit Singhal
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and BeyondSage Weil
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Community
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Community
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Ceph at salesforce ceph day external presentation
Ceph at salesforce   ceph day external presentationCeph at salesforce   ceph day external presentation
Ceph at salesforce ceph day external presentationSameer Tiwari
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...Equnix Business Solutions
 
How to configure the cluster based on Multi-site (WAN) configuration
How to configure the clusterbased on Multi-site (WAN) configurationHow to configure the clusterbased on Multi-site (WAN) configuration
How to configure the cluster based on Multi-site (WAN) configurationAkihiro Kitada
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 

Similar to Hadoop over rgw (20)

Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for you
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisNoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Ceph at salesforce ceph day external presentation
Ceph at salesforce   ceph day external presentationCeph at salesforce   ceph day external presentation
Ceph at salesforce ceph day external presentation
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
 
How to configure the cluster based on Multi-site (WAN) configuration
How to configure the clusterbased on Multi-site (WAN) configurationHow to configure the clusterbased on Multi-site (WAN) configuration
How to configure the cluster based on Multi-site (WAN) configuration
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 

Recently uploaded

Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxBipin Adhikari
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 

Recently uploaded (20)

Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptx
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 

Hadoop over rgw

  • 1. Ceph Design Summit – Jewel -- Hadoop over RGW with SSD Cache Status update yuan.zhou@intel.com 7/2015
  • 2. Content • Hadoop over RGW with SSD cache design • Status update since Infernalis • Performance of Hadoop over Swift
  • 3. Rack 2 Server 1 RGW (vanilla) RGWFS FileSystem Interface M/R Server 2 RGW (vanilla) RGWFS FileSystem Interface M/R RGW-Proxy (NEW!) RGWFS FileSystem Interface Scheduler 1 2 3 RGW Service 1. Scheduler ask RGW service where a particular block locates (control path) • RGW-Proxy returns the closest active RGW instance(s) 2. Scheduler allocates a task on the server that is near to the data 3. Task access data from nearby (data path) 4. RGW get/put data from the CT, and CT would get/put data from BT if necessary (data path) Ceph RGW with SSD cache OSD OSD OSD OSD OSD OSD Ceph RADOS Rack 1 4 Gateway Gateway OSD(SSD) OSD(SSD) OSD(SSD) OSD(SSD) Cache Tier Base Tier Isolated network MON
  • 4. Status update • RGW-Proxy(done) • Restful service based on Python WSGI • Give out the block location(s) on restful request, where the locations are sorted by the distance between RGW instances and the data OSD • curl http://RGW-proxy/con/test1/1 • http://192.168.6.115/con/test1, http://192.168.6.114/con/test1 • RGWFS(70% done) 1. Forked from SwiftFS, RGWFS can talk to single RGW • But only to single RGW instance • Each Get/Put will need to go through the RGW 2. Now with RGW-proxy, it can talk to multiple RGW instances 3. We also add ‘block level location aware read’ feature • Performance testing • Baseline performance with HDFS and Swift is done
  • 5. • New filesystem URL rgw:// 1. Forked from Hadoop-8545(SwiftFS) 2. Hadoop is able to talk to a RGW cluster with this plugin 3. A new ‘block concept’ was added since Swift doesn’t support blocks • Thus scheduler could use multiple tasks to access the same file • Based on the location, RGWFS is able to read from the closest RGW through range GET API • But for PUT all the traffic still goes through single RGW RGWFS – a new adaptor for HCFS RGWFS FileSystem Interface Scheduler RGW:// RGW RADOS RGW file1
  • 6. 1. Before get/put, RGWFS would try to get the location of each block from RGW-Proxy 1. One topology file of the cluster is generated 2. RGW-Proxy would get the manifest from the head object first(librados + getxattr) 3. Then based on the crushmap RGW-proxy can get the location of each object block(ceph osd map) 4. RGW-proxy could get the closest RGW the data osd info and the topology file(simple lookup in the topology file) RGW-Proxy (NEW!) RGW-Proxy – Give out the closest RGW instance RGWFS FileSystem Interface Scheduler RGW:// RGW RADOS RGW file1
  • 7. 1. Setting up RGW on a Cache Tier thus we could use SSD as the cache. 1. With some dedicated chunk size: e.g., 64MB considering the data are quite big usually 2. rgw_max_chunk_size, rgw_obj_strip_size 2. Based on the account/container/object name, RGW could get/put the content. 1. Using Range Read to get each chunk 3. We’ll use write-through mode here as a start point to bypass the data consistency issue. RGW(vanillia) – Serve the data requests RGWFS FileSystem Interface Scheduler RGW (modified) RGW (modified) RGW-Proxy (NEW!)RGW Service OSD OSD OSD OSD OSD OSD Ceph RADOS Gateway Gateway OSD OSD OSD OSD Cache Tier Base Tier MON
  • 8. HDFS vs Swift HDFS Swift with list-enpoint Swift without list-enpoint Host Data Node MapReduce Host Data Node MapReduce … Host Object-Server MapReduce Host Object-Server MapReduce Host Data Node MapReduce Host Object-Server MapReduce … Proxy-Server Host Object-Server MapReduce Host Object-Server MapReduce Host Object-Server MapReduce … Proxy-Server Name Node IMPACT • List Endpoint impact is huge • Swift overhead comes from “Rename” 1X 1.25X 1.67X Less is better
  • 9. Rename in Reduce Task • The output of the reduce function is written to a temporary location in HDFS. After completing, the output will automatically renamed from its temporary location to its final location. • Object storage cannot support rename, swiftfs use “copy and delete” for rename function. HDFS Rename -> Change METADATA in Name Node Swift Rename -> Copy new object and Delete the older one in Swift
  • 10. Next Step • Finish the development(70% done) and complete the performance testing work • Based on the performance of Swift, we’ll need to solve the heavy rename issue, may need to patch RGW • Open source code repo(WIP) • https://github.com/intel-bigdata/MOC 10
  • 12. Deployment Consideration Matrix 12 Storage Compute Distro/Plugin Data Processing API Vanilla CDH HDP MapRSpark VM Container Bare-metal Tenant vs. Admin provisioned Disaggregated vs. Collocated HDFS vs. other options Traditional EDP (Sahara native) 3rd party APIs Storm Performance results in the next section
  • 13. Storage Architecture Tenant provisioned (in VM)  HDFS in the same VMs of computing tasks vs. in the different VMs  Ephemeral disk vs. Cinder volume  Admin provided  Logically disaggregated from computing tasks  Physical collocation is a matter of deployment  For network remote storage, Neutron DVR is very useful feature  A disaggregated (and centralized) storage system has significant values  No data silos, more business opportunities  Could leverage Manila service  Allow to create advanced solutions (.e.g. in-memory overlayer)  More vendor specific optimization opportunities 13 #2 #4#3#1 Host HDFS VM Comput- ing Task VM Comput- ing Task Host HDFS VM Comput- ing Task VM HDFS Host HDFS VM Comput- ing Task VM Comput- ing Task Legacy NFS GlusterFS Ceph* External HDFS Swift HDFS Scenario #1: computing and data service collocate in the VMs Scenario #2: data service locates in the host world Scenario #3: data service locates in a separate VM world Scenario #4: data service locates in the remote network
  • 14. Compute Engine 14 Pros Cons VM • Best support in OpenStack • Strong security • Slow to provision • Relatively high runtime performance overhead Container • Light-weight, fast provisioning • Better runtime performance than VM • Nova-docker readiness • Cinder volume support is not ready yet • Weaker security than VM • Not the ideal way to use container Bare-Metal • Best performance and QoS • Best security isolation • Ironic readiness • Worst efficiency (e.g. consolidation of workloads with different behaviors) • Worst flexibility (e.g. migration) • Worst elasticity due to slow provisioning  Container seems to be promising but still need better support  Determining the appropriate cluster size is always a challenge to tenants  e.g. small flavor with more nodes or large flavor with less nodes

Editor's Notes

  1. A new adaptor for RGW(RGWFS): basically it will follow SwiftFS using a restful requests to do read/write the data. [Java] Need to change the path.. A RGW-Proxy to do the master selection: it’s just like the Swift-Proxy which will give out the RGW instance and the object address(ip, port, account/container/obj) [python or Java] Based on the container/object name, this daemon would give out the closest RGW instance(thus we know the closest rack). We’ll use the SSD as the cache layer here. It’s actually one Cache Tier of below Base Tier. A RGW cache coherence algorithm: it’s working among the RGW instances which will keep the cache coherence. [C++] Based on the container/object name, RGW instance would read the object out from Ceph cluster.