SlideShare a Scribd company logo
1 of 16
Download to read offline
2015.9.14
Ceph IO 路径
和性能分析
王豪迈
I’m Haomai
✤ Ceph 开发者
✤ 之前做 OpenStack, Ceph
✤ 现在关注容器,Ceph
✤ 涉猎数据库,存储,⽂文件系统
IO 层次
API Interface
✤ rados_aio_write(rados_ioctx_t io, const char *oid,
rados_completion_t completion, const char *buf, size_t len,
uint64_t off)
✤ rados_aio_read(rados_ioctx_t io, const char *oid,
rados_completion_t completion, char *buf, size_t len, uint64_t off);
✤ 特点:
✤ 类 POSIX 但没有⽬目录结构
✤ 丰富的 callback 语意,异步通知
Session Layer
✤ Objecter::read(const object_t& oid, const object_locator_t& oloc, uint64_t off,
uint64_t len, snapid_t snap, bufferlist *pbl, int flags, Context
*onfinish,version_t *objver, ObjectOperation *extra_ops)
✤ Objecter::write(const object_t& oid, const object_locator_t& oloc, uint64_t off,
uint64_t len, const SnapContext& snapc, const bufferlist &bl, utime_t mtime,
int flags, Context *onack, Context *oncommit, version_t *objver,
ObjectOperation *extra_ops)
✤ 特点:
✤ 增加位置和版本信息
✤ 合并操作
Messenger Layer
✤ Connection::send_message(Message *m)
✤ Dispatcher::ms_dispatch(Message *m)
✤ 特点:
✤ Ops -> Message(连续字节流)
✤ ⾼高度封装,处理重传,⽹网络错误
Dispatcher Layer
✤ OSD::handle_op(OpRequestRef& op, OSDMapRef&
osdmap)
✤ 特点:
✤ Message -> OpRequest
✤ 基本信息校验(合法性,权限,完整性,正确性)
✤ 重定向
Replication Layer
✤ ReplicatedPG::execute_ctx(OpContext *ctx)
✤ 特点:
✤ OpRequest -> OpContext
✤ 请求解析,上下⽂文组建
✤ 读类型请求执⾏行
✤ 写请求组建事务(Transaction)
✤ 写请求增加⼀一致性操作⽇日志(Undo Log)
PG Backend Layer
✤ 特点:
✤ OpContext -> RepGather
✤ 不同回调实现
✤ 实现写操作⽅方式
Store Backend
✤ ObjectStore::queue_transactions(Sequencer *osr,
list<Transaction*>& tls, Context *onreadable, Context *ondisk,
Context *onreadable_sync, TrackedOpRef op,
ThreadPool::TPHandle *handle)
✤ 特点:
✤ Repop -> Transaction
✤ 保证原⼦子性
✤ 异步 IO
任何计算机问题总能通过加层解决
- Someone
性能问题
✤ 低延迟⾼高 IOPS 问题
✤ 复杂的分层模型
✤ ⽹网络
✤ 存储
✤ ⾼高吞吐设备利⽤用问题
✤ 存储
✤ 数据附加 Overhead
Store
✤ FileStore
✤ KeyValueStore
✤ DB
✤ 硬件 API
✤ MemStore
✤ NewStore
Messenger
✤ Simple
✤ poll
✤ 线程调度
✤ Async
✤ select/epoll/kqueue/evport …
✤ 事件驱动
✤ Xio
✤ tcp/rdma
✤ incompatible
Pipeline
VCPU
Thread
Qemu Main
Thread
Pipe::Writer Pipe::Reader DispatchThreader OSD::OpWQ FileJournal::Writer FileJournal->finisher
FileStore::OpWQFileStore::SyncThreadPipe::WriterPipe::ReaderDispatchThreader FileStore->op_finisherFileStore->ondisk_finisherRadosClient->finisher
谢谢!

More Related Content

What's hot

Python包管理工具介绍
Python包管理工具介绍Python包管理工具介绍
Python包管理工具介绍Young King
 
Strace debug
Strace debugStrace debug
Strace debugluo jing
 
常用Mac/Linux命令分享
常用Mac/Linux命令分享常用Mac/Linux命令分享
常用Mac/Linux命令分享Yihua Huang
 
Analysis on tcp ip protocol stack
Analysis on tcp ip protocol stackAnalysis on tcp ip protocol stack
Analysis on tcp ip protocol stackYueshen Xu
 
Head first in xmemcached yanf4j
Head first in xmemcached yanf4jHead first in xmemcached yanf4j
Head first in xmemcached yanf4jwavefly
 
X64服务器 lamp服务器部署标准 new
X64服务器 lamp服务器部署标准 newX64服务器 lamp服务器部署标准 new
X64服务器 lamp服务器部署标准 newYiwei Ma
 
Linux安全配置终极指南
Linux安全配置终极指南Linux安全配置终极指南
Linux安全配置终极指南wensheng wei
 
Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧Chen Liwei
 
Apache+php+mysql在Linux下的安装与配置
Apache+php+mysql在Linux下的安装与配置Apache+php+mysql在Linux下的安装与配置
Apache+php+mysql在Linux下的安装与配置wensheng wei
 
Windows 10 install mysql 8.0.16
Windows 10 install mysql 8.0.16Windows 10 install mysql 8.0.16
Windows 10 install mysql 8.0.16songwenxuan2020
 
compiler fuzzer : prog fuzz介紹
compiler fuzzer : prog fuzz介紹compiler fuzzer : prog fuzz介紹
compiler fuzzer : prog fuzz介紹fdgkhdkgh tommy
 
BASH 漏洞深入探討
BASH 漏洞深入探討BASH 漏洞深入探討
BASH 漏洞深入探討Tim Hsu
 
Introduce to Linux command line
Introduce to Linux command lineIntroduce to Linux command line
Introduce to Linux command lineWen Liao
 
Node.js 异步任务的四种模式
Node.js 异步任务的四种模式Node.js 异步任务的四种模式
Node.js 异步任务的四种模式Stackia Jia
 

What's hot (15)

Python包管理工具介绍
Python包管理工具介绍Python包管理工具介绍
Python包管理工具介绍
 
Strace debug
Strace debugStrace debug
Strace debug
 
常用Mac/Linux命令分享
常用Mac/Linux命令分享常用Mac/Linux命令分享
常用Mac/Linux命令分享
 
Analysis on tcp ip protocol stack
Analysis on tcp ip protocol stackAnalysis on tcp ip protocol stack
Analysis on tcp ip protocol stack
 
Head first in xmemcached yanf4j
Head first in xmemcached yanf4jHead first in xmemcached yanf4j
Head first in xmemcached yanf4j
 
X64服务器 lamp服务器部署标准 new
X64服务器 lamp服务器部署标准 newX64服务器 lamp服务器部署标准 new
X64服务器 lamp服务器部署标准 new
 
Linux安全配置终极指南
Linux安全配置终极指南Linux安全配置终极指南
Linux安全配置终极指南
 
Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧
 
Apache+php+mysql在Linux下的安装与配置
Apache+php+mysql在Linux下的安装与配置Apache+php+mysql在Linux下的安装与配置
Apache+php+mysql在Linux下的安装与配置
 
Windows 10 install mysql 8.0.16
Windows 10 install mysql 8.0.16Windows 10 install mysql 8.0.16
Windows 10 install mysql 8.0.16
 
compiler fuzzer : prog fuzz介紹
compiler fuzzer : prog fuzz介紹compiler fuzzer : prog fuzz介紹
compiler fuzzer : prog fuzz介紹
 
BASH 漏洞深入探討
BASH 漏洞深入探討BASH 漏洞深入探討
BASH 漏洞深入探討
 
Php
PhpPhp
Php
 
Introduce to Linux command line
Introduce to Linux command lineIntroduce to Linux command line
Introduce to Linux command line
 
Node.js 异步任务的四种模式
Node.js 异步任务的四种模式Node.js 异步任务的四种模式
Node.js 异步任务的四种模式
 

Viewers also liked

Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬Hang Geng
 
Ceph中国社区9.19 Some Ceph Story-朱荣泽03
Ceph中国社区9.19 Some Ceph Story-朱荣泽03Ceph中国社区9.19 Some Ceph Story-朱荣泽03
Ceph中国社区9.19 Some Ceph Story-朱荣泽03Hang Geng
 
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Jens Hadlich
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed_Hat_Storage
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
Divein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetupDivein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetupJiaying Ren
 
Rgw multisite-overview v2
Rgw multisite-overview v2Rgw multisite-overview v2
Rgw multisite-overview v2Jiaying Ren
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 

Viewers also liked (16)

Aio
AioAio
Aio
 
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
 
Ceph中国社区9.19 Some Ceph Story-朱荣泽03
Ceph中国社区9.19 Some Ceph Story-朱荣泽03Ceph中国社区9.19 Some Ceph Story-朱荣泽03
Ceph中国社区9.19 Some Ceph Story-朱荣泽03
 
Intorduce to Ceph
Intorduce to CephIntorduce to Ceph
Intorduce to Ceph
 
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere Mortals
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Divein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetupDivein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetup
 
Rgw multisite-overview v2
Rgw multisite-overview v2Rgw multisite-overview v2
Rgw multisite-overview v2
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 

Similar to Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05

Node.js长连接开发实践
Node.js长连接开发实践Node.js长连接开发实践
Node.js长连接开发实践longhao
 
Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeAngel Boy
 
PHP Coding Standard and 50+ Programming Skills
PHP Coding Standard and 50+ Programming SkillsPHP Coding Standard and 50+ Programming Skills
PHP Coding Standard and 50+ Programming SkillsHo Kim
 
Ria的强力后盾:rest+海量存储
Ria的强力后盾:rest+海量存储 Ria的强力后盾:rest+海量存储
Ria的强力后盾:rest+海量存储 zhen chen
 
Linux c++ 编程之链接与装载 -基础篇--v0.3--20120509
Linux c++ 编程之链接与装载 -基础篇--v0.3--20120509Linux c++ 编程之链接与装载 -基础篇--v0.3--20120509
Linux c++ 编程之链接与装载 -基础篇--v0.3--20120509tidesq
 
Node.js在淘宝的应用实践
Node.js在淘宝的应用实践Node.js在淘宝的应用实践
Node.js在淘宝的应用实践taobao.com
 
COSCUP 2010 - node.JS 於互動式網站之應用
COSCUP 2010 - node.JS 於互動式網站之應用COSCUP 2010 - node.JS 於互動式網站之應用
COSCUP 2010 - node.JS 於互動式網站之應用ericpi Bi
 
Redis 介绍 -田琪
Redis 介绍 -田琪Redis 介绍 -田琪
Redis 介绍 -田琪Shaoning Pan
 
使用Lua提高开发效率
使用Lua提高开发效率使用Lua提高开发效率
使用Lua提高开发效率gowell
 
千呼萬喚始出來的Java SE 7
千呼萬喚始出來的Java SE 7千呼萬喚始出來的Java SE 7
千呼萬喚始出來的Java SE 7javatwo2011
 
應用Ceph技術打造軟體定義儲存新局
應用Ceph技術打造軟體定義儲存新局應用Ceph技術打造軟體定義儲存新局
應用Ceph技術打造軟體定義儲存新局Alex Lau
 
D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版Jackson Tian
 
为啥别读HotSpot VM的源码(2012-03-03)
为啥别读HotSpot VM的源码(2012-03-03)为啥别读HotSpot VM的源码(2012-03-03)
为啥别读HotSpot VM的源码(2012-03-03)Kris Mok
 
IOS入门分享
IOS入门分享IOS入门分享
IOS入门分享zenyuhao
 
MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程Lixun Peng
 

Similar to Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05 (20)

Node.js长连接开发实践
Node.js长连接开发实践Node.js长连接开发实践
Node.js长连接开发实践
 
Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledge
 
Tcfsh bootcamp day2
 Tcfsh bootcamp day2 Tcfsh bootcamp day2
Tcfsh bootcamp day2
 
big web site
big web site big web site
big web site
 
Baidu Cloud Foundry
Baidu Cloud FoundryBaidu Cloud Foundry
Baidu Cloud Foundry
 
PHP Coding Standard and 50+ Programming Skills
PHP Coding Standard and 50+ Programming SkillsPHP Coding Standard and 50+ Programming Skills
PHP Coding Standard and 50+ Programming Skills
 
Ria的强力后盾:rest+海量存储
Ria的强力后盾:rest+海量存储 Ria的强力后盾:rest+海量存储
Ria的强力后盾:rest+海量存储
 
Linux c++ 编程之链接与装载 -基础篇--v0.3--20120509
Linux c++ 编程之链接与装载 -基础篇--v0.3--20120509Linux c++ 编程之链接与装载 -基础篇--v0.3--20120509
Linux c++ 编程之链接与装载 -基础篇--v0.3--20120509
 
Nio trick and trap
Nio trick and trapNio trick and trap
Nio trick and trap
 
Node.js在淘宝的应用实践
Node.js在淘宝的应用实践Node.js在淘宝的应用实践
Node.js在淘宝的应用实践
 
COSCUP 2010 - node.JS 於互動式網站之應用
COSCUP 2010 - node.JS 於互動式網站之應用COSCUP 2010 - node.JS 於互動式網站之應用
COSCUP 2010 - node.JS 於互動式網站之應用
 
Redis 介绍 -田琪
Redis 介绍 -田琪Redis 介绍 -田琪
Redis 介绍 -田琪
 
使用Lua提高开发效率
使用Lua提高开发效率使用Lua提高开发效率
使用Lua提高开发效率
 
千呼萬喚始出來的Java SE 7
千呼萬喚始出來的Java SE 7千呼萬喚始出來的Java SE 7
千呼萬喚始出來的Java SE 7
 
Node分享 展烨
Node分享 展烨Node分享 展烨
Node分享 展烨
 
應用Ceph技術打造軟體定義儲存新局
應用Ceph技術打造軟體定義儲存新局應用Ceph技術打造軟體定義儲存新局
應用Ceph技術打造軟體定義儲存新局
 
D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版
 
为啥别读HotSpot VM的源码(2012-03-03)
为啥别读HotSpot VM的源码(2012-03-03)为啥别读HotSpot VM的源码(2012-03-03)
为啥别读HotSpot VM的源码(2012-03-03)
 
IOS入门分享
IOS入门分享IOS入门分享
IOS入门分享
 
MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程
 

Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05