Submit Search
Upload
Ceph Day Shanghai - The Scrub and Repair in Jewel
•
0 likes
•
105 views
Ceph Community
Follow
Kefu Chai, Red Hat
Read less
Read more
Technology
Report
Share
Report
Share
1 of 17
Download now
Download to read offline
Recommended
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Community
London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress
Ceph Community
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Community
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
Ceph Community
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Community
London Ceph Day: Ceph Performance and Optimization
London Ceph Day: Ceph Performance and Optimization
Ceph Community
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Community
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Community
Recommended
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Community
London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress
Ceph Community
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Community
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
Ceph Community
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Community
London Ceph Day: Ceph Performance and Optimization
London Ceph Day: Ceph Performance and Optimization
Ceph Community
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Community
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Community
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Community
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Community
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Community
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Community
Transforming the Ceph Integration Tests with OpenStack
Transforming the Ceph Integration Tests with OpenStack
Ceph Community
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Ceph Community
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
Ceph Community
Ceph Day 2015 - Erasure Coding
Ceph Day 2015 - Erasure Coding
Ceph Community
Ceph Day LA: Ceph Ecosystem Update
Ceph Day LA: Ceph Ecosystem Update
Ceph Community
Ceph Day LA - RBD: A deep dive
Ceph Day LA - RBD: A deep dive
Ceph Community
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS
Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS
Ceph Community
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
Ceph Community
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Community
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Community
Ceph Day NYC: Developing With Librados
Ceph Day NYC: Developing With Librados
Ceph Community
Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph
Ceph Community
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Community
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Community
Ceph Day Beijing: Ceph-Dokan: A Native Windows Ceph Client
Ceph Day Beijing: Ceph-Dokan: A Native Windows Ceph Client
Ceph Community
My Eclipse 6 Java Ee开发中文手册
My Eclipse 6 Java Ee开发中文手册
yiditushe
My Eclipse 6 Java Ee开发中文手册
My Eclipse 6 Java Ee开发中文手册
yiditushe
More Related Content
Viewers also liked
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Community
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Community
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Community
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Community
Transforming the Ceph Integration Tests with OpenStack
Transforming the Ceph Integration Tests with OpenStack
Ceph Community
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Ceph Community
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
Ceph Community
Ceph Day 2015 - Erasure Coding
Ceph Day 2015 - Erasure Coding
Ceph Community
Ceph Day LA: Ceph Ecosystem Update
Ceph Day LA: Ceph Ecosystem Update
Ceph Community
Ceph Day LA - RBD: A deep dive
Ceph Day LA - RBD: A deep dive
Ceph Community
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS
Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS
Ceph Community
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
Ceph Community
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Community
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Community
Ceph Day NYC: Developing With Librados
Ceph Day NYC: Developing With Librados
Ceph Community
Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph
Ceph Community
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Community
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Community
Ceph Day Beijing: Ceph-Dokan: A Native Windows Ceph Client
Ceph Day Beijing: Ceph-Dokan: A Native Windows Ceph Client
Ceph Community
Viewers also liked
(20)
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph
Transforming the Ceph Integration Tests with OpenStack
Transforming the Ceph Integration Tests with OpenStack
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day 2015 - Erasure Coding
Ceph Day 2015 - Erasure Coding
Ceph Day LA: Ceph Ecosystem Update
Ceph Day LA: Ceph Ecosystem Update
Ceph Day LA - RBD: A deep dive
Ceph Day LA - RBD: A deep dive
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS
Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day NYC: Developing With Librados
Ceph Day NYC: Developing With Librados
Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day Beijing: Ceph-Dokan: A Native Windows Ceph Client
Ceph Day Beijing: Ceph-Dokan: A Native Windows Ceph Client
Similar to Ceph Day Shanghai - The Scrub and Repair in Jewel
My Eclipse 6 Java Ee开发中文手册
My Eclipse 6 Java Ee开发中文手册
yiditushe
My Eclipse 6 Java Ee开发中文手册
My Eclipse 6 Java Ee开发中文手册
yiditushe
尽管去做
尽管去做
hievan
Direct show
Direct show
guest3d7034
Ucd火花集
Ucd火花集
quizasdodo
Ucd火花集
Ucd火花集
quizasdodo
Cite space中文手册
Cite space中文手册
cueb
51 cto下载 2010-ccna实验手册
51 cto下载 2010-ccna实验手册
poker mr
神经网络与深度学习
神经网络与深度学习
Xiaohu ZHU
尽管去做——无压工作的艺术
尽管去做——无压工作的艺术
Jim Liu
080620-16461915
080620-16461915
tutorialsruby
Ext 中文手册
Ext 中文手册
donotbeevil
080620-16461915
080620-16461915
tutorialsruby
Similar to Ceph Day Shanghai - The Scrub and Repair in Jewel
(13)
My Eclipse 6 Java Ee开发中文手册
My Eclipse 6 Java Ee开发中文手册
My Eclipse 6 Java Ee开发中文手册
My Eclipse 6 Java Ee开发中文手册
尽管去做
尽管去做
Direct show
Direct show
Ucd火花集
Ucd火花集
Ucd火花集
Ucd火花集
Cite space中文手册
Cite space中文手册
51 cto下载 2010-ccna实验手册
51 cto下载 2010-ccna实验手册
神经网络与深度学习
神经网络与深度学习
尽管去做——无压工作的艺术
尽管去做——无压工作的艺术
080620-16461915
080620-16461915
Ext 中文手册
Ext 中文手册
080620-16461915
080620-16461915
Ceph Day Shanghai - The Scrub and Repair in Jewel
1.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Scrub and
Repair in Jewel Kefu Cai Red Hat October 18, 2015 The Scrub and Repair in Jewel Kefu Cai Red Hat October 18, 2015 2015-10-18 The Scrub and Repair in Jewel 谢谢⼤家现在还醒着
2.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Scrub and
Repair in Jewel Kefu Chai Red Hat October 18, 2015 The Scrub and Repair in Jewel Kefu Chai Red Hat October 18, 2015 2015-10-18 The Scrub and Repair in Jewel
3.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outline scrub 101 pain in
scrub scrub in jewel Q & A Outline scrub 101 pain in scrub scrub in jewel Q & A 2015-10-18 The Scrub and Repair in Jewel Outline 1. 接下来想和诸位聊⼀下 ceph ⾥⾯的 scrub 和 repair
4.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $ sudo ceph
health detail HEALTH_ERR 1 pgs inconsistent; 2 scrub errors pg 17.1c1 is active+clean+inconsistent, acting [21,25,30] 2 scrub errors $ sudo ceph health detail HEALTH_ERR 1 pgs inconsistent; 2 scrub errors pg 17.1c1 is active+clean+inconsistent, acting [21,25,30] 2 scrub errors 2015-10-18 The Scrub and Repair in Jewel 有的时候需要检查 osd 的 log,才能知道具体是哪个 object 有了问题,进⽽⼿动解决。
5.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . $ sudo ceph
health detail HEALTH_ERR 1 pgs inconsistent; 2 scrub errors pg 17.1c1 is active+clean+inconsistent, acting [21,25,30] 2 scrub errors $ ceph pg repair 17.1c1 $ sudo ceph health detail HEALTH_ERR 1 pgs inconsistent; 2 scrub errors pg 17.1c1 is active+clean+inconsistent, acting [21,25,30] 2 scrub errors $ ceph pg repair 17.1c1 2015-10-18 The Scrub and Repair in Jewel 有的时候需要检查 osd 的 log,才能知道具体是哪个 object 有了问题,进⽽⼿动解决。
6.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . scrub why ▶ fix the
data lost/corruption when we can scrub why ▶ fix the data lost/corruption when we can 2015-10-18 The Scrub and Repair in Jewel scrub 101 scrub 由于硬盘出错导致的 EIO,使得数据不可读。 或者更可怕的是,数据出错,同时客户端没有觉察, 把错误的数据当作正确的数据处理, 从⽽造成灾难性的损失。
7.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . scrub when ▶ least impact
to production scrub when ▶ least impact to production 2015-10-18 The Scrub and Repair in Jewel scrub 101 scrub 1. 限定在 begin hour 和 end hour 之间 2. 在系统负载超过阈值的时候也不会贸然启动 3. 以 osd 为单位,同时进⾏的 scrub 数量是可以配置的
8.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . scrub when ▶ least impact
to production ▶ scheduled periodically scrub when ▶ least impact to production ▶ scheduled periodically 2015-10-18 The Scrub and Repair in Jewel scrub 101 scrub 1. min/max interval 可配置
9.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . scrub when ▶ least impact
to production ▶ scheduled periodically ▶ distributed randomly in configured time range scrub when ▶ least impact to production ▶ scheduled periodically ▶ distributed randomly in configured time range 2015-10-18 The Scrub and Repair in Jewel scrub 101 scrub 1. 试图缓解因为 max interval 导致的⼤量 scrub 任务堆积
10.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . scrub how ▶ pg based
and initialized by the primary ▶ scan in batch scrub how ▶ pg based and initialized by the primary ▶ scan in batch 2015-10-18 The Scrub and Repair in Jewel scrub 101 scrub 1. 每次以 pg 为单位,由 primary 发起,要求所有的 peered 的 pg 参与。 分批次处理 pg ⾥⾯的所有 object,以 object 的 hash 为界。这就是所谓的 chunky scrub。把收集到的 object 信息,组织成⼀个 scrub map,同时也向 replica 请求相应的 scrub map。很明显,scrub map 会包含 object, xattr, omap 的信息, 以供交叉⽐较。
11.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . scrub how ▶ pg based
and initialized by the primary ▶ scan in batch ▶ select authoritive copy scrub how ▶ pg based and initialized by the primary ▶ scan in batch ▶ select authoritive copy 2015-10-18 The Scrub and Repair in Jewel scrub 101 scrub 1. 和⺴购⼀样,⼀个 object 也有好评差评之分。⽐如, object 对应 hash 读不出来了; omap 读不出, object 读出来的⼤⼩或者 digest 不⼀致; 干脆读取的时候出现 EIO。 都会导致⼀个 replica 失去成为权威拷⻉的机会。 只有通过这些考验的副本才能成为权威拷⻉。⽽出问题的情况, 不外乎读取失败,数据缺失,和数据不⼀致⼏种。
12.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . scrub how ▶ pg based
and initialized by the primary ▶ scan in batch ▶ select authoritive copy ▶ write down the results ▶ fix it (later) scrub how ▶ pg based and initialized by the primary ▶ scan in batch ▶ select authoritive copy ▶ write down the results ▶ fix it (later) 2015-10-18 The Scrub and Repair in Jewel scrub 101 scrub 1. 把 scrub 的结果通过 cluster log 发给 monitor 记录下来。 这些结果包括⼀些 metadata 本⾝不⼀致的情况,⽐如 head 和 snapdir 共存,有 clone, 但是 snapset ⾥⾯的 seq 是空的, 也有 snapset 本⾝出现不⼀致,⽐如说⾃⼰觉得有 3 个 snap,但是发现只有 2 个。 2. 对于 digest 缺失的情况,就直接⽤已知的 digest 重写所有的 replica。 如果数据本⾝不⼀致, 或者缺失,那么就把它放到 missing 列表⾥⾯去。 如果在以后处理 io 请求的时候,正好碰到了缺少的 object,就⽴即恢复它。 即随机选择⼀个 replica,发送 pull 请求, ⽤收到的 push 数据重写本地有错误,或者缺少的 object。
13.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pain in scrub ▶
ceph pg repair does not always work. ▶ scrub/repair is not programmable/scriptable. pain in scrub ▶ ceph pg repair does not always work. ▶ scrub/repair is not programmable/scriptable. 2015-10-18 The Scrub and Repair in Jewel pain in scrub pain in scrub 现有的机制还⾮常简单,⽆法应对绝⼤多数情形。 有时候需要观察⽇志,删除出错的 replica。 没有实现更智能、更复杂的策略。 ⽐如说少数服从多数的算法,虽然直觉,但是⺫前没有这个设计。 如果 librados 能为 scrub 提供更丰富的接⼝, 那么客户端就能灵活地制定策略,选择有效的副本,指定正确的数据, 甚⾄恢复 snapset/clone,当然这需要对 snapset 有⼀些理解。
14.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . new APIs query ▶ get_inconsistent_pools()
=> pool[] ▶ get_inconsistent_pgs() => pg[] ▶ get_inconsistent(pg) => epoch, inconsistent[] new APIs query ▶ get_inconsistent_pools() => pool[] ▶ get_inconsistent_pgs() => pg[] ▶ get_inconsistent(pg) => epoch, inconsistent[] 2015-10-18 The Scrub and Repair in Jewel scrub in jewel new APIs 在 scrub 过程中,osd 会把缺失或者不⼀致的信息收集起来。保存在临时 object ⾥⾯,供 scrub API 查询。在同⼀个 pg interval 的期间,可以根据这次 scrub 的结果 进⾏查询,读取,进⾏恢复的操作。 但是如果 interval 过了,scrub API 会返回 EAGAIN 。 inconsistent 会是⼀个带有版本号的 json,⽅便⽇后扩展。 是 osd 到 shard_info_t 的 map, shard_info_t 包含 replica 是否存在, omap_sha1, data_sha1, size 等 metadata。 基于这些数据,⽤户可以实现各种策略,⽐如说投票。
15.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . new APIs read ▶ operate_on_shard(epoch,
osd, shard, op) new APIs read ▶ operate_on_shard(epoch, osd, shard, op) 2015-10-18 The Scrub and Repair in Jewel scrub in jewel new APIs 其中,op 可能是 read 操作,这些操作允许⽤户检查数据。 从⽽选择有效的数据副本,同时跳过 OSD ⼀致性的检查。
16.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . new APIs choose allows user
to ▶ specify the good/bad shard/xattr/omap replica ▶ specify the metadata (snapset in particular) new APIs choose allows user to ▶ specify the good/bad shard/xattr/omap replica ▶ specify the metadata (snapset in particular) 2015-10-18 The Scrub and Repair in Jewel scrub in jewel new APIs 前者本⾝就是数据,没有和其它对象有互相引⽤的关系。 但是 snapset 和 clone 有其⾃⾝的⼀致性问题。所以具体的接⼝还在讨论。 如果操作不当,可能会产⽣更多的不⼀致,有些可能是预料之外的。
17.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Q & A Q
& A 2015-10-18 The Scrub and Repair in Jewel Q & A Q & A 段经理在早上,说到 call for action。在这⾥也希望得到⼤家的宝贵意⻅。
Download now