SlideShare a Scribd company logo
오픈인프라데이
Ceph issue 사례
2019.07.11
오픈소스컨설팅 이 영주
( yjlee@osci.kr )
Contents
01. 구성도
02. Issue 발생
03. 해결 과정
구성도
01
01. 구성도
●
전체 구성도
Controller Node
Compute NodeStorage Node
Deploy
FireWall
Router
5/26
01. 구성도
●
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
6/26
01. 구성도
●
Ceph OBJ 흐름
PG: Placement Group
Object를 저장하기 위한 OSD의 group.
복제본 수에 맞춰 member의 수가 달라짐.
OSD: Object Storage Daemon
object를 최종 저장하는 곳
Monitor: ceph OSD의 
변화를  monitoring
하여  crush map을 
만드는 주체 주체.
[root@ceph-osd01 ~]# rados ls -p vms
rbd_data.1735e637a64d5.0000000000000000
rbd_header.1735e637a64d5
rbd_directory
rbd_children
rbd_info
rbd_data.1735e637a64d5.0000000000000003
rbd_data.1735e637a64d5.0000000000000002
rbd_id.893f4f3d-f6d9-4521-997c-72caa861ac24_disk
rbd_data.1735e637a64d5.0000000000000001
rbd_object_map.1735e637a64d5
[root@ceph-osd01 ~]#
OBJ의  기본 크기는 크기는 주체4MBMB
CRUSH: Controlled Replication Under
Scalable Hashing
Object를 분산 저장하기위한 알고리즘.
Issue 발생
02
02. Issue 발생
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
OSD 중 1개가 90%가 되어
Read/Write가 안되는 주체 상태
[root@osc-ceph01 ~]# ceph pg dump |grep -i full_ratio
dumped all in format plain
full_ratio 0.9
nearfull_ratio 0.8
[root@osc-ceph01 ~]# ceph daemon mon.`hostname` config show |grep -i osd_full_ratio
"mon_osd_full_ratio": "0.9",
[root@osc-ceph01 ~]#
02. Issue 발생
- Ceph community Trouble shooting guide
참조  : http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space
02. Issue 발생
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
osd.8의 pg 1.11f를 삭제
02. Issue 발생
[root@ceph-mon02 ~]# ceph -s
cluster f5078395-0236-47fd-ad02-8a6daadc7475
health HEALTH_ERR
1 pgs are stuck inactive for more than 300 seconds
162 pgs backfill_wait
37 pgs backfilling
322 pgs degraded
1 pgs down
2 pgs peering
4 pgs recovering
119 pgs recovery_wait
1 pgs stuck inactive
322 pgs stuck unclean
199 pgs undersized
recovery 592647/43243812 objects degraded (1.370%)
recovery 488046/43243812 objects misplaced (1.129%)
1 mons down, quorum 0,2 ceph-mon01,ceph-mon03
monmap e1: 3 mons at {ceph-mon01=10.10.50.201:6789/0,ceph-mon02=10.10.50.202:6789/0,ceph-mon03=10.10.50.203:6789/0}
election epoch 480, quorum 0,2 ceph-mon01,ceph-mon03
osdmap e27606: 128 osds: 125 up, 125 in; 198 remapped pgs
flags sortbitwise
pgmap v58287759: 10240 pgs, 4 pools, 54316 GB data, 14076 kobjects
157 TB used, 71440 GB / 227 TB avail
592647/43243812 objects degraded (1.370%)
488046/43243812 objects misplaced (1.129%)
9916 active+clean
162 active+undersized+degraded+remapped+wait_backfill
119 active+recovery_wait+degraded
37 active+undersized+degraded+remapped+backfilling
4 active+recovering+degraded
1 down+peering
1 peering
300초 넘게 통신이 안되는 pg가 1개 (1.11f) ...
osd가 down되어 backfill을 기다리고 있는 pg가 162개
pglog의 범위를 벗어나
backfill을 진행 하고 있는 pg가 37개
3copy를 채우지 못해 성능이 떨어진 pg가 322개
문제의 down된 pg 1개... (1.11f) )
상태를 결정중인 pg 2개
(recovery, backfill)
recovery를 기다리고 있는 pg 119개
pglog를 보고 복구중인 pg 4개
(해당 pg I/O block됨)
up상태의 osd가 없어서 inactive 된 pg 1개
(1.11f)
3벌 복제에 못미치는 pg가 322개
pool의 복제본 수에 못미치는 pg가 199개
Monitor 1개 죽음
pg 1.11f를 갖고 있는
OSD 3개 죽음
02. Issue 발생
구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
Images Pool
Openstack image들이
들어가 있음.
Volumes Pool
pg 1.11f는 모든 openstack
volume들의 정보를
조금씩 갖고 있음.
pg 1개가 down되면 해당 pool의
모든 data들을 쓸 수가 없다.
[root@osc-ceph01 ~]# ceph pg dump |head
dumped all in format plain
...
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
1.11f 0 0 0 0 0 0 3080 3080 active+clean 2019-07-10 08:12:46.623592 921'8580 10763:10591 [8,4,7] 8 [8,4,7] 8
921'8580 2019-07-10 08:12:46.623572 921'8580 2019-07-07 19:44:32.652377
...
Primary pg가
모든 I/O를 책임진다.
해결 과정
03
03. 해결 과정
writeout_from: 30174'649621, trimmed:
-1> 2018-10-24 15:28:44.487997 7fb622e2d700 5 write_log with: dirty_to: 0'0,
dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0,
writeout_from: 30174'593316, trimmed:
0> 2018-10-24 15:28:44.502006 7fb61de23700 -1 osd/SnapMapper.cc: In function
'void SnapMapper::add_oid(const hobject_t&,
const std::set<snapid_t>&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
thread 7fb61de23700 time 2018-10-24 15:28:44.497739
osd/SnapMapper.cc: 228: FAILED assert(r == -2)
분석 결과...
3벌 복제된 pg간 충돌이나서 해당 pg를
갖고 있는 osd가 down된다.
이것은 redhat ceph 3.1(luminous)에서 fix되었으니
upgrade를 해라!!
그러나...
- Redhat Openstack 9(Mitaka)는 Redhat ceph 3.1을 지원 안함.
- Redhat ceph 3.1로 upgrade하기전에 openstack을
10(Newton)까지 upgrade 필요.
- Redhat openstack 9는 TripleO로 되어져 있음.
(Upgrade process가 굉장히 복잡함...)
- Redhat ceph upgrade 시 Error상태에서 해야 함.
- 렁나ㅣ러아니ㅓㄹ아ㅣㄴ;ㅓㄹ아ㅣ;ㄴ며랴ㅓ야냋
03. 해결 과정
Openstack upgrade
- 실패...
- 재설치 후 모든  후 모든  모든  vm 복구 
Ceph 3.1 upgrade
- ceph ansible을  사용하지 않고  않고  manualy upgrade 함.
03. 해결 과정
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
vms Pool
Nova에 의해 생성되는
vm image를 저장
12345_disk 기존 vm rbd
67890_disk 신규 vm rbd
신규 VM1
ID=67890
기존 VM1
ID=12345
복구 과정
- 신규 vm생성 (ID 67890)
- vms pool에 있는 rbd 67890_disk삭제
- 12345_disk를 67890_disk로 이름변경
- 이걸 모든vm에 적용...
[root@ceph01 ~]# rbd list -p vms
12345_disk
67890_disk
[root@ceph01 ~]# rbd rm -p vms 67890_disk
Removing image: 100% complete...done.
[root@ceph01 ~]#
[root@ceph01 ~]# rbd mv -p vms 12345_disk 67890_disk
[root@ceph01 ~]# rbd ls -p vms
67890_disk
03. 해결 과정
Redhat Ceph 3.1 upgrade 후 ...
- 비슷한 문제 발생
- pg 1.11f 를 갖고 있는 osd들이 up down을 반복 함.
[root@ceph-mon01 osc]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
noscrub,nodeep-scrub flag(s) set
5 scrub errors
Possible data damage: 1 pg inconsistent
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
flags noscrub,nodeep-scrub
data:
pools: 4 pools, 10240 pgs
objects: 12200k objects, 46865 GB
usage: 137 TB used, 97628 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+inconsistent
io:
client: 0 B/s rd, 1232 kB/s wr, 19 op/s rd, 59 op/s wr
[root@ceph-mon01 osc]# ceph health detail
HEALTH_ERR noscrub,nodeep-scrub flag(s) set; 5 scrub errors; Possible data
damage: 1 pg inconsistent
OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
OSD_SCRUB_ERRORS 5 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 1.11f is active+clean+inconsistent, acting [113,105,10]
[root@ceph-mon01 osc]#
OTL...
03. 해결 과정
하지만 문제되는 Object를 특정지을 수 있었음.
[root@ceph-mon01 ~]# rados list-inconsistent-obj 1.11f --format=json-pretty
{
"epoch": 34376,
"inconsistents": [
{
"object": {
"name": "rbd_data.39edab651c7b53.0000000000003600",
"nspace": "",
"locator": "",
03. 해결 과정
Object rbd_data.39edab651c7b53.0000000000003600는 고객 DB Service vm의 root filesystem volume이었음.
다행이도 DB data에는 문제가 없었고...
문제가 된 DB vm의 root filesystem을 담고 있는 RBD image를 삭제 함. 하지만 여전히 상태는 HEALTH_ERR ...
[root@ceph-mon01 osc]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
4 scrub errors
Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
data:
pools: 4 pools, 10240 pgs
objects: 12166k objects, 46731 GB
usage: 136 TB used, 98038 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+inconsistent+snaptrim_error
io:
client: 0 B/s rd, 351 kB/s wr, 15 op/s rd, 51 op/s wr
[root@ceph-mon01 osc]# ceph health detail
HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent, 1 pg
snaptrim_error
OSD_SCRUB_ERRORS 4 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error
pg 1.11f is active+clean+inconsistent+snaptrim_error, acting [113,105,10]
[root@ceph-mon01 osc]#
03. 해결 과정
- 문제되는 object의 snapshot id 54(0x36)이 문제가 되어서 error가 발생중임.
- ??? 이미 지웠는데??
2018-11-16 18:45:00.163319 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 10: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
2018-11-16 18:45:00.163330 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 105: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
2018-11-16 18:45:00.163333 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 113: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
$ printf "%dn" 0x36
54
[root@ceph-osd08 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-113 --pgid 1.11f --op list | grep 39edab651c7b53
Error getting attr on : 1.11f_head,#-3:f8800000:::scrub_1.11f:head#, (61) No data available
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":54,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":63,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":-2,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
03. 해결 과정
- 문제되는 object를 갖고 있는 rbd image를 찾아보자!
[root@ceph-mon01 osc]# rbd info volume-13076ffc-6520-4db8-b238-1ba6108bfe52 -p volumes
rbd image 'volume-13076ffc-6520-4db8-b238-1ba6108bfe52':
size 53248 MB in 13312 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.62cb510d494de
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
[root@ceph-mon01 osc]#
[root@ceph-mon01 osc]# cat rbd-info.sh
#!/bin/bash
for i in `rbd list -p volumes`
do
rbd info volumes/$i |grep rbd_data.39edab651c7b53
echo --- $i done ----
done
[root@ceph-mon01 osc]# bash rbd-info.sh
rbd info에서 object의
prefix를 볼 수 있다.
모든 rbd image에서
문제되는 object를
찾는 script
[root@ceph-mon01 osc]# bash rbd-info.sh
--- rbdtest done ----
--- volume-00b0de1a-bfab-40e0-a444-b6c2d0de3905 done ----
--- volume-02d9c884-fc30-4700-87fd-950855ae361d done ----
...
[root@ceph-mon01 osc]# 결과는 ...
역시나 없음...
03. 해결 과정
- 해당 snapshot을 갖고 있는 volume이 삭제 되었으니 오류에 대한 조건이 더이상 존재하지 않음.
- repair를 다시 해보라고 함.
[root@ceph-mon01 ~]# date ; ceph pg repair 1.11f
Wed Nov 28 18:16:25 KST 2018
instructing pg 1.11f on osd.113 to repair
[root@ceph-mon01 ~]# ceph health detail
HEALTH_ERR noscrub,nodeep-scrub flag(s) set; Possible data damage: 1 pg repair
OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
PG_DAMAGED Possible data damage: 1 pg repair
pg 1.11f is active+clean+scrubbing+deep+repair, acting [113,105,10]
[root@ceph-mon01 ~]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
noscrub,nodeep-scrub flag(s) set
Possible data damage: 1 pg repair
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
flags noscrub,nodeep-scrub
data:
pools: 4 pools, 10240 pgs
objects: 12321k objects, 47365 GB
usage: 138 TB used, 96138 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+scrubbing+deep+repair
io:
client: 598 kB/s rd, 1145 kB/s wr, 18 op/s rd, 63 op/s wr
pg 1.11f를 repair중
03. 해결 과정
- ceph log를 확인.
[root@ceph-mon01 ~]# ceph -w
...
2018-11-28 18:21:26.654955 osd.113 [ERR] 1.11f repair stat mismatch, got 3310/3312 objects, 91/92 clones, 3243/3245
dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 67/68 whiteouts, 13579894784/13584089088 bytes, 0/0 hit_set_archive bytes.
2018-11-28 18:21:26.655657 osd.113 [ERR] 1.11f repair 1 errors, 1 fixed
2018-11-28 18:19:28.979704 mon.ceph-mon01 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg repair)
2018-11-28 18:20:30.652593 mon.ceph-mon01 [WRN] Health check update: nodeep-scrub flag(s) set (OSDMAP_FLAGS)
2018-11-28 18:20:35.394445 mon.ceph-mon01 [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set)
2018-11-28 18:20:35.394457 mon.ceph-mon01 [INF] Cluster is now healthy
어..?! fixed???
03. 해결 과정
- HEALTH_OK
[root@ceph-mon01 ~]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
data:
pools: 4 pools, 10240 pgs
objects: 12321k objects, 47366 GB
usage: 138 TB used, 96138 GB / 232 TB avail
pgs: 10216 active+clean
24 active+clean+scrubbing+deep
io:
client: 424 kB/s rd, 766 kB/s wr, 18 op/s rd, 72 op/s wr
Q&A
오픈소스컨설팅 이 영주
( yjlee@osci.kr )
Thank you
감사합니다
Cloud & Collaboration
T. 02-516-0711 E. sales@osci.kr
서울시강남구 테헤란로83길32,5층(삼성동, 나라키움삼성동A빌딩)
www.osci.kr

More Related Content

What's hot

Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
Italo Santos
 
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
Open Source Consulting
 
Jenkins를 활용한 Openshift CI/CD 구성
Jenkins를 활용한 Openshift CI/CD 구성 Jenkins를 활용한 Openshift CI/CD 구성
Jenkins를 활용한 Openshift CI/CD 구성
rockplace
 
Community Openstack 구축 사례
Community Openstack 구축 사례Community Openstack 구축 사례
Community Openstack 구축 사례
Open Source Consulting
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
Ceph Community
 
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
OpenStack Korea Community
 
Red Hat Ansible 적용 사례
Red Hat Ansible 적용 사례Red Hat Ansible 적용 사례
Red Hat Ansible 적용 사례
Opennaru, inc.
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
Ji-Woong Choi
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
Sage Weil
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
ShapeBlue
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
Karan Singh
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
Sage Weil
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
David Pasek
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Ceph Community
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
Jo Hoon
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
Red_Hat_Storage
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
ShapeBlue
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 

What's hot (20)

Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
 
Jenkins를 활용한 Openshift CI/CD 구성
Jenkins를 활용한 Openshift CI/CD 구성 Jenkins를 활용한 Openshift CI/CD 구성
Jenkins를 활용한 Openshift CI/CD 구성
 
Community Openstack 구축 사례
Community Openstack 구축 사례Community Openstack 구축 사례
Community Openstack 구축 사례
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
 
Red Hat Ansible 적용 사례
Red Hat Ansible 적용 사례Red Hat Ansible 적용 사례
Red Hat Ansible 적용 사례
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 

Similar to Ceph issue 해결 사례

[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
OpenStack Korea Community
 
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Community
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
CanSecWest
 
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606Diaa Radwan
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Rongze Zhu
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
Vietnam Open Infrastructure User Group
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
Viet Stack
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE
 
Some analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDB
Xiao Yan Li
 
Cephalocon apac china
Cephalocon apac chinaCephalocon apac china
Cephalocon apac china
Vikhyat Umrao
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Ceph Community
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Matthew Ahrens
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Community
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
Anne Nicolas
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
Marian Marinov
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteAdvanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suite
Kenny Gryp
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT
Ceph Community
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud Stroage
Alex Lau
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
libfetion
 

Similar to Ceph issue 해결 사례 (20)

[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
 
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
 
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
 
Some analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDB
 
Cephalocon apac china
Cephalocon apac chinaCephalocon apac china
Cephalocon apac china
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteAdvanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suite
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud Stroage
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 

More from Open Source Consulting

[Open Source Consulting] Open PaaS & IaaS Offering Brochure.pdf
[Open Source Consulting] Open PaaS & IaaS Offering Brochure.pdf[Open Source Consulting] Open PaaS & IaaS Offering Brochure.pdf
[Open Source Consulting] Open PaaS & IaaS Offering Brochure.pdf
Open Source Consulting
 
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
Open Source Consulting
 
[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술
Open Source Consulting
 
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
Open Source Consulting
 
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
Open Source Consulting
 
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Open Source Consulting
 
초보자를 위한 네트워크/VLAN 기초
초보자를 위한 네트워크/VLAN 기초초보자를 위한 네트워크/VLAN 기초
초보자를 위한 네트워크/VLAN 기초
Open Source Consulting
 
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket CloudAtlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Open Source Consulting
 
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10![웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
Open Source Consulting
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
Open Source Consulting
 
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
Open Source Consulting
 
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
Open Source Consulting
 
[오픈소스컨설팅] ARM & OpenStack Community
[오픈소스컨설팅] ARM & OpenStack Community[오픈소스컨설팅] ARM & OpenStack Community
[오픈소스컨설팅] ARM & OpenStack Community
Open Source Consulting
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting
Open Source Consulting
 
Atlassian ITSM Case-study
Atlassian ITSM Case-studyAtlassian ITSM Case-study
Atlassian ITSM Case-study
Open Source Consulting
 
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
Open Source Consulting
 
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
Open Source Consulting
 
Open infra and cloud native
Open infra and cloud nativeOpen infra and cloud native
Open infra and cloud native
Open Source Consulting
 
[오픈소스컨설팅] jira service desk 201908
[오픈소스컨설팅] jira service desk 201908[오픈소스컨설팅] jira service desk 201908
[오픈소스컨설팅] jira service desk 201908
Open Source Consulting
 
Community openstack & Ceph 기반 서비스 운영 해결 방안
Community openstack & Ceph 기반 서비스 운영 해결 방안Community openstack & Ceph 기반 서비스 운영 해결 방안
Community openstack & Ceph 기반 서비스 운영 해결 방안
Open Source Consulting
 

More from Open Source Consulting (20)

[Open Source Consulting] Open PaaS & IaaS Offering Brochure.pdf
[Open Source Consulting] Open PaaS & IaaS Offering Brochure.pdf[Open Source Consulting] Open PaaS & IaaS Offering Brochure.pdf
[Open Source Consulting] Open PaaS & IaaS Offering Brochure.pdf
 
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
 
[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술
 
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
 
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
 
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
 
초보자를 위한 네트워크/VLAN 기초
초보자를 위한 네트워크/VLAN 기초초보자를 위한 네트워크/VLAN 기초
초보자를 위한 네트워크/VLAN 기초
 
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket CloudAtlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
 
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10![웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
 
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
 
[오픈소스컨설팅] ARM & OpenStack Community
[오픈소스컨설팅] ARM & OpenStack Community[오픈소스컨설팅] ARM & OpenStack Community
[오픈소스컨설팅] ARM & OpenStack Community
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting
 
Atlassian ITSM Case-study
Atlassian ITSM Case-studyAtlassian ITSM Case-study
Atlassian ITSM Case-study
 
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
 
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
 
Open infra and cloud native
Open infra and cloud nativeOpen infra and cloud native
Open infra and cloud native
 
[오픈소스컨설팅] jira service desk 201908
[오픈소스컨설팅] jira service desk 201908[오픈소스컨설팅] jira service desk 201908
[오픈소스컨설팅] jira service desk 201908
 
Community openstack & Ceph 기반 서비스 운영 해결 방안
Community openstack & Ceph 기반 서비스 운영 해결 방안Community openstack & Ceph 기반 서비스 운영 해결 방안
Community openstack & Ceph 기반 서비스 운영 해결 방안
 

Recently uploaded

Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 

Recently uploaded (20)

Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 

Ceph issue 해결 사례

  • 2. Contents 01. 구성도 02. Issue 발생 03. 해결 과정
  • 4. 01. 구성도 ● 전체 구성도 Controller Node Compute NodeStorage Node Deploy FireWall Router
  • 5. 5/26 01. 구성도 ● Ceph 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3
  • 6. 6/26 01. 구성도 ● Ceph OBJ 흐름 PG: Placement Group Object를 저장하기 위한 OSD의 group. 복제본 수에 맞춰 member의 수가 달라짐. OSD: Object Storage Daemon object를 최종 저장하는 곳 Monitor: ceph OSD의 변화를 monitoring 하여 crush map을 만드는 주체 주체. [root@ceph-osd01 ~]# rados ls -p vms rbd_data.1735e637a64d5.0000000000000000 rbd_header.1735e637a64d5 rbd_directory rbd_children rbd_info rbd_data.1735e637a64d5.0000000000000003 rbd_data.1735e637a64d5.0000000000000002 rbd_id.893f4f3d-f6d9-4521-997c-72caa861ac24_disk rbd_data.1735e637a64d5.0000000000000001 rbd_object_map.1735e637a64d5 [root@ceph-osd01 ~]# OBJ의 기본 크기는 크기는 주체4MBMB CRUSH: Controlled Replication Under Scalable Hashing Object를 분산 저장하기위한 알고리즘.
  • 8. 02. Issue 발생 Ceph 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 OSD 중 1개가 90%가 되어 Read/Write가 안되는 주체 상태 [root@osc-ceph01 ~]# ceph pg dump |grep -i full_ratio dumped all in format plain full_ratio 0.9 nearfull_ratio 0.8 [root@osc-ceph01 ~]# ceph daemon mon.`hostname` config show |grep -i osd_full_ratio "mon_osd_full_ratio": "0.9", [root@osc-ceph01 ~]#
  • 9. 02. Issue 발생 - Ceph community Trouble shooting guide 참조 : http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space
  • 10. 02. Issue 발생 Ceph 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 osd.8의 pg 1.11f를 삭제
  • 11. 02. Issue 발생 [root@ceph-mon02 ~]# ceph -s cluster f5078395-0236-47fd-ad02-8a6daadc7475 health HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds 162 pgs backfill_wait 37 pgs backfilling 322 pgs degraded 1 pgs down 2 pgs peering 4 pgs recovering 119 pgs recovery_wait 1 pgs stuck inactive 322 pgs stuck unclean 199 pgs undersized recovery 592647/43243812 objects degraded (1.370%) recovery 488046/43243812 objects misplaced (1.129%) 1 mons down, quorum 0,2 ceph-mon01,ceph-mon03 monmap e1: 3 mons at {ceph-mon01=10.10.50.201:6789/0,ceph-mon02=10.10.50.202:6789/0,ceph-mon03=10.10.50.203:6789/0} election epoch 480, quorum 0,2 ceph-mon01,ceph-mon03 osdmap e27606: 128 osds: 125 up, 125 in; 198 remapped pgs flags sortbitwise pgmap v58287759: 10240 pgs, 4 pools, 54316 GB data, 14076 kobjects 157 TB used, 71440 GB / 227 TB avail 592647/43243812 objects degraded (1.370%) 488046/43243812 objects misplaced (1.129%) 9916 active+clean 162 active+undersized+degraded+remapped+wait_backfill 119 active+recovery_wait+degraded 37 active+undersized+degraded+remapped+backfilling 4 active+recovering+degraded 1 down+peering 1 peering 300초 넘게 통신이 안되는 pg가 1개 (1.11f) ... osd가 down되어 backfill을 기다리고 있는 pg가 162개 pglog의 범위를 벗어나 backfill을 진행 하고 있는 pg가 37개 3copy를 채우지 못해 성능이 떨어진 pg가 322개 문제의 down된 pg 1개... (1.11f) ) 상태를 결정중인 pg 2개 (recovery, backfill) recovery를 기다리고 있는 pg 119개 pglog를 보고 복구중인 pg 4개 (해당 pg I/O block됨) up상태의 osd가 없어서 inactive 된 pg 1개 (1.11f) 3벌 복제에 못미치는 pg가 322개 pool의 복제본 수에 못미치는 pg가 199개 Monitor 1개 죽음 pg 1.11f를 갖고 있는 OSD 3개 죽음
  • 12. 02. Issue 발생 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 Images Pool Openstack image들이 들어가 있음. Volumes Pool pg 1.11f는 모든 openstack volume들의 정보를 조금씩 갖고 있음. pg 1개가 down되면 해당 pool의 모든 data들을 쓸 수가 없다. [root@osc-ceph01 ~]# ceph pg dump |head dumped all in format plain ... pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 1.11f 0 0 0 0 0 0 3080 3080 active+clean 2019-07-10 08:12:46.623592 921'8580 10763:10591 [8,4,7] 8 [8,4,7] 8 921'8580 2019-07-10 08:12:46.623572 921'8580 2019-07-07 19:44:32.652377 ... Primary pg가 모든 I/O를 책임진다.
  • 14. 03. 해결 과정 writeout_from: 30174'649621, trimmed: -1> 2018-10-24 15:28:44.487997 7fb622e2d700 5 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 30174'593316, trimmed: 0> 2018-10-24 15:28:44.502006 7fb61de23700 -1 osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' thread 7fb61de23700 time 2018-10-24 15:28:44.497739 osd/SnapMapper.cc: 228: FAILED assert(r == -2) 분석 결과... 3벌 복제된 pg간 충돌이나서 해당 pg를 갖고 있는 osd가 down된다. 이것은 redhat ceph 3.1(luminous)에서 fix되었으니 upgrade를 해라!! 그러나... - Redhat Openstack 9(Mitaka)는 Redhat ceph 3.1을 지원 안함. - Redhat ceph 3.1로 upgrade하기전에 openstack을 10(Newton)까지 upgrade 필요. - Redhat openstack 9는 TripleO로 되어져 있음. (Upgrade process가 굉장히 복잡함...) - Redhat ceph upgrade 시 Error상태에서 해야 함. - 렁나ㅣ러아니ㅓㄹ아ㅣㄴ;ㅓㄹ아ㅣ;ㄴ며랴ㅓ야냋
  • 15. 03. 해결 과정 Openstack upgrade - 실패... - 재설치 후 모든 후 모든 모든 vm 복구 Ceph 3.1 upgrade - ceph ansible을 사용하지 않고 않고 manualy upgrade 함.
  • 16. 03. 해결 과정 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 vms Pool Nova에 의해 생성되는 vm image를 저장 12345_disk 기존 vm rbd 67890_disk 신규 vm rbd 신규 VM1 ID=67890 기존 VM1 ID=12345 복구 과정 - 신규 vm생성 (ID 67890) - vms pool에 있는 rbd 67890_disk삭제 - 12345_disk를 67890_disk로 이름변경 - 이걸 모든vm에 적용... [root@ceph01 ~]# rbd list -p vms 12345_disk 67890_disk [root@ceph01 ~]# rbd rm -p vms 67890_disk Removing image: 100% complete...done. [root@ceph01 ~]# [root@ceph01 ~]# rbd mv -p vms 12345_disk 67890_disk [root@ceph01 ~]# rbd ls -p vms 67890_disk
  • 17. 03. 해결 과정 Redhat Ceph 3.1 upgrade 후 ... - 비슷한 문제 발생 - pg 1.11f 를 갖고 있는 osd들이 up down을 반복 함. [root@ceph-mon01 osc]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR noscrub,nodeep-scrub flag(s) set 5 scrub errors Possible data damage: 1 pg inconsistent services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in flags noscrub,nodeep-scrub data: pools: 4 pools, 10240 pgs objects: 12200k objects, 46865 GB usage: 137 TB used, 97628 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+inconsistent io: client: 0 B/s rd, 1232 kB/s wr, 19 op/s rd, 59 op/s wr [root@ceph-mon01 osc]# ceph health detail HEALTH_ERR noscrub,nodeep-scrub flag(s) set; 5 scrub errors; Possible data damage: 1 pg inconsistent OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set OSD_SCRUB_ERRORS 5 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 1.11f is active+clean+inconsistent, acting [113,105,10] [root@ceph-mon01 osc]# OTL...
  • 18. 03. 해결 과정 하지만 문제되는 Object를 특정지을 수 있었음. [root@ceph-mon01 ~]# rados list-inconsistent-obj 1.11f --format=json-pretty { "epoch": 34376, "inconsistents": [ { "object": { "name": "rbd_data.39edab651c7b53.0000000000003600", "nspace": "", "locator": "",
  • 19. 03. 해결 과정 Object rbd_data.39edab651c7b53.0000000000003600는 고객 DB Service vm의 root filesystem volume이었음. 다행이도 DB data에는 문제가 없었고... 문제가 된 DB vm의 root filesystem을 담고 있는 RBD image를 삭제 함. 하지만 여전히 상태는 HEALTH_ERR ... [root@ceph-mon01 osc]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR 4 scrub errors Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in data: pools: 4 pools, 10240 pgs objects: 12166k objects, 46731 GB usage: 136 TB used, 98038 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+inconsistent+snaptrim_error io: client: 0 B/s rd, 351 kB/s wr, 15 op/s rd, 51 op/s wr [root@ceph-mon01 osc]# ceph health detail HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error OSD_SCRUB_ERRORS 4 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error pg 1.11f is active+clean+inconsistent+snaptrim_error, acting [113,105,10] [root@ceph-mon01 osc]#
  • 20. 03. 해결 과정 - 문제되는 object의 snapshot id 54(0x36)이 문제가 되어서 error가 발생중임. - ??? 이미 지웠는데?? 2018-11-16 18:45:00.163319 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 10: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) 2018-11-16 18:45:00.163330 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 105: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) 2018-11-16 18:45:00.163333 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 113: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) $ printf "%dn" 0x36 54 [root@ceph-osd08 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-113 --pgid 1.11f --op list | grep 39edab651c7b53 Error getting attr on : 1.11f_head,#-3:f8800000:::scrub_1.11f:head#, (61) No data available ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":54,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}] ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":63,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}] ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":-2,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
  • 21. 03. 해결 과정 - 문제되는 object를 갖고 있는 rbd image를 찾아보자! [root@ceph-mon01 osc]# rbd info volume-13076ffc-6520-4db8-b238-1ba6108bfe52 -p volumes rbd image 'volume-13076ffc-6520-4db8-b238-1ba6108bfe52': size 53248 MB in 13312 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.62cb510d494de format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: [root@ceph-mon01 osc]# [root@ceph-mon01 osc]# cat rbd-info.sh #!/bin/bash for i in `rbd list -p volumes` do rbd info volumes/$i |grep rbd_data.39edab651c7b53 echo --- $i done ---- done [root@ceph-mon01 osc]# bash rbd-info.sh rbd info에서 object의 prefix를 볼 수 있다. 모든 rbd image에서 문제되는 object를 찾는 script [root@ceph-mon01 osc]# bash rbd-info.sh --- rbdtest done ---- --- volume-00b0de1a-bfab-40e0-a444-b6c2d0de3905 done ---- --- volume-02d9c884-fc30-4700-87fd-950855ae361d done ---- ... [root@ceph-mon01 osc]# 결과는 ... 역시나 없음...
  • 22. 03. 해결 과정 - 해당 snapshot을 갖고 있는 volume이 삭제 되었으니 오류에 대한 조건이 더이상 존재하지 않음. - repair를 다시 해보라고 함. [root@ceph-mon01 ~]# date ; ceph pg repair 1.11f Wed Nov 28 18:16:25 KST 2018 instructing pg 1.11f on osd.113 to repair [root@ceph-mon01 ~]# ceph health detail HEALTH_ERR noscrub,nodeep-scrub flag(s) set; Possible data damage: 1 pg repair OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set PG_DAMAGED Possible data damage: 1 pg repair pg 1.11f is active+clean+scrubbing+deep+repair, acting [113,105,10] [root@ceph-mon01 ~]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR noscrub,nodeep-scrub flag(s) set Possible data damage: 1 pg repair services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in flags noscrub,nodeep-scrub data: pools: 4 pools, 10240 pgs objects: 12321k objects, 47365 GB usage: 138 TB used, 96138 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+scrubbing+deep+repair io: client: 598 kB/s rd, 1145 kB/s wr, 18 op/s rd, 63 op/s wr pg 1.11f를 repair중
  • 23. 03. 해결 과정 - ceph log를 확인. [root@ceph-mon01 ~]# ceph -w ... 2018-11-28 18:21:26.654955 osd.113 [ERR] 1.11f repair stat mismatch, got 3310/3312 objects, 91/92 clones, 3243/3245 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 67/68 whiteouts, 13579894784/13584089088 bytes, 0/0 hit_set_archive bytes. 2018-11-28 18:21:26.655657 osd.113 [ERR] 1.11f repair 1 errors, 1 fixed 2018-11-28 18:19:28.979704 mon.ceph-mon01 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg repair) 2018-11-28 18:20:30.652593 mon.ceph-mon01 [WRN] Health check update: nodeep-scrub flag(s) set (OSDMAP_FLAGS) 2018-11-28 18:20:35.394445 mon.ceph-mon01 [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set) 2018-11-28 18:20:35.394457 mon.ceph-mon01 [INF] Cluster is now healthy 어..?! fixed???
  • 24. 03. 해결 과정 - HEALTH_OK [root@ceph-mon01 ~]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_OK services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in data: pools: 4 pools, 10240 pgs objects: 12321k objects, 47366 GB usage: 138 TB used, 96138 GB / 232 TB avail pgs: 10216 active+clean 24 active+clean+scrubbing+deep io: client: 424 kB/s rd, 766 kB/s wr, 18 op/s rd, 72 op/s wr
  • 26. Thank you 감사합니다 Cloud & Collaboration T. 02-516-0711 E. sales@osci.kr 서울시강남구 테헤란로83길32,5층(삼성동, 나라키움삼성동A빌딩) www.osci.kr