SlideShare a Scribd company logo
1 of 20
Download to read offline
開源分散式儲存實做
可靠度研究
Chu, Hua-Rong @ TANET 2018
Speaker
● CHTTL Cloud Lab. / mogilefs-moji team member
● Maintainer / contributor to several storage related projects
● http://www.github.com/hrchu
Distributed File System
A.K.A.
- restful storage, object storage, distributed storage system, software defined storage...
Characteristics
- Accessible: flat namespace / access files via restful API, SNIA CDMI or other protocol
- Scalable/Available: files are written to multiple drives spread throughout servers and data centers
- Reliable/Durable: ensuring data replication and integrity across the cluster
Typical Implementations
- AWS S3 / Azure blob storage / Google cloud storage
- MogileFS / Openstack Swift / Ceph radosgw
How reliable is your DFS setup?
● Hard to measure by simply testing strategies
● Most existing works are addressing theoretical and conceptual frameworks
We investigated open sourced DFSs[1] , and found that...
1. Points of concern and characteristics are vary among DFSs we surveyed
2. Existing reliable risks among DFSs implementations:
Study case
● 實作及版本:MogileFS 2.57+
● 儲存節點數量:147台
● 硬碟機數量:1547台
● 儲存容量:~3PB
● 檔案數量:396,043,825份
● 讀寫比率:0.34
● 機房配置:兩座機房
Study Result
[1] 分散式儲存可靠度的實務性研究 , 電信研究期刊 第 47 卷第 1 期
Regarding MogileFS (Responding to reviewers)
One of widely adopted DFS today
- Implemented by Brad Fitzpatrick (1999)
- Simple architecture design
- Users: KKBOX, Dreamwidth Studios, Sixpart...
Be the research case in this work
[2] MogileFS 簡約可靠的儲存方案 , TWJUG 2016
What we proposed in this work
Two methods to address the issue are proposed based on insights from previous works
1. Simple Majority Write
2. Routined Deep FSCK
Outcome
● Code: available in github
● Reliability evaluation: introduced in the following sections
Reliability Analysis
Goal
● Mean time to data loss (MTTDL)
○ a classical metric for studys of reliability
● Approximate Reliability
○ needed for our customer and my boss
○ also the key factor of service level of agreement (SLA)
Modeling
● Continuous-time markov chain (CTMC)
○ number of replicas as state
○ disks fail independently
○ failure process - Poisson process with rate λ
○ repair time - exponential with mean time 1/μ
1. Simple Majority Write
AS-IS TO-BE
(4)
(5) MTTDLfrom1=
(1) Mean time in state 1 =
(2) P(1->2), P(1->0) =
(3)
1. Simple Majority Write - MTTDL
(4)
(5) MDDDLfrom1=
1. Simple Majority Write - MTTDL
1. Simple Majority Write - Reliability
假設副本存活時間為50萬小時,產生時間為1小時
● 保存五年可靠度:99.99996%
● 保存十年可靠度:99.99990%
2. Routined Deep FSCK
AS-IS: check host/disk regularly
TO-BE: check each file length and checksum
Extended from the previous model
● c: file damage detected rate
● state 1’: one replica is broken
and unawared
(1) Mean time in each state (2) P(1->2), P(1->0) =
2. Routined Deep FSCK - MTTDL
(3) Possible ways to walk:
(4) Time of walk patterns:
(5) MTTDL:
2. Routined Deep FSCK - MTTDL
Extremely cases
(6) c=1, MTTDL=
(7) c=0, MTTDL=
2. Routined Deep FSCK - MTTDL
(5) MTTDL:
2. Routined Deep FSCK - Reliability
Assume coverage rates are:
● AS-IS: 72%
● TO-BE: 81%
Reliability Difference:
● 檔案保存五年: 0.78%
● 檔案保存十年: 1.5%
Summary
Contributions
● 呈現開源分散式儲存實作的可靠度特性及調校評估
○ 揭露開源分散式儲存實作實際運行可靠度數據
○ 提供DFS開發者可靠度機制的評估方法
○ 實務上DFS開發者/用戶應優先考量驗證涵蓋度及降低修復時間 (Amdahl's Law)
● Impact factor of reliability: integrity check coverage > repair time > sync write
○ 簡單多數寫入相較於縮短檔案修復時間,改善效果有限
○ 改善幅度僅跟平均副本壽命有關,與修復時間無關
○ 例行性FSCK能提高異常偵測涵蓋度,相較於縮短修復時間能顯著改善可靠度
Limitation and future work
● Independence of disks failure is impractical in the real world
○ powered by more magical math?
○ simulation such as the Monte Carlo method
● 評估受限於可靠度數據難以於短期取得,無法進一步驗證
○ 強化儲存叢集數據蒐集
○ 蒐集不同實作叢集運行數據 , e.g., openstack swift and Ceph

More Related Content

What's hot

CS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXCS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIX
ruchith
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Giuseppe Paterno'
 
Grand Central Dispatch - iOS Conf SG 2015
Grand Central Dispatch - iOS Conf SG 2015Grand Central Dispatch - iOS Conf SG 2015
Grand Central Dispatch - iOS Conf SG 2015
Ben Asher
 

What's hot (20)

Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
CS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXCS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIX
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challenges
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Barcamp presentation
Barcamp presentationBarcamp presentation
Barcamp presentation
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
LMAX Disruptor as real-life example
LMAX Disruptor as real-life exampleLMAX Disruptor as real-life example
LMAX Disruptor as real-life example
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
 
SMB3 Offload Data Transfer (ODX)
SMB3 Offload Data Transfer (ODX)SMB3 Offload Data Transfer (ODX)
SMB3 Offload Data Transfer (ODX)
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summit
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
Smb 3-odx-traffic
Smb 3-odx-trafficSmb 3-odx-traffic
Smb 3-odx-traffic
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan Rossi
 
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
 
Experimental dtrace
Experimental dtraceExperimental dtrace
Experimental dtrace
 
Grand Central Dispatch - iOS Conf SG 2015
Grand Central Dispatch - iOS Conf SG 2015Grand Central Dispatch - iOS Conf SG 2015
Grand Central Dispatch - iOS Conf SG 2015
 
Disruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilDisruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.il
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 

Similar to TANET 2018 - Insights into the reliability of open-source distributed file system (DFS)

A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
Johan Tibell
 
Presentation v1 (1)
Presentation v1 (1)Presentation v1 (1)
Presentation v1 (1)
koboltmarky
 
Introduction to Mesos
Introduction to MesosIntroduction to Mesos
Introduction to Mesos
koboltmarky
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on Emulators
DVClub
 

Similar to TANET 2018 - Insights into the reliability of open-source distributed file system (DFS) (20)

Cncf storage-final-filip
Cncf storage-final-filipCncf storage-final-filip
Cncf storage-final-filip
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
 
Groovy In the Cloud
Groovy In the CloudGroovy In the Cloud
Groovy In the Cloud
 
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use cases
 
Developers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomonDevelopers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomon
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Developing, testing and distributing elasticsearch beats in a complex, heter...
Developing, testing and distributing elasticsearch beats in  a complex, heter...Developing, testing and distributing elasticsearch beats in  a complex, heter...
Developing, testing and distributing elasticsearch beats in a complex, heter...
 
Code Review with Sonar
Code Review with SonarCode Review with Sonar
Code Review with Sonar
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
Lock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data GuaranteedLock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data Guaranteed
 
Presentation v1 (1)
Presentation v1 (1)Presentation v1 (1)
Presentation v1 (1)
 
Introduction to Mesos
Introduction to MesosIntroduction to Mesos
Introduction to Mesos
 
Kernel Recipes 2015 - So you want to write a Linux driver framework
Kernel Recipes 2015 - So you want to write a Linux driver frameworkKernel Recipes 2015 - So you want to write a Linux driver framework
Kernel Recipes 2015 - So you want to write a Linux driver framework
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on Emulators
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 

More from Hua Chu

More from Hua Chu (7)

PyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive applicationPyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive application
 
EuroPython 2020 - Speak python with devices
EuroPython 2020 - Speak python with devicesEuroPython 2020 - Speak python with devices
EuroPython 2020 - Speak python with devices
 
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
 
Taipei.py 2018 - Control device via ioctl from Python
Taipei.py 2018 - Control device via ioctl from Python Taipei.py 2018 - Control device via ioctl from Python
Taipei.py 2018 - Control device via ioctl from Python
 
Apache spot 系統架構
Apache spot 系統架構Apache spot 系統架構
Apache spot 系統架構
 
Apache spot 初步瞭解
Apache spot 初步瞭解Apache spot 初步瞭解
Apache spot 初步瞭解
 
TWJUG 2016 - Mogilefs, 簡約可靠的儲存方案
TWJUG 2016 - Mogilefs, 簡約可靠的儲存方案TWJUG 2016 - Mogilefs, 簡約可靠的儲存方案
TWJUG 2016 - Mogilefs, 簡約可靠的儲存方案
 

Recently uploaded

一比一原版SUT毕业证成绩单如何办理
一比一原版SUT毕业证成绩单如何办理一比一原版SUT毕业证成绩单如何办理
一比一原版SUT毕业证成绩单如何办理
cnzepoz
 
一比一原版UVic毕业证成绩单如何办理
一比一原版UVic毕业证成绩单如何办理一比一原版UVic毕业证成绩单如何办理
一比一原版UVic毕业证成绩单如何办理
cnzepoz
 
一比一原版ArtEZ毕业证成绩单如何办理
一比一原版ArtEZ毕业证成绩单如何办理一比一原版ArtEZ毕业证成绩单如何办理
一比一原版ArtEZ毕业证成绩单如何办理
cnzepoz
 
一比一原版UofM毕业证成绩单如何办理
一比一原版UofM毕业证成绩单如何办理一比一原版UofM毕业证成绩单如何办理
一比一原版UofM毕业证成绩单如何办理
cnzepoz
 
一比一原版UW毕业证成绩单如何办理
一比一原版UW毕业证成绩单如何办理一比一原版UW毕业证成绩单如何办理
一比一原版UW毕业证成绩单如何办理
cnzepoz
 
一比一原版UMich毕业证成绩单如何办理
一比一原版UMich毕业证成绩单如何办理一比一原版UMich毕业证成绩单如何办理
一比一原版UMich毕业证成绩单如何办理
cnzepoz
 
Abortion pills in Riyadh |•••@•••| +966572737505 |•••@•••| Buy Cytotec
Abortion pills in Riyadh |•••@•••| +966572737505 |•••@•••| Buy CytotecAbortion pills in Riyadh |•••@•••| +966572737505 |•••@•••| Buy Cytotec
Abortion pills in Riyadh |•••@•••| +966572737505 |•••@•••| Buy Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cnzepoz
 
一比一原版GT毕业证成绩单如何办理
一比一原版GT毕业证成绩单如何办理一比一原版GT毕业证成绩单如何办理
一比一原版GT毕业证成绩单如何办理
cnzepoz
 
一比一原版UC Berkeley毕业证成绩单如何办理
一比一原版UC Berkeley毕业证成绩单如何办理一比一原版UC Berkeley毕业证成绩单如何办理
一比一原版UC Berkeley毕业证成绩单如何办理
cnzepoz
 
一比一原版UCB毕业证成绩单如何办理
一比一原版UCB毕业证成绩单如何办理一比一原版UCB毕业证成绩单如何办理
一比一原版UCB毕业证成绩单如何办理
cnzepoz
 
一比一原版Southern Cross毕业证成绩单如何办理
一比一原版Southern Cross毕业证成绩单如何办理一比一原版Southern Cross毕业证成绩单如何办理
一比一原版Southern Cross毕业证成绩单如何办理
cnzepoz
 
一比一原版AIS毕业证成绩单如何办理
一比一原版AIS毕业证成绩单如何办理一比一原版AIS毕业证成绩单如何办理
一比一原版AIS毕业证成绩单如何办理
cnzepoz
 
一比一原版迪肯大学毕业证成绩单如何办理
一比一原版迪肯大学毕业证成绩单如何办理一比一原版迪肯大学毕业证成绩单如何办理
一比一原版迪肯大学毕业证成绩单如何办理
cnzepoz
 
Balancing of rotating bodies questions.pptx
Balancing of rotating bodies questions.pptxBalancing of rotating bodies questions.pptx
Balancing of rotating bodies questions.pptx
joshuaclack73
 
1. WIX 2 PowerPoint for Work Experience.pptx
1. WIX 2 PowerPoint for Work Experience.pptx1. WIX 2 PowerPoint for Work Experience.pptx
1. WIX 2 PowerPoint for Work Experience.pptx
louise569794
 

Recently uploaded (20)

一比一原版SUT毕业证成绩单如何办理
一比一原版SUT毕业证成绩单如何办理一比一原版SUT毕业证成绩单如何办理
一比一原版SUT毕业证成绩单如何办理
 
一比一原版UVic毕业证成绩单如何办理
一比一原版UVic毕业证成绩单如何办理一比一原版UVic毕业证成绩单如何办理
一比一原版UVic毕业证成绩单如何办理
 
一比一原版ArtEZ毕业证成绩单如何办理
一比一原版ArtEZ毕业证成绩单如何办理一比一原版ArtEZ毕业证成绩单如何办理
一比一原版ArtEZ毕业证成绩单如何办理
 
一比一原版UofM毕业证成绩单如何办理
一比一原版UofM毕业证成绩单如何办理一比一原版UofM毕业证成绩单如何办理
一比一原版UofM毕业证成绩单如何办理
 
一比一原版UW毕业证成绩单如何办理
一比一原版UW毕业证成绩单如何办理一比一原版UW毕业证成绩单如何办理
一比一原版UW毕业证成绩单如何办理
 
Aluminum Die Casting Manufacturers in China - BIAN Diecast
Aluminum Die Casting Manufacturers in China - BIAN DiecastAluminum Die Casting Manufacturers in China - BIAN Diecast
Aluminum Die Casting Manufacturers in China - BIAN Diecast
 
一比一原版UMich毕业证成绩单如何办理
一比一原版UMich毕业证成绩单如何办理一比一原版UMich毕业证成绩单如何办理
一比一原版UMich毕业证成绩单如何办理
 
Abortion pills in Riyadh |•••@•••| +966572737505 |•••@•••| Buy Cytotec
Abortion pills in Riyadh |•••@•••| +966572737505 |•••@•••| Buy CytotecAbortion pills in Riyadh |•••@•••| +966572737505 |•••@•••| Buy Cytotec
Abortion pills in Riyadh |•••@•••| +966572737505 |•••@•••| Buy Cytotec
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
China Die Casting Manufacturer & Supplier - Bian Diecast
China Die Casting Manufacturer & Supplier - Bian DiecastChina Die Casting Manufacturer & Supplier - Bian Diecast
China Die Casting Manufacturer & Supplier - Bian Diecast
 
一比一原版GT毕业证成绩单如何办理
一比一原版GT毕业证成绩单如何办理一比一原版GT毕业证成绩单如何办理
一比一原版GT毕业证成绩单如何办理
 
一比一原版UC Berkeley毕业证成绩单如何办理
一比一原版UC Berkeley毕业证成绩单如何办理一比一原版UC Berkeley毕业证成绩单如何办理
一比一原版UC Berkeley毕业证成绩单如何办理
 
一比一原版UCB毕业证成绩单如何办理
一比一原版UCB毕业证成绩单如何办理一比一原版UCB毕业证成绩单如何办理
一比一原版UCB毕业证成绩单如何办理
 
NO1 Qari kala jadu karne wale ka contact number kala jadu karne wale baba kal...
NO1 Qari kala jadu karne wale ka contact number kala jadu karne wale baba kal...NO1 Qari kala jadu karne wale ka contact number kala jadu karne wale baba kal...
NO1 Qari kala jadu karne wale ka contact number kala jadu karne wale baba kal...
 
NO1 Qari Rohani Amil In Islamabad Amil Baba in Rawalpindi Kala Jadu Amil In R...
NO1 Qari Rohani Amil In Islamabad Amil Baba in Rawalpindi Kala Jadu Amil In R...NO1 Qari Rohani Amil In Islamabad Amil Baba in Rawalpindi Kala Jadu Amil In R...
NO1 Qari Rohani Amil In Islamabad Amil Baba in Rawalpindi Kala Jadu Amil In R...
 
一比一原版Southern Cross毕业证成绩单如何办理
一比一原版Southern Cross毕业证成绩单如何办理一比一原版Southern Cross毕业证成绩单如何办理
一比一原版Southern Cross毕业证成绩单如何办理
 
一比一原版AIS毕业证成绩单如何办理
一比一原版AIS毕业证成绩单如何办理一比一原版AIS毕业证成绩单如何办理
一比一原版AIS毕业证成绩单如何办理
 
一比一原版迪肯大学毕业证成绩单如何办理
一比一原版迪肯大学毕业证成绩单如何办理一比一原版迪肯大学毕业证成绩单如何办理
一比一原版迪肯大学毕业证成绩单如何办理
 
Balancing of rotating bodies questions.pptx
Balancing of rotating bodies questions.pptxBalancing of rotating bodies questions.pptx
Balancing of rotating bodies questions.pptx
 
1. WIX 2 PowerPoint for Work Experience.pptx
1. WIX 2 PowerPoint for Work Experience.pptx1. WIX 2 PowerPoint for Work Experience.pptx
1. WIX 2 PowerPoint for Work Experience.pptx
 

TANET 2018 - Insights into the reliability of open-source distributed file system (DFS)

  • 2. Speaker ● CHTTL Cloud Lab. / mogilefs-moji team member ● Maintainer / contributor to several storage related projects ● http://www.github.com/hrchu
  • 3. Distributed File System A.K.A. - restful storage, object storage, distributed storage system, software defined storage... Characteristics - Accessible: flat namespace / access files via restful API, SNIA CDMI or other protocol - Scalable/Available: files are written to multiple drives spread throughout servers and data centers - Reliable/Durable: ensuring data replication and integrity across the cluster Typical Implementations - AWS S3 / Azure blob storage / Google cloud storage - MogileFS / Openstack Swift / Ceph radosgw
  • 4. How reliable is your DFS setup? ● Hard to measure by simply testing strategies ● Most existing works are addressing theoretical and conceptual frameworks
  • 5. We investigated open sourced DFSs[1] , and found that... 1. Points of concern and characteristics are vary among DFSs we surveyed 2. Existing reliable risks among DFSs implementations: Study case ● 實作及版本:MogileFS 2.57+ ● 儲存節點數量:147台 ● 硬碟機數量:1547台 ● 儲存容量:~3PB ● 檔案數量:396,043,825份 ● 讀寫比率:0.34 ● 機房配置:兩座機房 Study Result [1] 分散式儲存可靠度的實務性研究 , 電信研究期刊 第 47 卷第 1 期
  • 6. Regarding MogileFS (Responding to reviewers) One of widely adopted DFS today - Implemented by Brad Fitzpatrick (1999) - Simple architecture design - Users: KKBOX, Dreamwidth Studios, Sixpart... Be the research case in this work [2] MogileFS 簡約可靠的儲存方案 , TWJUG 2016
  • 7. What we proposed in this work Two methods to address the issue are proposed based on insights from previous works 1. Simple Majority Write 2. Routined Deep FSCK Outcome ● Code: available in github ● Reliability evaluation: introduced in the following sections
  • 8. Reliability Analysis Goal ● Mean time to data loss (MTTDL) ○ a classical metric for studys of reliability ● Approximate Reliability ○ needed for our customer and my boss ○ also the key factor of service level of agreement (SLA) Modeling ● Continuous-time markov chain (CTMC) ○ number of replicas as state ○ disks fail independently ○ failure process - Poisson process with rate λ ○ repair time - exponential with mean time 1/μ
  • 9. 1. Simple Majority Write AS-IS TO-BE
  • 10. (4) (5) MTTDLfrom1= (1) Mean time in state 1 = (2) P(1->2), P(1->0) = (3) 1. Simple Majority Write - MTTDL
  • 11. (4) (5) MDDDLfrom1= 1. Simple Majority Write - MTTDL
  • 12. 1. Simple Majority Write - Reliability 假設副本存活時間為50萬小時,產生時間為1小時 ● 保存五年可靠度:99.99996% ● 保存十年可靠度:99.99990%
  • 13. 2. Routined Deep FSCK AS-IS: check host/disk regularly TO-BE: check each file length and checksum Extended from the previous model ● c: file damage detected rate ● state 1’: one replica is broken and unawared
  • 14. (1) Mean time in each state (2) P(1->2), P(1->0) = 2. Routined Deep FSCK - MTTDL
  • 15. (3) Possible ways to walk: (4) Time of walk patterns: (5) MTTDL: 2. Routined Deep FSCK - MTTDL
  • 16. Extremely cases (6) c=1, MTTDL= (7) c=0, MTTDL= 2. Routined Deep FSCK - MTTDL (5) MTTDL:
  • 17. 2. Routined Deep FSCK - Reliability Assume coverage rates are: ● AS-IS: 72% ● TO-BE: 81% Reliability Difference: ● 檔案保存五年: 0.78% ● 檔案保存十年: 1.5%
  • 19. Contributions ● 呈現開源分散式儲存實作的可靠度特性及調校評估 ○ 揭露開源分散式儲存實作實際運行可靠度數據 ○ 提供DFS開發者可靠度機制的評估方法 ○ 實務上DFS開發者/用戶應優先考量驗證涵蓋度及降低修復時間 (Amdahl's Law) ● Impact factor of reliability: integrity check coverage > repair time > sync write ○ 簡單多數寫入相較於縮短檔案修復時間,改善效果有限 ○ 改善幅度僅跟平均副本壽命有關,與修復時間無關 ○ 例行性FSCK能提高異常偵測涵蓋度,相較於縮短修復時間能顯著改善可靠度
  • 20. Limitation and future work ● Independence of disks failure is impractical in the real world ○ powered by more magical math? ○ simulation such as the Monte Carlo method ● 評估受限於可靠度數據難以於短期取得,無法進一步驗證 ○ 強化儲存叢集數據蒐集 ○ 蒐集不同實作叢集運行數據 , e.g., openstack swift and Ceph