When IaaS Meets DFS
IaaS平台中
儲存元件的考量與其需求
Huang Chih-Chieh (soem) @ NEET
Outline
• What is IaaS
• What is OpenStack
• Storage Types in IaaS
• Ceph
– Issues
• Summary
2
WHAT IS IAAS
3
Cloud Service Models Overview
• What if you want to have an IT department ?
– Similar to build a new house in previous ana...
From IaaS to PaaS
Traditional IT
Networking
Storage
Servers
Virtualization
OS
Middleware
Runtime
Data
Applications
YouMana...
Service Model Overview
6
WHAT IS OPENSTACK
7
8
9
OpenStack Storage
10
cinder-volume
STORAGE TYPES IN IAAS
11
OpenStack Storage
• Instance Storage Provider
– Off Compute Node Storage
• Shared File System
– On Compute Node Storage
• ...
OpenNebula Storage
13
• Properties
– File System vs. Block Device
– Shared vs. Non-shared
• Four types:
– Shared File System
– Non-Shared File S...
• File System
– Bootable image / Small image
• Small image
– FS cache in host memory
– Random accessing
– Type
• Shared
– ...
• Block device (with LVM)
– Additional space / large image
– Heavy accessing image
• Large image
– No FS cache in host mem...
Storage Systems
• File Based
– NFS
– DFS
• Lustre
• GlusterFS
• MooseFS
• Ceph
17
Storage Systems
• Block Based
– iSCSI + LVM
– DRBD
– VastSky
– KPS: Kernel-based Programmable Storage System
– Ceph
18
Storage Systems
• Object Based
– Openstack Swift
– Hadoop HDFS
• with WebHDFS (1.0.4-stable) or HTTPFS (2.0.3-alpha)
– Cep...
CEPH
CEPH: THE FUTURE OF STORAGE™
20
Ceph
• Overview
– Ceph is a free software distributed file system.
– Ceph's main goals are to be POSIX-compatible, and
com...
Ceph
• Introduction
– Ceph is a distributed file system that provides
excellent performance ,reliability and scalability.
...
Ceph
• Objected-based Storage
23
Ceph
• Goal
– Scalability
• Storage capacity, throughput, client performance.
Emphasis on HPC.
– Reliability
• Failures ar...
Ceph
• Ceph Filesystem
– POSIX
• File based
• Ceph Block Device
– RBD
• Block based
• Ceph Object Gateway
– Swift / S3 Res...
Ceph
• Three main components
– Clients : Near-POSIX file system interface.
– Cluster of OSDs : Store all data and metadata...
Three Fundamental Design
1. Separating Data and Metadata
– Separation of file metadata management from the
storage.
– Meta...
Separating Data and Metadata
• Ceph separates data and metadata operations
28
Separating Data and Metadata
• Data Distribution with CRUSH
– In order to avoid imbalance(OSD idle, empty) or
load asymmet...
Dynamic Distributed Metadata
Management
2. Dynamic Distributed Metadata Management
 Ceph utilizes a metadata cluster arch...
Reliable Distributed Object Storage
3. Reliable Autonomic Distributed Object
Storage
– Replica.
– Failure Detection and Re...
Client
• Client Operation
– File I/O and Capabilities
Request
Client
(open file)
MDS
Translate file
name into
inode(inode
...
Client
• Client Synchronization
– If Multiple clients(readers and writers) use same
file, cancel any previously read and w...
Metadata
• Dynamically Distributed Metadata
– MDSs use journaling
• Repetitive metadata updates handled in memory.
• Optim...
Replica
• Replication
– Data is replicated in terms of PGs.
– Clients send all writes to the first non-failed OSD
in an ob...
Failure detection
• Failure detection
– When OSD not response → sign “down”
– Pass to the next OSD.
– If first OSD doesn’t...
Failure Recovery
• Recovery and Cluster Updates
– If an OSD1 crashes → sign “down”
– The OSD2 take over as primary.
– If O...
EVERYTHING LOOKS GOOD, BUT…
38
Issues
• Highly developed
– 0.48
• Monitor waste CPUs
• Recovery into un-consistency state
– 0.56
• Bugs in file extend be...
Issues
• Correct the time
– 0.56
• OSDs waste CPUs
– ntpdate tock.stdtime.gov.tw
– 0.67
• health HEALTH_WARN clock skew de...
Issues
• CephFS is not statble
– Newly system can use ceph RBD
– Traditional system could only use the POSIX
interface
• 0...
Issues
• Mount ceph with
– Kernel module
• mount –t ceph …
– FUSE
• ceph-fuse -c /etc/ceph/ceph.conf …
42
Issues
43
root@SSCloud-01:/# cephfs /mnt/dev set_layout -p 5
Segmentation fault
cephfs is not a super-friendly tool right ...
SUMMARY
44
Summary
• There are three type of storage in IaaS
– File-based, block-based, object-based
• Ceph is a good choice for IaaS...
Upcoming SlideShare
Loading in …5
×

Dfs in iaa_s

586 views
424 views

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
586
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Dfs in iaa_s

  1. 1. When IaaS Meets DFS IaaS平台中 儲存元件的考量與其需求 Huang Chih-Chieh (soem) @ NEET
  2. 2. Outline • What is IaaS • What is OpenStack • Storage Types in IaaS • Ceph – Issues • Summary 2
  3. 3. WHAT IS IAAS 3
  4. 4. Cloud Service Models Overview • What if you want to have an IT department ? – Similar to build a new house in previous analogy • You can rent some virtualized infrastructure and build up your own IT system among those resources, which may be fully controlled. • Technical speaking, use the Infrastructure as a Service (IaaS) solution. – Similar to buy an empty house in previous analogy • You can directly develop your IT system through one cloud platform, and do not care about any lower level resource management. • Technical speaking, use the Platform as a Service (PaaS) solution. – Similar to live in a hotel in previous analogy • You can directly use some existed IT system solutions, which were provided by some cloud application service provider, without knowing any detail technique about how these service was achieved. • Technical speaking, use the Software as a Service (SaaS) solution. 4
  5. 5. From IaaS to PaaS Traditional IT Networking Storage Servers Virtualization OS Middleware Runtime Data Applications YouManage IaaS Networking Storage Servers Virtualization OS Middleware Runtime Data Applications YouManage ProviderManage PaaS Networking Storage Servers Virtualization OS Middleware Runtime Data Applications YouManage ProviderManage 5
  6. 6. Service Model Overview 6
  7. 7. WHAT IS OPENSTACK 7
  8. 8. 8
  9. 9. 9
  10. 10. OpenStack Storage 10 cinder-volume
  11. 11. STORAGE TYPES IN IAAS 11
  12. 12. OpenStack Storage • Instance Storage Provider – Off Compute Node Storage • Shared File System – On Compute Node Storage • Shared File System – On Compute Node Storage • Non-shared File System • Image Repository – Glance 12
  13. 13. OpenNebula Storage 13
  14. 14. • Properties – File System vs. Block Device – Shared vs. Non-shared • Four types: – Shared File System – Non-Shared File System – Shared Block Device – Non-Shared Block Device SSCloud Storage 14
  15. 15. • File System – Bootable image / Small image • Small image – FS cache in host memory – Random accessing – Type • Shared – DFS (ceph), NFS (nfsd) • Non-Shared – Local filesystem + scp SSCloud Storage 15
  16. 16. • Block device (with LVM) – Additional space / large image – Heavy accessing image • Large image – No FS cache in host memory => save memory – Large chunk access » Hadoop (64MB~128MB per file) – Type • Shared – iSCSI + LVM • Non-Shared – LVM SSCloud Storage 16
  17. 17. Storage Systems • File Based – NFS – DFS • Lustre • GlusterFS • MooseFS • Ceph 17
  18. 18. Storage Systems • Block Based – iSCSI + LVM – DRBD – VastSky – KPS: Kernel-based Programmable Storage System – Ceph 18
  19. 19. Storage Systems • Object Based – Openstack Swift – Hadoop HDFS • with WebHDFS (1.0.4-stable) or HTTPFS (2.0.3-alpha) – Ceph 19
  20. 20. CEPH CEPH: THE FUTURE OF STORAGE™ 20
  21. 21. Ceph • Overview – Ceph is a free software distributed file system. – Ceph's main goals are to be POSIX-compatible, and completely distributed without a single point of failure. – The data is seamlessly replicated, making it fault tolerant. • Release – On July 3, 2012, the Ceph development team released Argonaut, the first release of Ceph with long-term support. 21
  22. 22. Ceph • Introduction – Ceph is a distributed file system that provides excellent performance ,reliability and scalability. – Objected-based Storage. – Ceph separates data and metadata operations by eliminating file allocation tables and replacing them with generating functions. – Ceph utilizes a highly adaptive distributed metadata cluster, improving scalability. – Using OSD to directly access data, high performance. 22
  23. 23. Ceph • Objected-based Storage 23
  24. 24. Ceph • Goal – Scalability • Storage capacity, throughput, client performance. Emphasis on HPC. – Reliability • Failures are the norm rather than the exception, so the system must have fault detection and recovery mechanism. – Performance • Dynamic workloads  Load balance. 24
  25. 25. Ceph • Ceph Filesystem – POSIX • File based • Ceph Block Device – RBD • Block based • Ceph Object Gateway – Swift / S3 Restful API • Object based 25
  26. 26. Ceph • Three main components – Clients : Near-POSIX file system interface. – Cluster of OSDs : Store all data and metadata. – Metadata Cluster : Manage namespace(file name) 26
  27. 27. Three Fundamental Design 1. Separating Data and Metadata – Separation of file metadata management from the storage. – Metadata operations are collectively managed by a metadata server cluster. – User can direct access OSDs to get data by metadata. – Ceph remove data allocation lists entirely. – Using CRUSH assigns objects to storage devices. 27
  28. 28. Separating Data and Metadata • Ceph separates data and metadata operations 28
  29. 29. Separating Data and Metadata • Data Distribution with CRUSH – In order to avoid imbalance(OSD idle, empty) or load asymmetries(hot data on new device). →distributing new data randomly. – Ceph maps ojects into Placement groups(PGs)PGs are assigned to OSDs by CRUSH. 29
  30. 30. Dynamic Distributed Metadata Management 2. Dynamic Distributed Metadata Management  Ceph utilizes a metadata cluster architecture based on Dynamic Subtree Partitioning.(workload balance) – Dynamic Subtree Partitioning • Most FS, use static subtree partitioning →imbalance workloads and easy hash function. • Ceph’s MDS cluster is based on a dynamic subtree partitioning. →balance workloads 30
  31. 31. Reliable Distributed Object Storage 3. Reliable Autonomic Distributed Object Storage – Replica. – Failure Detection and Recovery. 31
  32. 32. Client • Client Operation – File I/O and Capabilities Request Client (open file) MDS Translate file name into inode(inode number, file owner, mode, size, …)Check OK, return Return inode number, map file data into objects(CRUSH) OSD Direct access 32
  33. 33. Client • Client Synchronization – If Multiple clients(readers and writers) use same file, cancel any previously read and write capability until OSD check OK. • Traditional: Update serialization. →Bad performance • Ceph: Use HPC(high-performance computing community) can read and write different parts of same file(diff bojects). →increase performance 33
  34. 34. Metadata • Dynamically Distributed Metadata – MDSs use journaling • Repetitive metadata updates handled in memory. • Optimizes on-disk layout for read access. – Per-MDS has journal, when MDS failure another node can quickly recover with journal. – Inodes are embedded directly within directories. – Each directory’s content is written to the OSD cluster using the same striping and distribution strategy as metadata journals and file data. 34
  35. 35. Replica • Replication – Data is replicated in terms of PGs. – Clients send all writes to the first non-failed OSD in an object’s PG (the primary), which assigns a new version number for the object and PG and forwards the write to any additional replica OSDs. 35
  36. 36. Failure detection • Failure detection – When OSD not response → sign “down” – Pass to the next OSD. – If first OSD doesn’t recover →sign “out” – Another OSD join. 36
  37. 37. Failure Recovery • Recovery and Cluster Updates – If an OSD1 crashes → sign “down” – The OSD2 take over as primary. – If OSD1 recovers → sign “up” – The OSD2 receives update request, sent new version data to OSD1. 37
  38. 38. EVERYTHING LOOKS GOOD, BUT… 38
  39. 39. Issues • Highly developed – 0.48 • Monitor waste CPUs • Recovery into un-consistency state – 0.56 • Bugs in file extend behavior – Qcow2 images have got IO errors in VMs kernel, » but things are going well in the log of Ceph. – 0.67 • ceph-deploy 39
  40. 40. Issues • Correct the time – 0.56 • OSDs waste CPUs – ntpdate tock.stdtime.gov.tw – 0.67 • health HEALTH_WARN clock skew detected on mon.1 – ntpdate tock.stdtime.gov.tw – ntpserver 40
  41. 41. Issues • CephFS is not statble – Newly system can use ceph RBD – Traditional system could only use the POSIX interface • 0.56 – Ceph’s operation in a folder would be frozen, » if that folder is getting heavy loading. – Bugs in file extend behavior REF: http://www.sebastien-han.fr/blog/2013/06/24/what-i-think-about-cephfs-in-openstack/ 41
  42. 42. Issues • Mount ceph with – Kernel module • mount –t ceph … – FUSE • ceph-fuse -c /etc/ceph/ceph.conf … 42
  43. 43. Issues 43 root@SSCloud-01:/# cephfs /mnt/dev set_layout -p 5 Segmentation fault cephfs is not a super-friendly tool right now — sorry! :( I believe you will find it works correctly if you specify all the layout parameters, not just one of them. root@SSCloud-01:/# cephfs -h not enough parameters! usage: cephfs path command [options]* Commands: show_layout -- view the layout information on a file or dir set_layout -- set the layout on an empty file, or the default layout on a directory show_location -- view the location information on a file map -- display file objects, pgs, osds Options: Useful for setting layouts: --stripe_unit, -u: set the size of each stripe --stripe_count, -c: set the number of objects to stripe across --object_size, -s: set the size of the objects to stripe across --pool, -p: set the pool to use Useful for getting location data: --offset, -l: the offset to retrieve location data for root@SSCloud-01:/# cephfs /mnt/dev set_layout -u 4194304 -c 1 -s 4194304 -p 5 root@SSCloud-01:/# cephfs /mnt/dev show_layout layout.data_pool: 5 layout.object_size: 4194304 layout.stripe_unit: 4194304 layout.stripe_count: 1
  44. 44. SUMMARY 44
  45. 45. Summary • There are three type of storage in IaaS – File-based, block-based, object-based • Ceph is a good choice for IaaS – OpenStack store images in Ceph Block Device – Cinder or nova-volume to boot a VM • using a copy-on-write clone of an image • CephFS is still highly developed – However, newer version is better. 45

×