Your SlideShare is downloading. ×
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Ceph Intro and Architectural Overview by Ross Turk

3,796

Published on

Published in: Technology
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,796
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
386
Comments
0
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Hi, welcome to my talk. I’m really happy that you chose to join me for this, given your many other choices. Believe me, I’m going to tell you things that will literally tear your head off. Ok, not literally. That would be really messy.
  • Working through a computer means that we can store more information, and we can store it more quickly. But it also means that we’re separated from the information we’ve created.
  • Working through a computer means that we can store more information, and we can store it more quickly. But it also means that we’re separated from the information we’ve created.
  • Ceph was designed to be self-managing. Lots of distributed storage systems require operator intervention when something goes wrong.
  • RADOS is a distributed object store, and it’s the foundation for Ceph. On top of RADOS, the Ceph team has built three applications that allow you to store data and do fantastic things. But before we get into all of that, let’s start at the beginning of the story.
  • But that’s a lot to digest all at once. Let’s start with RADOS.
  • Remember all that meta-data we talked about in the beginning? Feels so long ago. It has to be stored somewhere! Something has to keep track of who created files, when they were created, and who has the right to access them. And something has to remember where they live within a tree. Enter MDS, the Ceph Metadata Server. Clients accessing Ceph FS data first make a request to an MDS, which provides what they need to get files from the right OSDs.
  • If you aren’t running Ceph FS, you don’t need to deploy metadata servers.
  • So now that you know what Ceph is, I’m going to tell you what makes it different.
  • All of that metadata for Ceph FS has to be stored somewhere. It’s a giant diary, keeping track of where everything is and who owns it.
  • MDSs store all of their data within RADOS itself, but there’s still a problem…
  • There are multiple MDSs!
  • So how do you have one tree and multiple servers?
  • If there’s just one MDS (which is a terrible idea), it manages metadata for the entire tree.
  • When the second one comes along, it will intelligently partition the work by taking a subtree.
  • When the third MDS arrives, it will attempt to split the tree again.
  • Same with the fourth.
  • A MDS can actually even just take a single directory or file, if the load is high enough. This all happens dynamically based on load and the structure of the data, and it’s called “dynamic subtree partitioning”.
  • Transcript

    • 1. Ceph Intro & Architectural OverviewRoss TurkVP Community, Inktank
    • 2. ME ME ME ME ME ME.2I made a slide today. It’s all about me.Ross TurkVP Community, Inktankross@inktank.com@rossturkinktank.com | ceph.com
    • 3. 3CLOUD SERVICESCOMPUTE NETWORK STORAGEthe future of storage™
    • 4. 4HUMANCOMPUTER TAPEHUMANROCKHUMANINKPAPER
    • 5. 5HUMANCOMPUTER TAPE
    • 6. 6YOUTECHNOLOGYYOUR DATA
    • 7. 7How Much Store Things All HumanHistory?!writingpapercomputersdistributed storagecloud computinggaaaaaaaaahhhh!!!!!!carving
    • 8. 8HUMANCOMPUTERDISKDISKDISKDISKDISKDISKDISKHUMANHUMAN
    • 9. 9DISKDISKDISKDISKDISKDISKDISKDISKDISKDISKDISKDISKHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANCOMPUTER
    • 10. 10DISKDISKDISKDISKDISKDISKDISKDISKDISKDISKDISKDISKHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANHUMANGIANTSPENDYCOMPUTER
    • 11. 11DISKCOMPUTERHUMANHUMANHUMANDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTER
    • 12. 12HUMANHUMANHUMANDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTER
    • 13. 13DISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTER“STORAGE APPLIANCE”
    • 14. Storage ApplianceMichael Moll, Wikipedia / CC BY-SA 2.0 14
    • 15. SUPPORT ANDMAINTENANCEPROPRIETARYSOFTWARE15PROPRIETARYHARDWAREDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTER34% of 2012 revenue(5.2 billion dollars)1.1 billion in R&Dspent in 20121.6 million square feetof manufacturing space
    • 16. 161010100110101011001100110010110011010111001100111001010011THE CLOUD
    • 17. SUPPORT ANDMAINTENANCEPROPRIETARYSOFTWARE17PROPRIETARYHARDWAREDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERSTANDARDHARDWAREDISKCOMPUTERDISKCOMPUTERDISKCOMPUTERDISKCOMPUTEROPEN SOURCESOFTWAREENTERPRISESUBSCRIPTION(optional)
    • 18. 18
    • 19. 19OPENSOURCECOMMUNITY-FOCUSEDSCALABLENO SINGLE POINT OFFAILURESOFTWAREBASEDSELF-MANAGINGphilosophy design
    • 20. 208 years & 20,000 commits later…
    • 21. 21CEPHOBJECT GATEWAYA powerful S3- and Swift-compatible gateway thatbrings the power of theCeph Object Store tomodern applicationsCEPHBLOCKDEVICEA distributed virtual blockdevice that delivers high-performance, cost-effective storage forvirtual machines andlegacy applicationsCEPHFILESYSTEMA distributed, scale-outfilesystem with POSIXsemantics that providesstorage for a legacy andmodern applicationsOBJECTS VIRTUAL DISKS FILES & DIRECTORIESCEPH STORAGECLUSTERA reliable, easy to manage, next-generation distributed objectstore that provides storage of unstructured data for applications
    • 22. 22RADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodesLIBRADOSA library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHPRBDA reliable and fully-distributed blockdevice, with a Linuxkernel client and aQEMU/KVM driverCEPH FSA POSIX-compliantdistributed filesystem, with a Linuxkernel client andsupport for FUSERADOSGWA bucket-based RESTgateway, compatiblewith S3 and SwiftAPP APP HOST/VM CLIENT
    • 23. 23RADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodesLIBRADOSA library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHPRBDA reliable and fully-distributed blockdevice, with a Linuxkernel client and aQEMU/KVM driverCEPH FSA POSIX-compliantdistributed filesystem, with a Linuxkernel client andsupport for FUSERADOSGWA bucket-based RESTgateway, compatiblewith S3 and SwiftAPP APP HOST/VM CLIENT
    • 24. 24DISKFSDISK DISKOSDDISK DISKOSD OSD OSD OSDFS FS FSFSbtrfsxfsext4MMM
    • 25. 25MMMHUMAN
    • 26. 26Monitors:• Maintain cluster membershipand state• Provide consensus fordistributed decision-making• Small, odd number• These do not serve storedobjects to clientsMOSDs:• 10s to 10000s in a cluster• One per disk• (or one per SSD, RAID group…)• Serve stored objects toclients• Intelligently peer to performreplication and recovery tasks
    • 27. 27RADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodesLIBRADOSA library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHPRBDA reliable and fully-distributed blockdevice, with a Linuxkernel client and aQEMU/KVM driverCEPH FSA POSIX-compliantdistributed filesystem, with a Linuxkernel client andsupport for FUSERADOSGWA bucket-based RESTgateway, compatiblewith S3 and SwiftAPP APP HOST/VM CLIENT
    • 28. LIBRADOSMMM28APPsocket
    • 29. LLIBRADOS• Provides direct access toRADOS for applications• C, C++, Python, PHP, Java, Erlang• Direct access to storage nodes• No HTTP overhead
    • 30. 30RADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodesLIBRADOSA library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHPRBDA reliable and fully-distributed blockdevice, with a Linuxkernel client and aQEMU/KVM driverCEPH FSA POSIX-compliantdistributed filesystem, with a Linuxkernel client andsupport for FUSERADOSGWA bucket-based RESTgateway, compatiblewith S3 and SwiftAPP APP HOST/VM CLIENT
    • 31. 31MMMLIBRADOSRADOSGWAPPsocketREST
    • 32. 32RADOS Gateway:• REST-based object storageproxy• Uses RADOS to store objects• API supportsbuckets, accounts• Usage accounting for billing• Compatible with S3 andSwift applications
    • 33. 33RADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodesLIBRADOSA library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHPCEPH FSA POSIX-compliantdistributed filesystem, with a Linuxkernel client andsupport for FUSERADOSGWA bucket-based RESTgateway, compatiblewith S3 and SwiftAPP APP HOST/VM CLIENTRBDA reliable and fully-distributed blockdevice, with a Linuxkernel client and aQEMU/KVM driver
    • 34. 34MMMVMLIBRADOSLIBRBDVIRTUALIZATION CONTAINER
    • 35. LIBRADOS35MMMLIBRBDCONTAINERLIBRADOSLIBRBDCONTAINERVM
    • 36. LIBRADOS36MMMKRBD (KERNEL MODULE)HOST
    • 37. 37RADOS Block Device:• Storage of disk images inRADOS• Decouples VMs from host• Images are striped across thecluster (pool)• Snapshots• Copy-on-write clones• Support in:• Mainline Linux Kernel (2.6.39+)• Qemu/KVM, native Xen comingsoon• OpenStack, CloudStack, Nebula,Proxmox
    • 38. 38RADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodesLIBRADOSA library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHPRBDA reliable and fully-distributed blockdevice, with a Linuxkernel client and aQEMU/KVM driverCEPH FSA POSIX-compliantdistributed filesystem, with a Linuxkernel client andsupport for FUSERADOSGWA bucket-based RESTgateway, compatiblewith S3 and SwiftAPP APP HOST/VM CLIENT
    • 39. 39MMMCLIENT0110datametadata
    • 40. 40Metadata Server• Manages metadata for aPOSIX-compliant sharedfilesystem• Directory hierarchy• File metadata(owner, timestamps, mode, etc.)• Stores metadata in RADOS• Does not serve file data toclients• Only required for sharedfilesystem
    • 41. What Makes Ceph Unique?Part one: CRUSH41
    • 42. 42APP??DCDCDCDCDCDCDCDCDCDCDCDC
    • 43. How Long Did It Take You To Find Your Keys This Morning?azmeen, Flickr / CC BY 2.0 43
    • 44. 44APPDCDCDCDCDCDCDCDCDCDCDCDC
    • 45. Dear Diary: Today I Put My Keys on the Kitchen CounterBarnaby, Flickr / CC BY 2.0 45
    • 46. 46APPDCDCDCDCDCDCDCDCDCDCDCDCA-GH-NO-TU-ZF*
    • 47. I Always Put My Keys on the Hook By the Doorvitamindave, Flickr / CC BY 2.0 47
    • 48. HOW DO YOUFIND YOUR KEYSWHEN YOUR HOUSEISINFINITELY BIGANDALWAYS CHANGING?48
    • 49. The Answer: CRUSH!!!!!pasukaru76, Flickr / CC SA 2.0 49
    • 50. 5010 10 01 01 10 10 01 11 01 1010 10 01 01 10 10 01 11 01 10hash(object name) % num pgCRUSH(pg, cluster state, rule set)
    • 51. 5110 10 01 01 10 10 01 11 01 1010 10 01 01 10 10 01 11 01 10
    • 52. 52CRUSH• Pseudo-random placementalgorithm• Fast calculation, no lookup• Repeatable, deterministic• Statistically uniform distribution• Stable mapping• Limited data migration on change• Rule-based configuration• Infrastructure topology aware• Adjustable replication• Weighting
    • 53. 53CLIENT??
    • 54. 54NAME:"foo"POOL:"bar"0101 11111001 00111010 11010011 1011 "bar" = 3hash("foo") % 256 = 0x23OBJECT PLACEMENT GROUP24312CRUSH TARGET OSDsPLACEMENT GROUP3.233.23
    • 55. 55
    • 56. 56
    • 57. 57CLIENT??
    • 58. What Makes Ceph UniquePart two: thin provisioning58
    • 59. LIBRADOS59MMMVMLIBRBDVIRTUALIZATION CONTAINER
    • 60. HOW DO YOUSPIN UPTHOUSANDS OF VMsINSTANTLYANDEFFICIENTLY?60
    • 61. 144610 0 0 0instant copy= 144
    • 62. 414462CLIENTwritewritewrite= 148write
    • 63. 414463CLIENTreadreadread= 148
    • 64. What Makes Ceph Unique?Part three: clustered metadata64
    • 65. POSIX Filesystem MetadataBarnaby, Flickr / CC BY 2.0 65
    • 66. 66MMMCLIENT0110
    • 67. 67MMM
    • 68. 68one treethree metadata servers??
    • 69. 69
    • 70. 70
    • 71. 71
    • 72. 72
    • 73. 73DYNAMIC SUBTREE PARTITIONING
    • 74. Getting Started With CephRead about the latest version of Ceph.• The latest stuff is always at http://ceph.com/getDeploy a test cluster using ceph-deploy.• Read the quick-start guide at http://ceph.com/qsgDeploy a test cluster on the AWS free-tier using Juju.• Read the guide at http://ceph.com/jujuRead the rest of the docs!• Find docs for the latest release at http://ceph.com/docs74Have a working cluster up quickly.
    • 75. Getting Involved With CephMost project discussion happens on the mailing list.• Join or view archives at http://ceph.com/listIRC is a great place to get help (or help others!)• Find details and historical logs at http://ceph.com/ircThe tracker manages our bugs and feature requests.• Register and start looking around at http://ceph.com/trackerDoc updates and suggestions are always welcome.• Learn how to contribute docs at http://ceph.com/docwriting75Help build the best storage system around!
    • 76. Ceph Cuttlefish (v0.61.x)1. New ceph-deploy provisioning tool2. New Chef cookbooks3. Fully-tested packages for RHEL (in EPEL)4. RGW authentication management API5. RADOS pool quotas6. New ceph df7. RBD incremental snapshots76Best Ceph ever.
    • 77. Questions?77Ross TurkVP Community, Inktankross@inktank.com@rossturkinktank.com | ceph.com

    ×