• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Ceph Day NYC: Ceph Fundamentals
 

Ceph Day NYC: Ceph Fundamentals

on

  • 867 views

Ross Turk provides an overview of the Ceph architecture at Ceph Day NYC.

Ross Turk provides an overview of the Ceph architecture at Ceph Day NYC.

Statistics

Views

Total Views
867
Views on SlideShare
863
Embed Views
4

Actions

Likes
2
Downloads
40
Comments
0

1 Embed 4

http://www.inktank.com 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • RADOS is a distributed object store, and it’s the foundation for Ceph. On top of RADOS, the Ceph team has built three applications that allow you to store data and do fantastic things. But before we get into all of that, let’s start at the beginning of the story.
  • But that’s a lot to digest all at once. Let’s start with RADOS.
  • MDSs store all of their data within RADOS itself, but there’s still a problem…
  • There are multiple MDSs!
  • So how do you have one tree and multiple servers?
  • If there’s just one MDS (which is a terrible idea), it manages metadata for the entire tree.
  • When the second one comes along, it will intelligently partition the work by taking a subtree.
  • When the third MDS arrives, it will attempt to split the tree again.
  • Same with the fourth.
  • A MDS can actually even just take a single directory or file, if the load is high enough. This all happens dynamically based on load and the structure of the data, and it’s called “dynamic subtree partitioning”.

Ceph Day NYC: Ceph Fundamentals Ceph Day NYC: Ceph Fundamentals Presentation Transcript

  • Ceph Fundamentals Ross Turk VP Community, Inktank
  • ME ME ME ME ME ME. 2 Ross Turk VP Community, Inktank ross@inktank.com @rossturk inktank.com | ceph.com
  • Ceph Architectural Overview Ah! Finally, 32 slides in and he gets to the nerdy stuff. 3
  • 4 RADOS A reliable, autonomous, distributed object store comprised of self-healing, self- managing, intelligent storage nodes LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP RBD A reliable and fully- distributed block device, with a Linux kernel client and a QEMU/KVM driver CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE RADOSGW A bucket-based REST gateway, compatible with S3 and Swift APP APP HOST/VM CLIENT
  • 5 RADOS A reliable, autonomous, distributed object store comprised of self-healing, self- managing, intelligent storage nodes LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP RBD A reliable and fully- distributed block device, with a Linux kernel client and a QEMU/KVM driver CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE RADOSGW A bucket-based REST gateway, compatible with S3 and Swift APP APP HOST/VM CLIENT
  • 6 DISK FS DISK DISK OSD DISK DISK OSD OSD OSD OSD FS FS FSFS btrfs xfs ext4 MMM
  • 7 M M M HUMAN
  • 8 Monitors: • Maintain cluster membership and state • Provide consensus for distributed decision-making • Small, odd number • These do not serve stored objects to clients M OSDs: • 10s to 10000s in a cluster • One per disk • (or one per SSD, RAID group…) • Serve stored objects to clients • Intelligently peer to perform replication and recovery tasks
  • 9 RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP RBD A reliable and fully- distributed block device, with a Linux kernel client and a QEMU/KVM driver CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE RADOSGW A bucket-based REST gateway, compatible with S3 and Swift APP APP HOST/VM CLIENT
  • LIBRADOS M M M 10 APP socket
  • L LIBRADOS • Provides direct access to RADOS for applications • C, C++, Python, PHP, Java, Erl ang • Direct access to storage nodes • No HTTP overhead
  • 12 RADOS A reliable, autonomous, distributed object store comprised of self-healing, self- managing, intelligent storage nodes LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP RBD A reliable and fully- distributed block device, with a Linux kernel client and a QEMU/KVM driver CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE RADOSGW A bucket-based REST gateway, compatible with S3 and Swift APP APP HOST/VM CLIENT
  • 13 M M M LIBRADOS RADOSGW APP socket REST
  • 14 RADOS Gateway: • REST-based object storage proxy • Uses RADOS to store objects • API supports buckets, accounts • Usage accounting for billing • Compatible with S3 and Swift applications
  • 15 M x12 x12 x12 x12 x12 LOAD BALANCER M MLOAD BALANCER
  • 16 RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE RADOSGW A bucket-based REST gateway, compatible with S3 and Swift APP APP HOST/VM CLIENT RBD A reliable and fully- distributed block device, with a Linux kernel client and a QEMU/KVM driver
  • 17 M M M VM LIBRADOS LIBRBD HYPERVISOR
  • LIBRADOS 18 M M M LIBRBD HYPERVISOR LIBRADOS LIBRBD HYPERVISORVM
  • LIBRADOS 19 M M M KRBD (KERNEL MODULE) HOST
  • 20 RADOS Block Device: • Storage of disk images in RADOS • Decouples VMs from host • Images are striped across the cluster (pool) • Snapshots • Copy-on-write clones • Support in: • Mainline Linux Kernel (2.6.39+) • Qemu/KVM, native Xen coming soon • OpenStack, CloudStack, Nebula, Proxmox
  • 21 RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP RBD A reliable and fully- distributed block device, with a Linux kernel client and a QEMU/KVM driver CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE RADOSGW A bucket-based REST gateway, compatible with S3 and Swift APP APP HOST/VM CLIENT
  • 22 M M M CLIENT 01 10 data metadata
  • 23 Metadata Server • Manages metadata for a POSIX-compliant shared filesystem • Directory hierarchy • File metadata (owner, timestamps, mode, et c.) • Stores metadata in RADOS • Does not serve file data to clients • Only required for shared filesystem
  • What Makes Ceph Unique? Part one: it never, ever remembers where it puts stuff. 24
  • 25 APP ?? DC DC DC DC DC DC DC DC DC DC DC DC
  • How Long Did It Take You To Find Your Keys This Morning? azmeen, Flickr / CC BY 2.0 26
  • 27 APP DC DC DC DC DC DC DC DC DC DC DC DC
  • Dear Diary: Today I Put My Keys on the Kitchen Counter Barnaby, Flickr / CC BY 2.0 28
  • 29 APP DC DC DC DC DC DC DC DC DC DC DC DC A-G H-N O-T U-Z F*
  • I Always Put My Keys on the Hook By the Door vitamindave, Flickr / CC BY 2.0 30
  • HOW DO YOU FIND YOUR KEYS WHEN YOUR HOUSE IS INFINITELY BIG AND ALWAYS CHANGING? 31
  • The Answer: CRUSH!!!!! pasukaru76, Flickr / CC SA 2.0 32
  • 33 OBJECT 10 10 01 01 10 10 01 11 01 10 hash(object name) % num pg CRUSH(pg, cluster state, rule set)
  • 34 OBJECT 10 10 01 01 10 10 01 11 01 10
  • 35 CRUSH • Pseudo-random placement algorithm • Fast calculation, no lookup • Repeatable, deterministic • Statistically uniform distribution • Stable mapping • Limited data migration on change • Rule-based configuration • Infrastructure topology aware • Adjustable replication • Weighting
  • 36 CLIENT ??
  • 37 NAME:"foo" POOL:"bar" 0101 1111 1001 0011 1010 1101 0011 1011 "bar" = 3 hash("foo") % 256 = 0x23 OBJECT PLACEMENT GROUP 24 3 12 CRUSH TARGET OSDsPLACEMENT GROUP 3.23 3.23
  • 38
  • 39
  • 40 CLIENT ??
  • What Makes Ceph Unique Part two: it has smart block devices for all those impatient, selfish VMs. 41
  • LIBRADOS 42 M M M VM LIBRBD HYPERVISOR
  • HOW DO YOU SPIN UP THOUSANDS OF VMs INSTANTLY AND EFFICIENTLY? 43
  • 144 44 0 0 0 0 instant copy = 144
  • 4144 45 CLIENT write write write = 148 write
  • 4144 46 CLIENT read read read = 148
  • What Makes Ceph Unique? Part three: it has powerful friends, ones you probably already know. 47
  • 48 M M M APACHE CLOUDSTACK HYPER- VISOR PRIMARY STORAGE POOL SECONDARY STORAGE POOL snapshots templates images
  • 49 M M M OPENSTACK KEYSTONE API SWIFT API CINDER API GLANCE API NOVA API HYPER- VISOR RADOSGW
  • What Makes Ceph Unique? Part three: clustered metadata 50
  • 51 M M M CLIENT 01 10
  • 52 M M M
  • 53 one tree three metadata servers ??
  • 54
  • 55
  • 56
  • 57
  • 58 DYNAMIC SUBTREE PARTITIONING
  • Now What? 59
  • 60 8 years & 26,000 commits
  • Getting Started With Ceph Read about the latest version of Ceph. • The latest stuff is always at http://ceph.com/get Deploy a test cluster using ceph-deploy. • Read the quick-start guide at http://ceph.com/qsg Deploy a test cluster on the AWS free-tier using Juju. • Read the guide at http://ceph.com/juju Read the rest of the docs! • Find docs for the latest release at http://ceph.com/docs 61 Have a working cluster up quickly.
  • Getting Involved With Ceph Most project discussion happens on the mailing list. • Join or view archives at http://ceph.com/list IRC is a great place to get help (or help others!) • Find details and historical logs at http://ceph.com/irc The tracker manages our bugs and feature requests. • Register and start looking around at http://ceph.com/tracker Doc updates and suggestions are always welcome. • Learn how to contribute docs at http://ceph.com/docwriting 62 Help build the best storage system around!
  • Questions? 63 Ross Turk VP Community, Inktank ross@inktank.com @rossturk inktank.com | ceph.com