KEEPING OPENSTACK STORAGE TRENDY
WITH CEPH AND CONTAINERS
SAGE WEIL, HAOMAI WANG
OPENSTACK SUMMIT - 2015.05.20
2
AGENDA
● Motivation
● Block
● File
● Container orchestration
● Summary
MOTIVATION
4
WEB APPLICATION
APP SERVER APP SERVER APP SERVER APP SERVER
A CLOUD SMORGASBORD
● Compelling clouds offer options
● Compute
– VM (KVM, Xen, …)
– Containers (lxc, Docker, OpenVZ, ...)
● Storage
– Block (virtual disk)
– File (shared)
– Object (RESTful, …)
– Key/value
– NoSQL
– SQL
5
WHY CONTAINERS?
Technology
● Performance
– Shared kernel
– Faster boot
– Lower baseline overhead
– Better resource sharing
● Storage
– Shared kernel → efficient IO
– Small image → efficient deployment
Ecosystem
● Emerging container host OSs
– Atomic – http://projectatomic.io
●
os-tree (s/rpm/git/)
– CoreOS
● systemd + etcd + fleet
– Snappy Ubuntu
● New app provisioning model
– Small, single-service containers
– Standalone execution environment
● New open container spec nulecule
– https://github.com/projectatomic/nulecule
6
WHY NOT CONTAINERS?
Technology
● Security
– Shared kernel
– Limited isolation
● OS flexibility
– Shared kernel limits OS choices
● Inertia
Ecosystem
● New models don't capture many
legacy services
7
WHY CEPH?
● All components scale horizontally
● No single point of failure
● Hardware agnostic, commodity hardware
● Self-manage whenever possible
● Open source (LGPL)
● Move beyond legacy approaches
– client/cluster instead of client/server
– avoid ad hoc HA
8
CEPH COMPONENTS
RGW
A web services gateway
for object storage,
compatible with S3 and
Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-distributed
block device with cloud
platform integration
CEPHFS
A distributed file system
with POSIX semantics and
scale-out metadata
management
APP HOST/VM CLIENT
BLOCK STORAGE
10
EXISTING BLOCK STORAGE MODEL
VM
●
VMs are the unit of cloud compute
●
Block devices are the unit of VM storage
– ephemeral: not redundant, discarded when VM dies
– persistent volumes: durable, (re)attached to any VM
●
Block devices are single-user
●
For shared storage,
– use objects (e.g., Swift or S3)
– use a database (e.g., Trove)
– ...
11
KVM + LIBRBD.SO
● Model
– Nova → libvirt → KVM → librbd.so
– Cinder → rbd.py → librbd.so
– Glance → rbd.py → librbd.so
● Pros
– proven
– decent performance
– good security
● Cons
– performance could be better
● Status
– most common deployment model
today (~44% in latest survey)
M M
RADOS CLUSTER
QEMU / KVM
LIBRBD
VM NOVA
CINDER
12
MULTIPLE CEPH DRIVERS
● librbd.so
– qemu-kvm
– rbd-fuse (experimental)
● rbd.ko (Linux kernel)
– /dev/rbd*
– stable and well-supported on modern kernels and distros
– some feature gap
● no client-side caching
● no “fancy striping”
– performance delta
● more efficient → more IOPS
● no client-side cache → higher latency for some workloads
13
LXC + CEPH.KO
● The model
– libvirt-based lxc containers
– map kernel RBD on host
– pass host device to libvirt, container
● Pros
– fast and efficient
– implement existing Nova API
● Cons
– weaker security than VM
● Status
– lxc is maintained
– lxc is less widely used
– no prototype
M M
RADOS CLUSTER
LINUX HOST
RBD.KO
CONTAINER
NOVA
14
NOVA-DOCKER + CEPH.KO
● The model
– docker container as mini-host
– map kernel RBD on host
– pass RBD device to container, or
– mount RBD, bind dir to container
● Pros
– buzzword-compliant
– fast and efficient
● Cons
– different image format
– different app model
– only a subset of docker feature set
● Status
– no prototype
– nova-docker is out of tree
https://wiki.openstack.org/wiki/Docker
15
IRONIC + CEPH.KO
● The model
– bare metal provisioning
– map kernel RBD directly from guest image
● Pros
– fast and efficient
– traditional app deployment model
● Cons
– guest OS must support rbd.ko
– requires agent
– boot-from-volume tricky
● Status
– Cinder and Ironic integration is a hot topic at
summit
● 5:20p Wednesday (cinder)
– no prototype
● References
– https://wiki.openstack.org/wiki/Ironic/blueprints/
cinder-integration
M M
RADOS CLUSTER
LINUX HOST
RBD.KO
16
BLOCK - SUMMARY
● But
– block storage is same old boring
– volumes are only semi-elastic (grow, not shrink; tedious to resize)
– storage is not shared between guests
performance efficiency VM
client
cache
striping
same
images?
exists
kvm + librbd.so best good X X X yes X
lxc + rbd.ko good best close
nova-docker + rbd.ko good best no
ironic + rbd.ko good best close? planned!
FILE STORAGE
18
MANILA FILE STORAGE
● Manila manages file volumes
– create/delete, share/unshare
– tenant network connectivity
– snapshot management
● Why file storage?
– familiar POSIX semantics
– fully shared volume – many clients can mount and share data
– elastic storage – amount of data can grow/shrink without explicit
provisioning
MANILA
19
MANILA CAVEATS
● Last mile problem
– must connect storage to guest network
– somewhat limited options (focus on Neutron)
● Mount problem
– Manila makes it possible for guest to mount
– guest is responsible for actual mount
– ongoing discussion around a guest agent …
● Current baked-in assumptions about both of these
MANILA
20
?
APPLIANCE DRIVERS
● Appliance drivers
– tell an appliance to export NFS to guests
– map appliance IP into tenant network
(Neutron)
– boring (closed, proprietary, expensive, etc.)
● Status
– several drivers from usual suspects
– security punted to vendor
NFS
MANILA
21
GANESHA DRIVER
● Model
– service VM running nfs-ganesha server
– mount file system on storage network
– export NFS to tenant network
– map IP into tenant network
● Status
– in-tree, well-supported
KVM
GANESHA
???
NFS
MANILA
???
22
KVM
GANESHA
KVM + GANESHA + LIBCEPHFS
● Model
– existing Ganesha driver, backed by
Ganesha's libcephfs FSAL
● Pros
– simple, existing model
– security
● Cons
– extra hop → higher latency
– service VM is SpoF
– service VM consumes resources
● Status
– Manila Ganesha driver exists
– untested with CephFS
M M
RADOS CLUSTER
LIBCEPHFS
KVM
NFS
NFS.KO
MANILA
NATIVE CEPH
23
KVM + CEPH.KO (CEPH-NATIVE)
● Model
– allow tenant access to storage network
– mount CephFS directly from tenant VM
● Pros
– best performance
– access to full CephFS feature set
– simple
● Cons
– guest must have modern distro/kernel
– exposes tenant to Ceph cluster
– must deliver mount secret to client
● Status
– no prototype
– CephFS isolation/security is work-in-progress
KVM
M M
RADOS CLUSTER
CEPH.KO
MANILA
NATIVE CEPH
24
NETWORK-ONLY MODEL IS LIMITING
● Current assumption of NFS or
CIFS sucks
● Always relying on guest mount
support sucks
– mount -t ceph -o what?
● Even assuming storage
connectivity is via the network
sucks
● There are other options!
– KVM virtfs/9p
● fs pass-through to host
● 9p protocol
● virtio for fast data transfer
● upstream; not widely used
– NFS re-export from host
● mount and export fs on host
● private host/guest net
● avoid network hop from NFS
service VM
– containers and 'mount --bind'
25
NOVA “ATTACH FS” API
● Mount problem is ongoing discussion by Manila team
– discussed this morning
– simple prototype using cloud-init
– Manila agent? leverage Zaqar tenant messaging service?
● A different proposal
– expand Nova to include “attach/detach file system” API
– analogous to current attach/detach volume for block
– each Nova driver may implement function differently
– “plumb” storage to tenant VM or container
● Open question
– Would API do the final “mount” step as well? (I say yes!)
26
KVM + VIRTFS/9P + CEPHFS.KO
● Model
– mount kernel CephFS on host
– pass-through to guest via virtfs/9p
● Pros
– security: tenant remains isolated from
storage net + locked inside a directory
● Cons
– require modern Linux guests
– 9p not supported on some distros
– “virtfs is ~50% slower than a native
mount?”
● Status
– Prototype from Haomai Wang
HOST
M M
RADOS CLUSTER
KVM VIRTFS
MANILA
NATIVE CEPH
CEPH.KO
VM
9P
NOVA
27
KVM + NFS + CEPHFS.KO
● Model
– mount kernel CephFS on host
– pass-through to guest via NFS
● Pros
– security: tenant remains isolated
from storage net + locked inside a
directory
– NFS is more standard
● Cons
– NFS has weak caching consistency
– NFS is slower
● Status
– no prototype
HOST
M M
RADOS CLUSTER
KVM
MANILA
NATIVE CEPH
CEPH.KO
VM
NFS
NOVA
28
(LXC, NOVA-DOCKER) + CEPHFS.KO
● Model
– host mounts CephFS directly
– mount --bind share into
container namespace
● Pros
– best performance
– full CephFS semantics
● Cons
– rely on container for security
● Status
– no prototype
HOST
M M
RADOS CLUSTER
CONTAINER
MANILA
NATIVE CEPH
CEPH.KO
NOVA
29
IRONIC + CEPHFS.KO
● Model
– mount CephFS directly from bare
metal “guest”
● Pros
– best performance
– full feature set
● Cons
– rely on CephFS security
– networking?
– agent to do the mount?
● Status
– no prototype
– no suitable (ironic) agent (yet)
HOST
M M
RADOS CLUSTER
MANILA
NATIVE CEPH
CEPH.KO
NOVA
30
THE MOUNT PROBLEM
● Containers may break the current 'network fs' assumption
– mounting becomes driver-dependent; harder for tenant to do the right thing
● Nova “attach fs” API could provide the needed entry point
– KVM: qemu-guest-agent
– Ironic: no guest agent yet...
– containers (lxc, nova-docker): use mount --bind from host
● Or, make tenant do the final mount?
– Manila API to provide command (template) to perform the mount
● e.g., “mount -t ceph $cephmonip:/manila/$uuid $PATH -o ...”
– Nova lxc and docker
● bind share to a “dummy” device /dev/manila/$uuid
● API mount command is 'mount --bind /dev/manila/$uuid $PATH'
31
SECURITY: NO FREE LUNCH
● (KVM, Ironic) + ceph.ko
– access to storage network relies on Ceph security
● KVM + (virtfs/9p, NFS) + ceph.ko
– better security, but
– pass-through/proxy limits performance
● (by how much?)
● Containers
– security (vs a VM) is weak at baseline, but
– host performs the mount; tenant locked into their share directory
32
PERFORMANCE
● 2 nodes
– Intel E5-2660
– 96GB RAM
– 10gb NIC
● Server
– 3 OSD (Intel S3500)
– 1 MON
– 1 MDS
● Client VMs
– 4 cores
– 2GB RAM
● iozone, 2x available RAM
● CephFS native
– VM ceph.ko → server
● CephFS 9p/virtfs
– VM 9p → host ceph.ko → server
● CephFS NFS
– VM NFS → server ceph.ko →
server
33
SEQUENTIAL
34
RANDOM
35
SUMMARY MATRIX
performance consistency VM gateway net hops security agent
mount
agent
prototype
kvm + ganesha +
libcephfs
slower (?) weak (nfs) X X 2 host X X
kvm + virtfs + ceph.ko good good X X 1 host X X
kvm + nfs + ceph.ko good weak (nfs) X X 1 host X
kvm + ceph.ko better best X 1 ceph X
lxc + ceph.ko best best 1 ceph
nova-docker + ceph.ko best best 1 ceph
IBM talk -
Thurs 9am
ironic + ceph.ko best best 1 ceph X X
CONTAINER ORCHESTRATION
37
CONTAINERS ARE DIFFERENT
● nova-docker implements a Nova view of a (Docker) container
– treats container like a standalone system
– does not leverage most of what Docker has to offer
– Nova == IaaS abstraction
● Kubernetes is the new hotness
– higher-level orchestration for containers
– draws on years of Google experience running containers at scale
– vibrant open source community
38
KUBERNETES SHARED STORAGE
● Pure Kubernetes – no OpenStack
● Volume drivers
– Local
● hostPath, emptyDir
– Unshared
● iSCSI, GCEPersistentDisk, Amazon EBS, Ceph RBD – local fs on top of existing device
– Shared
● NFS, GlusterFS, Amazon EFS, CephFS
● Status
– Ceph drivers under review
● Finalizing model for secret storage, cluster parameters (e.g., mon IPs)
– Drivers expect pre-existing volumes
● recycled; missing REST API to create/destroy volumes
39
KUBERNETES ON OPENSTACK
● Provision Nova VMs
– KVM or ironic
– Atomic or CoreOS
● Kubernetes per tenant
● Provision storage devices
– Cinder for volumes
– Manila for shares
● Kubernetes binds into pod/container
● Status
– Prototype Cinder plugin for Kubernetes
https://github.com/spothanis/kubernetes/tree/cinder-vol-plugin
KVM
Kube node
nginx pod
mysql pod
KVM
Kube node
nginx pod
mysql pod
KVM
Kube master
Volume
controller
...
CINDER MANILA
NOVA
40
WHAT NEXT?
● Ironic agent
– enable Cinder (and Manila?) on bare metal
– Cinder + Ironic
● 5:20p Wednesday (Cinder)
● Expand breadth of Manila drivers
– virtfs/9p, ceph-native, NFS proxy via host, etc.
– the last mile is not always the tenant network!
● Nova “attach fs” API (or equivalent)
– simplify tenant experience
– paper over VM vs container vs bare metal differences
THANK YOU!
Sage Weil
CEPH PRINCIPAL ARCHITECT
Haomai Wang
FREE AGENT
sage@redhat.com
haomaiwang@gmail.com
@liewegas
42
FOR MORE INFORMATION
● http://ceph.com
● http://github.com/ceph
● http://tracker.ceph.com
● Mailing lists
– ceph-users@ceph.com
– ceph-devel@vger.kernel.org
● irc.oftc.net
– #ceph
– #ceph-devel
● Twitter
– @ceph

Keeping OpenStack storage trendy with Ceph and containers

  • 1.
    KEEPING OPENSTACK STORAGETRENDY WITH CEPH AND CONTAINERS SAGE WEIL, HAOMAI WANG OPENSTACK SUMMIT - 2015.05.20
  • 2.
    2 AGENDA ● Motivation ● Block ●File ● Container orchestration ● Summary
  • 3.
  • 4.
    4 WEB APPLICATION APP SERVERAPP SERVER APP SERVER APP SERVER A CLOUD SMORGASBORD ● Compelling clouds offer options ● Compute – VM (KVM, Xen, …) – Containers (lxc, Docker, OpenVZ, ...) ● Storage – Block (virtual disk) – File (shared) – Object (RESTful, …) – Key/value – NoSQL – SQL
  • 5.
    5 WHY CONTAINERS? Technology ● Performance –Shared kernel – Faster boot – Lower baseline overhead – Better resource sharing ● Storage – Shared kernel → efficient IO – Small image → efficient deployment Ecosystem ● Emerging container host OSs – Atomic – http://projectatomic.io ● os-tree (s/rpm/git/) – CoreOS ● systemd + etcd + fleet – Snappy Ubuntu ● New app provisioning model – Small, single-service containers – Standalone execution environment ● New open container spec nulecule – https://github.com/projectatomic/nulecule
  • 6.
    6 WHY NOT CONTAINERS? Technology ●Security – Shared kernel – Limited isolation ● OS flexibility – Shared kernel limits OS choices ● Inertia Ecosystem ● New models don't capture many legacy services
  • 7.
    7 WHY CEPH? ● Allcomponents scale horizontally ● No single point of failure ● Hardware agnostic, commodity hardware ● Self-manage whenever possible ● Open source (LGPL) ● Move beyond legacy approaches – client/cluster instead of client/server – avoid ad hoc HA
  • 8.
    8 CEPH COMPONENTS RGW A webservices gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully-distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale-out metadata management APP HOST/VM CLIENT
  • 9.
  • 10.
    10 EXISTING BLOCK STORAGEMODEL VM ● VMs are the unit of cloud compute ● Block devices are the unit of VM storage – ephemeral: not redundant, discarded when VM dies – persistent volumes: durable, (re)attached to any VM ● Block devices are single-user ● For shared storage, – use objects (e.g., Swift or S3) – use a database (e.g., Trove) – ...
  • 11.
    11 KVM + LIBRBD.SO ●Model – Nova → libvirt → KVM → librbd.so – Cinder → rbd.py → librbd.so – Glance → rbd.py → librbd.so ● Pros – proven – decent performance – good security ● Cons – performance could be better ● Status – most common deployment model today (~44% in latest survey) M M RADOS CLUSTER QEMU / KVM LIBRBD VM NOVA CINDER
  • 12.
    12 MULTIPLE CEPH DRIVERS ●librbd.so – qemu-kvm – rbd-fuse (experimental) ● rbd.ko (Linux kernel) – /dev/rbd* – stable and well-supported on modern kernels and distros – some feature gap ● no client-side caching ● no “fancy striping” – performance delta ● more efficient → more IOPS ● no client-side cache → higher latency for some workloads
  • 13.
    13 LXC + CEPH.KO ●The model – libvirt-based lxc containers – map kernel RBD on host – pass host device to libvirt, container ● Pros – fast and efficient – implement existing Nova API ● Cons – weaker security than VM ● Status – lxc is maintained – lxc is less widely used – no prototype M M RADOS CLUSTER LINUX HOST RBD.KO CONTAINER NOVA
  • 14.
    14 NOVA-DOCKER + CEPH.KO ●The model – docker container as mini-host – map kernel RBD on host – pass RBD device to container, or – mount RBD, bind dir to container ● Pros – buzzword-compliant – fast and efficient ● Cons – different image format – different app model – only a subset of docker feature set ● Status – no prototype – nova-docker is out of tree https://wiki.openstack.org/wiki/Docker
  • 15.
    15 IRONIC + CEPH.KO ●The model – bare metal provisioning – map kernel RBD directly from guest image ● Pros – fast and efficient – traditional app deployment model ● Cons – guest OS must support rbd.ko – requires agent – boot-from-volume tricky ● Status – Cinder and Ironic integration is a hot topic at summit ● 5:20p Wednesday (cinder) – no prototype ● References – https://wiki.openstack.org/wiki/Ironic/blueprints/ cinder-integration M M RADOS CLUSTER LINUX HOST RBD.KO
  • 16.
    16 BLOCK - SUMMARY ●But – block storage is same old boring – volumes are only semi-elastic (grow, not shrink; tedious to resize) – storage is not shared between guests performance efficiency VM client cache striping same images? exists kvm + librbd.so best good X X X yes X lxc + rbd.ko good best close nova-docker + rbd.ko good best no ironic + rbd.ko good best close? planned!
  • 17.
  • 18.
    18 MANILA FILE STORAGE ●Manila manages file volumes – create/delete, share/unshare – tenant network connectivity – snapshot management ● Why file storage? – familiar POSIX semantics – fully shared volume – many clients can mount and share data – elastic storage – amount of data can grow/shrink without explicit provisioning MANILA
  • 19.
    19 MANILA CAVEATS ● Lastmile problem – must connect storage to guest network – somewhat limited options (focus on Neutron) ● Mount problem – Manila makes it possible for guest to mount – guest is responsible for actual mount – ongoing discussion around a guest agent … ● Current baked-in assumptions about both of these MANILA
  • 20.
    20 ? APPLIANCE DRIVERS ● Appliancedrivers – tell an appliance to export NFS to guests – map appliance IP into tenant network (Neutron) – boring (closed, proprietary, expensive, etc.) ● Status – several drivers from usual suspects – security punted to vendor NFS MANILA
  • 21.
    21 GANESHA DRIVER ● Model –service VM running nfs-ganesha server – mount file system on storage network – export NFS to tenant network – map IP into tenant network ● Status – in-tree, well-supported KVM GANESHA ??? NFS MANILA ???
  • 22.
    22 KVM GANESHA KVM + GANESHA+ LIBCEPHFS ● Model – existing Ganesha driver, backed by Ganesha's libcephfs FSAL ● Pros – simple, existing model – security ● Cons – extra hop → higher latency – service VM is SpoF – service VM consumes resources ● Status – Manila Ganesha driver exists – untested with CephFS M M RADOS CLUSTER LIBCEPHFS KVM NFS NFS.KO MANILA NATIVE CEPH
  • 23.
    23 KVM + CEPH.KO(CEPH-NATIVE) ● Model – allow tenant access to storage network – mount CephFS directly from tenant VM ● Pros – best performance – access to full CephFS feature set – simple ● Cons – guest must have modern distro/kernel – exposes tenant to Ceph cluster – must deliver mount secret to client ● Status – no prototype – CephFS isolation/security is work-in-progress KVM M M RADOS CLUSTER CEPH.KO MANILA NATIVE CEPH
  • 24.
    24 NETWORK-ONLY MODEL ISLIMITING ● Current assumption of NFS or CIFS sucks ● Always relying on guest mount support sucks – mount -t ceph -o what? ● Even assuming storage connectivity is via the network sucks ● There are other options! – KVM virtfs/9p ● fs pass-through to host ● 9p protocol ● virtio for fast data transfer ● upstream; not widely used – NFS re-export from host ● mount and export fs on host ● private host/guest net ● avoid network hop from NFS service VM – containers and 'mount --bind'
  • 25.
    25 NOVA “ATTACH FS”API ● Mount problem is ongoing discussion by Manila team – discussed this morning – simple prototype using cloud-init – Manila agent? leverage Zaqar tenant messaging service? ● A different proposal – expand Nova to include “attach/detach file system” API – analogous to current attach/detach volume for block – each Nova driver may implement function differently – “plumb” storage to tenant VM or container ● Open question – Would API do the final “mount” step as well? (I say yes!)
  • 26.
    26 KVM + VIRTFS/9P+ CEPHFS.KO ● Model – mount kernel CephFS on host – pass-through to guest via virtfs/9p ● Pros – security: tenant remains isolated from storage net + locked inside a directory ● Cons – require modern Linux guests – 9p not supported on some distros – “virtfs is ~50% slower than a native mount?” ● Status – Prototype from Haomai Wang HOST M M RADOS CLUSTER KVM VIRTFS MANILA NATIVE CEPH CEPH.KO VM 9P NOVA
  • 27.
    27 KVM + NFS+ CEPHFS.KO ● Model – mount kernel CephFS on host – pass-through to guest via NFS ● Pros – security: tenant remains isolated from storage net + locked inside a directory – NFS is more standard ● Cons – NFS has weak caching consistency – NFS is slower ● Status – no prototype HOST M M RADOS CLUSTER KVM MANILA NATIVE CEPH CEPH.KO VM NFS NOVA
  • 28.
    28 (LXC, NOVA-DOCKER) +CEPHFS.KO ● Model – host mounts CephFS directly – mount --bind share into container namespace ● Pros – best performance – full CephFS semantics ● Cons – rely on container for security ● Status – no prototype HOST M M RADOS CLUSTER CONTAINER MANILA NATIVE CEPH CEPH.KO NOVA
  • 29.
    29 IRONIC + CEPHFS.KO ●Model – mount CephFS directly from bare metal “guest” ● Pros – best performance – full feature set ● Cons – rely on CephFS security – networking? – agent to do the mount? ● Status – no prototype – no suitable (ironic) agent (yet) HOST M M RADOS CLUSTER MANILA NATIVE CEPH CEPH.KO NOVA
  • 30.
    30 THE MOUNT PROBLEM ●Containers may break the current 'network fs' assumption – mounting becomes driver-dependent; harder for tenant to do the right thing ● Nova “attach fs” API could provide the needed entry point – KVM: qemu-guest-agent – Ironic: no guest agent yet... – containers (lxc, nova-docker): use mount --bind from host ● Or, make tenant do the final mount? – Manila API to provide command (template) to perform the mount ● e.g., “mount -t ceph $cephmonip:/manila/$uuid $PATH -o ...” – Nova lxc and docker ● bind share to a “dummy” device /dev/manila/$uuid ● API mount command is 'mount --bind /dev/manila/$uuid $PATH'
  • 31.
    31 SECURITY: NO FREELUNCH ● (KVM, Ironic) + ceph.ko – access to storage network relies on Ceph security ● KVM + (virtfs/9p, NFS) + ceph.ko – better security, but – pass-through/proxy limits performance ● (by how much?) ● Containers – security (vs a VM) is weak at baseline, but – host performs the mount; tenant locked into their share directory
  • 32.
    32 PERFORMANCE ● 2 nodes –Intel E5-2660 – 96GB RAM – 10gb NIC ● Server – 3 OSD (Intel S3500) – 1 MON – 1 MDS ● Client VMs – 4 cores – 2GB RAM ● iozone, 2x available RAM ● CephFS native – VM ceph.ko → server ● CephFS 9p/virtfs – VM 9p → host ceph.ko → server ● CephFS NFS – VM NFS → server ceph.ko → server
  • 33.
  • 34.
  • 35.
    35 SUMMARY MATRIX performance consistencyVM gateway net hops security agent mount agent prototype kvm + ganesha + libcephfs slower (?) weak (nfs) X X 2 host X X kvm + virtfs + ceph.ko good good X X 1 host X X kvm + nfs + ceph.ko good weak (nfs) X X 1 host X kvm + ceph.ko better best X 1 ceph X lxc + ceph.ko best best 1 ceph nova-docker + ceph.ko best best 1 ceph IBM talk - Thurs 9am ironic + ceph.ko best best 1 ceph X X
  • 36.
  • 37.
    37 CONTAINERS ARE DIFFERENT ●nova-docker implements a Nova view of a (Docker) container – treats container like a standalone system – does not leverage most of what Docker has to offer – Nova == IaaS abstraction ● Kubernetes is the new hotness – higher-level orchestration for containers – draws on years of Google experience running containers at scale – vibrant open source community
  • 38.
    38 KUBERNETES SHARED STORAGE ●Pure Kubernetes – no OpenStack ● Volume drivers – Local ● hostPath, emptyDir – Unshared ● iSCSI, GCEPersistentDisk, Amazon EBS, Ceph RBD – local fs on top of existing device – Shared ● NFS, GlusterFS, Amazon EFS, CephFS ● Status – Ceph drivers under review ● Finalizing model for secret storage, cluster parameters (e.g., mon IPs) – Drivers expect pre-existing volumes ● recycled; missing REST API to create/destroy volumes
  • 39.
    39 KUBERNETES ON OPENSTACK ●Provision Nova VMs – KVM or ironic – Atomic or CoreOS ● Kubernetes per tenant ● Provision storage devices – Cinder for volumes – Manila for shares ● Kubernetes binds into pod/container ● Status – Prototype Cinder plugin for Kubernetes https://github.com/spothanis/kubernetes/tree/cinder-vol-plugin KVM Kube node nginx pod mysql pod KVM Kube node nginx pod mysql pod KVM Kube master Volume controller ... CINDER MANILA NOVA
  • 40.
    40 WHAT NEXT? ● Ironicagent – enable Cinder (and Manila?) on bare metal – Cinder + Ironic ● 5:20p Wednesday (Cinder) ● Expand breadth of Manila drivers – virtfs/9p, ceph-native, NFS proxy via host, etc. – the last mile is not always the tenant network! ● Nova “attach fs” API (or equivalent) – simplify tenant experience – paper over VM vs container vs bare metal differences
  • 41.
    THANK YOU! Sage Weil CEPHPRINCIPAL ARCHITECT Haomai Wang FREE AGENT sage@redhat.com haomaiwang@gmail.com @liewegas
  • 42.
    42 FOR MORE INFORMATION ●http://ceph.com ● http://github.com/ceph ● http://tracker.ceph.com ● Mailing lists – ceph-users@ceph.com – ceph-devel@vger.kernel.org ● irc.oftc.net – #ceph – #ceph-devel ● Twitter – @ceph