Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ceph in a security critical
OpenStack cloud
Danny Al-Gaaf (Deutsche Telekom)
Deutsche OpenStack Tage 2015 - Frankfurt
● Ceph and OpenStack
● Secure NFV cloud at DT
● Attack surface
● Proactive countermeasures
○ Setup
○ Vulnerability prevent...
Ceph and OpenStack
Ceph Architecture
4
Ceph and OpenStack
5
Secure NFV Cloud @ DT
NFV Cloud @ Deutsche Telekom
● Datacenter design
○ BDCs
■ few but classic DCs
■ high SLAs for infrastructure and services
...
Fundamentals - The CIA Triad
8
CONFIDENTIALITY
INTEGRITY
AVAILABILITY
Preventing sensitive data
against unauthorized
acces...
High Security Requirements
● Multiple security placement zones (PZ)
○ e.g. EHD, DMZ, MZ, SEC, Management
○ TelcoWG “Securi...
Solutions for storage separation
● Physical separation
○ Large number of clusters (>100)
○ Large hardware demand (compute ...
Separation through Placement Zones
● One RADOS pool for each security zone
○ Limit access using Ceph capabilities
● OpenSt...
Attack Surface
RadosGW attack surface
● S3/Swift
○ Network access to gateway
only
○ No direct access for
consumer to other Ceph
daemons
●...
RBD librbd attack surface
● Protection from hypervisor
block layer
○ transparent for the guest
○ No network access or Ceph...
RBD.ko attack surface
● RBD kernel module
○ e.g. used with XEN or on bare
metal
○ Requires direct access to Ceph
public ne...
CephFS attack surface
● pure CephFS tears a big hole
in hypervisor separation
○ Requires direct access to Ceph
public netw...
Host attack surface
● If KVM is compromised, the attacker ...
○ has access to neighbor VMs
○ has access to local Ceph keys...
Network attack surface
● Sessions are authenticated
○ Attacker cannot impersonate clients or servers
○ Attacker cannot mou...
Denial of Service
● Attack against:
○ Ceph Cluster:
■ Submit many / large / expensive IOs
■ Open many connections
■ Use fl...
Proactive Countermeasures
Deployment and Setup
● Network
○ Always use separated cluster and public networks
○ Always separate your control nodes fro...
Deploying RadosGW
● Big and easy target through
HTTP(S) protocol
● Small appliance per tenant with
○ Separate network
○ SS...
Ceph security: CephX
● Monitors are trusted key servers
○ Store copies of all entity keys
○ Each key has an associated “ca...
Ceph security: CephX take-aways
● Monitors must be secured
○ Protect the key database
● Key management is important
○ Sepa...
● Static Code Analysis (SCA)
○ Buffer overflows and other code flaws
○ Regular Coverity scans
■ 996 fixed, 284 dismissed; ...
● Pen-testing
○ human attempt to subvert security, generally guided by code review
● Fuzz testing
○ computer attempt to su...
Mitigating Breaches
● Run non-root daemons (WIP: PR #4456)
○ Prevent escalating privileges to get root
○ Run as ‘ceph’ use...
Encryption: Data at Rest
● Encryption at application vs cluster level
● Some deployment tools support dm-crypt
○ Encrypt r...
● Goal
○ Protect data from someone listening in on network
○ Protect administrator sessions configuring client keys
● Plan...
● Limit load from client
○ Use qemu IO throttling features - set safe upper bound
● To do:
○ Limit max open sockets per OS...
CephFS
● No standard virtualization layer (unlike block)
○ Filesystem passthrough (9p/virtfs) to host
○ Proxy through gate...
Reactive Countermeasures
● Community
○ Single point of contact: security@ceph.com
■ Core development team
■ Red Hat, SUSE, Canonical security teams...
Detecting and Preventing Breaches
● Brute force attacks
○ Good logging of any failed authentication
○ Monitoring easy via ...
Conclusions
Summary
● Reactive processes are in place
○ security@ceph.com, CVEs, downstream product updates, etc.
● Proactive measures...
Get involved !
● Ceph
○ https://ceph.com/community/contribute/
○ ceph-devel@vger.kernel.org
○ IRC: OFTC
■ #ceph,
■ #ceph-d...
danny.al-gaaf@telekom.de
dalgaaf
linkedin.com/in/dalgaaf
Danny Al-Gaaf
Senior Cloud Technologist
IRC
THANK YOU!
Upcoming SlideShare
Loading in …5
×

of

DOST: Ceph in a security critical OpenStack cloud Slide 1 DOST: Ceph in a security critical OpenStack cloud Slide 2 DOST: Ceph in a security critical OpenStack cloud Slide 3 DOST: Ceph in a security critical OpenStack cloud Slide 4 DOST: Ceph in a security critical OpenStack cloud Slide 5 DOST: Ceph in a security critical OpenStack cloud Slide 6 DOST: Ceph in a security critical OpenStack cloud Slide 7 DOST: Ceph in a security critical OpenStack cloud Slide 8 DOST: Ceph in a security critical OpenStack cloud Slide 9 DOST: Ceph in a security critical OpenStack cloud Slide 10 DOST: Ceph in a security critical OpenStack cloud Slide 11 DOST: Ceph in a security critical OpenStack cloud Slide 12 DOST: Ceph in a security critical OpenStack cloud Slide 13 DOST: Ceph in a security critical OpenStack cloud Slide 14 DOST: Ceph in a security critical OpenStack cloud Slide 15 DOST: Ceph in a security critical OpenStack cloud Slide 16 DOST: Ceph in a security critical OpenStack cloud Slide 17 DOST: Ceph in a security critical OpenStack cloud Slide 18 DOST: Ceph in a security critical OpenStack cloud Slide 19 DOST: Ceph in a security critical OpenStack cloud Slide 20 DOST: Ceph in a security critical OpenStack cloud Slide 21 DOST: Ceph in a security critical OpenStack cloud Slide 22 DOST: Ceph in a security critical OpenStack cloud Slide 23 DOST: Ceph in a security critical OpenStack cloud Slide 24 DOST: Ceph in a security critical OpenStack cloud Slide 25 DOST: Ceph in a security critical OpenStack cloud Slide 26 DOST: Ceph in a security critical OpenStack cloud Slide 27 DOST: Ceph in a security critical OpenStack cloud Slide 28 DOST: Ceph in a security critical OpenStack cloud Slide 29 DOST: Ceph in a security critical OpenStack cloud Slide 30 DOST: Ceph in a security critical OpenStack cloud Slide 31 DOST: Ceph in a security critical OpenStack cloud Slide 32 DOST: Ceph in a security critical OpenStack cloud Slide 33 DOST: Ceph in a security critical OpenStack cloud Slide 34 DOST: Ceph in a security critical OpenStack cloud Slide 35 DOST: Ceph in a security critical OpenStack cloud Slide 36 DOST: Ceph in a security critical OpenStack cloud Slide 37 DOST: Ceph in a security critical OpenStack cloud Slide 38
Upcoming SlideShare
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Next
Download to read offline and view in fullscreen.

3 Likes

Share

Download to read offline

DOST: Ceph in a security critical OpenStack cloud

Download to read offline

Presentation held on "Deutsche OpenStack Tage 2015 in Frankfurt, Germany" about Ceph Security in a OpenStack Cloud.

Related Books

Free with a 30 day trial from Scribd

See all

DOST: Ceph in a security critical OpenStack cloud

  1. 1. Ceph in a security critical OpenStack cloud Danny Al-Gaaf (Deutsche Telekom) Deutsche OpenStack Tage 2015 - Frankfurt
  2. 2. ● Ceph and OpenStack ● Secure NFV cloud at DT ● Attack surface ● Proactive countermeasures ○ Setup ○ Vulnerability prevention ○ Breach mitigation ● Reactive countermeasures ○ 0-days, CVEs ○ Security support SLA and lifecycle ● Conclusions Overview 2
  3. 3. Ceph and OpenStack
  4. 4. Ceph Architecture 4
  5. 5. Ceph and OpenStack 5
  6. 6. Secure NFV Cloud @ DT
  7. 7. NFV Cloud @ Deutsche Telekom ● Datacenter design ○ BDCs ■ few but classic DCs ■ high SLAs for infrastructure and services ■ for private/customer data and services ○ FDCs ■ small but many ■ near to the customer ■ lower SLAs, can fail at any time ■ NFVs: ● spread over many FDCs ● failures are handled by services and not the infrastructure ● Run telco core services @ OpenStack/KVM/Ceph 7
  8. 8. Fundamentals - The CIA Triad 8 CONFIDENTIALITY INTEGRITY AVAILABILITY Preventing sensitive data against unauthorized access Maintaining consistency, accuracy, and trustworthiness of data Protecting systems against disrupting services and availability of information
  9. 9. High Security Requirements ● Multiple security placement zones (PZ) ○ e.g. EHD, DMZ, MZ, SEC, Management ○ TelcoWG “Security Segregation” use case ● Separation between PZs required for: ○ compute ○ networks ○ storage ● Protect against many attack vectors ● Enforced and reviewed by security department 9
  10. 10. Solutions for storage separation ● Physical separation ○ Large number of clusters (>100) ○ Large hardware demand (compute and storage) ○ High maintenance effort ○ Less flexibility ● RADOS pool separation ○ Much more flexible ○ Efficient use of hardware ● Question: ○ Can we get the same security as physical separation? 10
  11. 11. Separation through Placement Zones ● One RADOS pool for each security zone ○ Limit access using Ceph capabilities ● OpenStack AZs as PZs ○ Cinder ■ Configure one backend/volume type per pool (with own key) ■ Need to map between AZs and volume types via policy ○ Glance ■ Lacks separation between control and compute/storage layer ■ Separate read-only vs management endpoints ○ Manila ■ Currently not planned to use in production with CephFS ■ May use RBD via NFS 11
  12. 12. Attack Surface
  13. 13. RadosGW attack surface ● S3/Swift ○ Network access to gateway only ○ No direct access for consumer to other Ceph daemons ● Single API attack surface 13
  14. 14. RBD librbd attack surface ● Protection from hypervisor block layer ○ transparent for the guest ○ No network access or CephX keys needed at guest level ● Issue: ○ hypervisor is software and therefore not 100% secure… ■ breakouts are no mythical creature ■ e.g., Virtunoid, SYSENTER, Venom!14
  15. 15. RBD.ko attack surface ● RBD kernel module ○ e.g. used with XEN or on bare metal ○ Requires direct access to Ceph public network ○ Requires CephX keys/secret at guest level ● Issue: ○ no separation between cluster and guest 15
  16. 16. CephFS attack surface ● pure CephFS tears a big hole in hypervisor separation ○ Requires direct access to Ceph public network ○ Requires CephX keys/secret at guest level ○ Complete file system visible to guest ■ Separation currently only via POSIX user/group 16
  17. 17. Host attack surface ● If KVM is compromised, the attacker ... ○ has access to neighbor VMs ○ has access to local Ceph keys ○ has access to Ceph public network and Ceph daemons ● Firewalls, deep packet inspection (DPI), ... ○ partly impractical due to used protocols ○ implications to performance and cost ● Bottom line: Ceph daemons must resist attack ○ C/C++ is harder to secure than e.g. Python ○ Homogenous: if one daemon is vulnerable, all in the cluster are! 17
  18. 18. Network attack surface ● Sessions are authenticated ○ Attacker cannot impersonate clients or servers ○ Attacker cannot mount man-in-the-middle attacks ● Client/cluster sessions are not encrypted ○ Sniffer can recover any data read or written 18
  19. 19. Denial of Service ● Attack against: ○ Ceph Cluster: ■ Submit many / large / expensive IOs ■ Open many connections ■ Use flaws to crash Ceph daemons ■ Identify non-obvious but expensive features of client/OSD interface ○ Ceph Cluster hosts: ■ Crash complete cluster hosts e.g. through flaws in kernel network layer ○ VMs on same host: ■ Saturate the network bandwidth of the host 19
  20. 20. Proactive Countermeasures
  21. 21. Deployment and Setup ● Network ○ Always use separated cluster and public networks ○ Always separate your control nodes from other networks ○ Don’t expose cluster to the open internet ○ Encrypt inter-datacenter traffic ● Avoid hyper-converged infrastructure ○ Don’t mix ■ compute and storage resources, isolate them! ■ OpenStack and Ceph control nodes ○ Scale resources independently ○ Risk mitigation if daemons are compromised or DoS’d 21
  22. 22. Deploying RadosGW ● Big and easy target through HTTP(S) protocol ● Small appliance per tenant with ○ Separate network ○ SSL terminated proxy forwarding requests to radosgw ○ WAF (mod_security) to filter ○ Placed in secure/managed zone ○ different type of webserver than RadosGW ● Don’t share buckets/users between tenants22
  23. 23. Ceph security: CephX ● Monitors are trusted key servers ○ Store copies of all entity keys ○ Each key has an associated “capability” ■ Plaintext description of what the key user is allowed to do ● What you get ○ Mutual authentication of client + server ○ Extensible authorization w/ “capabilities” ○ Protection from man-in-the-middle, TCP session hijacking ● What you don’t get ○ Secrecy (encryption over the wire) 23
  24. 24. Ceph security: CephX take-aways ● Monitors must be secured ○ Protect the key database ● Key management is important ○ Separate key for each Cinder backend/AZ ○ Restrict capabilities associated with each key ○ Limit administrators’ power ■ use ‘allow profile admin’ and ‘allow profile readonly’ ■ restrict role-definer or ‘allow *’ keys ○ Careful key distribution (Ceph and OpenStack nodes) ● To do: ○ Thorough CephX code review by security experts ○ Audit OpenStack deployment tools’ key distribution ○ Improve security documentation24
  25. 25. ● Static Code Analysis (SCA) ○ Buffer overflows and other code flaws ○ Regular Coverity scans ■ 996 fixed, 284 dismissed; 420 outstanding ■ defect density 0.97 ○ cppcheck ○ LLVM: clang/scan-build ● Runtime analysis ○ valgrind memcheck ● Plan ○ Reduce backlog of low-priority issues (e.g., issues in test code) ○ Automated reporting of new SCA issues on pull requests ○ Improve code reviewer awareness of security defects Preventing Breaches - Defects 25
  26. 26. ● Pen-testing ○ human attempt to subvert security, generally guided by code review ● Fuzz testing ○ computer attempt to subvert or crash, by feeding garbage input ● Harden build ○ -fpie -fpic ○ -stack-protector=strong ○ -Wl,-z,relro,-z,now ○ -D_FORTIFY_SOURCE=2 -O2 (?) ○ Check for performance regression! Preventing Breaches - Hardening 26
  27. 27. Mitigating Breaches ● Run non-root daemons (WIP: PR #4456) ○ Prevent escalating privileges to get root ○ Run as ‘ceph’ user and group ○ Pending for Infernalis ● MAC ○ SELinux / AppArmor ○ Profiles for daemons and tools planned for Infernalis ● Run (some) daemons in VMs or containers ○ Monitor and RGW - less resource intensive ○ MDS - maybe ○ OSD - prefers direct access to hardware ● Separate MON admin network 27
  28. 28. Encryption: Data at Rest ● Encryption at application vs cluster level ● Some deployment tools support dm-crypt ○ Encrypt raw block device (OSD and journal) ○ Allow disks to be safely discarded if key remains secret ● Key management is still very simple ○ Encryption key stored on disk via LUKS ○ LUKS key stored in /etc/ceph/keys ● Plan ○ Petera, a new key escrow project from Red Hat ■ https://github.com/npmccallum/petera ○ Alternative: simple key management via monitor (CDS blueprint) 28
  29. 29. ● Goal ○ Protect data from someone listening in on network ○ Protect administrator sessions configuring client keys ● Plan ○ Generate per-session keys based on existing tickets ○ Selectively encrypt monitor administrator sessions ○ alternative: make use of IPSec (performance and management implications) Encryption: On Wire 29
  30. 30. ● Limit load from client ○ Use qemu IO throttling features - set safe upper bound ● To do: ○ Limit max open sockets per OSD ○ Limit max open sockets per source IP ■ handle on Ceph or in the network layer? ○ Throttle operations per-session or per-client (vs just globally)? Denial of Service attacks 30
  31. 31. CephFS ● No standard virtualization layer (unlike block) ○ Filesystem passthrough (9p/virtfs) to host ○ Proxy through gateway (NFS?) ○ Allow direct access from tenant VM (most unsecure) ● Granularity of access control is harder ○ No simple mapping to RADOS objects ● Work in progress ○ root_squash (Infernalis blueprint) ○ Restrict mount to subtree ○ Restrict mount to user 31
  32. 32. Reactive Countermeasures
  33. 33. ● Community ○ Single point of contact: security@ceph.com ■ Core development team ■ Red Hat, SUSE, Canonical security teams ○ Security related fixes are prioritized and backported ○ Releases may be accelerated on ad hoc basis ○ Security advisories to ceph-announce@ceph.com ● Red Hat Ceph ○ Strict SLA on issues raised with Red Hat security team ○ Escalation process to Ceph developers ○ Red Hat security team drives CVE process ○ Hot fixes distributed via Red Hat’s CDN Reactive Security Process 33
  34. 34. Detecting and Preventing Breaches ● Brute force attacks ○ Good logging of any failed authentication ○ Monitoring easy via existing tools like e.g. Nagios ● To do: ○ Automatic blacklisting IPs/clients after n-failed attempts on Ceph level (Jewel blueprint) ● Unauthorized injection of keys ○ Monitor the audit log ■ trigger alerts for auth events -> monitoring ○ Periodic comparison with signed backup of auth database? 34
  35. 35. Conclusions
  36. 36. Summary ● Reactive processes are in place ○ security@ceph.com, CVEs, downstream product updates, etc. ● Proactive measures in progress ○ Code quality improves (SCA, etc.) ○ Unprivileged daemons ○ MAC (SELinux, AppArmor) ○ Encryption ● Progress defining security best-practices ○ Document best practices for security ● Ongoing process 36
  37. 37. Get involved ! ● Ceph ○ https://ceph.com/community/contribute/ ○ ceph-devel@vger.kernel.org ○ IRC: OFTC ■ #ceph, ■ #ceph-devel ○ Ceph Developer Summit ● OpenStack ○ Telco Working Group ■ #openstack-nfv ○ Cinder, Glance, Manila, ... 37
  38. 38. danny.al-gaaf@telekom.de dalgaaf linkedin.com/in/dalgaaf Danny Al-Gaaf Senior Cloud Technologist IRC THANK YOU!
  • budibudifr

    Sep. 5, 2018
  • parkseungkyu

    Jul. 2, 2015
  • ircolle

    Jun. 27, 2015

Presentation held on "Deutsche OpenStack Tage 2015 in Frankfurt, Germany" about Ceph Security in a OpenStack Cloud.

Views

Total views

3,526

On Slideshare

0

From embeds

0

Number of embeds

49

Actions

Downloads

102

Shares

0

Comments

0

Likes

3

×