DOST: Ceph in a security critical OpenStack cloud

2,092 views

Published on

Presentation held on "Deutsche OpenStack Tage 2015 in Frankfurt, Germany" about Ceph Security in a OpenStack Cloud.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,092
On SlideShare
0
From Embeds
0
Number of Embeds
48
Actions
Shares
0
Downloads
78
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

DOST: Ceph in a security critical OpenStack cloud

  1. 1. Ceph in a security critical OpenStack cloud Danny Al-Gaaf (Deutsche Telekom) Deutsche OpenStack Tage 2015 - Frankfurt
  2. 2. ● Ceph and OpenStack ● Secure NFV cloud at DT ● Attack surface ● Proactive countermeasures ○ Setup ○ Vulnerability prevention ○ Breach mitigation ● Reactive countermeasures ○ 0-days, CVEs ○ Security support SLA and lifecycle ● Conclusions Overview 2
  3. 3. Ceph and OpenStack
  4. 4. Ceph Architecture 4
  5. 5. Ceph and OpenStack 5
  6. 6. Secure NFV Cloud @ DT
  7. 7. NFV Cloud @ Deutsche Telekom ● Datacenter design ○ BDCs ■ few but classic DCs ■ high SLAs for infrastructure and services ■ for private/customer data and services ○ FDCs ■ small but many ■ near to the customer ■ lower SLAs, can fail at any time ■ NFVs: ● spread over many FDCs ● failures are handled by services and not the infrastructure ● Run telco core services @ OpenStack/KVM/Ceph 7
  8. 8. Fundamentals - The CIA Triad 8 CONFIDENTIALITY INTEGRITY AVAILABILITY Preventing sensitive data against unauthorized access Maintaining consistency, accuracy, and trustworthiness of data Protecting systems against disrupting services and availability of information
  9. 9. High Security Requirements ● Multiple security placement zones (PZ) ○ e.g. EHD, DMZ, MZ, SEC, Management ○ TelcoWG “Security Segregation” use case ● Separation between PZs required for: ○ compute ○ networks ○ storage ● Protect against many attack vectors ● Enforced and reviewed by security department 9
  10. 10. Solutions for storage separation ● Physical separation ○ Large number of clusters (>100) ○ Large hardware demand (compute and storage) ○ High maintenance effort ○ Less flexibility ● RADOS pool separation ○ Much more flexible ○ Efficient use of hardware ● Question: ○ Can we get the same security as physical separation? 10
  11. 11. Separation through Placement Zones ● One RADOS pool for each security zone ○ Limit access using Ceph capabilities ● OpenStack AZs as PZs ○ Cinder ■ Configure one backend/volume type per pool (with own key) ■ Need to map between AZs and volume types via policy ○ Glance ■ Lacks separation between control and compute/storage layer ■ Separate read-only vs management endpoints ○ Manila ■ Currently not planned to use in production with CephFS ■ May use RBD via NFS 11
  12. 12. Attack Surface
  13. 13. RadosGW attack surface ● S3/Swift ○ Network access to gateway only ○ No direct access for consumer to other Ceph daemons ● Single API attack surface 13
  14. 14. RBD librbd attack surface ● Protection from hypervisor block layer ○ transparent for the guest ○ No network access or CephX keys needed at guest level ● Issue: ○ hypervisor is software and therefore not 100% secure… ■ breakouts are no mythical creature ■ e.g., Virtunoid, SYSENTER, Venom!14
  15. 15. RBD.ko attack surface ● RBD kernel module ○ e.g. used with XEN or on bare metal ○ Requires direct access to Ceph public network ○ Requires CephX keys/secret at guest level ● Issue: ○ no separation between cluster and guest 15
  16. 16. CephFS attack surface ● pure CephFS tears a big hole in hypervisor separation ○ Requires direct access to Ceph public network ○ Requires CephX keys/secret at guest level ○ Complete file system visible to guest ■ Separation currently only via POSIX user/group 16
  17. 17. Host attack surface ● If KVM is compromised, the attacker ... ○ has access to neighbor VMs ○ has access to local Ceph keys ○ has access to Ceph public network and Ceph daemons ● Firewalls, deep packet inspection (DPI), ... ○ partly impractical due to used protocols ○ implications to performance and cost ● Bottom line: Ceph daemons must resist attack ○ C/C++ is harder to secure than e.g. Python ○ Homogenous: if one daemon is vulnerable, all in the cluster are! 17
  18. 18. Network attack surface ● Sessions are authenticated ○ Attacker cannot impersonate clients or servers ○ Attacker cannot mount man-in-the-middle attacks ● Client/cluster sessions are not encrypted ○ Sniffer can recover any data read or written 18
  19. 19. Denial of Service ● Attack against: ○ Ceph Cluster: ■ Submit many / large / expensive IOs ■ Open many connections ■ Use flaws to crash Ceph daemons ■ Identify non-obvious but expensive features of client/OSD interface ○ Ceph Cluster hosts: ■ Crash complete cluster hosts e.g. through flaws in kernel network layer ○ VMs on same host: ■ Saturate the network bandwidth of the host 19
  20. 20. Proactive Countermeasures
  21. 21. Deployment and Setup ● Network ○ Always use separated cluster and public networks ○ Always separate your control nodes from other networks ○ Don’t expose cluster to the open internet ○ Encrypt inter-datacenter traffic ● Avoid hyper-converged infrastructure ○ Don’t mix ■ compute and storage resources, isolate them! ■ OpenStack and Ceph control nodes ○ Scale resources independently ○ Risk mitigation if daemons are compromised or DoS’d 21
  22. 22. Deploying RadosGW ● Big and easy target through HTTP(S) protocol ● Small appliance per tenant with ○ Separate network ○ SSL terminated proxy forwarding requests to radosgw ○ WAF (mod_security) to filter ○ Placed in secure/managed zone ○ different type of webserver than RadosGW ● Don’t share buckets/users between tenants22
  23. 23. Ceph security: CephX ● Monitors are trusted key servers ○ Store copies of all entity keys ○ Each key has an associated “capability” ■ Plaintext description of what the key user is allowed to do ● What you get ○ Mutual authentication of client + server ○ Extensible authorization w/ “capabilities” ○ Protection from man-in-the-middle, TCP session hijacking ● What you don’t get ○ Secrecy (encryption over the wire) 23
  24. 24. Ceph security: CephX take-aways ● Monitors must be secured ○ Protect the key database ● Key management is important ○ Separate key for each Cinder backend/AZ ○ Restrict capabilities associated with each key ○ Limit administrators’ power ■ use ‘allow profile admin’ and ‘allow profile readonly’ ■ restrict role-definer or ‘allow *’ keys ○ Careful key distribution (Ceph and OpenStack nodes) ● To do: ○ Thorough CephX code review by security experts ○ Audit OpenStack deployment tools’ key distribution ○ Improve security documentation24
  25. 25. ● Static Code Analysis (SCA) ○ Buffer overflows and other code flaws ○ Regular Coverity scans ■ 996 fixed, 284 dismissed; 420 outstanding ■ defect density 0.97 ○ cppcheck ○ LLVM: clang/scan-build ● Runtime analysis ○ valgrind memcheck ● Plan ○ Reduce backlog of low-priority issues (e.g., issues in test code) ○ Automated reporting of new SCA issues on pull requests ○ Improve code reviewer awareness of security defects Preventing Breaches - Defects 25
  26. 26. ● Pen-testing ○ human attempt to subvert security, generally guided by code review ● Fuzz testing ○ computer attempt to subvert or crash, by feeding garbage input ● Harden build ○ -fpie -fpic ○ -stack-protector=strong ○ -Wl,-z,relro,-z,now ○ -D_FORTIFY_SOURCE=2 -O2 (?) ○ Check for performance regression! Preventing Breaches - Hardening 26
  27. 27. Mitigating Breaches ● Run non-root daemons (WIP: PR #4456) ○ Prevent escalating privileges to get root ○ Run as ‘ceph’ user and group ○ Pending for Infernalis ● MAC ○ SELinux / AppArmor ○ Profiles for daemons and tools planned for Infernalis ● Run (some) daemons in VMs or containers ○ Monitor and RGW - less resource intensive ○ MDS - maybe ○ OSD - prefers direct access to hardware ● Separate MON admin network 27
  28. 28. Encryption: Data at Rest ● Encryption at application vs cluster level ● Some deployment tools support dm-crypt ○ Encrypt raw block device (OSD and journal) ○ Allow disks to be safely discarded if key remains secret ● Key management is still very simple ○ Encryption key stored on disk via LUKS ○ LUKS key stored in /etc/ceph/keys ● Plan ○ Petera, a new key escrow project from Red Hat ■ https://github.com/npmccallum/petera ○ Alternative: simple key management via monitor (CDS blueprint) 28
  29. 29. ● Goal ○ Protect data from someone listening in on network ○ Protect administrator sessions configuring client keys ● Plan ○ Generate per-session keys based on existing tickets ○ Selectively encrypt monitor administrator sessions ○ alternative: make use of IPSec (performance and management implications) Encryption: On Wire 29
  30. 30. ● Limit load from client ○ Use qemu IO throttling features - set safe upper bound ● To do: ○ Limit max open sockets per OSD ○ Limit max open sockets per source IP ■ handle on Ceph or in the network layer? ○ Throttle operations per-session or per-client (vs just globally)? Denial of Service attacks 30
  31. 31. CephFS ● No standard virtualization layer (unlike block) ○ Filesystem passthrough (9p/virtfs) to host ○ Proxy through gateway (NFS?) ○ Allow direct access from tenant VM (most unsecure) ● Granularity of access control is harder ○ No simple mapping to RADOS objects ● Work in progress ○ root_squash (Infernalis blueprint) ○ Restrict mount to subtree ○ Restrict mount to user 31
  32. 32. Reactive Countermeasures
  33. 33. ● Community ○ Single point of contact: security@ceph.com ■ Core development team ■ Red Hat, SUSE, Canonical security teams ○ Security related fixes are prioritized and backported ○ Releases may be accelerated on ad hoc basis ○ Security advisories to ceph-announce@ceph.com ● Red Hat Ceph ○ Strict SLA on issues raised with Red Hat security team ○ Escalation process to Ceph developers ○ Red Hat security team drives CVE process ○ Hot fixes distributed via Red Hat’s CDN Reactive Security Process 33
  34. 34. Detecting and Preventing Breaches ● Brute force attacks ○ Good logging of any failed authentication ○ Monitoring easy via existing tools like e.g. Nagios ● To do: ○ Automatic blacklisting IPs/clients after n-failed attempts on Ceph level (Jewel blueprint) ● Unauthorized injection of keys ○ Monitor the audit log ■ trigger alerts for auth events -> monitoring ○ Periodic comparison with signed backup of auth database? 34
  35. 35. Conclusions
  36. 36. Summary ● Reactive processes are in place ○ security@ceph.com, CVEs, downstream product updates, etc. ● Proactive measures in progress ○ Code quality improves (SCA, etc.) ○ Unprivileged daemons ○ MAC (SELinux, AppArmor) ○ Encryption ● Progress defining security best-practices ○ Document best practices for security ● Ongoing process 36
  37. 37. Get involved ! ● Ceph ○ https://ceph.com/community/contribute/ ○ ceph-devel@vger.kernel.org ○ IRC: OFTC ■ #ceph, ■ #ceph-devel ○ Ceph Developer Summit ● OpenStack ○ Telco Working Group ■ #openstack-nfv ○ Cinder, Glance, Manila, ... 37
  38. 38. danny.al-gaaf@telekom.de dalgaaf linkedin.com/in/dalgaaf Danny Al-Gaaf Senior Cloud Technologist IRC THANK YOU!

×