Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big data security


Published on

Big data security

  1. 1. Big Data Security Joey Echeverria | Principal Solutions Architect | @fwiffo1 ©2013 Cloudera, Inc.
  2. 2. Big Data Security EARLY DAYS2
  3. 3. Hadoop File Permissions • Added in HADOOP-1298 • Hadoop 0.16 • Early 2008 • Authorization without authentication • POSIX-like RWX bits3
  4. 4. MapReduce ACLs • Added in HADOOP-3698 • Hadoop 0.19 • Late 2008 • ACLs per job queue • Set a list of allowed users or groups per operation • Job submission • Job administration • No authentication4
  5. 5. Securing a Cluster Through a Gateway • Hadoop cluster runs on a private network • Gateway server dual-homed (Hadoop network and public network) • Users SSH onto gateway • Optionally can create an SSH proxy for jobs to be submitted from the client machine • Provides minimum level of protection5
  6. 6. Big Data Security WHY SECURITY MATTERS6
  7. 7. Prevent Accidental Access • Don’t let users shoot themselves in the foot • Main driver for early features • Not security per-se, but a critical first step • Doesn’t require strong authentication7
  8. 8. Stop Malicious Users • Early features were necessary, but not sufficient • Security has to get real • Hadoop runs arbitrary code • Implicit trust doesn’t prevent the insider threat8
  9. 9. Co-mingle All Your Data • Often overlooked • Big data means getting rid of stovepipes • Scalability and flexibility are only 50% of the problem • Trust your data in a multi-tenant environment • Most critical driver9
  10. 10. Big Data Security AN EVOLVING STORY10
  11. 11. Authorization • Files • MapReduce/YARN job queues • Service-level authorization • Whitelists and blacklists of hosts and users11
  12. 12. Authentication 2.2 High Level Use Cases 2 USE CASES • HADOOP-4487 • Hadoop 0.22evel U0.20.205 2.2 H igh L and se Cases 1. A ppl icat i ons accessing fi les on H D F S cl ust er s Non-MapReduce ap- • Late 2010ions, including hadoop fs, access files st ored on one or more HDFS plicat clust ers. T he applicat ion should only be able t o access files and services • Based on Kerberos and internal delegation tokens t hey are aut horized t o access. See figure 1. Variat ions: (a) Access HDFS direct ly using HDFS prot ocol. • Provides strong user authentication servers via t he HFT P (b) Access HDFS indirect ly t hough HDFS proxy FileSyst em or HT T P get . • Also used for service-to-service authentication Name delg(jo (joe) Node e kerb ) MapReduce Application kerb(hdfs) Task bloc e n k to ken tok ck Data blo Node Figure 1: HDFS High-level Dat aflow12
  13. 13. Encryption • Over the wire encryption for some socket connections • RPC encryption added soon after Kerberos • Shuffle encryption (HTTPS) added in Hadoop 2.0.2- alpha, back ported to CDH4 MR1 • HDFS block streamer encryption added in Hadoop 2.0.2-alpha • Volume-level encryption for data at rest13
  14. 14. Big Data Security SECURITY FOR KEY VALUE STORES14
  15. 15. Apache Accumulo • Robust, scalable, high performance data storage and retrieval system • Built by NSA, now an Apache project • Based on Google’s BigTable • Built on top of HDFS, ZooKeeper and Thrift • Iterators for server-side extensions • Cell labels for flexible security models15
  16. 16. Data Model • Multi-dimensional, persistent, sorted map • Key/Value store with a twist • A single primary key (Row ID) • Secondary key (Column) internal to a row • Family • Qualifier • Per-cell timestamp16
  17. 17. Cell-Level Security • Labels stored per cell • Labels consist of Boolean expressions (AND, OR, nesting) • Labels associated with each user • Cell labels checked against user’s labels with a built- in iterator17
  18. 18. Pluggable Authentication • Currently supports username/password authentication backed by ZooKeeper • ACCUMULO-259 • Targeted for Accumulo 1.5.0 • Authentication info replaced with generic tokens • Supports multiple implementations (e.g. Kerberos)18
  19. 19. Application Level • Accumulo often paired with application level authentication/authorization • Accumulo users created per application • Each application granted access level of most permitted user • Application authenticates users, grabs user authorizations, passes user labels with requests19
  20. 20. Apache HBase • Also based on Google’s BigTable • Started as a Hadoop contrib project • Supports column-level ACLs • Kerberos for authentication • Discussion and early prototypes of cell-level security ongoing20
  21. 21. Big Data Security FUTURE21
  22. 22. Encryption for Data at Rest • Need multiple levels of granularity • Encryption keys tied to authorization labels (like Accumulo labels or HBase ACLs) • APIs for file-level, block-level, or record-level encryption22
  23. 23. Hive Security • Column-level ACLs • Kerberos authentication • AccessServer23
  24. 24. 24 ©2013 Cloudera, Inc.