Your SlideShare is downloading. ×

Big data security

637
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
637
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
53
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data Security Joey Echeverria | Principal Solutions Architect joey@cloudera.com | @fwiffo1 ©2013 Cloudera, Inc.
  • 2. Big Data Security EARLY DAYS2
  • 3. Hadoop File Permissions • Added in HADOOP-1298 • Hadoop 0.16 • Early 2008 • Authorization without authentication • POSIX-like RWX bits3
  • 4. MapReduce ACLs • Added in HADOOP-3698 • Hadoop 0.19 • Late 2008 • ACLs per job queue • Set a list of allowed users or groups per operation • Job submission • Job administration • No authentication4
  • 5. Securing a Cluster Through a Gateway • Hadoop cluster runs on a private network • Gateway server dual-homed (Hadoop network and public network) • Users SSH onto gateway • Optionally can create an SSH proxy for jobs to be submitted from the client machine • Provides minimum level of protection5
  • 6. Big Data Security WHY SECURITY MATTERS6
  • 7. Prevent Accidental Access • Don’t let users shoot themselves in the foot • Main driver for early features • Not security per-se, but a critical first step • Doesn’t require strong authentication7
  • 8. Stop Malicious Users • Early features were necessary, but not sufficient • Security has to get real • Hadoop runs arbitrary code • Implicit trust doesn’t prevent the insider threat8
  • 9. Co-mingle All Your Data • Often overlooked • Big data means getting rid of stovepipes • Scalability and flexibility are only 50% of the problem • Trust your data in a multi-tenant environment • Most critical driver9
  • 10. Big Data Security AN EVOLVING STORY10
  • 11. Authorization • Files • MapReduce/YARN job queues • Service-level authorization • Whitelists and blacklists of hosts and users11
  • 12. Authentication 2.2 High Level Use Cases 2 USE CASES • HADOOP-4487 • Hadoop 0.22evel U0.20.205 2.2 H igh L and se Cases 1. A ppl icat i ons accessing fi les on H D F S cl ust er s Non-MapReduce ap- • Late 2010ions, including hadoop fs, access files st ored on one or more HDFS plicat clust ers. T he applicat ion should only be able t o access files and services • Based on Kerberos and internal delegation tokens t hey are aut horized t o access. See figure 1. Variat ions: (a) Access HDFS direct ly using HDFS prot ocol. • Provides strong user authentication servers via t he HFT P (b) Access HDFS indirect ly t hough HDFS proxy FileSyst em or HT T P get . • Also used for service-to-service authentication Name delg(jo (joe) Node e kerb ) MapReduce Application kerb(hdfs) Task bloc e n k to ken tok ck Data blo Node Figure 1: HDFS High-level Dat aflow12
  • 13. Encryption • Over the wire encryption for some socket connections • RPC encryption added soon after Kerberos • Shuffle encryption (HTTPS) added in Hadoop 2.0.2- alpha, back ported to CDH4 MR1 • HDFS block streamer encryption added in Hadoop 2.0.2-alpha • Volume-level encryption for data at rest13
  • 14. Big Data Security SECURITY FOR KEY VALUE STORES14
  • 15. Apache Accumulo • Robust, scalable, high performance data storage and retrieval system • Built by NSA, now an Apache project • Based on Google’s BigTable • Built on top of HDFS, ZooKeeper and Thrift • Iterators for server-side extensions • Cell labels for flexible security models15
  • 16. Data Model • Multi-dimensional, persistent, sorted map • Key/Value store with a twist • A single primary key (Row ID) • Secondary key (Column) internal to a row • Family • Qualifier • Per-cell timestamp16
  • 17. Cell-Level Security • Labels stored per cell • Labels consist of Boolean expressions (AND, OR, nesting) • Labels associated with each user • Cell labels checked against user’s labels with a built- in iterator17
  • 18. Pluggable Authentication • Currently supports username/password authentication backed by ZooKeeper • ACCUMULO-259 • Targeted for Accumulo 1.5.0 • Authentication info replaced with generic tokens • Supports multiple implementations (e.g. Kerberos)18
  • 19. Application Level • Accumulo often paired with application level authentication/authorization • Accumulo users created per application • Each application granted access level of most permitted user • Application authenticates users, grabs user authorizations, passes user labels with requests19
  • 20. Apache HBase • Also based on Google’s BigTable • Started as a Hadoop contrib project • Supports column-level ACLs • Kerberos for authentication • Discussion and early prototypes of cell-level security ongoing20
  • 21. Big Data Security FUTURE21
  • 22. Encryption for Data at Rest • Need multiple levels of granularity • Encryption keys tied to authorization labels (like Accumulo labels or HBase ACLs) • APIs for file-level, block-level, or record-level encryption22
  • 23. Hive Security • Column-level ACLs • Kerberos authentication • AccessServer23
  • 24. 24 ©2013 Cloudera, Inc.

×