Security in a hadoop cluster: Overview of Approaches


Published on

Putting all your data in one place makes it a target for bad guys.

Plenty of examples in the news of major data breaches – no one wants to be responsible for that.

Best to consider Security up front in design time.

It’s the right thing to do.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Security in a hadoop cluster: Overview of Approaches

  1. 1. SECURITY IN A HADOOPCLUSTEROverview of Approaches
  2. 2. Security: Why You Should CareHadoop is a powerful technology that lets us doamazing things…but “with great power comesgreat responsibility” - Voltaire / Uncle Ben • Putting all your data in one place makes it a target for bad guys • Plenty of examples in the news of major data breaches – no one wants to be responsible for that • Best to consider Security up front in design time • It’s the right thing to do © 2012 Cloudera. All Rights Reserved.
  3. 3. Types of SecurityType ExampleAccess Physical (lock and key), Virtual (Firewalls, VLANS)Authentication Logins – verify users are who they say they areAuthorization Permissions – verify what a user can doEncryption at Rest Data protection for files on diskEncryption in transport Data protection on the wireAuditing Keep track of who accessed whatPolicy / Procedure Protect against Human Error & Social Engineering © 2012 Cloudera. All Rights Reserved.
  4. 4. Hadoop Ecosystem Security: What is supported today?Approach BenefitNetwork Based Isolation Restrict Network access to only authorized usersof ClusterHDFS File Ownership & Configure access permissions (ACLs) to files in HDFSPermissions for Users/GroupsKerberos Authentication & Strong authentication of both clients and servers so thatAuthorization tasks can be run under a job submitters own acct.Combination of Kerberos & HDFS Configure User & Group lockdowns for Read / Write /ACLs enable lockdowns Execute of files and jobs. Prevents user impersonation Offers Table & Column / Row & Cell security, respectively,HBase & Accumulo Security for Users and Groups •At Rest via 3rd party, at OS layer •In Transport:Some Encryption • Internal for HDFS and MapReduce (new in CDH 4.1) •External for HttpFS via SSL, SQOOP via Native DB Driver Encryption (not yet for FLUME)TLS between Cloudera Manager Provides encryption and authentication in theand Agents communications to prevent snooping © 2012 Cloudera. All Rights Reserved.
  5. 5. Kerberos Overview• Kerberos: A computer network authentication protocol that works on basis of tickets to allow nodes to prove identity to each other in a secure manner using encryption extensively• Messages are exchanged between: • Client • Server • Kerberos Key Distribution Center (KDC). • Note this is not part of Hadoop, but most Linux Distros come with MIT Kerberos KDC.• Passwords are not sent across network, Instead passwords are used to compute encryption keys• Authentication status is cached (don’t need to send credentials with each request)• Timestamps are essential to Kerberos (make sure system clocks are synchronized !)
  6. 6. HBase and Accumulo OverviewBoth systems:• Open Source, Distributed NoSQL, Key/Value stores that run on Hadoop, based on Googles BigTable design• Provide real-time read/write access to HDFS• Scale to 1000s of nodes and Petabytes of data• Provide real-time and bulk APIs for loading of data• Support application-level extensions to the core (in HBase theyre called co- processors, in Accumulo theyre called iterators and aggregators)• Run at scale in Production Environments• Can run on CDH !!!Primary Differences (in terms of Security)• HBase supports Kerberos authentication and ACLs on tables and column families• Accumulo support username/password authentication (Kerberos based is under development), table level permissions, and ACLs on individual Cells © 2012 Cloudera. All Rights Reserved.
  7. 7. Configuring Security in Hadoop• Hadoop Security configuration is a specialized topic• Many specifics depend on: • Version of Hadoop • Type of Kerberos being used (AD or MIT) • Operating System and Distribution• Little room for misconfiguration • Must follow instructions exactly • Mistakes often result in vague “access denied” errors • May need to work around Version specific bugs• The can help configure a secure system
  8. 8. Hadoop Security Landscape:Future Requirements and Features• Forward-Deployed Systems • Not just in the comfy confines of the Datacenter anymore • Encryption at Rest becoming a major requirement• Mixed level Security • Analytics in environments with multiple levels of trusted users and multiple levels of sensitive data • Joining low-sensitive data with med-sensitive data can equal high- sensitive data. • What users can see, use, join/merge, analyze, and derive insight from what data ?• Wireless • PDA access • Wireless Clusters © 2012 Cloudera. All Rights Reserved.
  9. 9. CONCLUSION• “…with great power comes great responsibility” - Voltaire / Uncle Ben• Security = Policy + Implementation • Implementation is both Technical and Human • Weaknesses and breakdowns are inevitable (ask any Hacker) • Must do everything to limit severity of breakdown by implementing multiple levels of security• I’m not the expert, your local Hacker is • Pay attention to their community • Keep current on Hacking techniques, news of breaches, etc. • Maintain good OpSec• The can configure a secure system © 2012 Cloudera. All Rights Reserved.
  10. 10. Questions?