0
Hadoop Security: Today and Tomorrow
Vinay Shukla
Hortonworks
© Hortonworks Inc. 2014
Hadoop Security Today &
Tomorrow
Amsterdam - April3rd, 2014
Vinay Shukla
Twitter: @NeoMythos
© Hortonworks Inc. 2014
Agenda
• What is Hadoop Security?
– 4 Security Pillars & Rings of Defense
• What security elements...
© Hortonworks Inc. 2014
What is Apache Hadoop Security?
Security in Apache Hadoop is
defined by four key pillars:
authenti...
© Hortonworks Inc. 2014
Two Reasons for Security in Hadoop
Hadoop Contains Sensitive Data
–As Hadoop adoption grows so too...
© Hortonworks Inc. 2014
Security: Rings of Defense
Perimeter Level Security
• Network Security (i.e. Firewalls)
• Apache K...
© Hortonworks Inc. 2014
Authentication in Hadoop Today…
Authentication
Who am I/prove it?
Control access to
cluster.
Autho...
© Hortonworks Inc. 2014
Kerberos Authentication in Hadoop
For more than 20 years, Kerberos has been the de-facto
standard ...
© Hortonworks Inc. 2014
• Single Hadoop
access point
• REST API hierarchy
• Consolidated API
calls
• Multi-cluster
support...
© Hortonworks Inc. 2014
Authentication & Audit in Hadoop today…
Authorization
Restrict access
to explicit data
Audit
Under...
© Hortonworks Inc. 2014
Authorization: Who can do what in Hadoop?
• Access Control Services exist for each of the Hadoop
c...
© Hortonworks Inc. 2014
Data Protection in Hadoop today…
Authorization
Restrict access
to explicit data
Audit
Understand w...
© Hortonworks Inc. 2014
Data Protection in Hadoop
must be applied at three different
layers in Apache Hadoop
Storage: encr...
© Hortonworks Inc. 2014
Data Protection – Details - Today
• Encryption of Data at Rest
–Option 1: OS or Hardware Level Enc...
© Hortonworks Inc. 2014
What can be done today?
Authorization
Restrict access
to explicit data
Audit
Understand who
did wh...
© Hortonworks Inc. 2014
Hadoop Security
Hortonworks is Delivering Secure Hadoop for the Enterprise
Security for Hadoop mus...
© Hortonworks Inc. 2014
Hadoop Security: Phase 2
Page 17
HDP 2.1 Features
Release Theme REST API Security, Improve AuthZ, ...
© Hortonworks Inc. 2014
Why Knox?
From fb.com/hadoopmemes
Apache Knox Gateway
• REST/HTTP API security for
Hadoop
• Elimin...
© Hortonworks Inc. 2014
Knox Deployment with Hadoop Cluster
Application Tier
DMZ
Switch Switch
….
Master
Nodes
Rack 1
Swit...
© Hortonworks Inc. 2014
Hadoop REST API Security: Drill-Down
REST
Client
Enterprise
Identity
Provider
LDAP/AD
Knox
Gateway...
© Hortonworks Inc. 2014
Selects appropriate
service filter chain
based on request URL
mapping rules
REST
Client
Protocol
L...
© Hortonworks Inc. 2014© Hortonworks Inc. 2014
Knox Gateway in action
Submit MR job via Knox
© Hortonworks Inc. 2014
HDFS & MR Operations with Knox
• Create a few directories
curl -iku guest:guest-password -X PUT 'h...
© Hortonworks Inc. 2014
How to get Involved
Resource Location
Security Labs http://hortonworks.com/labs/security/
Security...
© Hortonworks Inc. 2014
Thank you! Amsterdam - April3rd, 2014
Vinay Shukla
Twitter: @NeoMythos
Hadoop Security Today and Tomorrow
Upcoming SlideShare
Loading in...5
×

Hadoop Security Today and Tomorrow

2,526

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,526
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
261
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • BackgroundHortonworks led initiativeUseful for connecting to Hadoop from the outside the clusterWhen more client language flexibility is requiredi.e. Java binding not an optionNot intended for RPC callsCall it REST API Gateway for HadoopDon’t call it a firewallFirewalls are at the network layerDon’t call is perimeter securityPerimeter security is getting discredited as an incomplete security solution
  • Node the arrows to Hadoop Cluster are simplificationsActually there will be multiple arrow – one per port open between Knox and Hadoop Services it supports (WebHDFS, WebHCAT, HiveServer2, HBase, Oozie) & more in future
  • Functions as HTTP reverse proxyRe-writes URLs to protect internal network topologyKnox Gateway embeds Jetty containerReads/Writes HTTP
  • Transcript of "Hadoop Security Today and Tomorrow"

    1. 1. Hadoop Security: Today and Tomorrow Vinay Shukla Hortonworks
    2. 2. © Hortonworks Inc. 2014 Hadoop Security Today & Tomorrow Amsterdam - April3rd, 2014 Vinay Shukla Twitter: @NeoMythos
    3. 3. © Hortonworks Inc. 2014 Agenda • What is Hadoop Security? – 4 Security Pillars & Rings of Defense • What security elements exists today? – Authentication – Authorization – Audit – Data Protection • What is on the security roadmap? – Coming soon – Longer term projects • Securing Hadoop with Apache Knox Gateway – Knox overview – Demo • How to get involved
    4. 4. © Hortonworks Inc. 2014 What is Apache Hadoop Security? Security in Apache Hadoop is defined by four key pillars: authentication, authorization, accountability, and data protection.
    5. 5. © Hortonworks Inc. 2014 Two Reasons for Security in Hadoop Hadoop Contains Sensitive Data –As Hadoop adoption grows so too has the types of data organizations look to store. Often the data is proprietary or personal and it must be protected. –In this context, Hadoop is governed by the same security requirements as any data center platform. Hadoop is subject to Compliance adherence –Organizations are often subject to comply with regulations such as HIPPA, PCI DSS, FISAM that require protection of personal information. –Adherence to other Corporate security policies. 1 2
    6. 6. © Hortonworks Inc. 2014 Security: Rings of Defense Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Data Protection • Core Hadoop • Partners Authentication • Kerberos OS Security Authorization • MR ACLs • HDFS Permissions • HDFS ACLs • HiveATZ-NG • HBase ACLs • Accumulo Label Security Page 6
    7. 7. © Hortonworks Inc. 2014 Authentication in Hadoop Today… Authentication Who am I/prove it? Control access to cluster. Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway
    8. 8. © Hortonworks Inc. 2014 Kerberos Authentication in Hadoop For more than 20 years, Kerberos has been the de-facto standard for strong authentication. …no other option exists. The design and implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworker Owen O’Malley in 2010. What does Kerberos Do? – Establishes identity for clients, hosts and services – Prevents impersonation/passwords are never sent over the wire – Integrates w/ enterprise identity management tools such as LDAP & Active Directory – More granular auditing of data access/job execution
    9. 9. © Hortonworks Inc. 2014 • Single Hadoop access point • REST API hierarchy • Consolidated API calls • Multi-cluster support • Eliminates SSH “edge node” • Central API management • Central audit control • Simple Service level Authorization • SSO Integration – Siteminder, API Key*, OAuth* & SAML* • LDAP & AD integration Perimeter Security with Apache Knox Integrated with existing systems to simplify identity maintenance Incubated and led by Hortonworks, Apache Knox provides a simple and open framework for Hadoop perimeter security. Single, simple point of access for a cluster Central controls ensure consistency across one or more clusters
    10. 10. © Hortonworks Inc. 2014 Authentication & Audit in Hadoop today… Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Authentication Who am I/prove it? Control access to cluster.
    11. 11. © Hortonworks Inc. 2014 Authorization: Who can do what in Hadoop? • Access Control Services exist for each of the Hadoop components –HDFS has file Permissions –YARN, MapReduce, HBase has Access Control Lists (ACL) –Accumulo Proves more granular label/cell level security • Improvements to these services are being led by Hortonworks Team: –HDFS Improvements – Extended ACL, more flexible via multiple policies on the same file or directory –Hive Improvements – Hortonworks initiative called Hive ATZ-NG, better integration allows familiar SQL/database syntax (GRANT/REVOKE) and allows more clients (including partner integrations) to be secure.
    12. 12. © Hortonworks Inc. 2014 Data Protection in Hadoop today… Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Wire encryption in native Apache Hadoop Orchestrated encryption with 3rd party tools Authentication Who am I/prove it? Control access to cluster.
    13. 13. © Hortonworks Inc. 2014 Data Protection in Hadoop must be applied at three different layers in Apache Hadoop Storage: encrypt data while it is at rest Direct data flows “into” and “out of” 3rd party encryption tools and/or rely upon hardware specific techniques (i.e. drive-level encryption). Transmission: encrypt data as it is in motion Native Apache Hadoop 2.0 provides wire encryption. Upon Access: apply restrictions when accessed Direct data flows “into” and “out of” 3rd party encryption tools. Data Protection
    14. 14. © Hortonworks Inc. 2014 Data Protection – Details - Today • Encryption of Data at Rest –Option 1: OS or Hardware Level Encryption (Out of the Box) –Option 2: Custom Development –Option 3: Certified Partners –Work underway for encryption in Hive, HDFS and HBase as core platform capabilities. • Encryption of Data on the Wire –All wire protocols can be encrypted by HDP platform (2.x). Wire-level encryption enhancements led by HWX Team. • Column Level Encryption –No current out of the box support in Hadoop. –Certified Partners provide these capabilities.
    15. 15. © Hortonworks Inc. 2014 What can be done today? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Service level Authorization with Knox Access Audit with Knox Wire encryption in native Apache Hadoop Wire Encryption with Knox Orchestrated encryption with 3rd party tools Authentication Who am I/prove it? Control access to cluster.
    16. 16. © Hortonworks Inc. 2014 Hadoop Security Hortonworks is Delivering Secure Hadoop for the Enterprise Security for Hadoop must be addressed within every layer of the stack and integrated into existing frameworks For a full description of what is available in Enterprise Hadoop today across Authentication, Authorization, Accountability and Data Protection please visit our security labs page Governance &Integration Security Operations Data Access Data Management HDP 2.1 New: Apache Knox Perimeter security for Hadoop  A common place to preform authentication across Hadoop and all related projects  Integrated to LDAP and AD  Currently supports: WebHDFS, WebHCAT, Oozie, Hive & HBase  Broad community effort, incubated with Microsoft, broad set of developers involved Security Investments Security Phase 3: • Audit event correlation and Audit viewer • Data Encryption in HDFS, Hive & HBase • Knox for HDFS HA, Ambari & Falcon • Support Token-Based AuthN beyond Kerb Security Phase 2: • ACLs for HDFS • Knox: Hadoop REST API Security • SQL-style Hive AuthZ (GRANT, REVOKE) • SSL support for Hive Server 2 • SSL for DN/NN UI & WebHDFS • PAM support for Hive Phase 1 • Strong AuthN with Kerberos • HBase, Hive, HDFS basic AuthZ • Encryption with SSL for NN, JT, etc. • Wire encryption with Shuffle, HDFS, JDBC
    17. 17. © Hortonworks Inc. 2014 Hadoop Security: Phase 2 Page 17 HDP 2.1 Features Release Theme REST API Security, Improve AuthZ, Wire Encryption Specific Features • Hadoop REST API Security with Apache Knox • Eliminates SSH edge node • Single Hadoop access point • LDAP, AD based Authentication • Service-level Authorization • Audit support for REST access • SQL style Hive Authorization with fine grain access • HDFS Access Control Lists • SSL support in HiveServer2 • SSL support in NN/DN UI & WebHDFS • Pluggable Authentication Module (PAM) in Hive Included Components Apache Knox, Hive, HDFS
    18. 18. © Hortonworks Inc. 2014 Why Knox? From fb.com/hadoopmemes Apache Knox Gateway • REST/HTTP API security for Hadoop • Eliminates SSH edge node • Single REST API access point • Centralized Authentication, Authorization, and Audit for Hadoop REST/HTTP services • LDAP/AD Authentication, Service Authorization, Audit etc. Knox Eliminates • Client’s requirements for intimate knowledge of cluster topology
    19. 19. © Hortonworks Inc. 2014 Knox Deployment with Hadoop Cluster Application Tier DMZ Switch Switch …. Master Nodes Rack 1 Switch NN SNN …. Slave Nodes Rack 2 …. Slave Nodes Rack N SwitchSwitch DN DN Web Tier LB Knox Hadoop CLIs
    20. 20. © Hortonworks Inc. 2014 Hadoop REST API Security: Drill-Down REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ L B Edge Node/H adoop CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 Hadoop Cluster 2 Masters Slaves RM NN Web HCat Oozie DN NM HS2 HBase HBase
    21. 21. © Hortonworks Inc. 2014 Selects appropriate service filter chain based on request URL mapping rules REST Client Protocol Listener Listens for requests on the appropriate protocols (e.g. HTTP/HTTPS) Service Selector Service Specific Filter Chain Identity Asserter Filter Dispatch Rewrite Filter AuthN Filter Hadoop Service Enforces propagation of authenticated identity to Hadoop by modifying request Streams request and response to and from Hadoop service based on rewritten URLs Translates URLs in request and response between external and internal URLs based on service specific rules Enterprise Identity Provider Enterprise/Cl oud SSO Provider Challenges client for credentials and authenticates or validates SSO Token Service filter chains are composed and configured at deployment time by service specific plugins What is Knox? Client > Knox > Hadoop Cluster Knox Gateway
    22. 22. © Hortonworks Inc. 2014© Hortonworks Inc. 2014 Knox Gateway in action Submit MR job via Knox
    23. 23. © Hortonworks Inc. 2014 HDFS & MR Operations with Knox • Create a few directories curl -iku guest:guest-password -X PUT 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=MKDIRS&permission=777' curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input?op=MKDIRS&permission=777" curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib?op=MKDIRS&permission=777" • Upload files curl -iku guest:guest-password -L -T samples/hadoop-examples.jar -X PUT https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib/hadoop- examples.jar?op=CREATE curl -iku guest:guest-password -X PUT -L -T README -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input/README?op=CREATE" • Run MR job curl -iku guest:guest-password -X POST -d arg=/user/guest/test/input -d arg=/user/guest/test/output -d jar=/user/guest/test/lib/hadoop-examples.jar -d class=org.apache.hadoop.examples.WordCount https://localhost:8443/gateway/sandbox/templeton/v1/mapreduce/jar • Query the jobs for a user curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue • Query the status of a given job curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue/<job_id> • Read the output file curl -iku guest:guest-password -L -X GET https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/output/part-r-00000?op=OPEN • Remove a directory curl -iku guest:guest-password -X DELETE "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=DELETE&recursive=true"
    24. 24. © Hortonworks Inc. 2014 How to get Involved Resource Location Security Labs http://hortonworks.com/labs/security/ Security Blogs http://hortonworks.com/blog/category/innovation/security/ Apache Knox Tutorial http://hortonworks.com/hadoop-tutorial/securing-hadoop- infrastructure-apache-knox/ Need help? http://hortonworks.com/community/forums/forum/security/ or vshukla@hortonworks.com
    25. 25. © Hortonworks Inc. 2014 Thank you! Amsterdam - April3rd, 2014 Vinay Shukla Twitter: @NeoMythos
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×