SlideShare a Scribd company logo
1 of 51
Download to read offline
Hadoop & Security
Past, Present, Future
uweseiler
Page2
About me
Big Data Nerd
TravelpiratePhotography Enthusiast
Hadoop TrainerData Architect
Page3
Agenda
Past
Present
Authentification
Authorization
Auditing
Data Protection
Future
Page4
Past
Page5
Hadoop & Security 2010
Owen O‘Malley @ Hadoop Summit 2010
http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
Page6
Hadoop & Security 2010
Owen O‘Malley @ Hadoop Summit 2010
http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
Page7
Hadoop & Security (Not that long ago…)
Hadoop Cluster
User
SSH
hadoop fs -put
SSH
Gateway
/user/uwe/
Page8
Present
Page9
Security in Hadoop 2015
Authorization
Restrict access to
explicit data
Audit
Understand who did
what
Data Protection
Encrypt data at rest
& in motion
• Kerberos in
Native Apache
Hadoop
• HTTP/REST
API Secured
with Apache
Knox Gateway
Authentication
Who am I/prove it?
• Wire encryption
in Hadoop
• File Encryption
• Built-in since
Hadoop 2.6
• Partner tools
• HDFS, YARN,
MapReduce,
Hive & HBase
• Storm & Knox
• Fine grain
access control
• Centralized
audit reporting
• Policy and
access history
Centralized Security Administration
Page10
Typical Flow - Hive Access with Beeline CLI
HDFS
HiveServer 2
A B C
Beeline Client
Page11
Typical Flow - Authenticate trough Kerberos
HDFS
HiveServer 2
A B C
Beeline Client
KDC
Use Hive, submit query
Hive gets NameNode
(NN) Service Ticket
Hive creates
MapReduce/Tez
job using NN
Client gets Service
Ticket for Hive
Page12
Typical Flow - Authorization through Ranger
HDFS
HiveServer 2
A B C
Beeline Client
KDC
Use Hive, submit query
Hive gets NameNode
(NN) Service Ticket
Hive creates
MapReduce/Tez
job using NN
Client gets Service
Ticket for Hive
Ranger
Page13
Typical Flow - Perimeter through Knox
HDFS
HiveServer 2
A B C
Beeline Client
KDC
Hive gets NameNode
(NN) Service Ticket
Knox gets Service
Ticket for Hive
Ranger
Client gets
query result
Original request
with user
id/password
Knox runs
as proxy
user using
Hive
Hive creates
MapReduce/Tez
job using NN
Page14
Typical Flow - Wire & File Encryption
HDFS
HiveServer 2
A B C
Beeline Client
KDC
Hive gets NameNode
(NN) Service Ticket
Hive creates
MapReduce/Tez
job using NN
Knox gets Service
Ticket for Hive
Ranger
Knox runs
as proxy
user using
Hive
Original request
with user
id/password
Client gets
query result
SSL SSL SASL
SSL SSL
Page15
Authentication
Kerberos
Page16
Kerberos Synopsis
• Client never sends a password
• Sends a username + token instead
• Authentication is centralized
• Key Distribution Center (KDC)
• Client will receive a Ticket-
Granting-Ticket
• Allows authenticated client to
request access to secured services
• Clients establish a timed
session
• Clients establish trust with
services by sending KDC-
stamped tickets to service
Page17
Kerberos + Active Directory/LDAP
Cross Realm Trust
Client
Hadoop Cluster
AD /
LDAP KDC
Hosts: host1@HADOOP.EXAMPLE.COM
Services: hdfs/host1@HADOOP.EXAMPLE.COM
User Store
Use existing
directory tools to
manage users
Use Kerberos tools
to manage host +
service principals
Authentication
Users: seiler@EXAMPLE.COM
Page18
Ambari & Kerberos
• Install & Configure Kerberos
Server on a single node
Client on rest of the nodes
• Define Principals & Keytabs
A keytab (key table) is a file containing a key for a principal
Since there are a few dozen principals, Ambari can generate keytab data for your entire cluster
as a downloadable csv file
• Configure User Permissions
Page19
Perimeter Security
Apache Knox
Page20
Load Balancer
Knox: Core Concept
Data Ingest
ETL
SSH
RPC Call
Falcon
Oozie
Scoop
Flume
Admin /
Data
Operator
Business
User
Hadoop
Admin
JDBC/ODBCREST/HTTP
Hadoop Cluster
HDFS Hive App XApp CApplication Layer
REST/HTTP
Edge
Node
Page21
Knox: Hadoop REST API
Service Direct URL Knox URL
WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie
HBase http://hbasehost:60080 https://knox-host:8443/hbase
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive
YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager
Masters could
be on many
different hosts
One host, one
port
Consistent
paths
SSL config at
one host
Page22
Knox: Features
Simplified Access
• Kerberos Encapsulation
• Single Access Point
• Multi-cluster support
• Single SSL certificate
Centralized Control
• Central REST API auditing
• Service-level authorization
• Alternative to SSH “edge node”
Enterprise Integration
• LDAP / AD integration
• SSO integration
• Apache Shiro extensibility
• Custom extensibility
Enhanced Security
• Protect network details
• SSL for non-SSL services
• WebApp vulnerability filter
Page23
Knox: Architecture
REST
Client
Enterprise
Identity
Provider
Knox
Firewall
Firewall
DMZ
L
B
Edge Node /
Hadoop CLIs
RPC
HTTP
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
HBase
Knox
Knox
Masters
Slaves
Hadoop Cluster 1
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
HBaseMasters
Slaves
Hadoop Cluster 2
Page24
Knox: What’s New in Version 0.6
• Knox support for HDFS HA
• Support for YARN REST API
• Support for SSL to Hadoop Cluster Services (WebHDFS,
HBase, Hive & Oozie)
• Knox Management REST API
• Integration with Ranger for Knox Service Level
Authorization
• Use Ambari for install/start/stop/configuration
Page3
Agenda
Past
Present
Authentification
Authorization
Auditing
Data Protection
Future
Page26
The Hadoop Layers
Page27
Authorization: Overview
• HDFS
• Permissions
• ACL‘s
• YARN
• Queue ACL‘s
• Pig
• No server component to
check/enforce ACL‘s
• Hive
• Column level ACL‘s
• HBase
• Cell level ACL‘s
Page28
Authorization: HDFS Permissions
hadoop fs -chown maya:sales /sales-data
hadoop fs -chmod 640 /sales-data
Page29
Authorization: HDFS ACL‘s
New Requirements:
– Maya, Diana and Clark are allowed to make modifications
– New group execs should be able to read the sales data
Page30
Authorization: HDFS ACL‘s
hdfs dfs -setfacl -m group:execs:r-- /sales-data
hdfs dfs -getfacl /sales-data
hadoop fs -ls /sales-data
Page31
Authorization: HDFS Best Practices
•Start with traditional HDFS file permissions to implement
most permission requirements
• Define a small number of ACL‘s to handle exceptional
cases
•A file/folder with ACL incurs an additional cost in memory
in the NameNode compared to a file/folder with traditional
permissions
Page4
Past
Page33
Authorization: Hive
• Hive has traditionally offered full table access control via
HDFS access control
• Solution for column-based control
– Let HiveServer2 check and submit the query execution
– Let the table accessible only by a special (technical) user
– Provide an authorization plugin to restrict UDF‘s and file formats
• Use standard SQL permission constructs
– GRANT / REVOKE
• Store the ACL‘s in Hive Metastore
Page34
Authorization: Hive ATZ-NG
Details: https://issues.apache.org/jira/browse/HIVE-5837
Page35
Authorization: Hive
CREATE ROLE sales_role;
GRANT ALL ON DATABASE ‘sales-data’ TO ROLE ‘sales_role’;
GRANT SELECT ON DATABASE ‘marketing-data’ TO ROLE
‘sales_role’;
CREATE ROLE sales_column_role;
GRANT ‘c1,c2,c3’ to ‘sales_column_role’;
GRANT ‘SELECT(c1, c2, c3) ’ on ‘secret_table’ to
‘sales_column_role’;
Page36
Authorization: Pig
• There is no Pig (or MapReduce) Server to submit and
check column-based access
• Pig (and MapReduce) is restricted to full data access via
HDFS access control
Page37
Authorization: HBase
• The HBase permission model traditionally supported ACL‘s
defined at the namespace, table , column family and
column level
– This is sufficient to meet most requirements
• Cell-based security was introduced with HBase 0.98
– On par with the security model of Accumolo
Page38
Authorization & Auditing
Apache Ranger
Page5
Hadoop & Security 2010
Owen O‘Malley @ Hadoop Summit 2010
http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
Page40
Ranger: Authorization Policies
Page41
Ranger: Auditing
Page42
Ranger: Architecture
Page43
Ranger: What’s New in Version 0.4?
• New Components Coverage
• Storm Authorization & Auditing
• Knox Authorization & Auditing
• Deeper Integration with HDP
• Windows Support
• Integration with Hive Auth API, support grant/revoke commands
• Support grant/revoke commands in HBase
• Enterprise Readiness
• Rest APIs for policy manager
• Store Audit logs locally in HDFS
• Support Oracle DB
• Ambari support, as part of Ambari 2.0 release
Page44
Data Protection
Encryption
Page45
Encryption: Data in motion
• Hadoop Client to DataNode via Data Transfer Protocol
– Client reads/writes to HDFS over encrypted channel
– Configurable encryption strength
• ODBC/JDBC Client to HiveServer2
– Encryption via SASL Quality of Protection
• Mapper to Reducer during Shuffle/Sort Phase
– Shuffle is over HTTP(S)
– Supports mutual authentification via SSL
– Host name verification enabled
• REST Protocols
– SSL Support
Page46
Encryption: Data at rest
HDFS Transparent Data Encryption
• Install and run KMS on top of HDP 2.2
• Change according HDFS parameters (via Ambari)
• Create encryption key
hadoop key create key1 -size 256
hadoop key list –metadata
• Create an encryption zone using the key
hdfs dfs -mkdir /zone1
hdfs crypto -createZone -keyName key1 /zone1
hdfs –listZones
• Details:
– http://hortonworks.com/kb/hdfs-transparent-data-encryption/
Page47
Future
Page48
Apache Atlas: Data Classification
Currently in Incubation
– https://wiki.apache.org/incubator/AtlasProposal
Page49
Apache Atlas: Tag-based Policies
HDFS
HiveServer 2
A B C
Beeline Client
RangerMetadata
Server
Data Classification
Table1|“marketing“
Tag Policy
Logs IT-Admin Create
Data Ingestion / ETL
Falcon
Oozie
Source Data
Scoop
Flume
Page50
Future: More goodies
Dynamic, Attribute based Access Control (ABAC)
• Extend Ranger to support data or user attributes in policy decisions
• Example: Use geo-location of users
Enhanced Auditing
• Ranger can stream audit data through Kafka&Storm into multiple stores
• Use Storm for correlation of data
Encryption as First Class Citizen
• Build native encryption support in HDFS, Hive & HBase
• Ranger-based key management to support encryption
Page51
Contact Details
Twitter:
@uweseiler
uwe.seiler@codecentric.de
Mail:
uwe.seiler@codecentric.de
Phone
+49 176 1076531
XING:
https://www.xing.com/profile/Uwe_Seiler

More Related Content

What's hot

2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyDataWorks Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyAnurag Shrivastava
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016StampedeCon
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?DataWorks Summit
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 

What's hot (20)

2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 

Viewers also liked

Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXAbhishek Mallick
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Peter Wood
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data miningharithavijay94
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authenticationleahculver
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1Amal Abid
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystèmeKhanh Maudoux
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Viewers also liked (18)

Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Hadoop
HadoopHadoop
Hadoop
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystème
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
What is big data?
What is big data?What is big data?
What is big data?
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Similar to Hadoop & Security - Past, Present, Future

Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextHellmar Becker
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
August 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopAugust 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopYahoo Developer Network
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization modelsThejas Nair
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAmazon Web Services
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 

Similar to Hadoop & Security - Past, Present, Future (20)

Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
August 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopAugust 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for Hadoop
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization models
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 

More from Uwe Printz

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Lightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesLightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesUwe Printz
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Uwe Printz
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceUwe Printz
 
MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)Uwe Printz
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
MongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererMongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererUwe Printz
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBUwe Printz
 

More from Uwe Printz (20)

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Lightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesLightning Talk: Agility & Databases
Lightning Talk: Agility & Databases
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
 
MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
MongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererMongoDB für Java-Programmierer
MongoDB für Java-Programmierer
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
 

Recently uploaded

2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 

Recently uploaded (20)

2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 

Hadoop & Security - Past, Present, Future

  • 1. Hadoop & Security Past, Present, Future uweseiler
  • 2. Page2 About me Big Data Nerd TravelpiratePhotography Enthusiast Hadoop TrainerData Architect
  • 5. Page5 Hadoop & Security 2010 Owen O‘Malley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
  • 6. Page6 Hadoop & Security 2010 Owen O‘Malley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
  • 7. Page7 Hadoop & Security (Not that long ago…) Hadoop Cluster User SSH hadoop fs -put SSH Gateway /user/uwe/
  • 9. Page9 Security in Hadoop 2015 Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion • Kerberos in Native Apache Hadoop • HTTP/REST API Secured with Apache Knox Gateway Authentication Who am I/prove it? • Wire encryption in Hadoop • File Encryption • Built-in since Hadoop 2.6 • Partner tools • HDFS, YARN, MapReduce, Hive & HBase • Storm & Knox • Fine grain access control • Centralized audit reporting • Policy and access history Centralized Security Administration
  • 10. Page10 Typical Flow - Hive Access with Beeline CLI HDFS HiveServer 2 A B C Beeline Client
  • 11. Page11 Typical Flow - Authenticate trough Kerberos HDFS HiveServer 2 A B C Beeline Client KDC Use Hive, submit query Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Client gets Service Ticket for Hive
  • 12. Page12 Typical Flow - Authorization through Ranger HDFS HiveServer 2 A B C Beeline Client KDC Use Hive, submit query Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Client gets Service Ticket for Hive Ranger
  • 13. Page13 Typical Flow - Perimeter through Knox HDFS HiveServer 2 A B C Beeline Client KDC Hive gets NameNode (NN) Service Ticket Knox gets Service Ticket for Hive Ranger Client gets query result Original request with user id/password Knox runs as proxy user using Hive Hive creates MapReduce/Tez job using NN
  • 14. Page14 Typical Flow - Wire & File Encryption HDFS HiveServer 2 A B C Beeline Client KDC Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Knox gets Service Ticket for Hive Ranger Knox runs as proxy user using Hive Original request with user id/password Client gets query result SSL SSL SASL SSL SSL
  • 16. Page16 Kerberos Synopsis • Client never sends a password • Sends a username + token instead • Authentication is centralized • Key Distribution Center (KDC) • Client will receive a Ticket- Granting-Ticket • Allows authenticated client to request access to secured services • Clients establish a timed session • Clients establish trust with services by sending KDC- stamped tickets to service
  • 17. Page17 Kerberos + Active Directory/LDAP Cross Realm Trust Client Hadoop Cluster AD / LDAP KDC Hosts: host1@HADOOP.EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Use existing directory tools to manage users Use Kerberos tools to manage host + service principals Authentication Users: seiler@EXAMPLE.COM
  • 18. Page18 Ambari & Kerberos • Install & Configure Kerberos Server on a single node Client on rest of the nodes • Define Principals & Keytabs A keytab (key table) is a file containing a key for a principal Since there are a few dozen principals, Ambari can generate keytab data for your entire cluster as a downloadable csv file • Configure User Permissions
  • 20. Page20 Load Balancer Knox: Core Concept Data Ingest ETL SSH RPC Call Falcon Oozie Scoop Flume Admin / Data Operator Business User Hadoop Admin JDBC/ODBCREST/HTTP Hadoop Cluster HDFS Hive App XApp CApplication Layer REST/HTTP Edge Node
  • 21. Page21 Knox: Hadoop REST API Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts One host, one port Consistent paths SSL config at one host
  • 22. Page22 Knox: Features Simplified Access • Kerberos Encapsulation • Single Access Point • Multi-cluster support • Single SSL certificate Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP / AD integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • SSL for non-SSL services • WebApp vulnerability filter
  • 23. Page23 Knox: Architecture REST Client Enterprise Identity Provider Knox Firewall Firewall DMZ L B Edge Node / Hadoop CLIs RPC HTTP Slaves RM NN Web HCat Oozie DN NM HS2 HBase Knox Knox Masters Slaves Hadoop Cluster 1 Slaves RM NN Web HCat Oozie DN NM HS2 HBaseMasters Slaves Hadoop Cluster 2
  • 24. Page24 Knox: What’s New in Version 0.6 • Knox support for HDFS HA • Support for YARN REST API • Support for SSL to Hadoop Cluster Services (WebHDFS, HBase, Hive & Oozie) • Knox Management REST API • Integration with Ranger for Knox Service Level Authorization • Use Ambari for install/start/stop/configuration
  • 27. Page27 Authorization: Overview • HDFS • Permissions • ACL‘s • YARN • Queue ACL‘s • Pig • No server component to check/enforce ACL‘s • Hive • Column level ACL‘s • HBase • Cell level ACL‘s
  • 28. Page28 Authorization: HDFS Permissions hadoop fs -chown maya:sales /sales-data hadoop fs -chmod 640 /sales-data
  • 29. Page29 Authorization: HDFS ACL‘s New Requirements: – Maya, Diana and Clark are allowed to make modifications – New group execs should be able to read the sales data
  • 30. Page30 Authorization: HDFS ACL‘s hdfs dfs -setfacl -m group:execs:r-- /sales-data hdfs dfs -getfacl /sales-data hadoop fs -ls /sales-data
  • 31. Page31 Authorization: HDFS Best Practices •Start with traditional HDFS file permissions to implement most permission requirements • Define a small number of ACL‘s to handle exceptional cases •A file/folder with ACL incurs an additional cost in memory in the NameNode compared to a file/folder with traditional permissions
  • 33. Page33 Authorization: Hive • Hive has traditionally offered full table access control via HDFS access control • Solution for column-based control – Let HiveServer2 check and submit the query execution – Let the table accessible only by a special (technical) user – Provide an authorization plugin to restrict UDF‘s and file formats • Use standard SQL permission constructs – GRANT / REVOKE • Store the ACL‘s in Hive Metastore
  • 34. Page34 Authorization: Hive ATZ-NG Details: https://issues.apache.org/jira/browse/HIVE-5837
  • 35. Page35 Authorization: Hive CREATE ROLE sales_role; GRANT ALL ON DATABASE ‘sales-data’ TO ROLE ‘sales_role’; GRANT SELECT ON DATABASE ‘marketing-data’ TO ROLE ‘sales_role’; CREATE ROLE sales_column_role; GRANT ‘c1,c2,c3’ to ‘sales_column_role’; GRANT ‘SELECT(c1, c2, c3) ’ on ‘secret_table’ to ‘sales_column_role’;
  • 36. Page36 Authorization: Pig • There is no Pig (or MapReduce) Server to submit and check column-based access • Pig (and MapReduce) is restricted to full data access via HDFS access control
  • 37. Page37 Authorization: HBase • The HBase permission model traditionally supported ACL‘s defined at the namespace, table , column family and column level – This is sufficient to meet most requirements • Cell-based security was introduced with HBase 0.98 – On par with the security model of Accumolo
  • 39. Page5 Hadoop & Security 2010 Owen O‘Malley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
  • 43. Page43 Ranger: What’s New in Version 0.4? • New Components Coverage • Storm Authorization & Auditing • Knox Authorization & Auditing • Deeper Integration with HDP • Windows Support • Integration with Hive Auth API, support grant/revoke commands • Support grant/revoke commands in HBase • Enterprise Readiness • Rest APIs for policy manager • Store Audit logs locally in HDFS • Support Oracle DB • Ambari support, as part of Ambari 2.0 release
  • 45. Page45 Encryption: Data in motion • Hadoop Client to DataNode via Data Transfer Protocol – Client reads/writes to HDFS over encrypted channel – Configurable encryption strength • ODBC/JDBC Client to HiveServer2 – Encryption via SASL Quality of Protection • Mapper to Reducer during Shuffle/Sort Phase – Shuffle is over HTTP(S) – Supports mutual authentification via SSL – Host name verification enabled • REST Protocols – SSL Support
  • 46. Page46 Encryption: Data at rest HDFS Transparent Data Encryption • Install and run KMS on top of HDP 2.2 • Change according HDFS parameters (via Ambari) • Create encryption key hadoop key create key1 -size 256 hadoop key list –metadata • Create an encryption zone using the key hdfs dfs -mkdir /zone1 hdfs crypto -createZone -keyName key1 /zone1 hdfs –listZones • Details: – http://hortonworks.com/kb/hdfs-transparent-data-encryption/
  • 48. Page48 Apache Atlas: Data Classification Currently in Incubation – https://wiki.apache.org/incubator/AtlasProposal
  • 49. Page49 Apache Atlas: Tag-based Policies HDFS HiveServer 2 A B C Beeline Client RangerMetadata Server Data Classification Table1|“marketing“ Tag Policy Logs IT-Admin Create Data Ingestion / ETL Falcon Oozie Source Data Scoop Flume
  • 50. Page50 Future: More goodies Dynamic, Attribute based Access Control (ABAC) • Extend Ranger to support data or user attributes in policy decisions • Example: Use geo-location of users Enhanced Auditing • Ranger can stream audit data through Kafka&Storm into multiple stores • Use Storm for correlation of data Encryption as First Class Citizen • Build native encryption support in HDFS, Hive & HBase • Ranger-based key management to support encryption