SlideShare a Scribd company logo
End-to-End Security and Auditing in a
Big-Data-as-a-Service (BDaaS) Deployment
Abhiraj Butala – BlueData
Nanda Vijaydev - BlueData
“A mechanism for the delivery of statistical analysis tools and
information that helps organizations understand and use insights
gained from large information sets in order to gain a competitive
advantage.”
On-Demand, Self-Service, Elastic
Big Data Infrastructure, Applications,
Analytics
Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
Big-Data-as-a-Service (BDaaS)
Multi-Tenant Big-Data-as-a-Service
Data/Storage
Prod
2.2
Dev/Test
2.4
POC
2.3
Prod
2.3
Dev/Test
2.4
MARKETING R&D MANUFACTURING
360 Customer View Log Analysis Predictive Maintenance
Data LakeStaging
Multiple
compute
services
(Hadoop, BI,
Spark)
There is a
shared Data
Lake (Shared
HDFS)
Why BDaaS? – Compute Side Of The Story
• Set of applications that interact with
Hadoop keeps growing
• Various versions of the same app/distro
run in parallel
• Enterprises have need to scale compute
up and down based on usage
• A model similar to Amazon AWS with S3
as storage and applications on EC2
Why BDaaS? – Data Side Of The Story
• Production cluster access takes time and
is generally restricted
• Staging clusters may not have all the data
• Data exists on other storage systems such
as NFS Isilon is common
• Users also want to upload arbitrary files
for analysis
Hadoop – A Collection Of Services
Hadoop is a collection of storage and compute services such as HDFS, HBase,
Hive, Yarn, Solr, Kafka
Security In Hadoop
• Authenticate user into Hadoop ecosystem
– Each service has its own integration with LDAP/AD for
authentication
• Authorize and limit their actions to selected services.
Authorization is granted separately for each service.
Example:
– Folder “/user/customer” in HDFS has ‘r-x’ to user ‘alice’, and ‘-
wx’ to user ‘bob’
– Enable column level access to a Hive Table. “Customer.Name”
& “Customer.PhoneNumber” is only accessible by some users
and groups
Ranger – A Pluggable Security Framework
• Ranger works with a common user DB (LDAP/AD) for authentication
• Provides a plug-in for individual Hadoop services to enable
authorization
• Allows users to define policies in a central location, using WEB UI or
APIs
• Users can define their own plug-in for a custom service and manage
them centrally via Ranger Admin
Defining HDFS Ranger Policies
HDFS Policy List
Marketing Policy Drill Down
Security Considerations in BDaaS
Data/Storage
Prod
2.2
Dev/Test
2.4
POC
2.3
Prod
2.3
Dev/Test
2.4
MARKETING R&D MANUFACTURING
360 Customer View Log Analysis Predictive Maintenance
Data LakeStaging
1. User Identity – Data Lake
2. User Identity - Application Level
3. User Identity propagation to Data Layer
1. User identity
within a Data
Lake
2. User identity
in application
layer
3. Prevent data
duplication &
maintain user
integrity
across layers
1. Securing The Data Lake
LDAPKDC
Data/Storage
Prod
2.2
Dev/Test
2.4
POC
2.3
Prod
2.3
Dev/Test
2.4
MARKETING R&D MANUFACTURING
360 Customer View Log Analysis Predictive Maintenance
Data LakeStaging
1. Authentication & Authorization – Data Lake
2. User Identity - Application Level
3. User Identity propagation to Data Layer
2. Securing The App Layer
LDAP
KDC
Data/Storage
Prod
2.2
Dev/Test
2.4
POC
2.3
Prod
2.3
Dev/Test
2.4
MARKETING R&D MANUFACTURING
360 Customer View Log Analysis Predictive Maintenance
Data LakeStaging
1. Authentication & Authorization – Data Lake
2. User Identity - Application Level
3. User Identity propagation to Data Layer
App containers are integrated with LDAP
KDC
AliceBob Tom
3. Identity Propagation to Data Layer
LDAP
KDC
Data/Storage
Prod
2.2
Dev/Test
2.4
POC
2.3
Prod
2.3
Dev/Test
2.4
MARKETING R&D MANUFACTURING
360 Customer View Log Analysis Predictive Maintenance
Data LakeStaging
1. Authentication & Authorization – Data Lake
2. User Identity - Application Level
3. User Identity propagation to Data Layer
KDC
AliceBob Tom
User Identity Propagation
Two Ways
–Users connect directly to HDFS
• Simple Authentication
• Kerberos Authentication
–Users connect to HDFS via a Super-user
(Impersonation)
HDFS Direct Connections
LDAP
KDC
Prod
2.2
Dev/Test
2.4
POC
2.3
Prod
2.3
Dev/Test
2.4
MARKETING R&D MANUFACTURING
360 Customer View Log Analysis Predictive Maintenance
KDC
Alice BobTom
HDFS
Data Lake
HDFS Direct Connections..
– hdfs-audit.log
– Ranger policies are enforced for alice and bob as they are
the effective users
HDFS Direct Connections..
• Single Hadoop Setup
– Ideal
• Multi-tenant, Multi-application Setup
– Kerberized HDFS needs kerberized compute and services
– May not want to kerberize Dev/QA setups
– Hadoop versions should be compatible all across
– Data duplication
HDFS Super-user Connections
• Super-users perform actions on behalf of other users
(Impersonation/Proxying)
• Adding a new super-user is easy
– core-site.xml
HDFS Super-user Connections..
LDAP
KDC
Prod
2.2
Dev/Test
2.4
POC
2.3
Prod
2.3
Dev/Test
2.4
MARKETING R&D MANUFACTURING
360 Customer View Log Analysis Predictive Maintenance
KDC
Alice BobTom
HDFS
Data Lake
DataTap Caching Service
via – super-user
HDFS Super-user Connections..
– hdfs-audit.log
– Ranger Authorization policies still enforced, as alice and bob
are effective users
HDFS Super-user Connections..
Multi-tenant, Multi-application Setup
– Works for applications which don’t support Kerberos (yet)
– Dev/Test setups need not be kerberized
– DataTap service can abstract version incompatibilities
– Can help avoid data duplication
– Need tight LDAP/AD integration though!
Ranger in Action
Hue Example
HDFS Permissions on Data Lake
• Set HDFS file
access for
‘/user/secret’ to
strict mode
• Set umask to ‘077’
HDFS Ranger Policies
DataTap Caching Service
Create Table via Hue
Query table via Hue - Success
Query table via Hue - Failure
Ranger Audit Logs
Key Takeaways
• BDaaS is more than Hadoop-as-a-Service
– Includes BI / ETL / Analytics + Data Science tools
• Security is an important consideration in BDaaS
• Data duplication is not an option
• Global user authentication using a centralized DB like
LDAP/AD is a must
• Apache Ranger helps in enforcing global policies,
provided user identities are propagated correctly
Q & A
www.bluedata.com
Nanda Vijaydev
@nandavijaydev
Abhiraj Butala
@abhirajbutala

More Related Content

What's hot

Hadoop security
Hadoop securityHadoop security
Hadoop security
Shivaji Dutta
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
Jason Shih
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Caserta
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
Anurag Shrivastava
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
Timothy Spann
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
Cloudera, Inc.
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
Tushar Dudhatra
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
Cloudera, Inc.
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
DataWorks Summit
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
DataWorks Summit/Hadoop Summit
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
Hortonworks
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
Suresh Mandava
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right Way
DataWorks Summit
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
Cloudera, Inc.
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 

What's hot (20)

Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right Way
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 

Viewers also liked

Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
Cloudera, Inc.
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
Madhan Neethiraj
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
Alex Zeltov
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
Nicolas Morales
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
Amal G Jose
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
Kai Zheng
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
Maxime Lanciaux
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
Cloudera, Inc.
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
Gergely Devenyi
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 

Viewers also liked (10)

Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 

Similar to Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Summit 2016

Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
Wilfried Hoge
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
Wilfried Hoge
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
Wilfried Hoge
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
Hellmar Becker
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
Impetus Technologies
 
Informatica big data relational topics and presentation
Informatica big data relational topics and presentationInformatica big data relational topics and presentation
Informatica big data relational topics and presentation
Janardhan Reddy
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
hadoop exp
hadoop exphadoop exp
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
Alessandro Salvatico
 
zData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc.
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
Francisco González Jiménez
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
Embedded BI Advanced Data Visualization and Analysis into Any Application
Embedded BI Advanced Data Visualization and Analysis into Any ApplicationEmbedded BI Advanced Data Visualization and Analysis into Any Application
Embedded BI Advanced Data Visualization and Analysis into Any Application
JReport
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Rizaldy Ignacio
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
Gustav Lundström
 
Self Service BI for Enterprise and SMB Applications
Self Service BI for Enterprise and SMB ApplicationsSelf Service BI for Enterprise and SMB Applications
Self Service BI for Enterprise and SMB Applications
JReport
 

Similar to Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Summit 2016 (20)

Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Informatica big data relational topics and presentation
Informatica big data relational topics and presentationInformatica big data relational topics and presentation
Informatica big data relational topics and presentation
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
hadoop exp
hadoop exphadoop exp
hadoop exp
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
 
zData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and Summary
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Embedded BI Advanced Data Visualization and Analysis into Any Application
Embedded BI Advanced Data Visualization and Analysis into Any ApplicationEmbedded BI Advanced Data Visualization and Analysis into Any Application
Embedded BI Advanced Data Visualization and Analysis into Any Application
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Self Service BI for Enterprise and SMB Applications
Self Service BI for Enterprise and SMB ApplicationsSelf Service BI for Enterprise and SMB Applications
Self Service BI for Enterprise and SMB Applications
 

Recently uploaded

🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
bhupeshkumar0889
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
gargnatasha985
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
kittycrispy617
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
tanupasswan6
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
kinni singh$A17
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
kuldeepsharmaks8120
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
6459astrid
 
Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
LINAT
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
norina2645
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
kuldeepsharmaks8120
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
revolutionary575
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
MinThetLwin1
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
Joel Ngushwai
 

Recently uploaded (20)

🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
 

Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Summit 2016

  • 1. End-to-End Security and Auditing in a Big-Data-as-a-Service (BDaaS) Deployment Abhiraj Butala – BlueData Nanda Vijaydev - BlueData
  • 2. “A mechanism for the delivery of statistical analysis tools and information that helps organizations understand and use insights gained from large information sets in order to gain a competitive advantage.” On-Demand, Self-Service, Elastic Big Data Infrastructure, Applications, Analytics Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification Big-Data-as-a-Service (BDaaS)
  • 3. Multi-Tenant Big-Data-as-a-Service Data/Storage Prod 2.2 Dev/Test 2.4 POC 2.3 Prod 2.3 Dev/Test 2.4 MARKETING R&D MANUFACTURING 360 Customer View Log Analysis Predictive Maintenance Data LakeStaging Multiple compute services (Hadoop, BI, Spark) There is a shared Data Lake (Shared HDFS)
  • 4. Why BDaaS? – Compute Side Of The Story • Set of applications that interact with Hadoop keeps growing • Various versions of the same app/distro run in parallel • Enterprises have need to scale compute up and down based on usage • A model similar to Amazon AWS with S3 as storage and applications on EC2
  • 5. Why BDaaS? – Data Side Of The Story • Production cluster access takes time and is generally restricted • Staging clusters may not have all the data • Data exists on other storage systems such as NFS Isilon is common • Users also want to upload arbitrary files for analysis
  • 6. Hadoop – A Collection Of Services Hadoop is a collection of storage and compute services such as HDFS, HBase, Hive, Yarn, Solr, Kafka
  • 7. Security In Hadoop • Authenticate user into Hadoop ecosystem – Each service has its own integration with LDAP/AD for authentication • Authorize and limit their actions to selected services. Authorization is granted separately for each service. Example: – Folder “/user/customer” in HDFS has ‘r-x’ to user ‘alice’, and ‘- wx’ to user ‘bob’ – Enable column level access to a Hive Table. “Customer.Name” & “Customer.PhoneNumber” is only accessible by some users and groups
  • 8. Ranger – A Pluggable Security Framework • Ranger works with a common user DB (LDAP/AD) for authentication • Provides a plug-in for individual Hadoop services to enable authorization • Allows users to define policies in a central location, using WEB UI or APIs • Users can define their own plug-in for a custom service and manage them centrally via Ranger Admin
  • 9. Defining HDFS Ranger Policies HDFS Policy List Marketing Policy Drill Down
  • 10. Security Considerations in BDaaS Data/Storage Prod 2.2 Dev/Test 2.4 POC 2.3 Prod 2.3 Dev/Test 2.4 MARKETING R&D MANUFACTURING 360 Customer View Log Analysis Predictive Maintenance Data LakeStaging 1. User Identity – Data Lake 2. User Identity - Application Level 3. User Identity propagation to Data Layer 1. User identity within a Data Lake 2. User identity in application layer 3. Prevent data duplication & maintain user integrity across layers
  • 11. 1. Securing The Data Lake LDAPKDC Data/Storage Prod 2.2 Dev/Test 2.4 POC 2.3 Prod 2.3 Dev/Test 2.4 MARKETING R&D MANUFACTURING 360 Customer View Log Analysis Predictive Maintenance Data LakeStaging 1. Authentication & Authorization – Data Lake 2. User Identity - Application Level 3. User Identity propagation to Data Layer
  • 12. 2. Securing The App Layer LDAP KDC Data/Storage Prod 2.2 Dev/Test 2.4 POC 2.3 Prod 2.3 Dev/Test 2.4 MARKETING R&D MANUFACTURING 360 Customer View Log Analysis Predictive Maintenance Data LakeStaging 1. Authentication & Authorization – Data Lake 2. User Identity - Application Level 3. User Identity propagation to Data Layer App containers are integrated with LDAP KDC AliceBob Tom
  • 13. 3. Identity Propagation to Data Layer LDAP KDC Data/Storage Prod 2.2 Dev/Test 2.4 POC 2.3 Prod 2.3 Dev/Test 2.4 MARKETING R&D MANUFACTURING 360 Customer View Log Analysis Predictive Maintenance Data LakeStaging 1. Authentication & Authorization – Data Lake 2. User Identity - Application Level 3. User Identity propagation to Data Layer KDC AliceBob Tom
  • 14. User Identity Propagation Two Ways –Users connect directly to HDFS • Simple Authentication • Kerberos Authentication –Users connect to HDFS via a Super-user (Impersonation)
  • 15. HDFS Direct Connections LDAP KDC Prod 2.2 Dev/Test 2.4 POC 2.3 Prod 2.3 Dev/Test 2.4 MARKETING R&D MANUFACTURING 360 Customer View Log Analysis Predictive Maintenance KDC Alice BobTom HDFS Data Lake
  • 16. HDFS Direct Connections.. – hdfs-audit.log – Ranger policies are enforced for alice and bob as they are the effective users
  • 17. HDFS Direct Connections.. • Single Hadoop Setup – Ideal • Multi-tenant, Multi-application Setup – Kerberized HDFS needs kerberized compute and services – May not want to kerberize Dev/QA setups – Hadoop versions should be compatible all across – Data duplication
  • 18. HDFS Super-user Connections • Super-users perform actions on behalf of other users (Impersonation/Proxying) • Adding a new super-user is easy – core-site.xml
  • 19. HDFS Super-user Connections.. LDAP KDC Prod 2.2 Dev/Test 2.4 POC 2.3 Prod 2.3 Dev/Test 2.4 MARKETING R&D MANUFACTURING 360 Customer View Log Analysis Predictive Maintenance KDC Alice BobTom HDFS Data Lake DataTap Caching Service via – super-user
  • 20. HDFS Super-user Connections.. – hdfs-audit.log – Ranger Authorization policies still enforced, as alice and bob are effective users
  • 21. HDFS Super-user Connections.. Multi-tenant, Multi-application Setup – Works for applications which don’t support Kerberos (yet) – Dev/Test setups need not be kerberized – DataTap service can abstract version incompatibilities – Can help avoid data duplication – Need tight LDAP/AD integration though!
  • 23. HDFS Permissions on Data Lake • Set HDFS file access for ‘/user/secret’ to strict mode • Set umask to ‘077’
  • 27. Query table via Hue - Success
  • 28. Query table via Hue - Failure
  • 30. Key Takeaways • BDaaS is more than Hadoop-as-a-Service – Includes BI / ETL / Analytics + Data Science tools • Security is an important consideration in BDaaS • Data duplication is not an option • Global user authentication using a centralized DB like LDAP/AD is a must • Apache Ranger helps in enforcing global policies, provided user identities are propagated correctly
  • 31. Q & A www.bluedata.com Nanda Vijaydev @nandavijaydev Abhiraj Butala @abhirajbutala

Editor's Notes

  1. Tom There are many definitions of BDaaS. Some say it is the combo of s/w & data- that can be hard to grasp. We say it is functionality stack:
  2. This is how the audit logs for direct connections will look like. Bob and alice will have entry as highlighted above. Ranger Authorization policies are enforced.
  3. Finally, to summarize the use of direct HDFS connections. Works best in a Single Hadoop Setup. Single Hadoop distro, kerberos everywhere, tight coupling. May not want to kerberize Dev/QA setups. May not be practical.
  4. Standard feature supported by Hadoop eco-system components to access HDFS data A super user performs operations on behalf of other users. Also known as impersonation. Typical configuration.
  5. This is how the audit logs for connections via super-users will look like. Bob and alice will have entries as highlighted above. Please note that, Ranger policies are still enforced for Bob and Alice, as they are the effective users!
  6. Finally, lets see what are the pros and cons of using supers-users.
  7. Finally, lets demonstrate all this by taking an example of Hue. Here, Hue is running in one of the compute nodes in a multi-tenant environment. It is trying to access data from HDFS, for which Ranger policies are enforced. Also, note that, Hue is LDAP integrated.
  8. Here, HDFS path /user/secret has restricted access Also, HDFS umask is set to 077, so it only allows the owner to access the data.
  9. This is how Ranger policies are defined for HDFS. We are defining who can access /user/secret path. Describe users nanda, abhiraj
  10. In our product, the HDFS caching service (DataTap), also supports impersonation. We won’t go into its details for the purpose of this talk. Typically, it is used to load remote HDFS backends as DataTaps, as shown in this picture.
  11. Using Hive Editor in Hue, we create a table using the path provided. Explain dtap:// path. User here is nanda, who was read/write permissions. This will succeed as Ranger policies will allow it.
  12. Now, the same user nanda queries the table and it succeeds. Note that, even though the permissions are 000, Ranger allows access to nanda. So it goes through.
  13. Next, the same operation is performed by user abhiraj. Here, it fails, because Ranger does not allow abhiraj to read. Thus, ranger policies are enforced.
  14. Finally, this is how the audit logs would look like. As you can see, nanda is allowed read access. Abhiraj is denied access. So, this shows that even though we use impersonation from remote clusters, the policies are still enforced. This is because, effective users are still ‘nanda’ and ‘abhiraj’.