SlideShare a Scribd company logo
1 of 24
Copyright 2016-2018 Northrop Grumman Systems Corp.
How to Achieve a
Self-Service and Secure
Multitenant Data Lake in a
Large Company
June 2018
Leon Li, PhD
Platform Architect
Copyright 2016-2018 Northrop Grumman Systems Corp.
About Northrop Grumman
Leading global security company
Approximately 70,000 employees in all 50 states and 25+ countries
Technology Heritage
• 1927 - Spirit of St. Louis, which Charles Lindbergh flew across the Atlantic
• 1946 - First flight of the XB-35 flying wing
• 1953 - Contract to oversee the U.S. Air Force ICBM program
• 1958 - Pioneer 1 becomes the first spacecraft built by an industrial contractor
• 1969 - Apollo Lunar Module carries man to the surface of the moon
• 1983 - Pioneer 10, becomes the first manmade object to leave the solar system
• 1989 - First flight of the B-2 stealth bomber, a descendent of XB-35 flying wing design
• 1998 - First flight of the RQ-4A Global Hawk, a high-altitude, long-endurance unmanned aerial
reconnaissance system
• NASA’s James Webb Space Telescope: unprecedented resolution and sensitivity space telescope to
observe most distant events and objects in the universe
2
Copyright 2016-2018 Northrop Grumman Systems Corp.
Northrop Grumman’s Enterprise Data Analytics Platform
• Platform provides analytics capabilities quickly and simply,
allowing users to focus on their needs instead of
infrastructure
• Provides capabilities from basic data handling and
reporting through big data and state-of-the-art machine
learning
• Single, large shared Hadoop cluster with multitenant
security as Data Lake
6/20/2018
3
Copyright 2016-2018 Northrop Grumman Systems Corp.
Northrop Grumman’s participation in this user conference and
mention of various products is not an endorsement of any product.
This presentation is protected by copyright protections and may not
be commercially used. The only permitted uses of this presentation
are for personal, non-commercial uses of the user community.
Northrop Grumman Does Not Endorse Any of the
Products Mentioned in this Presentation
Copyright 2016-2018 Northrop Grumman Systems Corp.
Architecture Decisions and Tradeoffs
1. Authentication and Accounts Management
2. Authorization and Access Control
3. Interfacing Analytics Applications with a Data Lake
5
Copyright 2016-2018 Northrop Grumman Systems Corp.
Authentication and Accounts Management
6
• Hadoop requires a large number of Kerberos accounts
– Each user uses a Kerberos account
– Each Hadoop service uses one Kerberos account
per machine
• Secure creation of Hadoop service accounts and
distribution of keytabs is crucial for security and
operations
Hadoop
• Enterprise IAM systems
o Provisioning and deprovisioning of accounts
o Management of user accounts
o Compliance with enterprise security policies
• Uses Kerberos as underlying technology
• Enterprises have existing IAM systems in place
• New systems for enterprise use need to meet existing
IAM policies for security, compliance, and governance
Enterprise Identity and Access Management (IAM)
User
Key Distribution Center
(IAM)
AuthenticatesAuthenticates
Hadoop
Copyright 2016-2018 Northrop Grumman Systems Corp.
Approach 1 – Completely isolated accounts management
7
Enterprise
Systems
Hadoop IAMEnterprise IAM
Data Operations
User &
Hadoop Service
Accounts
Benefits:
• No dependency on enterprise accounts management teams
• Faster standup of platform
• Data Lake team has full control over all accounts
• Your Hadoop management tools can easily generate and
distribute Hadoop service accounts
Disadvantages:
• Users have to remember separates usernames and passwords
• Must manage user accounts separately (e.g. provisioning,
deprovisioning, passwords, account lockouts, etc.)
• Against security best practices to establish separate user accounts for
separate systems
Hadoop
Cluster
Copyright 2016-2018 Northrop Grumman Systems Corp.
Approach 2 – Store all accounts directly into
existing Enterprise IAM
8
Enterprise
Systems
Enterprise IAM
Data Operations
Hadoop Service
Accounts
Benefits:
• Unified account management within enterprise IT organization
• Improved user experience with SSO
• Improved central auditing and compliance
• No need to manage a separate Hadoop IAM
– Reduce potential points of system failure
Disadvantages:
• Your Hadoop management tools may not have administrative
access to Enterprise IAM systems or generate Hadoop service
accounts in compliance with Enterprise IAM team rules
• Increase dependency on Enterprise IAM administrator team
• Recommendation: Consider OU delegation to isolate service
accounts generation and management for Hadoop
• Greater performance load on the Enterprise Directory Service as
Hadoop grows
Hadoop
Cluster
Copyright 2016-2018 Northrop Grumman Systems Corp.
Approach 3 – Hybrid Approach: Separate service
accounts from user accounts using a domain trust
9
Enterprise
Systems
Hadoop IAMEnterprise IAM
Data Operations
Hadoop Service
Accounts
Kerberos
Realm
Trust
Benefits:
• Unified user account management within enterprise IT
organization
• Improved user experience with SSO
• Improved central auditing and compliance
• Easier administration and maintenance as your Hadoop cluster
management tools maintains control of Hadoop internal service
accounts
Disadvantages:
• Kerberos trust setup can be complex and requires special skills
• If Enterprise IAM and Hadoop IAM is mixed operating system
environments, then incompatibilities can occur
• Applications deployed to the Hadoop IAM realm may run into
“Kerberos double-hop” delegation issues when authenticating users
to Hadoop services
Hadoop
Cluster
Copyright 2016-2018 Northrop Grumman Systems Corp.
Summary Approaches to Hadoop – Enterprise IAM Integration
10
1) Completely isolated accounts management 2) Store all accounts directly into existing Enterprise IAM
3) Hybrid Approach: Separate service accounts
from user accounts using a domain trust
Copyright 2016-2018 Northrop Grumman Systems Corp.
Architecture Decisions and Tradeoffs
1. Authentication and Accounts Management
2. Authorization and Access Control
3. Interfacing Analytics Applications with a Data Lake
11
Copyright 2016-2018 Northrop Grumman Systems Corp.
Hadoop Authorization Plugin Systems Architecture
12
Examples: Ranger, Sentry
Administration Portal Administration API
HDFS Hive HBase YarnNiFiStorm KafkaKnox Solr ImpalaAtlas
• Hadoop authorization systems adds plugins to many Hadoop components to control authorizations
• An Administration Portal and Administration API provides the ability to control authorizations
* Not every authorization plugin systems support plugins for every Hadoop component
Plugin Plugin Plugin Plugin Plugin Plugin Plugin Plugin Plugin Plugin Plugin
Copyright 2016-2018 Northrop Grumman Systems Corp.
Authorization Plug-ins for Hive and Spark
16
Copyright 2016-2018 Northrop Grumman Systems Corp.
Managing Permissions for Hive and Spark
14
• Only use HDFS permissions for both Spark and Hive
• Benefits
o Simple security model, easier to implement
o Consistent user access for Hive and Spark
• Disadvantages
o No fine grained controls like column based
security in Hive
HDFS permissions for Hive and Spark
• Use Hive column based security for Hive, and HDFS
security for Spark
• Benefits
o Fine grained controls like column based security
in Hive
• Disadvantages
o Spark access is granted separately
o More administrative complexity
o Must address discrepancy in access control
between Hive and Spark (e.g. how to make
sure fine grain control is enforce for Spark
users)
HDFS permissions for Spark only
*Note: LLAP may improve this situation when it becomes fully enterprise ready
Copyright 2016-2018 Northrop Grumman Systems Corp.
Architecture Decisions and Tradeoffs
1. Authentication and Accounts Management
2. Authorization and Access Control
3. Interfacing Analytics Applications with a Data Lake
15
Copyright 2016-2018 Northrop Grumman Systems Corp.
Hadoop Analytics Tools
Graphical Analytics Tools
…More accessible to general users
Examples
• Data Science Notebooks
• Business Intelligence and Visualization Tools
16
Command Line Hadoop Tools
…Powerful but Not Intuitive
Copyright 2016-2018 Northrop Grumman Systems Corp.
Security Considerations for Integrating Analytics Tools
17
• Users only have access to their files, databases, and analytics processes in
the Data Lake
• Users run analytics (Hive queries, Pig jobs, Spark jobs, etc) as themselves
in the Data Lake
Copyright 2016-2018 Northrop Grumman Systems Corp.
Hadoop Secure Impersonation
18
Alice
Alice’s
credentials
Web Application
Application’s credentials
doAs: Alice Authorization
controls for Alice
• A Hadoop superuser can submit jobs or access data on behalf of another user
Applications:
• Web based applications
Cautions:
• Limit superusers to trusted applications only
Hadoop
Copyright 2016-2018 Northrop Grumman Systems Corp.
Direct Kerberos Authentication
Applications:
• Hadoop commandline tools (hdfs, beeline, Spark shell)
• Some data science notebook tools
19
Alice
KDC Server
TGT Service
Tickets
Alice’s workstation
Authorization
controls for
Alice
Cautions:
• Users may need to be trained to be Kerberos aware
• Difficult in some choice of IAM systems in Hybrid identity
management approach
Hadoop
Copyright 2016-2018 Northrop Grumman Systems Corp.
Direct Kerberos Authentication + Saved Credentials
Applications:
– Some commercial data science platforms
Benefits:
– Kerberos authentication becomes transparent to the
user, improving user experience
– Linux Container security simplifies isolation of many
analytics applications20
Alice’s session
KDC Server
TGT Service
Tickets
Container instance with
Alice’s Kerberos tickets
Authorization
controls for
Alice
Alice Application
Server
(Saved password
or keytab)
Alice’s
password
or keytab
Start Linux
Container
Cautions:
• Users’ Kerberos credentials on saved on servers long
term
• Understand which servers and persistence stores save
these passwords and take security precautions to
minimize risks
Hadoop
Copyright 2016-2018 Northrop Grumman Systems Corp.
Edge Proxy Gateway
• Example: Knox
Applications:
- Custom scripts calling Hadoop functionality
- ODBC/JDBC Data Sources
- Self-Service Business Intelligence Tools
21
Authorization
controls for
Alice
Pluggable Auth Providers
LDAP
PAM
HadoopAuth
SSO Cookie
JWT Provider
Claims (CAS/Auth/SAML/OpenID)
Alice
Cautions:
• Not every application supports this method
• Performance challenges
Edge Proxy
Gateway
Hadoop
Copyright 2016-2018 Northrop Grumman Systems Corp.
Methods of application authentication and
impersonation in Hadoop
22
Edge Proxy Gateway
Direct Kerberos AuthenticationHadoop User Impersonation
Direct Kerberos Authentication + Saved Credentials
Copyright 2016-2018 Northrop Grumman Systems Corp.
Architecture Decisions and Tradeoffs
1. Authentication and Accounts Management
2. Authorization and Access Control
3. Interfacing Analytics Applications with a Data Lake
23
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Company with Strict IT Security and Diverse Analytics Use Cases

More Related Content

What's hot

What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopDataWorks Summit
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeDataWorks Summit
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFTDataWorks Summit
 
Enabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationEnabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationDataWorks Summit
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingDataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...DataWorks Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on DockerDataWorks Summit
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache AmbariDataWorks Summit
 
Scalable and adaptable typosquatting detection in Apache Metron
Scalable and adaptable typosquatting detection in Apache MetronScalable and adaptable typosquatting detection in Apache Metron
Scalable and adaptable typosquatting detection in Apache MetronDataWorks Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 DataWorks Summit
 

What's hot (20)

Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Enabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationEnabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integration
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
 
Scalable and adaptable typosquatting detection in Apache Metron
Scalable and adaptable typosquatting detection in Apache MetronScalable and adaptable typosquatting detection in Apache Metron
Scalable and adaptable typosquatting detection in Apache Metron
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
 

Similar to How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Company with Strict IT Security and Diverse Analytics Use Cases

A Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterA Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterForgeRock
 
Migrate to a Fully Managed Application Streaming Service on AWS with AppStrea...
Migrate to a Fully Managed Application Streaming Service on AWS with AppStrea...Migrate to a Fully Managed Application Streaming Service on AWS with AppStrea...
Migrate to a Fully Managed Application Streaming Service on AWS with AppStrea...Amazon Web Services
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyDataWorks Summit
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyAnurag Shrivastava
 
Gcp intro-20160721
Gcp intro-20160721Gcp intro-20160721
Gcp intro-20160721Haeseung Lee
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Hellmar Becker
 
Nirav Kothari: Well-Architected - Operational Excellence Instructor Led Lab.pdf
Nirav Kothari: Well-Architected - Operational Excellence Instructor Led Lab.pdfNirav Kothari: Well-Architected - Operational Excellence Instructor Led Lab.pdf
Nirav Kothari: Well-Architected - Operational Excellence Instructor Led Lab.pdfAmazon Web Services
 
Adapting to Meet Today’s Trends and Technologies– Compliance vs. Enforcement
Adapting to Meet Today’s Trends and Technologies– Compliance vs. EnforcementAdapting to Meet Today’s Trends and Technologies– Compliance vs. Enforcement
Adapting to Meet Today’s Trends and Technologies– Compliance vs. EnforcementFlexera
 
Best Practices for Multi-Cloud Security and Compliance
Best Practices for Multi-Cloud Security and ComplianceBest Practices for Multi-Cloud Security and Compliance
Best Practices for Multi-Cloud Security and ComplianceRightScale
 
Admin Features Upgraded in Cognos 11.1
Admin Features Upgraded in Cognos 11.1Admin Features Upgraded in Cognos 11.1
Admin Features Upgraded in Cognos 11.1Senturus
 
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...Amazon Web Services
 
Who Broke My Cloud? SaaS Monitoring Best Practices
Who Broke My Cloud? SaaS Monitoring Best PracticesWho Broke My Cloud? SaaS Monitoring Best Practices
Who Broke My Cloud? SaaS Monitoring Best PracticesThousandEyes
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!DataWorks Summit
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextHellmar Becker
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersDavid Walker
 
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Amazon Web Services
 
Too Many Tools - How AWS Systems Manager Bridges Operational Models
Too Many Tools - How AWS Systems Manager Bridges Operational ModelsToo Many Tools - How AWS Systems Manager Bridges Operational Models
Too Many Tools - How AWS Systems Manager Bridges Operational ModelsAmazon Web Services
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 

Similar to How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Company with Strict IT Security and Diverse Analytics Use Cases (20)

A Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterA Study in Borderless Over Perimeter
A Study in Borderless Over Perimeter
 
Migrate to a Fully Managed Application Streaming Service on AWS with AppStrea...
Migrate to a Fully Managed Application Streaming Service on AWS with AppStrea...Migrate to a Fully Managed Application Streaming Service on AWS with AppStrea...
Migrate to a Fully Managed Application Streaming Service on AWS with AppStrea...
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Gcp intro-20160721
Gcp intro-20160721Gcp intro-20160721
Gcp intro-20160721
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Nirav Kothari: Well-Architected - Operational Excellence Instructor Led Lab.pdf
Nirav Kothari: Well-Architected - Operational Excellence Instructor Led Lab.pdfNirav Kothari: Well-Architected - Operational Excellence Instructor Led Lab.pdf
Nirav Kothari: Well-Architected - Operational Excellence Instructor Led Lab.pdf
 
Adapting to Meet Today’s Trends and Technologies– Compliance vs. Enforcement
Adapting to Meet Today’s Trends and Technologies– Compliance vs. EnforcementAdapting to Meet Today’s Trends and Technologies– Compliance vs. Enforcement
Adapting to Meet Today’s Trends and Technologies– Compliance vs. Enforcement
 
Best Practices for Multi-Cloud Security and Compliance
Best Practices for Multi-Cloud Security and ComplianceBest Practices for Multi-Cloud Security and Compliance
Best Practices for Multi-Cloud Security and Compliance
 
Admin Features Upgraded in Cognos 11.1
Admin Features Upgraded in Cognos 11.1Admin Features Upgraded in Cognos 11.1
Admin Features Upgraded in Cognos 11.1
 
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
 
Who Broke My Cloud? SaaS Monitoring Best Practices
Who Broke My Cloud? SaaS Monitoring Best PracticesWho Broke My Cloud? SaaS Monitoring Best Practices
Who Broke My Cloud? SaaS Monitoring Best Practices
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
 
Too Many Tools - How AWS Systems Manager Bridges Operational Models
Too Many Tools - How AWS Systems Manager Bridges Operational ModelsToo Many Tools - How AWS Systems Manager Bridges Operational Models
Too Many Tools - How AWS Systems Manager Bridges Operational Models
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Company with Strict IT Security and Diverse Analytics Use Cases

  • 1. Copyright 2016-2018 Northrop Grumman Systems Corp. How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Company June 2018 Leon Li, PhD Platform Architect
  • 2. Copyright 2016-2018 Northrop Grumman Systems Corp. About Northrop Grumman Leading global security company Approximately 70,000 employees in all 50 states and 25+ countries Technology Heritage • 1927 - Spirit of St. Louis, which Charles Lindbergh flew across the Atlantic • 1946 - First flight of the XB-35 flying wing • 1953 - Contract to oversee the U.S. Air Force ICBM program • 1958 - Pioneer 1 becomes the first spacecraft built by an industrial contractor • 1969 - Apollo Lunar Module carries man to the surface of the moon • 1983 - Pioneer 10, becomes the first manmade object to leave the solar system • 1989 - First flight of the B-2 stealth bomber, a descendent of XB-35 flying wing design • 1998 - First flight of the RQ-4A Global Hawk, a high-altitude, long-endurance unmanned aerial reconnaissance system • NASA’s James Webb Space Telescope: unprecedented resolution and sensitivity space telescope to observe most distant events and objects in the universe 2
  • 3. Copyright 2016-2018 Northrop Grumman Systems Corp. Northrop Grumman’s Enterprise Data Analytics Platform • Platform provides analytics capabilities quickly and simply, allowing users to focus on their needs instead of infrastructure • Provides capabilities from basic data handling and reporting through big data and state-of-the-art machine learning • Single, large shared Hadoop cluster with multitenant security as Data Lake 6/20/2018 3
  • 4. Copyright 2016-2018 Northrop Grumman Systems Corp. Northrop Grumman’s participation in this user conference and mention of various products is not an endorsement of any product. This presentation is protected by copyright protections and may not be commercially used. The only permitted uses of this presentation are for personal, non-commercial uses of the user community. Northrop Grumman Does Not Endorse Any of the Products Mentioned in this Presentation
  • 5. Copyright 2016-2018 Northrop Grumman Systems Corp. Architecture Decisions and Tradeoffs 1. Authentication and Accounts Management 2. Authorization and Access Control 3. Interfacing Analytics Applications with a Data Lake 5
  • 6. Copyright 2016-2018 Northrop Grumman Systems Corp. Authentication and Accounts Management 6 • Hadoop requires a large number of Kerberos accounts – Each user uses a Kerberos account – Each Hadoop service uses one Kerberos account per machine • Secure creation of Hadoop service accounts and distribution of keytabs is crucial for security and operations Hadoop • Enterprise IAM systems o Provisioning and deprovisioning of accounts o Management of user accounts o Compliance with enterprise security policies • Uses Kerberos as underlying technology • Enterprises have existing IAM systems in place • New systems for enterprise use need to meet existing IAM policies for security, compliance, and governance Enterprise Identity and Access Management (IAM) User Key Distribution Center (IAM) AuthenticatesAuthenticates Hadoop
  • 7. Copyright 2016-2018 Northrop Grumman Systems Corp. Approach 1 – Completely isolated accounts management 7 Enterprise Systems Hadoop IAMEnterprise IAM Data Operations User & Hadoop Service Accounts Benefits: • No dependency on enterprise accounts management teams • Faster standup of platform • Data Lake team has full control over all accounts • Your Hadoop management tools can easily generate and distribute Hadoop service accounts Disadvantages: • Users have to remember separates usernames and passwords • Must manage user accounts separately (e.g. provisioning, deprovisioning, passwords, account lockouts, etc.) • Against security best practices to establish separate user accounts for separate systems Hadoop Cluster
  • 8. Copyright 2016-2018 Northrop Grumman Systems Corp. Approach 2 – Store all accounts directly into existing Enterprise IAM 8 Enterprise Systems Enterprise IAM Data Operations Hadoop Service Accounts Benefits: • Unified account management within enterprise IT organization • Improved user experience with SSO • Improved central auditing and compliance • No need to manage a separate Hadoop IAM – Reduce potential points of system failure Disadvantages: • Your Hadoop management tools may not have administrative access to Enterprise IAM systems or generate Hadoop service accounts in compliance with Enterprise IAM team rules • Increase dependency on Enterprise IAM administrator team • Recommendation: Consider OU delegation to isolate service accounts generation and management for Hadoop • Greater performance load on the Enterprise Directory Service as Hadoop grows Hadoop Cluster
  • 9. Copyright 2016-2018 Northrop Grumman Systems Corp. Approach 3 – Hybrid Approach: Separate service accounts from user accounts using a domain trust 9 Enterprise Systems Hadoop IAMEnterprise IAM Data Operations Hadoop Service Accounts Kerberos Realm Trust Benefits: • Unified user account management within enterprise IT organization • Improved user experience with SSO • Improved central auditing and compliance • Easier administration and maintenance as your Hadoop cluster management tools maintains control of Hadoop internal service accounts Disadvantages: • Kerberos trust setup can be complex and requires special skills • If Enterprise IAM and Hadoop IAM is mixed operating system environments, then incompatibilities can occur • Applications deployed to the Hadoop IAM realm may run into “Kerberos double-hop” delegation issues when authenticating users to Hadoop services Hadoop Cluster
  • 10. Copyright 2016-2018 Northrop Grumman Systems Corp. Summary Approaches to Hadoop – Enterprise IAM Integration 10 1) Completely isolated accounts management 2) Store all accounts directly into existing Enterprise IAM 3) Hybrid Approach: Separate service accounts from user accounts using a domain trust
  • 11. Copyright 2016-2018 Northrop Grumman Systems Corp. Architecture Decisions and Tradeoffs 1. Authentication and Accounts Management 2. Authorization and Access Control 3. Interfacing Analytics Applications with a Data Lake 11
  • 12. Copyright 2016-2018 Northrop Grumman Systems Corp. Hadoop Authorization Plugin Systems Architecture 12 Examples: Ranger, Sentry Administration Portal Administration API HDFS Hive HBase YarnNiFiStorm KafkaKnox Solr ImpalaAtlas • Hadoop authorization systems adds plugins to many Hadoop components to control authorizations • An Administration Portal and Administration API provides the ability to control authorizations * Not every authorization plugin systems support plugins for every Hadoop component Plugin Plugin Plugin Plugin Plugin Plugin Plugin Plugin Plugin Plugin Plugin
  • 13. Copyright 2016-2018 Northrop Grumman Systems Corp. Authorization Plug-ins for Hive and Spark 16
  • 14. Copyright 2016-2018 Northrop Grumman Systems Corp. Managing Permissions for Hive and Spark 14 • Only use HDFS permissions for both Spark and Hive • Benefits o Simple security model, easier to implement o Consistent user access for Hive and Spark • Disadvantages o No fine grained controls like column based security in Hive HDFS permissions for Hive and Spark • Use Hive column based security for Hive, and HDFS security for Spark • Benefits o Fine grained controls like column based security in Hive • Disadvantages o Spark access is granted separately o More administrative complexity o Must address discrepancy in access control between Hive and Spark (e.g. how to make sure fine grain control is enforce for Spark users) HDFS permissions for Spark only *Note: LLAP may improve this situation when it becomes fully enterprise ready
  • 15. Copyright 2016-2018 Northrop Grumman Systems Corp. Architecture Decisions and Tradeoffs 1. Authentication and Accounts Management 2. Authorization and Access Control 3. Interfacing Analytics Applications with a Data Lake 15
  • 16. Copyright 2016-2018 Northrop Grumman Systems Corp. Hadoop Analytics Tools Graphical Analytics Tools …More accessible to general users Examples • Data Science Notebooks • Business Intelligence and Visualization Tools 16 Command Line Hadoop Tools …Powerful but Not Intuitive
  • 17. Copyright 2016-2018 Northrop Grumman Systems Corp. Security Considerations for Integrating Analytics Tools 17 • Users only have access to their files, databases, and analytics processes in the Data Lake • Users run analytics (Hive queries, Pig jobs, Spark jobs, etc) as themselves in the Data Lake
  • 18. Copyright 2016-2018 Northrop Grumman Systems Corp. Hadoop Secure Impersonation 18 Alice Alice’s credentials Web Application Application’s credentials doAs: Alice Authorization controls for Alice • A Hadoop superuser can submit jobs or access data on behalf of another user Applications: • Web based applications Cautions: • Limit superusers to trusted applications only Hadoop
  • 19. Copyright 2016-2018 Northrop Grumman Systems Corp. Direct Kerberos Authentication Applications: • Hadoop commandline tools (hdfs, beeline, Spark shell) • Some data science notebook tools 19 Alice KDC Server TGT Service Tickets Alice’s workstation Authorization controls for Alice Cautions: • Users may need to be trained to be Kerberos aware • Difficult in some choice of IAM systems in Hybrid identity management approach Hadoop
  • 20. Copyright 2016-2018 Northrop Grumman Systems Corp. Direct Kerberos Authentication + Saved Credentials Applications: – Some commercial data science platforms Benefits: – Kerberos authentication becomes transparent to the user, improving user experience – Linux Container security simplifies isolation of many analytics applications20 Alice’s session KDC Server TGT Service Tickets Container instance with Alice’s Kerberos tickets Authorization controls for Alice Alice Application Server (Saved password or keytab) Alice’s password or keytab Start Linux Container Cautions: • Users’ Kerberos credentials on saved on servers long term • Understand which servers and persistence stores save these passwords and take security precautions to minimize risks Hadoop
  • 21. Copyright 2016-2018 Northrop Grumman Systems Corp. Edge Proxy Gateway • Example: Knox Applications: - Custom scripts calling Hadoop functionality - ODBC/JDBC Data Sources - Self-Service Business Intelligence Tools 21 Authorization controls for Alice Pluggable Auth Providers LDAP PAM HadoopAuth SSO Cookie JWT Provider Claims (CAS/Auth/SAML/OpenID) Alice Cautions: • Not every application supports this method • Performance challenges Edge Proxy Gateway Hadoop
  • 22. Copyright 2016-2018 Northrop Grumman Systems Corp. Methods of application authentication and impersonation in Hadoop 22 Edge Proxy Gateway Direct Kerberos AuthenticationHadoop User Impersonation Direct Kerberos Authentication + Saved Credentials
  • 23. Copyright 2016-2018 Northrop Grumman Systems Corp. Architecture Decisions and Tradeoffs 1. Authentication and Accounts Management 2. Authorization and Access Control 3. Interfacing Analytics Applications with a Data Lake 23