SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Owen O’Malley – Co-founder & Technical Fellow
Srikanth Venkat – Senior Director, Product Management
Treat Your Enterprise Data Lake Indigestion: Enterprise
Ready Security And Governance For Hadoop Ecosystem
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Presenters
Owen O’Malley
Co-Founder & Technical Fellow
Hortonworks
Srikanth Venkat
Senior Director of Product Management, Security & Governance
Apache Ranger, Apache Atlas, Apache Knox
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
HDP Security
Authentication (Kerberos, Apache Knox)
Authorization & Audits (Apache Ranger)
Data Protection
HDP Governance: Apache Atlas Overview
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Security: Comprehensive, Complete, Extensible
Data Protection
Protect data at rest and in motion
Audit
Maintain a record of data access
Authorization
Provision access to data
Authentication
Authenticate users and systems
Administration
Central management and consistent security
Single administrative console to set policy across the
entire cluster: Apache Ranger
Authentication for perimeter and cluster; integrates
with existing Active Directory and LDAP solutions:
Kerberos | Apache Knox
Consistent authorization controls across all Apache
components within HDP: Apache Ranger
Record of data access events across all components
that is consistent and accessible: Apache Ranger
Secure data in motion and data at rest: HDFS TDE w/
Ranger KMS + HSM, Ranger Data Masking + Row
Filtering, Wire encryption + Partner Solutions
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication & API Security: Apache Knox
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Knox Community Snapshot
Mar 2013
Entered
Incubator
Oct 2013
0.1.0 - 0.3.0
Incubator
Releases
Feb 2014
Graduates
to
Apache TLP
Apr 2014
0.4.0
TLP
Release
Nov 2014
0.5.0 May 2015
0.6.0
Apr/Aug 2016
0.9.0/0.9.1
Feb 2016
0.8.0
Dec 2015
0.7.0
Nov 2016
0.10.0
Dec 2016
0.11.0
Mar 2017
0.12.0
TBD
1.0.0
Target
Release
Date
• Committers: 17
• Contributors from:
• Hortonworks, IBM, CGI,
Uber, Oracle, Blue Talon
Apache 0.12.0/HDP 2.6
• Client SDK/DSL Improvements
• Apache Zeppelin Proxying
• YARN RM UI HA Support
• Knox Token Service
• Solr API and UI
Apache 0.11.0
• LDAP Improvements
• Hadoop Group Lookup Support
• Phoenix Server Support (Avatica)
• Management UI
• Metrics
@apache_knox
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Knox Proxying Services
★ Provide access to Hadoop via proxying of
HTTP resources
★ Ecosystem APIs and UIs + Hadoop oriented
dispatching for Kerberos + doAs
(impersonation) etc.
Authentication Services
★ REST API access, WebSSO flow for UIs
★ LDAP/AD, Header based PreAuth
★ Kerberos, SAML, OAuth
Client DSL/SDK Services
★ Scripting through DSL
★ Using Knox Shell classes directly as SDK
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication: Kerberos
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: Kerberos
⬢ Strongly authenticating and establishing a user’s identity is the basis for secure
access in Hadoop
⬢ Users need to be able to reliably “identify” themselves and have identity
propagated throughout the Hadoop cluster
⬢ Design & implementation of Kerberos security in native Apache Hadoop was
delivered by Hortonworks co-founder Owen O’Malley!
⬢ Why Kerberos?
⬢ Establishes identity for clients, hosts and services
⬢ Prevents impersonation/passwords are never sent over the wire
⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory
⬢ More granular auditing of data access/job execution
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automated Kerberos Setup with Ambari
 Wizard driven and automated Kerberos support
(kerberos principal creation for service
accounts, keytab generation and distribution for
appropriate hosts, permissions, etc.)
 Removes cumbersome, time consuming and
error prone administration of Kerberos
 Works with existing Kerberos infrastructure,
including Active Directory to automate common
tasks, removing the burden from the operator:
• Add/Delete Host
• Add Service
• Add/Delete Component
• Regenerate Keytabs
• Disable Kerberos
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos + Active Directory
Page 18
Cross Realm Trust
Client
Hadoop Cluster
AD /
LDAP KDC
Users: smith@EXAMPLE.COM
Hosts: host1@HADOOP.EXAMPLE.COM
Services: hdfs/host1@HADOOP.EXAMPLE.COM
User Store
Use existing directory
tools to manage users
Use Kerberos tools to
manage host + service
principals
Authentication
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization & Audits: Apache Ranger
20 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only.
Apache Ranger Community Snapshot
May 2014
XASecure
Acquisition
July 2014
Enters Apache
Incubation
Nov 2014
Ranger 0.4.0
Release
July 2015
Ranger 0.5/
HDP2.3
Aug 2016
Ranger 0.6/
HDP2.5
Nov 2016
Ranger 0.6.2/
HDP2.5.3
Jan 2017
Ranger TLP
graduation!
Apr 2017
Ranger 0.7
/HDP2.6
TBD
1.0.0
Target
Release
Date
• Committers: 22
• Contributors from:
Ebay, MSFT, Huawei,
Pandora, Accenture, ING,
Talend
Ranger 0.7/HDP 2.6
• Export/import of Policies
• $User and macros
• Plugin status tab
• “Show columns” and “describe extended
support”
• Incremental LDAP Sync
• SmartSense Metrics
Ranger 0.6/HDP2.5
• Classification (tag) based security (ABAC)
• Dynamic Column Masking & Row Filtering
• KMS HSM Integration (Safenet)
• Dynamic Policies & Deny Conditions
• LDAP Improvements & Audit Scalability
Jun 2017
Ranger 0.7.1/
HDP2.6.1
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query interface
AuditingAuthorization
• Store and manage encryption keys
• Support HDFS Transparent Data
Encryption
• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer
and manage security policies consistently
across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr,
Storm, Knox, NiFi, Atlas
• Extensible Architecture
• Custom policy conditions, user context
enrichers
• Easy to add new component types for
authorization
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger – Attribute Based Access Control (ABAC) Model
⬢ ABAC Model
⬢ Combination of the subject, action,
resource, and environment
⬢ Uses descriptive attributes: AD group,
Apache Atlas-based tags or
classifications, geo-location, etc.
⬢ Ranger approach is consistent with NIST
800-162
⬢ Avoid role proliferation and
manageability issues
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit Server
Ranger Plugin
HadoopComponentsEnterprise
Users
Ranger Plugin
Ranger Plugin
Legacy Tools and Data Governance
HDFS
Knox
NifI
Ranger Plugin
Ranger Plugin
SolrRanger Plugin
Ranger Policy Server Integration API
KafkaRanger Plugin
YARNRanger Plugin
Ranger PluginStorm Ranger Plugin Atlas
Solr
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP – Security & Governance
Classification
Prohibition
Time
Location
Policies
PDP
Resource
Cache
Ranger
Manage Access Policies
and Audit Logs
Track Metadata
and Lineage
Atlas Client
Subscribers
to Topic
Gets Metadata
Updates
Atlas
Metastore
Tags
Assets
Entitles
Streams
Pipelines
Feeds
Hive
Tables
HDFS
Files
HBase
Tables
Entities
in Data
Lake
Industry First: Dynamic Tag-based Security Policies
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive
User 2: Ivanna
Location : EU
Group: HRUser 1: Joe
Location : US
Group: Analyst
Original Query:
SELECT country, nationalid,
ccnumber, mrn, name FROM
ww_customers
Country National
ID
CC No DOB MRN Name Policy ID
US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424
US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984
Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909
Country National ID CC No MRN Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Ranger Policy Enforcement
Query Rewritten based on Dynamic Ranger
Policies: Filter rows by region & apply
relevant column masking
Users from US Analyst group see data
for US persons with CC and National ID
(SSN) as masked values and MRN is
nullified
Country National ID Name MRN
Germany T22000129 Ernie Schwarz 876452830A
EU HR Policy Admins can see
unmasked but are restricted
by row filtering policies to
see data for EU persons only
Original Query:
SELECT country, nationalid,
name, mrn FROM
ww_customers
Analysts
HR Marketing
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Data Protection
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection in Hadoop
must be applied at three different layers
in Apache Hadoop
Storage: encrypt data while it is at rest
Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner
Products (HPE Voltage, Protegrity, Dataguise)
Transmission: encrypt data as it is in motion
Native Apache Hadoop 2.0 provides wire encryption.
Upon Access: apply restrictions when accessed
Ranger (Dynamic Column Masking + Row Filtering), Partner Masking +
Encryption
Data Protection
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger KMS
Transparent Data Encryption in HDFS
NN
A B
C D
HDFS Client
A B
C D
A B
C D
DN DN DN
Benefits
 Selective encryption of relevant files/folders
 Prevent rogue admin access to sensitive data
 Fine grained access controls
 Transparent to end application w/o changes
 Ranger KMS integrated to external HSM
(Safenet Luna) adding to reliability/security of
KMS
SafeNet-
Luna HSM
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Apache Atlas: Vision & Features Overview
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: DGI Community becomes Apache Atlas
May
2015
Apache
Atlas
Incubation
DGI group
Kickoff
Dec
2014
Apr
2017
Apache 0.8
Release
Global Financial
Company
* DGI: Data Governance Initiative
Aug
2016
Apache 0.7
Foundation
Release
Apache Atlas 0.8/HDP2.6
• Simplified Search UI
• Simplified APIs
• Classification-based security for HDFS,
Kafka, HBase
• Knox SSO
• Performance/scalability improvements
Apache Atlas 0.7.1/HDP2.5
• High availability support
• LDAP Authentication/Authorization
• Classification based security for Hive
• UI Redesign
• Committers – 35
• Code contributors from
- Hortonworks, IBM, Aetna,
Merck, Target
Jun
2017
Atlas
Becomes
TLP!
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Vision: Open Metadata & Governance Services
STRUCTURED
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Kafka Storm
Sqoop
Hive
ATLAS
METADATA
Falcon
RANGER
Custom
Partners
Comprehensive Enterprise Data Catalog
• Lists all of your data, where it is located, its origin (lineage), owner, structure,
meaning, classification and quality
• Integrate both on-premise and cloud platforms to provide enterprise wide view
Open Enterprise Data Connectors
• Interoperable connector framework to connect to your data catalog out of the
box with many vendor technologies
• No expensive population of proprietary siloed metadata repositories
Dynamic Metadata Discovery
• Metadata is added automatically to the catalog as new data is created or data is
updated
• Extensible discovery processes that characterize and classify the data
Enabling Collaboration & Workflows
• Subject matter experts locate the data they need quickly and efficiently, share
their knowledge about the data and its usage to help others
• Interested parties and processes are notified automatically
Automated Governance Processes
• Metadata-driven access control
• Auditing, metering, and monitoring
• Quality control and exception management
• Rights (entitlement) management
Predefined standards for glossaries, data schemas, rules and regulations
Vision:
Metadata-driven foundational governance
services for enterprise data ecosystem
• Open frameworks and APIs
• Agile and secure collaboration around data and advanced
analytics
• Reduce operational costs while extracting economic value
of data
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
High Level Architecture: 4 Key points
Type System
Repository
Search DSL
Bridge
Hive Storm
Falcon Custom
REST API
Graph DB
Search
Kafka
Sqoop
Connectors
MessagingFramework
3 REST API
Modern, flexible access
to Atlas services, HDP
components, UI &
external tools
1 Data Lineage
Only product that
captures lineage
across Hadoop
components at
platform level.
4 Exchange
Leverage existing
metadata / models by
importing it from
current tools. Export
metadata to
downstream systems
2 Agile Data
Modeling:
Type system allows
custom metadata
structures in a
hierarchy taxonomy
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lineage
• Where does this data originate from (source/provenance)?
• Upstream path: Path through all data assets and processes leading up to current data asset
Impact
• How is this data being used ?
• What other data assets (derivative/dependent) does this impact?
• Downstream path: Path through all data assets and processes leading out of current data asset
Used for forensics
• Impact analysis
• Auditing and Compliance
Apache Atlas : Lineage
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas: Lineage and Impact
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas: Classification
• Categorize and curate data assets for easier discovery
• Associate context with data assets – Governance, Security, Business, …
GOVERNANCE
SECURITY
BUSINESS
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Classification: usecase – cross component
Classification based security on cross-component data assets
51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metadata Catalog Search : Basic Search
Search for a hive_table classified as ‘PII’ and name starting with ‘prov’
Filter by
Data Asset type
Filter by
Classification
Search text
Wildcards: prov*, *sum*
Logical expressions: prov* AND *sum*
52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metadata Catalog Search : Advanced
Filter by
Data asset type
Search for a hive_table named ‘employees’ and owner ‘hive’
DSL search with SQL like syntax
Select columns from impressions table in raw database
hive_column where table.name=‘impressions’ and table.db.name = ‘raw’
DSL query string
53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Takeaways
 Secure APIs and UIs in Hadoop ecosystem using Apache Knox gateway
 Enforce appropriate security controls to monitor data access across your businesses
with Apache Ranger
– Implement fine-grained policy based controls to grant and monitor data access
– Track user activity on data using user access audit logging features to help with forensic auditing for
breach notification purposes
– Protect sensitive data through anonymization and pseudonymization using dynamic masking and
row filtering
 Establish an Enterprise Data Catalog with Apache Atlas
– Identify and classify data
– Harvest and maintain metadata
 Track and map the movement of data through your enterprise with Apache Atlas
– Maintain a “Near Real Time” view to track data movement
– Understand data proliferation (especially sensitive data) with data lineage and impact analysis
54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Information…
Coming up Next..
BoF session – Security, Governance
& Cybersecurity
When: 6:00pm, Thursday September 21st 2017
Where: C4.7
Also Check out other sessions on Apache Atlas &
Apache Ranger from recent DataWorks Summits
https://dataworkssummit.com/san-jose-
2017/
https://dataworkssummit.com/munich-
2017/
Hortonworks
Product Pages
https://hortonworks.com/apache/ranger/
https://hortonworks.com/apache/atlas
Hortonworks Community Connection:
https://community.hortonworks.com/spaces/64/governance-
lifecycle-track.html
https://community.hortonworks.com/spaces/62/security-
track_2.html
Apache Software Foundation
http://ranger.apache.org/
http://atlas.apache.org/

More Related Content

What's hot

HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged Data
DataWorks Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
DataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
DataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
DataWorks Summit
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Provisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariProvisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & Ambari
DataWorks Summit/Hadoop Summit
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
DataWorks Summit/Hadoop Summit
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
DataWorks Summit/Hadoop Summit
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
DataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 

What's hot (20)

HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged Data
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Provisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariProvisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & Ambari
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 

Viewers also liked

Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
Abhishek Mallick
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
DataWorks Summit
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
Cloudera, Inc.
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
harithavijay94
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
Hortonworks
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
DataWorks Summit/Hadoop Summit
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
Peter Wood
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
Emilio Coppa
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
leahculver
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1
Amal Abid
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 

Viewers also liked (20)

Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop
HadoopHadoop
Hadoop
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Similar to Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
DataWorks Summit
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Shivaji Dutta
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
Chris Nauroth
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
Rommel Garcia
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
Madhan Neethiraj
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
Hortonworks
 

Similar to Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem (20)

Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Owen O’Malley – Co-founder & Technical Fellow Srikanth Venkat – Senior Director, Product Management Treat Your Enterprise Data Lake Indigestion: Enterprise Ready Security And Governance For Hadoop Ecosystem
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Presenters Owen O’Malley Co-Founder & Technical Fellow Hortonworks Srikanth Venkat Senior Director of Product Management, Security & Governance Apache Ranger, Apache Atlas, Apache Knox
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda HDP Security Authentication (Kerberos, Apache Knox) Authorization & Audits (Apache Ranger) Data Protection HDP Governance: Apache Atlas Overview
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP Security: Comprehensive, Complete, Extensible Data Protection Protect data at rest and in motion Audit Maintain a record of data access Authorization Provision access to data Authentication Authenticate users and systems Administration Central management and consistent security Single administrative console to set policy across the entire cluster: Apache Ranger Authentication for perimeter and cluster; integrates with existing Active Directory and LDAP solutions: Kerberos | Apache Knox Consistent authorization controls across all Apache components within HDP: Apache Ranger Record of data access events across all components that is consistent and accessible: Apache Ranger Secure data in motion and data at rest: HDFS TDE w/ Ranger KMS + HSM, Ranger Data Masking + Row Filtering, Wire encryption + Partner Solutions
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication & API Security: Apache Knox
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Knox Community Snapshot Mar 2013 Entered Incubator Oct 2013 0.1.0 - 0.3.0 Incubator Releases Feb 2014 Graduates to Apache TLP Apr 2014 0.4.0 TLP Release Nov 2014 0.5.0 May 2015 0.6.0 Apr/Aug 2016 0.9.0/0.9.1 Feb 2016 0.8.0 Dec 2015 0.7.0 Nov 2016 0.10.0 Dec 2016 0.11.0 Mar 2017 0.12.0 TBD 1.0.0 Target Release Date • Committers: 17 • Contributors from: • Hortonworks, IBM, CGI, Uber, Oracle, Blue Talon Apache 0.12.0/HDP 2.6 • Client SDK/DSL Improvements • Apache Zeppelin Proxying • YARN RM UI HA Support • Knox Token Service • Solr API and UI Apache 0.11.0 • LDAP Improvements • Hadoop Group Lookup Support • Phoenix Server Support (Avatica) • Management UI • Metrics @apache_knox
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Knox Proxying Services ★ Provide access to Hadoop via proxying of HTTP resources ★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs (impersonation) etc. Authentication Services ★ REST API access, WebSSO flow for UIs ★ LDAP/AD, Header based PreAuth ★ Kerberos, SAML, OAuth Client DSL/SDK Services ★ Scripting through DSL ★ Using Knox Shell classes directly as SDK
  • 8. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication: Kerberos
  • 9. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Background: Kerberos ⬢ Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop ⬢ Users need to be able to reliably “identify” themselves and have identity propagated throughout the Hadoop cluster ⬢ Design & implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworks co-founder Owen O’Malley! ⬢ Why Kerberos? ⬢ Establishes identity for clients, hosts and services ⬢ Prevents impersonation/passwords are never sent over the wire ⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory ⬢ More granular auditing of data access/job execution
  • 10. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Automated Kerberos Setup with Ambari  Wizard driven and automated Kerberos support (kerberos principal creation for service accounts, keytab generation and distribution for appropriate hosts, permissions, etc.)  Removes cumbersome, time consuming and error prone administration of Kerberos  Works with existing Kerberos infrastructure, including Active Directory to automate common tasks, removing the burden from the operator: • Add/Delete Host • Add Service • Add/Delete Component • Regenerate Keytabs • Disable Kerberos
  • 11. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberos + Active Directory Page 18 Cross Realm Trust Client Hadoop Cluster AD / LDAP KDC Users: smith@EXAMPLE.COM Hosts: host1@HADOOP.EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Use existing directory tools to manage users Use Kerberos tools to manage host + service principals Authentication
  • 12. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization & Audits: Apache Ranger
  • 13. 20 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only. Apache Ranger Community Snapshot May 2014 XASecure Acquisition July 2014 Enters Apache Incubation Nov 2014 Ranger 0.4.0 Release July 2015 Ranger 0.5/ HDP2.3 Aug 2016 Ranger 0.6/ HDP2.5 Nov 2016 Ranger 0.6.2/ HDP2.5.3 Jan 2017 Ranger TLP graduation! Apr 2017 Ranger 0.7 /HDP2.6 TBD 1.0.0 Target Release Date • Committers: 22 • Contributors from: Ebay, MSFT, Huawei, Pandora, Accenture, ING, Talend Ranger 0.7/HDP 2.6 • Export/import of Policies • $User and macros • Plugin status tab • “Show columns” and “describe extended support” • Incremental LDAP Sync • SmartSense Metrics Ranger 0.6/HDP2.5 • Classification (tag) based security (ABAC) • Dynamic Column Masking & Row Filtering • KMS HSM Integration (Safenet) • Dynamic Policies & Deny Conditions • LDAP Improvements & Audit Scalability Jun 2017 Ranger 0.7.1/ HDP2.6.1
  • 14. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface AuditingAuthorization • Store and manage encryption keys • Support HDFS Transparent Data Encryption • Integration with HSM • Safenet LUNA Ranger KMS • Centralized platform to define, administer and manage security policies consistently across Hadoop components • HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas • Extensible Architecture • Custom policy conditions, user context enrichers • Easy to add new component types for authorization
  • 15. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger – Attribute Based Access Control (ABAC) Model ⬢ ABAC Model ⬢ Combination of the subject, action, resource, and environment ⬢ Uses descriptive attributes: AD group, Apache Atlas-based tags or classifications, geo-location, etc. ⬢ Ranger approach is consistent with NIST 800-162 ⬢ Avoid role proliferation and manageability issues
  • 16. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Architecture HDFS Ranger Administration Portal HBase Hive Server2 Ranger Audit Server Ranger Plugin HadoopComponentsEnterprise Users Ranger Plugin Ranger Plugin Legacy Tools and Data Governance HDFS Knox NifI Ranger Plugin Ranger Plugin SolrRanger Plugin Ranger Policy Server Integration API KafkaRanger Plugin YARNRanger Plugin Ranger PluginStorm Ranger Plugin Atlas Solr
  • 17. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP – Security & Governance Classification Prohibition Time Location Policies PDP Resource Cache Ranger Manage Access Policies and Audit Logs Track Metadata and Lineage Atlas Client Subscribers to Topic Gets Metadata Updates Atlas Metastore Tags Assets Entitles Streams Pipelines Feeds Hive Tables HDFS Files HBase Tables Entities in Data Lake Industry First: Dynamic Tag-based Security Policies
  • 18. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive User 2: Ivanna Location : EU Group: HRUser 1: Joe Location : US Group: Analyst Original Query: SELECT country, nationalid, ccnumber, mrn, name FROM ww_customers Country National ID CC No DOB MRN Name Policy ID US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424 US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984 Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909 Country National ID CC No MRN Name US xxxxx3233 4539 xxxx xxxx xxxx null John Doe US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe Ranger Policy Enforcement Query Rewritten based on Dynamic Ranger Policies: Filter rows by region & apply relevant column masking Users from US Analyst group see data for US persons with CC and National ID (SSN) as masked values and MRN is nullified Country National ID Name MRN Germany T22000129 Ernie Schwarz 876452830A EU HR Policy Admins can see unmasked but are restricted by row filtering policies to see data for EU persons only Original Query: SELECT country, nationalid, name, mrn FROM ww_customers Analysts HR Marketing
  • 19. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Data Protection
  • 20. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection in Hadoop must be applied at three different layers in Apache Hadoop Storage: encrypt data while it is at rest Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner Products (HPE Voltage, Protegrity, Dataguise) Transmission: encrypt data as it is in motion Native Apache Hadoop 2.0 provides wire encryption. Upon Access: apply restrictions when accessed Ranger (Dynamic Column Masking + Row Filtering), Partner Masking + Encryption Data Protection
  • 21. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger KMS Transparent Data Encryption in HDFS NN A B C D HDFS Client A B C D A B C D DN DN DN Benefits  Selective encryption of relevant files/folders  Prevent rogue admin access to sensitive data  Fine grained access controls  Transparent to end application w/o changes  Ranger KMS integrated to external HSM (Safenet Luna) adding to reliability/security of KMS SafeNet- Luna HSM
  • 22. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Apache Atlas: Vision & Features Overview
  • 23. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Background: DGI Community becomes Apache Atlas May 2015 Apache Atlas Incubation DGI group Kickoff Dec 2014 Apr 2017 Apache 0.8 Release Global Financial Company * DGI: Data Governance Initiative Aug 2016 Apache 0.7 Foundation Release Apache Atlas 0.8/HDP2.6 • Simplified Search UI • Simplified APIs • Classification-based security for HDFS, Kafka, HBase • Knox SSO • Performance/scalability improvements Apache Atlas 0.7.1/HDP2.5 • High availability support • LDAP Authentication/Authorization • Classification based security for Hive • UI Redesign • Committers – 35 • Code contributors from - Hortonworks, IBM, Aetna, Merck, Target Jun 2017 Atlas Becomes TLP!
  • 24. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Vision: Open Metadata & Governance Services STRUCTURED TRADITIONAL RDBMS METADATA MPP APPLIANCES Kafka Storm Sqoop Hive ATLAS METADATA Falcon RANGER Custom Partners Comprehensive Enterprise Data Catalog • Lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality • Integrate both on-premise and cloud platforms to provide enterprise wide view Open Enterprise Data Connectors • Interoperable connector framework to connect to your data catalog out of the box with many vendor technologies • No expensive population of proprietary siloed metadata repositories Dynamic Metadata Discovery • Metadata is added automatically to the catalog as new data is created or data is updated • Extensible discovery processes that characterize and classify the data Enabling Collaboration & Workflows • Subject matter experts locate the data they need quickly and efficiently, share their knowledge about the data and its usage to help others • Interested parties and processes are notified automatically Automated Governance Processes • Metadata-driven access control • Auditing, metering, and monitoring • Quality control and exception management • Rights (entitlement) management Predefined standards for glossaries, data schemas, rules and regulations Vision: Metadata-driven foundational governance services for enterprise data ecosystem • Open frameworks and APIs • Agile and secure collaboration around data and advanced analytics • Reduce operational costs while extracting economic value of data
  • 25. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved High Level Architecture: 4 Key points Type System Repository Search DSL Bridge Hive Storm Falcon Custom REST API Graph DB Search Kafka Sqoop Connectors MessagingFramework 3 REST API Modern, flexible access to Atlas services, HDP components, UI & external tools 1 Data Lineage Only product that captures lineage across Hadoop components at platform level. 4 Exchange Leverage existing metadata / models by importing it from current tools. Export metadata to downstream systems 2 Agile Data Modeling: Type system allows custom metadata structures in a hierarchy taxonomy
  • 26. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lineage • Where does this data originate from (source/provenance)? • Upstream path: Path through all data assets and processes leading up to current data asset Impact • How is this data being used ? • What other data assets (derivative/dependent) does this impact? • Downstream path: Path through all data assets and processes leading out of current data asset Used for forensics • Impact analysis • Auditing and Compliance Apache Atlas : Lineage
  • 27. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: Lineage and Impact
  • 28. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: Classification • Categorize and curate data assets for easier discovery • Associate context with data assets – Governance, Security, Business, … GOVERNANCE SECURITY BUSINESS
  • 29. 48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Classification: usecase – cross component Classification based security on cross-component data assets
  • 30. 51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metadata Catalog Search : Basic Search Search for a hive_table classified as ‘PII’ and name starting with ‘prov’ Filter by Data Asset type Filter by Classification Search text Wildcards: prov*, *sum* Logical expressions: prov* AND *sum*
  • 31. 52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metadata Catalog Search : Advanced Filter by Data asset type Search for a hive_table named ‘employees’ and owner ‘hive’ DSL search with SQL like syntax Select columns from impressions table in raw database hive_column where table.name=‘impressions’ and table.db.name = ‘raw’ DSL query string
  • 32. 53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Takeaways  Secure APIs and UIs in Hadoop ecosystem using Apache Knox gateway  Enforce appropriate security controls to monitor data access across your businesses with Apache Ranger – Implement fine-grained policy based controls to grant and monitor data access – Track user activity on data using user access audit logging features to help with forensic auditing for breach notification purposes – Protect sensitive data through anonymization and pseudonymization using dynamic masking and row filtering  Establish an Enterprise Data Catalog with Apache Atlas – Identify and classify data – Harvest and maintain metadata  Track and map the movement of data through your enterprise with Apache Atlas – Maintain a “Near Real Time” view to track data movement – Understand data proliferation (especially sensitive data) with data lineage and impact analysis
  • 33. 54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved More Information… Coming up Next.. BoF session – Security, Governance & Cybersecurity When: 6:00pm, Thursday September 21st 2017 Where: C4.7 Also Check out other sessions on Apache Atlas & Apache Ranger from recent DataWorks Summits https://dataworkssummit.com/san-jose- 2017/ https://dataworkssummit.com/munich- 2017/ Hortonworks Product Pages https://hortonworks.com/apache/ranger/ https://hortonworks.com/apache/atlas Hortonworks Community Connection: https://community.hortonworks.com/spaces/64/governance- lifecycle-track.html https://community.hortonworks.com/spaces/62/security- track_2.html Apache Software Foundation http://ranger.apache.org/ http://atlas.apache.org/