Next Generation of Hadoop
Security & Governance
Apache Atlas + Ranger
Alex Zeltov – Solutions Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Data Platform Architecture
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger + Atlas
Overview
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• Administrators have complete
visibility into the security
administration process
Deep VisibilityCentralized Platform
• Administer security for:
– Database
– Table
– Column
– LDAP Groups
– Specific Users
Fine-Grained Security
Definition
• Centralized platform to define,
administer and manage security
policies consistently
• Define security policy once and
apply it to all the applicable
components across the stack
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Data Governance
Organizations need data governance to understand its information to answer
questions such as:
• What do we know about our information?
• Where did this data come from and how’s it being used?
• Does this data adhere to company policies and rules?
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: DGI Community becomes Apache Atlas
May
2015
Proto-type
Built
Apache
Atlas
Incubation
DGI group
Kickoff
Feb
2015
Dec
2014
July
2015
HDP 2.3
Foundation
GA Release
First kickoff to GA in 7 months
Global Financial
Company
* DGI: Data Governance Initiative
Key Benefits:
• Co-Dev = Built for
real customer use
cases
• Faster & Safer =
Customers know
business + HWX
knows Hadoop
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas
REST API
Modern, flexible access to Atlas services, HDP
components, UI & external tools
Search: SQL like DSL (Domain Specific
Language)
Support for key word, faceted and full text searches
Data Lineage
Only product that captures lineage across Hadoop
components at platform level.
Exchange
Leverage existing metadata / models by importing it
from current tools. Export metadata to downstream
systems
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag Based
Policies
Data Lifecycle
Management
Real Time Tag BasedAccess Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Metadata?
Technical
Metadata
• Database Name
• Table Name
• Column Name
• Data Type
Business
Metadata
• Business Name
• Business Definition
• Business Classification
• Sensitivity Tags
Operational
Metadata
• Who (security access)
• What (job information)
• When (logs/ audit trails)
• Where (location)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Access Policy
Apache Ranger + Atlas Integration
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use cases drives design – high reliability
Metastore
• Tags
• Assets
• Entities
Notification
Framework
Kafka Topics
Atlas
Atlas Client
• Subscribes to
Topic
• Gets Metadata
Updates
PDP
Resource Cache
Ranger
Notification Metadata
updates
Message
durability
Optimized
for Speed
Event driven
updates
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tag-based Access Policy Requirements
• Basic Tag policy – PII example. Access and entitlements must be tag
based ABAC and scalable in implementation.
• Geo-based policy – Policy based on IP address, proxy IP substitution
maybe required. The rule enforcement but be geo aware.
• Time-based policy – Timer for data access, de-coupled from deletion
of data.
• Prohibitions – Prevention of combination of Hive tables/Columns
that may pose a risk together.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Expanded Native Connectors:
Dataset Lineage
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sqoop
Teradata
Connector
Apache
Kafka
Expanded Native Connector: Dataset Lineage
Custom
Activity
Reporter
Metadata
Repository
RDBMS
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
UX proto-type: Taxonomy Navigation
Breadcrumbs for
taxonomy context path
Contents at
taxonomy context
DEMO

Atlas and ranger epam meetup

  • 1.
    Next Generation ofHadoop Security & Governance Apache Atlas + Ranger Alex Zeltov – Solutions Engineer
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Hortonworks Data Platform Architecture
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache Ranger + Atlas Overview
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved • Administrators have complete visibility into the security administration process Deep VisibilityCentralized Platform • Administer security for: – Database – Table – Column – LDAP Groups – Specific Users Fine-Grained Security Definition • Centralized platform to define, administer and manage security policies consistently • Define security policy once and apply it to all the applicable components across the stack
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Atlas Data Governance Organizations need data governance to understand its information to answer questions such as: • What do we know about our information? • Where did this data come from and how’s it being used? • Does this data adhere to company policies and rules?
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Background: DGI Community becomes Apache Atlas May 2015 Proto-type Built Apache Atlas Incubation DGI group Kickoff Feb 2015 Dec 2014 July 2015 HDP 2.3 Foundation GA Release First kickoff to GA in 7 months Global Financial Company * DGI: Data Governance Initiative Key Benefits: • Co-Dev = Built for real customer use cases • Faster & Safer = Customers know business + HWX knows Hadoop
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache Atlas REST API Modern, flexible access to Atlas services, HDP components, UI & external tools Search: SQL like DSL (Domain Specific Language) Support for key word, faceted and full text searches Data Lineage Only product that captures lineage across Hadoop components at platform level. Exchange Leverage existing metadata / models by importing it from current tools. Export metadata to downstream systems Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Tag Based Policies Data Lifecycle Management Real Time Tag BasedAccess Control REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved What is Metadata? Technical Metadata • Database Name • Table Name • Column Name • Data Type Business Metadata • Business Name • Business Definition • Business Classification • Sensitivity Tags Operational Metadata • Who (security access) • What (job information) • When (logs/ audit trails) • Where (location)
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Apache Ranger + Atlas Integration
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Use cases drives design – high reliability Metastore • Tags • Assets • Entities Notification Framework Kafka Topics Atlas Atlas Client • Subscribes to Topic • Gets Metadata Updates PDP Resource Cache Ranger Notification Metadata updates Message durability Optimized for Speed Event driven updates
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Tag-based Access Policy Requirements • Basic Tag policy – PII example. Access and entitlements must be tag based ABAC and scalable in implementation. • Geo-based policy – Policy based on IP address, proxy IP substitution maybe required. The rule enforcement but be geo aware. • Time-based policy – Timer for data access, de-coupled from deletion of data. • Prohibitions – Prevention of combination of Hive tables/Columns that may pose a risk together.
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved Expanded Native Connectors: Dataset Lineage
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Sqoop Teradata Connector Apache Kafka Expanded Native Connector: Dataset Lineage Custom Activity Reporter Metadata Repository RDBMS
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved UX proto-type: Taxonomy Navigation Breadcrumbs for taxonomy context path Contents at taxonomy context
  • 15.

Editor's Notes

  • #2 TALK TRACK Data is powering successful clinical care and successful operations. [NEXT SLIDE]
  • #7 How fast ? 7 months !
  • #11 Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  • #12 The point of Atlas is to leverage metadata to drive exchange, agility and scalability in the HDP gov solution.   The paradigm shift requires that in a true data lake with multi-tenant environment with 10K+ of objects, conventional management of entitlement and enforcement will not work and new patterns must be used.   One group cannot both understand the data and manage policy efficiently — the domain is too large.  These activities must be de-coupled.   The data stewards curate the data as they are the SMEs (tagging), and the policy folks create a policy once based on tags (access rules).    In our thinking, this the ONLY scalable solution.   We have it and CDH does not.
  • #14 Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together