SlideShare a Scribd company logo
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Implementing a Data Lake with Enterprise
Grade Data Governance
We do Hadoop.
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Your speakers
Andrew Ahn
Governance Product Manager, Hortonworks
Oliver Claude
CMO at Waterline
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP: Data Governance
We Do Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise Data Governance Goals
GOAL: Provide a common approach to
data governance across all systems
and data within the organization
•  Transparent
Governance standards & protocols must be
clearly defined and available to all
•  Reproducible
Recreate the relevant data landscape at a
point in time
•  Auditable
All relevant events and assets but be
traceable with appropriate historical lineage
•  Consistent
Compliance practices must be consistent
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Governance
Framework
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Data Governance Challenges WITHIN Hadoop
•  No comprehensive governance within
the Hadoop stack
•  Mostly disjoint as each project defines its own
future and there is no common framework
•  Disparate tools, such as HCatalog, Ranger and
Falcon provide pieces of the overall solution
•  No integration with external governance
frameworks
•  Difficult to get right because each project
is autonomous and you need insight into
traditional IT
ApachePig
ApacheHive
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Data Governance Initiative for Hadoop
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Data Governance Initiative
Common
Governance
Framework
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
°
°
ApachePig
ApacheHive
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
TWO Requirements
1.  Hadoop must snap in to
the existing frameworks
and be a good citizen
2.  Hadoop must also provide
governance within its own
stack of technologies
A group of companies dedicated to meeting
these requirements in the open
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Common Data Governance Use Cases
Financial Reporting
Chain of custody, Lineage Narratives
Telco
Device log management, Correlation, Analysis, and Mitigation
Retail
Point of sale analysis, Price optimization
Healthcare
30 day measures reporting
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas Overview
We Do Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
New Project Proposal: Apache Atlas
Apache Atlas
Proposed open source project
aimed at solving the Hadoop
data governance challenge in
the open.
Key Capabilities
•  Data Classification
•  Metadata Exchange
•  Centralized Auditing
•  Search & Lineage (Browse)
•  Security & Policy Engine
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag Based
Policies
Data Lifecycle
Management
Real Time Tag Based Access Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Essen%al	
  Timeline	
  
	
  
Phase-­‐3	
  
•  Collaboration Features
•  Self Service
•  Steward Delegation
•  Profiling & Pattern Analysis
•  Visualization	
  
Phase-­‐2
•  Advance audit reporting
•  Advanced Policy Engine
•  Row / Column Masking
•  3rd party Metadata exchange
	
  
1H	
  2015	
  GA	
  
•  Rest API
•  Centralized Taxonomy
•  Import / export metadata
•  Basic Policy Rules Engine
•  Real-time access control
•  Column Level Tagging
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas Capabilities: Overview
Data Classification
•  Import or define taxonomy business-oriented annotations for data
•  Define, annotate, and automate capture of relationships between data sets and underlying
elements including source, target, and derivation processes
•  Export metadata to third-party systems
Centralized Auditing
•  Capture security access information for every application, process, and interaction with data
•  Capture the operational information for execution, steps, and activities
Search & Lineage (Browse)
•  Pre-defined navigation paths to explore the data classification and audit information
•  Text-based search features locates relevant data and audit event across Data Lake quickly
and accurately
•  Browse visualization of data set lineage allowing users to drill-down into operational, security,
and provenance related information
Security & Policy Engine
•  Rationalize compliance policy at runtime based on data classification schemes
•  Advanced definition of policies for preventing data derivation based on classification (i.e. re-
identification)
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag Based
Policies
Data Lifecycle
Management
Real Time Tag Based Access Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas
Apache Atlas Overview
Knowledge Store
Knowledge store categorized with appropriate business-
oriented taxonomy
•  Data sets & objects
•  Tables / Columns
•  Logical context
•  Source, destination
Support exchange of metadata between foundation
components and third-party applications/governance tools
Leverages existing Hadoop metastores
Audit Store
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Knowledge Store
ModelsType-System
Policy RulesTaxonomies
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas
Knowledge Store
Apache Atlas Overview
Data Lifecycle Management
Leverage existing investment in Apache Falcon with a
focus on:
•  Provenance
•  Multi-cluster replication
•  Data set retention/eviction
•  Late data handling
•  Automation
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Data Lifecycle
Management
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas
Knowledge Store
Apache Atlas Overview
Audit Store
Historical repository for all governance events
•  Security: Access Grant & Deny
•  Operational: Data Provenance & Metrics
•  Indexed and Searchable
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Audit Store
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas
Knowledge Store
Apache Atlas Overview
Security
Integration with HDP Advanced Security investments
to ensure compliance.
Establish global security policies based on data
classification.
Leverages Ranger plug-in architecture for policy
enforcement
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Security
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas
Knowledge Store
Apache Atlas Overview
Policy Engine
Runtime rationalization of policies rules with respect to
data asset combinations and time. Fully extensible.
•  Metadata based
•  Geo based rules
•  Time-based rules
•  Hive Column Prohibitions
•  Preview: Hive Row and Column Masking
Audit Store
ModelsType-System
Taxonomies
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Policy Rules
Policy Engine
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas
Knowledge Store
Apache Atlas Overview
RESTful interface
•  Extensible enterprise classification of data assets,
relationships and policies organized in a meaningful
way -- aligned to business organization.
•  Supports exploration via user interface
•  Supports extensibility via API and CLI exposure
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Coming 2h 2015
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas
Knowledge Store
Apache Atlas Overview
Enhanced Audit Store
Historical repository for all governance events
•  Immutable file format
•  Events Metadata Taggable
•  Advanced Reporting
•  Security: Access Grant & Deny
•  Operational: Data Provenance & Metrics
•  Indexed and SearchableModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Audit Store
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Summary
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas Capabilities: Overview
Data Classification
•  Import or define taxonomy business-oriented annotations for data
•  Define, annotate, and automate capture of relationships between data sets and underlying
elements including source, target, and derivation processes
•  Export metadata to third-party systems
Centralized Auditing
•  Capture security access information for every application, process, and interaction with data
•  Capture the operational information for execution, steps, and activities
Search & Lineage (Browse)
•  Pre-defined navigation paths to explore the data classification and audit information
•  Text-based search features locates relevant data and audit event across Data Lake quickly
and accurately
•  Browse visualization of data set lineage allowing users to drill-down into operational, security,
and provenance related information
Security & Policy Engine
•  Rationalize compliance policy at runtime based on data classification schemes
•  Advanced definition of policies for preventing data derivation based on classification (i.e. re-
identification)
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag Based
Policies
Data Lifecycle
Management
Real Time Tag Based Access Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Governance Ready Certification Program
Curated group of vendor partners to provide
rich & complete features
Customers choose features that they want to
deploy – a la carte.
Low switching costs !
HDP at core to provide stability and
interoperability
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self
Service
Visual-
ization
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Waterline Data improves speed to value and
compliance
Data
Warehouse Offload
Data Science/
Analytics Sandbox
Data Lake
VALUE
CREATION
COST
SAVINGS
Deliver a
Business-Ready
Data Lake
Accelerate Data
Prep Process
Govern Data in
Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Find, understand and govern data in Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Modern Data Architecture
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Atlas Capabilities: Overview
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag Based
Policies
Data Lifecycle
Management
Real Time Tag Based Access Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Rest API
Business Glossary
Automated Classification (Tagging)
Automated Lineage Discovery
Profiling and Data Quality
Schema Discovery
Change Detection and Audit
•  Glossary
•  Tags
•  Lineage
•  Models
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Visual-ization
Governance Ready Certification Program
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self
Service
Visual-
ization
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Imagine shopping on Amazon.com
GOVERNANCE
Inventory
Find and Understand
Provision
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Waterline Data is like Amazon.com for data in
Hadoop
GOVERNANCE
Inventory
Find and Understand
Provision
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Inventory
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Find and Understand
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Provision
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Governance
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Find, understand and govern data in Hadoop
Big Data IT Architect
Deliver a Business-
Ready Data Lake
Data Engineer/Data Scientist
Accelerate Data Prep
Process
CDO/Data Steward
Govern Data in
Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deliver a business-ready data lake
“It’s easy to get data into Hadoop, but it’s not necessarily easy to get data out of Hadoop. There is a need for data as a
service to help the business find, understand, and govern data in Hadoop.”
Joe DosSantos, EMC Big Data Practice Leader
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deliver a business-ready data lake
“It’s easy to get data into Hadoop, but it’s not necessarily easy to get data out of Hadoop. There is a need for data as a
service to help the business find, understand, and govern data in Hadoop.”
Joe DosSantos, EMC Big Data Practice Leader
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Accelerate data prep process
“80% of Big Data analytics is data prep, and 80% of data prep is inventorying data.”
Data Engineering Director, Financial Services
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Accelerate data prep process
"Waterline Data fills a critical gap in big data exploratory analytics by automating the tagging and cataloging of data,
which in turn can help analytic teams provision the right data for their analyses.”
Tony Baer, Principal Analyst, Ovum
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Govern data in Hadoop
“Data lakes therefore carry substantial risks. The most important is the inability to determine data quality or the lineage of findings by
other analysts or users that have found value, previously, in using the same data in the lake. By its definition, a data lake accepts any
data, without oversight or governance. Without descriptive metadata and a mechanism to maintain it, the data lake risks turning into a
data swamp. And without metadata, every subsequent use of data means analysts start from scratch.”
“Gartner Says Beware of the Data Lake Fallacy” post on the Gartner website
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Govern data in Hadoop
“The first step to governing Big Data is to build an inventory.”
Sunil Soares, Managing Partner, Information Asset
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Best practice approach to implement an enterprise
grade data lake
6. Monitor and maintain
5. Open up to users
4. Protect sensitive data
3. Integrate with enterprise metadata repository
2. Build inventory of data
1. Create and populate landing area
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Best practices in deployment landscape
1. Create and populate landing
area
1
1
•  Create Landing directory structure
•  Set up ETL processes using
Falcon to orchestrate
•  Implement ETL jobs using ETL
tools (Syncsort, Talend,
Informatica, etc), Hadoop tools
(Sqoop, Flume, etc) or FTP
Falcon
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Best practices in deployment landscape
2. Build inventory of data
1. Create and populate landing
area
2
•  Crawl the cluster
•  Profile files
•  Automatically discover technical,
business, and compliance
metadata at a field level
•  Create Hive tables as needed
•  Import lineage
•  Export to Atlas
2
2
Falcon
HCatalog
Atlas
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Best practices in deployment landscape
3. Integrate with enterprise
metadata repository
2. Build inventory of data
1. Create and populate landing
area
3
3
•  Import business glossary terms
and export new tags and updated
definitions
•  Synchronize Atlas and Waterline
Data Inventory
•  Export metadata and lineage from
Hadoop to Enterprise repository
Falcon
HCatalog
Atlas
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Best practices in deployment landscape
4. Protect sensitive data
3. Integrate with enterprise
metadata repository
2. Build inventory of data
1. Create and populate landing
area
4
•  Use Waterline Data Inventory to
find sensitive data
•  Create access privileges in Ranger
•  Encrypt or de-identify
HCatalog
Ranger
Falcon
Atlas
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Best practices in deployment landscape
5. Open up to users
4. Protect sensitive data
3. Integrate with enterprise
metadata repository
2. Build inventory of data
1. Create and populate landing
area
5
5
5
•  Create account with Kerberos,
LDAP, etc.
•  Set up ACLs (leverage Ranger)
•  Users can browse securely through
Waterline Data Inventory
5
HCatalog
Ranger
Falcon
Atlas
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Best practices in deployment landscape
6. Monitor and maintain
5. Open up to users
4. Protect sensitive data
3. Integrate with enterprise
metadata repository
2. Build inventory of data
1. Create and populate landing
area
•  Continue profiling new or changed
files and sync with Atlas
•  Continue monitoring for sensitive
data, use Ranger to protect
•  Build a folksonomy and
synchronize with business glossary
in Atlas and Enterprise Business
Glossary
HCatalog
Ranger
Falcon
Atlas
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Find, understand and govern data in Hadoop
Discover lineage and
business metadata
automatically, and
manage metadata
CDO/Data Steward
Automate cataloging of
data assets at scale,
with secure
provisioning to
business users
Big Data Architect
Find and understand
best-suited and most
trusted data without
having to explore
every file manually
Data Engineer/Data
Scientist/Business Analyst
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Questions and Answers
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Next Steps…
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about Waterline Data & Hortonworks
http://hortonworks.com/partner/waterline-data
Joint tutorial: bit.ly/DataLakeTutorial
Modern Data Architecture Paper: go.waterlinedata.com/hw-mda
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
SAN JOSE
June 9-11
BRUSSELS
April 15-16
•  Deep-dive technical content
•  65+ sessions and 5 tracks
•  1,000 attendees
•  Sponsorships Available
•  Including Pre and Post event community meetups
and BOFs
•  Hadoop training available
•  100+ sessions and 7 tracks
•  Deep-dive technical content
•  5,000 attendees
•  Sponsorships Available
•  Including Pre and Post event community meetups
and BOFs
•  Hadoop training available
www.hadoopsummit.org
The Largest Hadoop Community Events in 

Europe and North America

More Related Content

What's hot

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Hortonworks
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
Hortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Hortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 

What's hot (20)

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 

Viewers also liked

BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2
Kelvin Chan
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
Data Science Thailand
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
EMC
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
Caserta
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
EMC
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoop
marklpollack
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
DataWorks Summit/Hadoop Summit
 

Viewers also liked (8)

BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoop
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 

Similar to Implementing a Data Lake with Enterprise Grade Data Governance

Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
DataWorks Summit
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
Alex Zeltov
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
DataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
DataWorks Summit/Hadoop Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
Madhan Neethiraj
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
DataWorks Summit
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
DataWorks Summit/Hadoop Summit
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
DataWorks Summit
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Hortonworks
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Hortonworks
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
Hortonworks
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Sean Roberts
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
Hortonworks
 

Similar to Implementing a Data Lake with Enterprise Grade Data Governance (20)

Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 

Recently uploaded (20)

Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 

Implementing a Data Lake with Enterprise Grade Data Governance

  • 1. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Implementing a Data Lake with Enterprise Grade Data Governance We do Hadoop.
  • 2. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Your speakers Andrew Ahn Governance Product Manager, Hortonworks Oliver Claude CMO at Waterline
  • 3. © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDP: Data Governance We Do Hadoop
  • 4. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enterprise Data Governance Goals GOAL: Provide a common approach to data governance across all systems and data within the organization •  Transparent Governance standards & protocols must be clearly defined and available to all •  Reproducible Recreate the relevant data landscape at a point in time •  Auditable All relevant events and assets but be traceable with appropriate historical lineage •  Consistent Compliance practices must be consistent ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Governance Framework
  • 5. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Data Governance Challenges WITHIN Hadoop •  No comprehensive governance within the Hadoop stack •  Mostly disjoint as each project defines its own future and there is no common framework •  Disparate tools, such as HCatalog, Ranger and Falcon provide pieces of the overall solution •  No integration with external governance frameworks •  Difficult to get right because each project is autonomous and you need insight into traditional IT ApachePig ApacheHive ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm
  • 6. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Data Governance Initiative for Hadoop ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Data Governance Initiative Common Governance Framework 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ApacheHive ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm TWO Requirements 1.  Hadoop must snap in to the existing frameworks and be a good citizen 2.  Hadoop must also provide governance within its own stack of technologies A group of companies dedicated to meeting these requirements in the open
  • 7. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Common Data Governance Use Cases Financial Reporting Chain of custody, Lineage Narratives Telco Device log management, Correlation, Analysis, and Mitigation Retail Point of sale analysis, Price optimization Healthcare 30 day measures reporting
  • 8. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Overview We Do Hadoop
  • 9. © Hortonworks Inc. 2011 – 2014. All Rights Reserved New Project Proposal: Apache Atlas Apache Atlas Proposed open source project aimed at solving the Hadoop data governance challenge in the open. Key Capabilities •  Data Classification •  Metadata Exchange •  Centralized Auditing •  Search & Lineage (Browse) •  Security & Policy Engine Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Tag Based Policies Data Lifecycle Management Real Time Tag Based Access Control REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM Essen%al  Timeline     Phase-­‐3   •  Collaboration Features •  Self Service •  Steward Delegation •  Profiling & Pattern Analysis •  Visualization   Phase-­‐2 •  Advance audit reporting •  Advanced Policy Engine •  Row / Column Masking •  3rd party Metadata exchange   1H  2015  GA   •  Rest API •  Centralized Taxonomy •  Import / export metadata •  Basic Policy Rules Engine •  Real-time access control •  Column Level Tagging
  • 10. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Capabilities: Overview Data Classification •  Import or define taxonomy business-oriented annotations for data •  Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes •  Export metadata to third-party systems Centralized Auditing •  Capture security access information for every application, process, and interaction with data •  Capture the operational information for execution, steps, and activities Search & Lineage (Browse) •  Pre-defined navigation paths to explore the data classification and audit information •  Text-based search features locates relevant data and audit event across Data Lake quickly and accurately •  Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information Security & Policy Engine •  Rationalize compliance policy at runtime based on data classification schemes •  Advanced definition of policies for preventing data derivation based on classification (i.e. re- identification) Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Tag Based Policies Data Lifecycle Management Real Time Tag Based Access Control REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 11. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Apache Atlas Overview Knowledge Store Knowledge store categorized with appropriate business- oriented taxonomy •  Data sets & objects •  Tables / Columns •  Logical context •  Source, destination Support exchange of metadata between foundation components and third-party applications/governance tools Leverages existing Hadoop metastores Audit Store Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Knowledge Store ModelsType-System Policy RulesTaxonomies
  • 12. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Knowledge Store Apache Atlas Overview Data Lifecycle Management Leverage existing investment in Apache Falcon with a focus on: •  Provenance •  Multi-cluster replication •  Data set retention/eviction •  Late data handling •  Automation Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Data Lifecycle Management
  • 13. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Knowledge Store Apache Atlas Overview Audit Store Historical repository for all governance events •  Security: Access Grant & Deny •  Operational: Data Provenance & Metrics •  Indexed and Searchable ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Audit Store
  • 14. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Knowledge Store Apache Atlas Overview Security Integration with HDP Advanced Security investments to ensure compliance. Establish global security policies based on data classification. Leverages Ranger plug-in architecture for policy enforcement Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Security
  • 15. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Knowledge Store Apache Atlas Overview Policy Engine Runtime rationalization of policies rules with respect to data asset combinations and time. Fully extensible. •  Metadata based •  Geo based rules •  Time-based rules •  Hive Column Prohibitions •  Preview: Hive Row and Column Masking Audit Store ModelsType-System Taxonomies Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Policy Rules Policy Engine
  • 16. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Knowledge Store Apache Atlas Overview RESTful interface •  Extensible enterprise classification of data assets, relationships and policies organized in a meaningful way -- aligned to business organization. •  Supports exploration via user interface •  Supports extensibility via API and CLI exposure Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other
  • 17. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Coming 2h 2015
  • 18. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Knowledge Store Apache Atlas Overview Enhanced Audit Store Historical repository for all governance events •  Immutable file format •  Events Metadata Taggable •  Advanced Reporting •  Security: Access Grant & Deny •  Operational: Data Provenance & Metrics •  Indexed and SearchableModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Audit Store
  • 19. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Summary
  • 20. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Capabilities: Overview Data Classification •  Import or define taxonomy business-oriented annotations for data •  Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes •  Export metadata to third-party systems Centralized Auditing •  Capture security access information for every application, process, and interaction with data •  Capture the operational information for execution, steps, and activities Search & Lineage (Browse) •  Pre-defined navigation paths to explore the data classification and audit information •  Text-based search features locates relevant data and audit event across Data Lake quickly and accurately •  Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information Security & Policy Engine •  Rationalize compliance policy at runtime based on data classification schemes •  Advanced definition of policies for preventing data derivation based on classification (i.e. re- identification) Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Tag Based Policies Data Lifecycle Management Real Time Tag Based Access Control REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 21. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Governance Ready Certification Program Curated group of vendor partners to provide rich & complete features Customers choose features that they want to deploy – a la carte. Low switching costs ! HDP at core to provide stability and interoperability Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visual- ization
  • 22. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Waterline Data improves speed to value and compliance Data Warehouse Offload Data Science/ Analytics Sandbox Data Lake VALUE CREATION COST SAVINGS Deliver a Business-Ready Data Lake Accelerate Data Prep Process Govern Data in Hadoop
  • 23. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Find, understand and govern data in Hadoop
  • 24. © Hortonworks Inc. 2011 – 2014. All Rights Reserved The Modern Data Architecture
  • 25. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Atlas Capabilities: Overview Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Tag Based Policies Data Lifecycle Management Real Time Tag Based Access Control REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM Rest API Business Glossary Automated Classification (Tagging) Automated Lineage Discovery Profiling and Data Quality Schema Discovery Change Detection and Audit •  Glossary •  Tags •  Lineage •  Models
  • 26. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Visual-ization Governance Ready Certification Program Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visual- ization
  • 27. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Imagine shopping on Amazon.com GOVERNANCE Inventory Find and Understand Provision
  • 28. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Waterline Data is like Amazon.com for data in Hadoop GOVERNANCE Inventory Find and Understand Provision
  • 29. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Inventory
  • 30. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Find and Understand
  • 31. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Provision
  • 32. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Governance
  • 33. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Find, understand and govern data in Hadoop Big Data IT Architect Deliver a Business- Ready Data Lake Data Engineer/Data Scientist Accelerate Data Prep Process CDO/Data Steward Govern Data in Hadoop
  • 34. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deliver a business-ready data lake “It’s easy to get data into Hadoop, but it’s not necessarily easy to get data out of Hadoop. There is a need for data as a service to help the business find, understand, and govern data in Hadoop.” Joe DosSantos, EMC Big Data Practice Leader
  • 35. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deliver a business-ready data lake “It’s easy to get data into Hadoop, but it’s not necessarily easy to get data out of Hadoop. There is a need for data as a service to help the business find, understand, and govern data in Hadoop.” Joe DosSantos, EMC Big Data Practice Leader
  • 36. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Accelerate data prep process “80% of Big Data analytics is data prep, and 80% of data prep is inventorying data.” Data Engineering Director, Financial Services
  • 37. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Accelerate data prep process "Waterline Data fills a critical gap in big data exploratory analytics by automating the tagging and cataloging of data, which in turn can help analytic teams provision the right data for their analyses.” Tony Baer, Principal Analyst, Ovum
  • 38. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Govern data in Hadoop “Data lakes therefore carry substantial risks. The most important is the inability to determine data quality or the lineage of findings by other analysts or users that have found value, previously, in using the same data in the lake. By its definition, a data lake accepts any data, without oversight or governance. Without descriptive metadata and a mechanism to maintain it, the data lake risks turning into a data swamp. And without metadata, every subsequent use of data means analysts start from scratch.” “Gartner Says Beware of the Data Lake Fallacy” post on the Gartner website
  • 39. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Govern data in Hadoop “The first step to governing Big Data is to build an inventory.” Sunil Soares, Managing Partner, Information Asset
  • 40. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Best practice approach to implement an enterprise grade data lake 6. Monitor and maintain 5. Open up to users 4. Protect sensitive data 3. Integrate with enterprise metadata repository 2. Build inventory of data 1. Create and populate landing area
  • 41. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Best practices in deployment landscape 1. Create and populate landing area 1 1 •  Create Landing directory structure •  Set up ETL processes using Falcon to orchestrate •  Implement ETL jobs using ETL tools (Syncsort, Talend, Informatica, etc), Hadoop tools (Sqoop, Flume, etc) or FTP Falcon
  • 42. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Best practices in deployment landscape 2. Build inventory of data 1. Create and populate landing area 2 •  Crawl the cluster •  Profile files •  Automatically discover technical, business, and compliance metadata at a field level •  Create Hive tables as needed •  Import lineage •  Export to Atlas 2 2 Falcon HCatalog Atlas
  • 43. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Best practices in deployment landscape 3. Integrate with enterprise metadata repository 2. Build inventory of data 1. Create and populate landing area 3 3 •  Import business glossary terms and export new tags and updated definitions •  Synchronize Atlas and Waterline Data Inventory •  Export metadata and lineage from Hadoop to Enterprise repository Falcon HCatalog Atlas
  • 44. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Best practices in deployment landscape 4. Protect sensitive data 3. Integrate with enterprise metadata repository 2. Build inventory of data 1. Create and populate landing area 4 •  Use Waterline Data Inventory to find sensitive data •  Create access privileges in Ranger •  Encrypt or de-identify HCatalog Ranger Falcon Atlas
  • 45. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Best practices in deployment landscape 5. Open up to users 4. Protect sensitive data 3. Integrate with enterprise metadata repository 2. Build inventory of data 1. Create and populate landing area 5 5 5 •  Create account with Kerberos, LDAP, etc. •  Set up ACLs (leverage Ranger) •  Users can browse securely through Waterline Data Inventory 5 HCatalog Ranger Falcon Atlas
  • 46. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Best practices in deployment landscape 6. Monitor and maintain 5. Open up to users 4. Protect sensitive data 3. Integrate with enterprise metadata repository 2. Build inventory of data 1. Create and populate landing area •  Continue profiling new or changed files and sync with Atlas •  Continue monitoring for sensitive data, use Ranger to protect •  Build a folksonomy and synchronize with business glossary in Atlas and Enterprise Business Glossary HCatalog Ranger Falcon Atlas
  • 47. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Find, understand and govern data in Hadoop Discover lineage and business metadata automatically, and manage metadata CDO/Data Steward Automate cataloging of data assets at scale, with secure provisioning to business users Big Data Architect Find and understand best-suited and most trusted data without having to explore every file manually Data Engineer/Data Scientist/Business Analyst
  • 48. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Questions and Answers
  • 49. © Hortonworks Inc. 2011 – 2014. All Rights Reserved Next Steps… Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 More about Waterline Data & Hortonworks http://hortonworks.com/partner/waterline-data Joint tutorial: bit.ly/DataLakeTutorial Modern Data Architecture Paper: go.waterlinedata.com/hw-mda
  • 50. © Hortonworks Inc. 2011 – 2014. All Rights Reserved SAN JOSE June 9-11 BRUSSELS April 15-16 •  Deep-dive technical content •  65+ sessions and 5 tracks •  1,000 attendees •  Sponsorships Available •  Including Pre and Post event community meetups and BOFs •  Hadoop training available •  100+ sessions and 7 tracks •  Deep-dive technical content •  5,000 attendees •  Sponsorships Available •  Including Pre and Post event community meetups and BOFs •  Hadoop training available www.hadoopsummit.org The Largest Hadoop Community Events in 
 Europe and North America