Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas: Data Governance
July 2015
Partner Solutions
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Agenda
Overview
•  Enterprise Goals
•  Data Governance
Initativ...
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enterprise Data Governance Goals
GOAL: Provide a common approac...
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Governance Initiative for Hadoop
ETL/DQ
BPM
Business
Analy...
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Overview
We Do Hadoop
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Vision
Metadata Services
•  Flexible Knowledge Sto...
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Capabilities: Overview
Data Classification
•  Impo...
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Load Wrapper
Sample Use Case: ETL Offload
RDMS
Business
Catalog...
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(P...
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Governance Ready Certification Program
Curated group of vendor...
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
•  ASF MVP (May) – Preview Core Metada...
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Architecture
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
High Level Architecture
Type System
Repository
Search DSL
Brid...
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Technology Stack
•  Knowledge Store
o ...
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Admin
GET: /admin/stack
GET: /admin/ve...
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Overview of Types
•  Class
•  Struct
•  Trait
• ...
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Data Types
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
_class("Column") {!
"name" ~ (string, required)!
"dataType" ~ ...
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Repository
•  Graph Database
•  Titan with storage backed by H...
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Search
•  DSL with SQL Like Syntax
•  from $type is $trait whe...
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Lineage
•  Uses Search DSL Loop expression
•  Everything resul...
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(...
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Screens
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
24
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Demo Atlas
Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Atlas UI de...
Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ingestion D...
Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Setup
•  So...
Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Steps to Cr...
Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Attribute D...
Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions and Answers
Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
•  HDP 2.3 Preview Sandbox VM:
–  http...
Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you !
Upcoming SlideShare
Loading in …5
×

Data Governance - Atlas 7.12.2015

11,101 views

Published on

Effective data governance is imperative to the success of Data Lake initiatives. Without governance policies and processes, information discovery and analysis is severely impaired. In this session we will provide an in-depth look into the Data Governance Initiative launched collaboratively between Hortonworks and partners from across industries. We will cover the objectives of Data Governance Initiatives and demonstrate key governance capabilities of the Hortonworks Data Platform.

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Your slides are very much useful to me, Many thanks for that.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data Governance - Atlas 7.12.2015

  1. 1. Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas: Data Governance July 2015 Partner Solutions
  2. 2. Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Agenda Overview •  Enterprise Goals •  Data Governance Initative Demo •  Example: Sqoop •  Walk through step •  Search Tables / Tags Atlas •  Feature tour •  Roadmap •  UI Tour
  3. 3. Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Enterprise Data Governance Goals GOAL: Provide a common approach to data governance across all systems and data within the organization •  Transparent Governance standards & protocols must be clearly defined and available to all •  Reproducible Recreate the relevant data landscape at a point in time •  Auditable All relevant events and assets but be traceable with appropriate historical lineage •  Consistent Compliance practices must be consistent ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Governance Framework
  4. 4. Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Governance Initiative for Hadoop ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Data Governance Initiative Common Governance Framework 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ApacheHive ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm TWO Requirements 1.  Hadoop must snap in to the existing frameworks and be a good citizen 2.  Hadoop must also provide governance within its own stack of technologies A group of companies dedicated to meeting these requirements in the open Major Bank
  5. 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Overview We Do Hadoop
  6. 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Vision Metadata Services •  Flexible Knowledge Store •  Business Catalog / Operational Data •  Search & Proscriptive Lineage •  Centralized location for all metadata within HDP •  Interface point for Metadata Exchange with platforms outside of HDP. Metadata will enrich every component •  Hive – Complete lineage, every HiveQL tracked •  Ranger – Tag or Attribute security ABAC •  Falcon – Business Taxonomy Apache Atlas Hive Ranger Falcon Kafka Storm
  7. 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Capabilities: Overview Data Classification •  Import or define taxonomy business-oriented annotations for data •  Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes •  Export metadata to third-party systems Centralized Auditing •  Capture security access information for every application, process, and interaction with data •  Capture the operational information for execution, steps, and activities Search & Lineage (Browse) •  Pre-defined navigation paths to explore the data classification and audit information •  Text-based search features locates relevant data and audit event across Data Lake quickly and accurately •  Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information Security & Policy Engine •  Rationalize compliance policy at runtime based on data classification schemes •  Advanced definition of policies for preventing data derivation based on classification (i.e. re- identification) Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other
  8. 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Load Wrapper Sample Use Case: ETL Offload RDMS Business Catalog Metadata Hive: Landing Hive: CTAS Traditional EDW New ETL Hadoop Atlas Sqoop Reporter via REST API
  9. 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive Integration Apache Atlas Hive Bridge (Client) Hive Hook (Post-execution) REST API
  10. 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Governance Ready Certification Program Curated group of vendor partners to provide rich & complete features Customers choose features that they want to deploy – a la carte. Low switching costs ! HDP at core to provide stability and interoperability Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visual- ization
  11. 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 •  ASF MVP (May) – Preview Core Metadata Services: Type system, API’s, basic UI, Hive connecter •  HDP 2.3 (July) - GA Core Metadata Services. Preview Metadata Business Glossary •  M10 – (Sept) – Preview ABAC with Ranger integration and Preview Sqoop component connector •  M20 – Preview Kafka, Storm connectors, Gov Ready Certification program, Preview row level & Column masking. •  HDP 2.4 (Q4’15) GA all preview features 11 High Level Roadmap
  12. 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Architecture
  13. 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved High Level Architecture Type System Repository Search DSL Bridge Hive Storm OthersSqoop REST API Titan / HBase Solr/Elastic
  14. 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 Technology Stack •  Knowledge Store o  Titan Graph DB •  Pluggable Search Backend o  Elastic search o  Solr •  Rules Engine o  TBD •  Audit Store o  YARN ATS - Time series DB •  Java 1.7 •  Dashboard o  TBD
  15. 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 Admin GET: /admin/stack GET: /admin/version Entity GET: /entities/definition/{guid} POST: /entities/submit/{typeName} GET: /entities/list/{entityType} Metadata Discovery GET: /discovery/search/gremlin/{gremlinQuery} GET: /discovery/search/relationships/{guid} GET: /discovery/search/fullText?text=<query> GET: /discovery/getIndexedFields Rexster GET: /graph/vertices/{id} GET: /graph/vertices/properties/{id} GET: /graph/vertices GET: /graph/vertices/{id}/{direction} GET: /graph/edges/{id} Types POST: /types/submit/{typeName} GET: /types/definition/{typeName} GET: /types/list Hive Lineage GET: /bridge/hive/{id} GET: /bridge/hive POST: /bridge/hive 15 APIs: Examples
  16. 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type System – Overview of Types •  Class •  Struct •  Trait •  Primitives •  Collections •  Map •  Array •  Instances (Entity) •  Referenceable
  17. 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type System – Data Types
  18. 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved _class("Column") {! "name" ~ (string, required)! "dataType" ~ (string, required)! "sd" ~ ("StorageDesc", required)! }! ! _class("Table", List()) {! "name" ~ (string, required, indexed)! "db" ~ ("DB", required)! "sd" ~ ("StorageDesc", required)! }! !   _trait("Dimension") {}! _trait("PII") {}! _trait("Metric") {}! _trait("ETL") {}! _trait("JdbcAccess") {}! ! _class("DB") {! "name" ~ (string, required, indexed, unique)! "owner" ~ (string)! "createTime" ~ (int)! }! ! _class("StorageDesc") {! "inputFormat" ~ (string, required)! "outputFormat" ~ (string, required)! }!
  19. 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Repository •  Graph Database •  Titan with storage backed by HBase •  Types and instances are mapped to the Graph DB •  Classes, Structs and Traits map to a vertex •  Relationships are mapped as edges •  Search - plugin enabled •  Indexing based on type annotations •  Solr •  Elastic search
  20. 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Search •  DSL with SQL Like Syntax •  from $type is $trait where $clause select|has $attributes loop $loopExpression withPath, repeat •  Examples •  from DB •  DB where name="Reporting" select name, owner •  DB has name •  DB is JdbcAccess •  Column where Column is a PII •  Table where name="sales_fact", columns •  Table where name="sales_fact", columns as column select column.name, column.dataType, column.comment •  Full-text search
  21. 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Lineage •  Uses Search DSL Loop expression •  Everything results in search •  Named Queries •  inputs •  Table where (name = "sales_fact_monthly_mv") as src loop (LoadProcess->outputTable inputTables) as dest select src.name as src_name, dest.name as dest_name withPath •  outputs •  Table where (name = "sales_fact") as src loop (LoadProcess->inputTables outputTables) as dest select src.name as src_name, dest.name as dest_name withPath •  schema •  Table where name="sales_fact", columns
  22. 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive Integration Apache Atlas Hive Bridge (Client) Hive Hook (Post-execution) REST API
  23. 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Screens
  24. 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 24
  25. 25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  26. 26. Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  27. 27. Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  28. 28. Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  29. 29. Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Demo Atlas
  30. 30. Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Atlas UI demostration Search DSL •  Type – DB, Table, Column •  Tag - PII •  Keyword Results •  Details •  Schema •  Lineage Coming Features
  31. 31. Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ingestion Demo Objective •  Show Lineage with Sqoop Ingestion of data •  Custom process instrumention •  Use the Hive Hook CTAS Operation •  Atlas Follow Lineage •  Metadata Model in Atlas •  The Open Framework •  Create Custom Types •  Create Custom Process •  Sample Codes
  32. 32. Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Setup •  Source System •  MySQL Database •  DRIVERS •  TIMESHEET •  Destination System •  Single Node HDP 2.3 (Tech Preview) •  Apache Atlas
  33. 33. Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Steps to Create Metadata •  Create a Atlas Client Instance •  Create Type Definitions –  Class Types –  Attributes –  List the Types •  Instantiate Entities •  - Create Entities (Class Type) •  - Search the Types •  Create Process •  Create DataSet Type •  Create Process Type •  Connect a Process Lineage
  34. 34. Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Attribute Definition •  Name •  Data Type •  Multiplicity •  Composite •  isIndexable •  ReverseAttribute
  35. 35. Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions and Answers
  36. 36. Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 •  HDP 2.3 Preview Sandbox VM: –  http://hortonworks.com/hdp/whats-new/ •  Apache Atlas: –  http://atlas.incubator.apache.org/ –  http://incubator.apache.org/projects/atlas.html –  https://git-wip-us.apache.org/repos/asf/incubator-atlas.git •  Partner Workshops –  http://hortonworks.com/partners/learn/ •  More to come with official GA release of HDP 2.3 36 Atlas Resources
  37. 37. Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank you !

×