Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Manage tracability with Apache Atlas, a flexible metadata repository

2,159 views

Published on

Do you know where is your data ?
Do you know who is responsible of this specific datasets ?
Do you know from which application or task this entity was modified last friday ?

Apache Atlas helps you to manage all your metadata of your data. With Apache Atlas you can know all lineages between your datasets and process that use them.

Published in: Technology
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Manage tracability with Apache Atlas, a flexible metadata repository

  1. 1. Copyright Synaltic 2015 Manage tracability with Apache Atlas, a flexible metadata repository Charly Clairmont Synaltic @egwada cclairmont@synaltic.fr http://synaltic.fr
  2. 2. Copyright Synaltic 2015 More than ten years experience in IT mainly in BI Cofounder of Altic, now Synaltic Cofounder of the Hadoop User Groupe France Belives in Open Source to help enterprises to create value Helps open source projects to be known via meetups and conference Charly Clairmont 2
  3. 3. Copyright Synaltic 2015 An integrator company mainly focused in Data Management Founded in 2004, Synaltic is the merge of two companies Synotis and Altic 25 specialists in Data Management A Swiss subsidiary, installed in Lausanne Our values ● Commitment ● Expertise ● Loyalty Synaltic 3 R&D Training SupportProject Expertise Data Intelligence Data Platform Data Governance Data Exchange SYNALTIC
  4. 4. Copyright Synaltic 2015 What about your Data ? 4 Do you know where is your data ? Do you know who is responsible of this specific datasets ? Do you know from which application or task this entity was modified last friday ?
  5. 5. Copyright Synaltic 2015 Enterprise Data Governance Provide a common approach to data governance across all systems and data within the organization – Transparent – Reproductible – Auditable – Consistent
  6. 6. Copyright Synaltic 2015 Enterprise Data Governance, in Hadoop No specific way to address this requirement – Each project proposes its own way to resolve data governance – No integration with some existing entreprise frameworks for data governance
  7. 7. Copyright Synaltic 2015 Apache Atlas Data classification Metadata Exchange Centralized Auditing Search & Lineage Security & Policy engine
  8. 8. Copyright Synaltic 2015 Apache Atlas, Overview Data Classification ● Taxonomy business-oriented annotations ● Relationships between data sets and underlying elements including source, target, and derivation processes ● Export metadata to third-party systems Centralized Auditing ● Security access information for every application, process ● Operational information for execution, steps, and activities Search & Lineage (Browse) ● Navigation paths to explore the data classification and audit information ● Text-based search to locate what is relevant ● Visualization of data set lineage Security & Policy Engine ● Compliance policy at runtime based on data classification schemes ● Advanced definition of policies for preventing data derivation
  9. 9. Copyright Synaltic 2015 Apache Atlas, Knowledge Store Knowledge store categorized with appropriate business-oriented taxonomy ● Data sets & objects ● Tables / Columns ● Logical context ● Source, destination Support exchange of metadata between foundation components and third-party applications/governance tools Tech: Titan with Apache HBase
  10. 10. Copyright Synaltic 2015 Apache Atlas, Data Lifecycle Management Provenance Multi-cluster replication Data set retention/eviction Late data handling Automation Tech: ● Apache Falcon
  11. 11. Copyright Synaltic 2015 Apache Atlas, Audit Store Historical repository for all governance events ● Security: Access Grant & Deny ● Operational: Data Provenance & Metrics ● Indexed and Searchable Tech: ● YARN ATS, Apache HBase, Apache Hive, Solr, ElasticSearch (Pluggable)
  12. 12. Copyright Synaltic 2015 Apache Atlas, Security Establish global security policies based on data classification.
  13. 13. Copyright Synaltic 2015 Apache Atlas, Policy Engine Runtime rationalization of policies rules with respect to data asset combinations and time. Fully extensible. ● Metadata based ● Geo based rules ● Time-based rules ● Column /Attribute Prohibitions ● Preview: Hive Row and Column Masking Tech: ● Ranger
  14. 14. Copyright Synaltic 2015 Apache Atlas, RESTful interface Extensible enterprise classification of data assets, relationships and policies organized in a meaningful way -- aligned to business organization. Supports exploration via user interface Supports extensibility via API and CLI exposure
  15. 15. Copyright Synaltic 2015 A use case Our process ImportImport TwitterTwitter HDFS : Raw data HDFS : Raw data Data source RéférentielRéférentiel Collect  from  twitter Hive: url Hive: url Hive: Hash tags Hive: Hash tags Hive: users Hive: users AnalyseAnalyse Build social network Hive: tweets Hive: tweets Hive: Social network Hive: Social network Data Platform
  16. 16. Copyright Synaltic 2015 A use case Search based on tables
  17. 17. Copyright Synaltic 2015 A use case Search based on Services
  18. 18. Copyright Synaltic 2015 A use case Table Metadata
  19. 19. Copyright Synaltic 2015 A use case Lineage
  20. 20. Copyright Synaltic 2015 Thank you !

×