Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta

1,836 views

Published on

Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta

Published in: Technology

Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta June 30, 2016 Apache Atlas
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved STRUCTURED UNSTRUCTURED Vision - Enterprise Data Governance Across Platforms TRADITIONAL RDBMS METADATA MPP APPLIANCES Project 1 Project 5 Project 4 Project 3 METADATA Project 6 DATA LAKE Atlas: Metadata Truth in Hadoop Data Management along the entire data lifecycle with integrated provenance and lineage capability Modeling with Metadata enables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilities Interoperable Solutions across the Hadoop ecosystem, through a common metadata store
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Governance Ready Certification Program Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visualization Choice: Customers choose features that they want to deploy—a la carte versus vendor lock Curated & Fast: Selected group of vendor partners to provide rich, complimentary and complete features ready to deploy Agile: Low switching costs, Faster deployment and innovation Centralized: Common SLA & common open metadata store Flexibility: Interoperability of products through Atlas metadata Safe: HDP at core to provide stability and interoperability
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Governance Ready Certification Program Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visualization The Apache open source community is committed to collaboration which critical for proper data governance. Partners have adopted this commitment and are extending governance capabilities by integrating their products with Atlas -- which is providing a rich innovative community with a common metadata store backed by Atlas. This session will showcase 3 vendors: – Waterline – Attivo – Trifacta
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Additional Atlas Sessions • BOF: Apache Knox and Apache Ranger provide Hadoop security while Atlas provides a Hadoop metadata store and enterprise compliance. Come learn and discuss security & governance innovations and future directions. Thursday 5-7 PM @ Room 210A
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn More: • Hortonworks links: http://hortonworks.com/solutions/security-and- governance/ • Tutorials: https://github.com/hortonworks/tutorials/tree/atlas-ranger- tp/tutorials/hortonworks/atlas-ranger-preview
  8. 8. Waterline Data The Smart Data Catalog Company
  9. 9. Unlock TheValue Of The Data LakeWith WaterlineData’s Smart Data Catalog Time To Value AUTOMATICALLY catalog data assets across ALL the data AND enable SELF-SERVICE access Tribal Knowledge Sharing AUGMENT semantic discovery by CROWDSOURCING tribal data knowledge Trust Enable AGILE GOVERNANCE with automated tagging, data stewardship, and SECURE SELF- SERVICE access to data based on role and policy
  10. 10. ShoppingMetaphorFor “Managed” Self-Service: Amazon.com Catalog Find, Understand And Collaborate Provision
  11. 11. Waterline Data Is LikeAmazon For The Data Lake Catalog Find, Understand And Collaborate Provision
  12. 12. WorkflowOf EnablingSelf-ServiceAnalytics With Hortonworks Hortonworks Atlas And Ranger Data Prep Analytics & Visualization Smart Data Discovery Profiling,Sensitive Data & Data Lineage Discovery, AutomatedTagging Data Stewardship Curate Tags Self-Service Data Catalog Find,Collaborate And Take Action Metadata,Tags, Data Lineage Metadata,Tags, Roles&Access Control Roles&Access Control
  13. 13. Demo
  14. 14. Waterline Data The Smart Data Catalog Company
  15. 15. UNIFY YOUR DATA ACROSS SILOS Joe Lichtman Vice President, Product jlichtman@attivio.com
  16. 16. 1 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL WHO IS ATTIVIO? Attivio unifies your data across silos to provide a 360° view of your business
  17. 17. 2 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL Gartner Magic Quadrant For Enterprise Search, Q3 2015 Forrester Wave: Big Data Search and Knowledge Discovery Solutions, Q3 2015 Forrester Wave: Big Data Text Analytics Platforms, Q2 2016 Gartner Magic Quadrant For Enterprise Search, August 2015 Forrester Wave: Big Data Search and Knowledge Discovery Solutions, Q3 2015 LEADER IN SEARCH, DATA DISCOVERY AND TEXT ANALYTICS
  18. 18. 3 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL SEMANTIC DATA CATALOG Attivio radically reduces time spent finding and understanding data sources to speed time-to-analytics. • Catalogs all your enterprise information • Identifies what’s most relevant • Unifies all structuredand semi-structured sources in a visual model • Provisions leadingBI and predictive analytics tools such as Qlik, R, RapidMiner, Spotfire, and Tableau 58% of the effort for BI initiatives is wasted on data exploration and integration 33% of businesses cite big data discovery as a challenge they are facing 50% Businesses use less than half of their available data for BI
  19. 19. 4 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL CATALOG ALL YOUR ENTERPRISE INFORMATION • Spiders and extracts metadata for all information types • Automatically catalogs data and content with semantic meaning • Applies human expertise to fine-tune tagging and align with business rules
  20. 20. 5 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL IDENTIFY THE RIGHT INFORMATION • Delivers natural language and keyword search • Provides an eCommerce-like shopping cart for data • Recommends the most relevant data for your context
  21. 21. 6 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL UNIFY THE INFORMATION FOR YOUR ANALYTIC CONTEXT • Automatically generates data models • Correlates all structured data and unstructured content • Simplifies provisioning to BI and advanced analytic tools
  22. 22. 7 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL PROVISION & OPERATIONALIZE DATA AS A STRATEGIC ASSET • Provision directly to agile BI and analytics tools OR • Rationalize the data warehouse for greater simplicity and lower cost • Power domain-specific apps
  23. 23. THANK YOU!
  24. 24. Trifacta + Hortonworks: Apache Atlas Integration
  25. 25. DATAWRANGLING What is Data Wrangling? 2 QUESTION ANALYZE INSIGHTDISCOVER STRUCTURE CLEANSE ENRICH VALIDATE PUBLISH
  26. 26. I N G E S T I O N A C C E S S DATA SOURCES Transactional Data banking credit cards lending wealth mortgages ledgers trades payments Interaction Data social webchat Analytics Reporting Data Product Models BUSINESS OPERATIONS Data Wrangling within the Hortonworks Data Lake Discovery Zone Shared Zone Raw Data Zone
  27. 27. Trifacta + Hortonworks & Apache Atlas Governance Ingestion Metadata & APIs Data Wrangling Analysis & Consumption

×