• Save
Govern This! Data Discovery and the application of data governance with new stack technologies
 

Like this? Share it with your network

Share

Govern This! Data Discovery and the application of data governance with new stack technologies

on

  • 1,161 views

Join Tableau and Cloudera to learn how to apply governance to the discovery layer in an enterprise data hub while still meeting the speed and agility requirements of the business user.

Join Tableau and Cloudera to learn how to apply governance to the discovery layer in an enterprise data hub while still meeting the speed and agility requirements of the business user.

Statistics

Views

Total Views
1,161
Views on SlideShare
1,079
Embed Views
82

Actions

Likes
1
Downloads
0
Comments
0

3 Embeds 82

http://www.cloudera.com 41
http://cloudera.com 37
http://author01.core.cloudera.com 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Today we're in the middle of a shift in how businesses use information. In the past, you'd define a set of business processes, build applications around each of them, and then go about gathering, conforming, and merging the necessary data sets to support those applications. From an infrastructure perspective, you'd be bringing the data over to the compute, often in relational databases. But you'd be leaving quite a lot on the table.The modern realities of business demand a new approach. Today companies need, more than ever, to become information-driven, but given the amount and diversity of information available, and the rate of change in business, it's simply unsustainable to keep moving around and transforming huge volumes of data.
  • The foundational platform that's addressing this wide range of problems today is Apache Hadoop, an open source platform for scalable, fault-tolerant data storage and processing that runs on a cluster of industry-standard servers. But Hadoop, in the beginning, wasn't capable of solving these problems. Originally, Hadoop was just a scalable distributed system for storing and processing large amounts of data. You could bring workloads to an effectively limitless amount and variety of data, provided the only kind of work you wanted to do was batch processing by writing Java code, and provided you liked hiring highly-skilled computer scientists to operate it.
  • Cloudera solved the latter problem with Cloudera Manager, the leading system management application for Apache Hadoop. Customers love Cloudera manager because it makes the complex simple. Hadoop is more than a dozen services running across many machines, with limitless configuration permutations. With Cloudera Manager, customers can centrally manage and monitor their clusters from a single tool. It provides automated installation and configuration of your cluster. Cloudera Manager is really our many years of Hadoop experience realized in software, and helps you get up and running quickly.
  • Our customers liked the scalability, flexibility, and economic properties of the platform, but, for example, didn't like that they had to move data out to other MPP analytic databases just to run fast SQL queries, so we built Impala, the world's first open source MPP analytic SQL query engine expressly designed for Hadoop. With Impala, you now have a viable open source alternative to proprietary MPP analytic databases, one that also delivers the core scalability, flexibility, and economic benefits of Hadoop.Now, over the past year we've continued to add to the platform, with Search, and Spark for interactive iterative analytics and stream processing. You also get HBase, the online key-value store, to enable real-time applications on the platform. With this range of diverse ways to access your data in Hadoop, far beyond just Java and MapReduce, you can now bring your existing tools and skill sets to the platform. What's even more exciting is that we've recently made it possible for our partners and other 3rd parties to deploy, manage, and monitor their apps in the platform, again leveraging exciting your investments while letting you access an even greater breadth and depth of data, all in one place.
  • Of course, none of this would matter if the platform weren't reliable, secure, and manageable. * Hadoop today is highly available and Cloudera provides extensions for automated backup and disaster recovery. * Hadoop has had perimeter security for some time but there was a significant gap in the area of fine-grained role-based access controls, the kind you'd expect from a DBMS. That's why, together with the community, we built and contributed the Apache Sentry project which delivers this security for Hive and Impala today, and why we developed Cloudera Navigator to support metadata management, including things like rights auditing, data lineage, and data discovery native to Hadoop. * And all this in addition to the industry-leading system management and customer support you expect from Cloudera.
  • So you can see a lot has happened in just a few short years. Ultimately what you have here is an enterprise data hub, which has four necessary attributes: * It's Secure and Compliant. In addition to perimeter security and encryption, an EDH offers fine-grained (row and column-level) role-based access controls over data, just like your data warehouse. * It's Governed. You need to understand what data is in your EDH and how it’s used, so an EDH must offer data discovery, data auditing, and data lineage. * It's Unified and Manageable. You need to be able to trust that your data is safe, so an EDH must provide not only native high-availability, fault-tolerance and self-healing storage, but also automated replication and disaster recovery. It also much provide advanced system and management to enable distributed multi-tenant performance. * And it's Open. As an EDH makes it possible to cost-effectively retain data for decades, you need to ensure that the foundational infrastructure is based on open source software and an open platform for 3rd parties. Open source ensures that you are not locked in to any particular vendor’s license agreement; nobody can hold your data or applications hostage. An open platform ensures that you’re not locked into a particular vendor’s stack and that you have a choice of what tools to use with the EDH, for example over 200 ISV products – such as Tableau Software - work with Cloudera today.With an enterprise data hub, our customers are able to store and drive real business impactfrom more data than they'd ever thought possible.
  • The expansive capabilities of Hadoop, and an enterprise data hub – the ability to store, process, and analyze huge quantities of data with varying levels of sensitivity from many different sources – structured, semi-structured, and unstructured - require a robust security capability to manage the range of vulnerabilities that may arise.As data proliferates, many new users of different types require access, and many different types of tools will access the data, raising concerns about ongoing management and compliance. Organizations will need to anticipate how they will ensure data quality throughout the information pipeline, enforce controls that guarantee appropriate access and rights, and move from ungoverned data systems with full administration, visibility, and security that allow them to discovery, explore, and consume data with full confidence.
  • Enter Cloudera Navigator, the first fully integrated data management application for Apache Hadoop designed to provide all of the capabilities required for administrators, data managers and analysts to secure, govern, classify and explore the large amounts of diverse data in their Hadoop clusters. Control: Navigator provides the system and data control necessary for compliance and risk management teams to ensure that their organization’s policies extend to critical and sensitive data within Hadoop., visibility, productivity, and reliability extend to critical and sensitive data within Hadoop. IT professionals benefit from the simple, centralized management functions offered by Cloudera Manager, so they gain both system and data control from an integrated end-to-end experienceVisibility – Navigator establishes a centralized system for verifying access permissions across all files and directories within Hadoop. Administrators and operations teams can validate their usage and data access policies by confirming individual and group rights and access. Productivity – Analysts, data scientists and business users easily identify data sets of interest and familiarize themselves with the various structures and formats. As a result, they can more quickly generate insights that benefit the business. Reliability – Navigator Lineage capabilities offer the ability to visually trace the progression of a data set from original source(s) to current state. This gives compliance officers, quality managers, executives and anyone else concerned with data cleanliness a high degree of confidence in the reliability of the data they use for reporting or to make decisions.
  • Tableau mission is to Help people see and understand their data. We have had this mission for over 10 years, and remain completely committed to helping business users discover new insights.
  • Data discovery has evolved. It has always been part to businesses, but it was typically done on the desktop or on “business server” environments. Business analysts spend most of their time preparing data to do work, rather than doing the work. Governance was/is Broken! Business users print, email, duplicate, and extract data assets from all over the organization… in a attempt to get their job done. The requirements process of traditional BI tools has failed organizations: 1) To Slow; 2) Requirements Change; 3) rely on a limited few; 4) to inflexible for the needs of the business; 5) costly; and 6) reactive.
  • We made if for everyone. We made it easy so that anyone would want to adopt it.

Govern This! Data Discovery and the application of data governance with new stack technologies Presentation Transcript

  • 1. 1 GOVERN THIS! Data Discovery & the Application of Data Governance Cloudera and Tableau Software Online Webinar May 1, 2014 Paul Lilford, Tableau Software Marc Lobree, Tableau Software Arlene Boyd, Cloudera Mark Donsky, Cloudera
  • 2. 2 Agenda ©2014 Cloudera and Tableau Software . All rights reserved. • Data Governance Requires a New Approach • From Apache Hadoop to an Enterprise Data Hub • Enterprise-Grade Governance with Cloudera Navigator • Data Discovery and the Application of Data Governance • Live Demo – Tableau Data Discovery • Live Demo – Cloudera Navigator • Q&A
  • 3. 3 Polling Question 3 How do you view existing governance processes? 1. Completely appropriate 2. Effective 3. Ineffective but needed 4. Obstructive
  • 4. 4 Polling Question 4 Are you in a line of business or IT person? 1. Business user 2. IT admin
  • 5. 5 Hadoop and Cloudera’s EDH: A New Approach to Data
  • 6. 6 ©2014Cloudera, Inc. All rights reserved. Expanding Data Requires A New Approach 6 Then Bring Data to Compute Now Bring Compute to Data Data Information-centric businesses use all Data: Multi-structured, Internal & external data of all types Comput e Comput e Comput e Process-centric businesses use: • Structured data mainly • Internal data only • “Important” data only Comput e Comput e Comput e Dat a Data Data Data
  • 7. 7 From Apache Hadoop to an enterprise data hub 7 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✖ ✖ ✖ BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE FILESYSTEM MAPREDUCE HDFS Core Apache Hadoop is great, but… 1) Hard to use and manage. 2) Only supports batch processing. 3) Not comprehensively secure. ©2014 Cloudera and Tableau Software . All rights reserved.
  • 8. 8 From Apache Hadoop to an enterprise data hub 8 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ BATCH PROCESSING STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE SYSTEM MANAGEMENT FILESYSTEM MAPREDUCE HDFS CLOUDERAMANAGER ✖ ✖ ©2014 Cloudera and Tableau Software . All rights reserved.
  • 9. 9 From Apache Hadoop to an enterprise data hub 9 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE SYSTEM MANAGEMENT FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERAMANAGER ✖ ©2014 Cloudera and Tableau Software . All rights reserved.
  • 10. 10 From Apache Hadoop to an enterprise data hub 10 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE DATA MANAGEMENT SYSTEM MANAGEMENT FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERANAVIGATORCLOUDERAMANAGER SENTRY ©2014 Cloudera and Tableau Software . All rights reserved.
  • 11. 11 From Apache Hadoop to an enterprise data hub 11 Open Source Scalable Flexible Cost-Effective ✔ Managed Open Architecture Secure and Governed ✔ ✔ ✔ BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE DATA MANAGEMENT SYSTEM MANAGEMENT CLOUDERA’S ENTERPRISE DATA HUB FILESYSTEM ONLINE NOSQL MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING YARN HDFS HBASE CLOUDERANAVIGATORCLOUDERAMANAGER SENTRY ©2014 Cloudera and Tableau Software . All rights reserved.
  • 12. 12 Partners Proactive & Predictive Support Professional Services Training Cloudera: Your Trusted Advisor for Big Data 12 Advance from Strategy to ROI with Best Practices and Peak Performance ©2014 Cloudera and Tableau Software . All rights reserved.
  • 13. 13 Polling Question 13 Do you use Hadoop for data discovery? 1. Yes, currently use Hadoop 2. No, but planning to start 3. Currently have no plans
  • 14. 14 Hadoop/EDH Data Management: Cloudera Navigator
  • 15. 1515 Problem Statement Lots of data landing in the enterprise data hub  Huge quantities with varying levels of sensitivity  Many different sources – structured & unstructured 1 Many users working with the data in multiple ways  Users: Compliance Officers, Analysts, Data Scientists, LOB  Tools: BI tools, ETL tools, Hue, and more 2 Need to effectively control & consume data  Get visibility & control over the environment  Discover, explore and consume data 3
  • 16. 16 Data Management Challenges •View, granting and revoke permissions across the Hadoop stack •Identify access to a data asset around the time of security breach •Generate alert when a restricted data asset is accessed Auditing and Access Management •Given a data set, trace back to the original source •Understand the downstream impact of purging/modifying a data setLineage •Search through metadata to find data sets of interest •Given a data set, view schema, metadata and policies Metadata Tagging and Discovery 16
  • 17. 17 Cloudera Navigator 17 Data Management Suite for Hadoop and Cloudera’s EDH Audit & Access Management Ensuring appropriate permissions & auditing on data access Discovery & Exploration Finding out what data is available and what it looks like Lineage Tracing data back to its original source Enterprise Metadata Repository  Business metadata  Lineage metadata  Operational metadata Audit & Access Mgmt Lineage Metadata Discovery & Exploration HDFS HBASE HIVE CLOUDERA NAVIGATOR CDH ETL DW DBMS DM … Self Tooling REST XMI
  • 18. 18 Tableau Data Discovery
  • 19. 19
  • 20. 20 • Support the process of discovery, and new insights through direct access to data by subject experts • LOB Subject Experts (empowered for their subject area) • Active IT support and engagement • Security still fundamental and Data is still protected. • Flexibility in governance, this is discovery not production. • Better vetted requirements feed production and more highly governed data types. • Help organizations in the move to become data driven. Data Discovery the new way!
  • 21. 21 But don’t take our word for it! 21 • The new normal: • Business Driven • Ease of use • Self reliance • Visual
  • 22. 22 For EveryoneEase of use leads to adoption across all departments and use cases ©2014 Cloudera and Tableau Software . All rights reserved.
  • 23. 23 Polling Question 23 What percentage of time would you like to spend in actual data discovery? 1. 0-10% 2. 10-20% 3. 20-30% 4. 30%+
  • 24. 24 •LIVE DEMO Tableau Data Discovery ©2014 Cloudera and Tableau Software . All rights reserved.
  • 25. 25 •LIVE DEMO Cloudera Navigator ©2014 Cloudera and Tableau Software . All rights reserved.
  • 26. 26 Summary 26 • Business driven data discovery is fundamental for all organizations • Move from insight to action - become data driven • Flexibility is key, yet so is scalability, integrated management, security, and governance • Prove it first – data discovery allows you to better vet your solution before you invest • The discovery layer brings IT and business users together in a collaborative form ©2014 Cloudera and Tableau Software . All rights reserved.
  • 27. 27 Questions? 27 Use the Chat tab on the left-side of your screen to submit question Watch this webinar on-demand: www.cloudera.com Contact Our Presenters: plilford@tableausoftware.com aboyd@cloudera.com Or contact your account team Thank you for attending! Connector: Tableau on Cloudera http://onlinehelp.tableausoftware.com/curre nt/pro/online/en- us/help.htm#examples_hadoop.html Download Tableau http://www.tableausoftware.com/ Download CDH – Free Open Source http://www.cloudera.com/downloads Cloudera and Tableau: http://www.cloudera.com/content/cloudera/e n/solutions/partner/Tableau.html ©2014 Cloudera and Tableau Software . All rights reserved.
  • 28. 28 ©2014 Cloudera and Tableau Software . All rights reserved.