1© Cloudera, Inc. All rights reserved.
Optimized Data Management &
Governance with Cloudera 5.7:
Cloudera Navigator 2.6
Mark Donsky | Director of Product Management
2© Cloudera, Inc. All rights reserved.
The Benefits of Hadoop...
One place for unlimited data
• All types
• More sources
• Faster, larger ingestion
Unified, multi-framework data access
• More users
• More tools
• Faster changes
3© Cloudera, Inc. All rights reserved.
…can cause data management challenges
Navigator’s Governance Foundation
Unified auditing Comprehensive lineage Unified metadata Universal policies
Compliance Officer
Track, understand and protect
access to sensitive data
Am I prepared for an
audit?
Who’s accessing what
data?
What are they doing with
the data?
Is sensitive data governed
and protected?
Data Stewards & Curators
Manage and organize data
assets at Hadoop scale
How do efficiently manage
data lifecycle, from ingest
to purge?
How do I classify data
efficiently?
How do I make data
available to my end users
efficiently?
Data Scientists & BI Users
Effortlessly find and trust the
data that matters most
How can I explore data on
my own?
Can I trust what I find?
How do I use what I find?
How do I find and use
related data sets?
Hadoop Admins & DBAs
Boost user productivity and
cluster performance
How is data being used
today?
How can I optimize for
future workloads?
How can I quickly take
advantage of Hadoop risk-
free?
4© Cloudera, Inc. All rights reserved.
Poll
• What’s your top Hadoop data management concern?
• Compliance (e.g., PCI, HIPAA)
• Stewardship (data ingest, lifecycle management, access management, purge)
• Curation (custom metadata tagging)
• Enabling end user self-service (finding, trusting, and using data sets)
• Administration (model and workload optimization)
• Other/not sure yet
5© Cloudera, Inc. All rights reserved.
Cloudera Navigator
The only integrated data management and governance platform for Hadoop
Trusted for production
• Deployed by hundreds of customers
across multiple industries
• Over three years in production
Compliance-ready
• The only Hadoop distribution to pass
PCI audit
Plays nicely with others
• Integrated with the leading partner
solutions
6© Cloudera, Inc. All rights reserved.
Poll
• Have you Cloudera Navigator deployed in your environment?
• Yes in production
• Yes in dev/test
• No, but we’re planning
• No plans yet
• What’s Navigator?
7© Cloudera, Inc. All rights reserved.
End-to-End Data Management
Cloudera Navigator + Partners
Lineage Auditing Metadata
AugmentationConsumption
8© Cloudera, Inc. All rights reserved.
Cloudera Navigator 2.6
What’s New
9© Cloudera, Inc. All rights reserved.
Cloudera Navigator 2.6
Lineage for business users
Description
• Establish trust: “where did this data set come
from?”
• Determine impact: “who’s using this data
set?”
• Preserves all governance-related details, but
presents them only when needed
Benefits
• ↓ cost: administrators field fewer calls about
data location, usage, trust, and provenance
• ↓ risk: less likely that end users will use the
wrong data sets for their analysis; reduce
reputational risk from customer data breach
10© Cloudera, Inc. All rights reserved.
Cloudera Navigator 2.6
Managed metadata
Define metadata properties Add managed metadata Search for metadata
Description
• Organize and enforce validation on important
business metadata
• Stewardship: lifecycle stage, retain until, data owner
• Classification: keywords, department, fiscal quarter
Benefits
• ↓ cost: find and trust datasets faster
• ↓ risk: less likely that end users will use the wrong
data sets for their analysis
11© Cloudera, Inc. All rights reserved.
Cloudera Navigator 2.6
HDFS metadata purge
Description
• Enables administrators to purge
unused HDFS entities from the Solr
database
• Critical for maintaining optimal
performance of larger deployments
• Invoked as a REST API call
Benefits
• ↓ cost: ensure optimal performance
with minimal administrative effort
12© Cloudera, Inc. All rights reserved.
Cloudera Navigator Demo
13© Cloudera, Inc. All rights reserved.
Cloudera Navigator Roadmap Highlights
2H-2016
• Expanded coverage: Spark lineage, Kafka
auditing
• Hadoop SQL workload optimization
• Metadata purge for Hive and operational
metadata
• Sentry ABAC
1H-2017
• Comprehensive governance for the cloud
• Discovery on S3
• Auditing, lineage and metadata on
transient clusters
• Multi-cluster, hybrid cloud support
• Fine-grained access control
• SolrCloud
14© Cloudera, Inc. All rights reserved.
More Information & Next Steps
Get Started
• Download C5.7: www.cloudera.com/downloads
Release Notes
• www.cloudera.com/documentation/enterprise/latest/
topics/rg_release_notes.html
Training Classes
• university.cloudera.com
15© Cloudera, Inc. All rights reserved.
Questions?

Optimized Data Management with Cloudera 5.7: Understanding data value with Cloudera Navigator

  • 1.
    1© Cloudera, Inc.All rights reserved. Optimized Data Management & Governance with Cloudera 5.7: Cloudera Navigator 2.6 Mark Donsky | Director of Product Management
  • 2.
    2© Cloudera, Inc.All rights reserved. The Benefits of Hadoop... One place for unlimited data • All types • More sources • Faster, larger ingestion Unified, multi-framework data access • More users • More tools • Faster changes
  • 3.
    3© Cloudera, Inc.All rights reserved. …can cause data management challenges Navigator’s Governance Foundation Unified auditing Comprehensive lineage Unified metadata Universal policies Compliance Officer Track, understand and protect access to sensitive data Am I prepared for an audit? Who’s accessing what data? What are they doing with the data? Is sensitive data governed and protected? Data Stewards & Curators Manage and organize data assets at Hadoop scale How do efficiently manage data lifecycle, from ingest to purge? How do I classify data efficiently? How do I make data available to my end users efficiently? Data Scientists & BI Users Effortlessly find and trust the data that matters most How can I explore data on my own? Can I trust what I find? How do I use what I find? How do I find and use related data sets? Hadoop Admins & DBAs Boost user productivity and cluster performance How is data being used today? How can I optimize for future workloads? How can I quickly take advantage of Hadoop risk- free?
  • 4.
    4© Cloudera, Inc.All rights reserved. Poll • What’s your top Hadoop data management concern? • Compliance (e.g., PCI, HIPAA) • Stewardship (data ingest, lifecycle management, access management, purge) • Curation (custom metadata tagging) • Enabling end user self-service (finding, trusting, and using data sets) • Administration (model and workload optimization) • Other/not sure yet
  • 5.
    5© Cloudera, Inc.All rights reserved. Cloudera Navigator The only integrated data management and governance platform for Hadoop Trusted for production • Deployed by hundreds of customers across multiple industries • Over three years in production Compliance-ready • The only Hadoop distribution to pass PCI audit Plays nicely with others • Integrated with the leading partner solutions
  • 6.
    6© Cloudera, Inc.All rights reserved. Poll • Have you Cloudera Navigator deployed in your environment? • Yes in production • Yes in dev/test • No, but we’re planning • No plans yet • What’s Navigator?
  • 7.
    7© Cloudera, Inc.All rights reserved. End-to-End Data Management Cloudera Navigator + Partners Lineage Auditing Metadata AugmentationConsumption
  • 8.
    8© Cloudera, Inc.All rights reserved. Cloudera Navigator 2.6 What’s New
  • 9.
    9© Cloudera, Inc.All rights reserved. Cloudera Navigator 2.6 Lineage for business users Description • Establish trust: “where did this data set come from?” • Determine impact: “who’s using this data set?” • Preserves all governance-related details, but presents them only when needed Benefits • ↓ cost: administrators field fewer calls about data location, usage, trust, and provenance • ↓ risk: less likely that end users will use the wrong data sets for their analysis; reduce reputational risk from customer data breach
  • 10.
    10© Cloudera, Inc.All rights reserved. Cloudera Navigator 2.6 Managed metadata Define metadata properties Add managed metadata Search for metadata Description • Organize and enforce validation on important business metadata • Stewardship: lifecycle stage, retain until, data owner • Classification: keywords, department, fiscal quarter Benefits • ↓ cost: find and trust datasets faster • ↓ risk: less likely that end users will use the wrong data sets for their analysis
  • 11.
    11© Cloudera, Inc.All rights reserved. Cloudera Navigator 2.6 HDFS metadata purge Description • Enables administrators to purge unused HDFS entities from the Solr database • Critical for maintaining optimal performance of larger deployments • Invoked as a REST API call Benefits • ↓ cost: ensure optimal performance with minimal administrative effort
  • 12.
    12© Cloudera, Inc.All rights reserved. Cloudera Navigator Demo
  • 13.
    13© Cloudera, Inc.All rights reserved. Cloudera Navigator Roadmap Highlights 2H-2016 • Expanded coverage: Spark lineage, Kafka auditing • Hadoop SQL workload optimization • Metadata purge for Hive and operational metadata • Sentry ABAC 1H-2017 • Comprehensive governance for the cloud • Discovery on S3 • Auditing, lineage and metadata on transient clusters • Multi-cluster, hybrid cloud support • Fine-grained access control • SolrCloud
  • 14.
    14© Cloudera, Inc.All rights reserved. More Information & Next Steps Get Started • Download C5.7: www.cloudera.com/downloads Release Notes • www.cloudera.com/documentation/enterprise/latest/ topics/rg_release_notes.html Training Classes • university.cloudera.com
  • 15.
    15© Cloudera, Inc.All rights reserved. Questions?

Editor's Notes

  • #3 Key Point: Set up how the benefits of Hadoop make it hard for data management/governance Lots of data of all types coming in from more places – faster and in larger quantities (no longer able to know what data you have at human scale) More users accessing the data, using more tools, and making faster changes (no longer able to control what is happening to data and who is doing what)
  • #6 Cloudera Navigator is the only integrated data management and governance platform for Hadoop. It is a critical part of Cloudera Enterprise and is trusted in production by hundreds of our customers across multiple industries (regulated and not). With over two years of development, Cloudera was the first Hadoop vendor to introduce a data management and governance solution. Cloudera Navigator is a mature tool that going well beyond auditing and metadata collection. Cloudera Navigator and data governance is a key part of passing compliance audits. Cloudera is the only Hadoop distribution to pass a compliance audit (PCI-DSS with Mastercard) and Navigator plays a huge part in that Cloudera Navigator also features interoperability with the broad partner ecosystem. It integrates with the leading tools for data lineage, policies, audits, quality, and more so you can manage data both within the Hadoop platform and beyond.
  • #8 Cloudera Navigator has a broad ecosystem of leading third-party tools that seamlessly integrate with Navigator through REST APIs for end-to-end data management [Image shows Informatica lineage with Navigator]
  • #9 [If you have access to demo, walk through]
  • #13 [If you have access to demo, walk through]