Big Data Warehousing Meetup: Cloudera Navigator


Published on

In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.

As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.

For more information, visit

Published in: Technology

Big Data Warehousing Meetup: Cloudera Navigator

  1. 1. Cloudera Navigator Patrick Angeles 1
  2. 2. Why You Need Cloudera Navigator 1 2 Many Users Working with the Data 3 2 Lots of Data Landing in Cloudera Enterprise Need to Effectively Control & Consume Data  Huge quantities  Many different sources – structured & unstructured  Varying levels of sensitivity  Administrators & compliance officers  Analysts & data scientists  Business users  Get visibility & control over the environment  Discover and explore data
  3. 3. Cloudera Navigator Data Management Layer for Cloudera Enterprise Audit & Access Control Ensuring appropriate permissions & reporting on data access for compliance CLOUDERA NAVIGATOR Audit & Access Control Discovery & Exploration Finding out what data is available and what it looks like Discovery & Exploration Lineage Lifecycle Mgmt. Enterprise Metadata Repository  Business metadata  Lineage metadata  Operational metadata Lineage Tracing data back to its original source CDH Lifecycle Management Migration of data based on policies 3 HDFS HBASE HIVE
  4. 4. Cloudera Navigator 1.0 Data Audit & Access Control Verify Permissions View which users and groups have access to files and directories IAM / LDAP SYSTEM Audit Configuration Configuration of audit tracking for HDFS, HBase and Hive Audit Dashboard Simple, queryable interface to view data access Information Export Export audit information for integration with SIEM tools 4 CLOUDERA NAVIGATOR 1.0 ACCESS SERVICE AUDIT LOG SERVICE VIEW PERMISSIONS HDFS AUDIT LOG CONFIG AUDIT LOG COLLECTION HBASE 3rd PARTY SIEM / GRC SYSTEM HIVE
  5. 5. Benefits of Cloudera Navigator 1.0 Control Visibility  Verify access permissions to files & directories  Report on data access by user and type Integration 5  Store sensitive data  Maintain full audit history  The first & only centralized audit tool for Hadoop  View permissions for LDAP/IAM users  Export audit data for integration with 3rd party SIEM tools
  6. 6. Navigator Subscription Data Management Layer for Hadoop Centralized audit management & access control 8x5 or 24x7 support CLOUDERA SUPPORT CLOUDERA NAVIGATOR CLOUDERA MANAGER CORE PROJECTS CLOUDERA MGR CLOUDERA NVGTR DATA AUDIT BASIC FEATURES IMPALA SEARCH ACCESS MGMT ADVANCED FEATURES CDH Optional add-on to Cloudera Enterprise subscription HBASE BACKUP & DR HBASE CORE PROJECTS IMPALA SEARCH Cloudera Enterprise 6 Navigator Subscription
  7. 7. Navigator 2.0 – Q1 2014 • Manage and explore your data with Cloudera Navigator 2.0 (Q1 2014) • • • Data Discovery (what data do we have?), Annotations/Tags Search, explore, define, and tag data sets. Important for: • • • • DBAs/Data Modelers Self-Service Business Analysts Data Scientists Data Lineage (where did the data come from? where is it used?) For files and tables, MR jobs, Hive queries, Impala queries, Pig scripts, Sqoop load/export. • Important for: • Risk and compliance audits. BI users facing 10K tables in HDFS. Which ones are relevant to the source data I need, or the table I’m looking at? • Data retention policies, where you need to purge not just the source data, but any data that’s been derived from it. • • 7
  8. 8. Navigator 2.0 - Lineage • • • • 8 Audit data access Verify access privileges Search meta data Visualize lineage
  9. 9. 9
  10. 10. 10