SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Bringing Trust and Visibility to
Apache Hadoop
Mark Donsky, Product Management, Cloudera
Chang She, Software Engineering, Cloudera
2© Cloudera, Inc. All rights reserved.
The benefits of Hadoop...
One place for unlimited data
• All types
• More sources
• Faster, larger ingestion
Unified, multi-framework data access
• More users
• More tools
• Faster changes
3© Cloudera, Inc. All rights reserved.
…Cause trust, visibility, and governance challenges
Business Users
How do I find what’s
relevant?
Can I trust what I find?
How can I explore data on
my own?
Information
Security
Who’s accessing what data?
What are they doing with
the data?
Is sensitive data governed
and protected?
Can I meet compliance
needs?
Database
Admins
How is data being used
today?
How can I optimize for
future workloads?
How can I take advantage
of Hadoop risk-free and
fast?
4© Cloudera, Inc. All rights reserved.
Building blocks of governance in Hadoop
Audit Logs Lineage Data Policies
Technical
Metadata
Business
Metadata
5© Cloudera, Inc. All rights reserved.
Metadata
6© Cloudera, Inc. All rights reserved.
Enterprise metadata
The foundation for governance
Metadata enables you to put context and meaning to data
Operational
Job Run-Time Stats
Report Run Information
Hardware Usage
Scheduler Stats
Database Schema
File Definition
ETL Job Design
BI Report Definition
Data Model
Technical
Business Glossary
Enterprise Taxonomy
Ontology
Business
Data Lineage
Impact Analysis
Topology Understanding
Data Governance
Compliance Audits
7© Cloudera, Inc. All rights reserved.
Enterprise metadata
The foundation for governance
Metadata enables you to put context and meaning to data to
answer the important questions
Business Technical Operational
Unified Metadata Repository
What data or information exists?
Where is data being used?
What is the data’s business definition?
Who is responsible for the data?
How is it inter-related to other data?
Who is using the data?
Why do we need this data?
Can we trust this data?
When was this data last updated?
Who are the high-value
customers?
How do we define that?
How is high value calculated?
Where is customer data stored
and used?
Is the data reliable and
accurate?
8© Cloudera, Inc. All rights reserved.
Technical metadata – what’s available?
Hive
Query Text
Table name
Column name
Data Type
Owner
Partitions
Pig
Script name
Owner
Creation date
Last modified date
HDFS
Permissions
Owner
Group
Creation date
Last modified date
MR/YARN
JobID
Mapper Class
Reducer Class
Inputs
Outputs
9© Cloudera, Inc. All rights reserved.
Technical metadata – where can I find it?
Component Metadata
HDFS fsimage (ls –lRa /)
Hive Hive Metastore Server (database metadata tables)
MapReduce JobTracker
YARN Job History Server
Oozie Oozie Server
Pig JobTracker, Job History Server
10© Cloudera, Inc. All rights reserved.
Technical metadata – Hive metastore
Collection of structured tables containing technical
metadata about Hive databases, tables, views, and columns
11© Cloudera, Inc. All rights reserved.
Technical metadata – HCatalog
• HCatalog uses the Hive Metastore to provide a management layer
• Abstracts the file location and storage format
• Makes formats available to Pig, Hive, MapReduce, etc.
• Also accessible via REST API
12© Cloudera, Inc. All rights reserved.
Business metadata – can we do this in Hadoop?
• Custom metadata is vital for trust and visibility
• Find all files associated with a particular clinical trial
• Locate all statements for high-profile customers
• Where is my sensitive data?
• Where is the protected health information?
• No - Hadoop doesn’t support business metadata
13© Cloudera, Inc. All rights reserved.
Hadoop Auditing
14© Cloudera, Inc. All rights reserved.
Hadoop audit logs – what do they look like?
• Logs all file system
access requests
• Impala, HBase and
other components use
a similar format
• Implemented in log4j
at the INFO level
{ "allowed": true,
"serviceName": "HDFS-1”,
"username": "training”,
"src": "/user”,
"eventTime": 1398544478141,
"ipAddress": "10.20.187.39”,
"operation": "getfileinfo”,
"dest": null,
"permissions": null,
"impersonator": null,
"delegationTokenId": null
}
{ "serviceName": "HIVE-1",
"username": "admin",
"impersonator": null,
"ipAddress": "10.20.187.39",
"operation": "QUERY",
"eventTime": 1398402718797,
"operationText": "select count(*) from salesdata",
"allowed": true,
"databaseName": "default",
"tableName": "salesdata",
"resourcePath": "/user/hive/warehouse/salesdata",
"objectType": "TABLE"
}
HDFS Audit Log Hive Audit Log
HDFS Property: Log4j.logger.org.apache.hadoop.hdfs.
server.namenode.FSNamesystem.audit
15© Cloudera, Inc. All rights reserved.
Hadoop audit logs – where can I find them?
Component Default Location (CDH)
HDFS Audit Logs /var/log/hadoop-hdfs/audit
Hive Audit Logs /var/log/hive/audit
Impala Audit Logs /var/log/impalad/audit
HBase Audit Logs /var/log/hbase/audit
• Log files are automatically rotated when a size limit is reached
• Location and size limit are configurable
16© Cloudera, Inc. All rights reserved.
Hadoop audit logs – limitations
• Consolidation
• Persistence
• Filtering
• Integration
17© Cloudera, Inc. All rights reserved.
Lineage
18© Cloudera, Inc. All rights reserved.
Lineage – how to track lineage
• You can’t do this easily – you used to need to track this manually unless you’re
using a tool like Cloudera Navigator
• But…lineage is embedded in Hadoop technical metadata
• Job configurations provide inputs/outputs
• Hive metastore provides location of HDFS directory where data resides
• Hive/Impala queries can be interpreted to provide fine-grained column-level
lineage between query input-output
• Some relationships (e.g., directory–file) are implicit
19© Cloudera, Inc. All rights reserved.
Data Policies
20© Cloudera, Inc. All rights reserved.
Data policies – Hadoop limitations
• Information is of limited use unless it is actionable
• There is a treasure trove of actionable information in the metadata that the various
Hadoop services emit
• Archival of unused data
• Encryption of sensitive data
• Remediation of incorrect permissions
• Triggers should be configurable based on user-defined criteria
• Hadoop does not offer a sufficient policy engine or action framework
21© Cloudera, Inc. All rights reserved.
Building blocks of trust and visibility in Hadoop
Audit Logs Lineage Data Policies
Technical
Metadata
Business
Metadata
22© Cloudera, Inc. All rights reserved.
Cloudera Navigator
Overview & Demo
23© Cloudera, Inc. All rights reserved.
Cloudera Navigator
The only integrated data management and governance platform for Hadoop
Governance & Foundational Layer
Business Metadata Technical Metadata Lineage Policies Audit Logs
Self-Service
Discovery & Analytics
Data Scientists & BI Users
Effortlessly find and trust the data
that matters most
Search
Data definitions
Analytics
Profiling
Usage-Driven
Model Optimization
Hadoop Administrators & DBAs
Configure Hadoop to boost user
productivity
Migration
Optimization
Reporting
Model maintenance
Compliance-Ready
Governance & Protection
Information Security
Track, understand and protect
access to sensitive data
Auditing
Lineage
Encryption
Key management
Active Data Management &
Information Lifecycle
Management
Data Stewards & Curators
Maximize cluster performance at
Hadoop scale with ease
Classification
Stewardship
Backup
Retention
24© Cloudera, Inc. All rights reserved.
Trust and visibility is an ecosystem
Data
Systems
Enterprise Data Hub
Security and Administration
Unlimited Storage
Process Discover Model Serve
System Integration
Infrastructure
More than 1,600 partners
ensure compatibility with existing
investments, lower skill barriers, and
help maximize value from your data.
Operational
Tools
Applications
25© Cloudera, Inc. All rights reserved.
Learn more!
Please stop by our
booth at P13
• See a demo of Cloudera Enterprise,
including our governance solution
that’s used by nearly 200 production
customers for over two years!
• Find out what makes Cloudera
Enterprise the only PCI-certified
Hadoop distribution
• Learn about our 1600+ partner
ecosystem
26© Cloudera, Inc. All rights reserved.
Thank You!
@markdonsky
@changhiskhan

More Related Content

What's hot

Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Hortonworks
 
Voltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in HadoopVoltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in Hadoop
HPE Security - Data Security
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
DataWorks Summit/Hadoop Summit
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
Hortonworks
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group
Hortonworks
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
DataWorks Summit
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
DataWorks Summit/Hadoop Summit
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Sean Roberts
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
Hortonworks
 

What's hot (20)

Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Voltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in HadoopVoltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in Hadoop
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 

Viewers also liked

Optimizing the image analyst's workflow for the United States Air Force
Optimizing the image analyst's workflow for the United States Air ForceOptimizing the image analyst's workflow for the United States Air Force
Optimizing the image analyst's workflow for the United States Air Force
BAE Systems Geospatial eXploitation Products (GXP)
 
Maintaining the Flex in Flexibility in a contentious South African Labour Market
Maintaining the Flex in Flexibility in a contentious South African Labour MarketMaintaining the Flex in Flexibility in a contentious South African Labour Market
Maintaining the Flex in Flexibility in a contentious South African Labour Market
Lwazi Leroy Sibisi
 
Innovation in Renovation - Airedale International Air Conditioning, presented...
Innovation in Renovation - Airedale International Air Conditioning, presented...Innovation in Renovation - Airedale International Air Conditioning, presented...
Innovation in Renovation - Airedale International Air Conditioning, presented...
Airedale International Air Conditioning Ltd
 
Calstart fuel cell bus review short v3
Calstart fuel cell bus review short v3Calstart fuel cell bus review short v3
Calstart fuel cell bus review short v3
CALSTART
 
Moving beyond Big Data, BAE Systems Detica
Moving beyond Big Data, BAE Systems Detica Moving beyond Big Data, BAE Systems Detica
Moving beyond Big Data, BAE Systems Detica
Internet World
 
APM Conference Manchester: What have military aircraft done for the Northwest...
APM Conference Manchester: What have military aircraft done for the Northwest...APM Conference Manchester: What have military aircraft done for the Northwest...
APM Conference Manchester: What have military aircraft done for the Northwest...
Association for Project Management
 
BAE Systems IFF Program Overview
BAE Systems IFF Program OverviewBAE Systems IFF Program Overview
BAE Systems IFF Program Overview
William Banfi
 
Fral fdnf44 spec sheet
Fral fdnf44 spec sheetFral fdnf44 spec sheet
Fral fdnf44 spec sheet
moisturecare
 
Darwin Melgar CV November 2016
Darwin Melgar CV November 2016Darwin Melgar CV November 2016
Darwin Melgar CV November 2016
Darwin Melgar
 
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
The Southern African Centre for Collaboration on Peace and Security (SACCPS)
 
Maintenance Secretary
Maintenance SecretaryMaintenance Secretary
Maintenance Secretary
Darwin Melgar
 
Cytochrome c Oxidase Jan06
Cytochrome c Oxidase Jan06Cytochrome c Oxidase Jan06
Cytochrome c Oxidase Jan06
University of Minnesota Rochester
 
Cultural Alignment Post Merger Linked In
Cultural Alignment Post Merger Linked InCultural Alignment Post Merger Linked In
Cultural Alignment Post Merger Linked In
cindyhardy
 
05 controller erc & ak rc 10x optyma controller
05 controller erc & ak rc 10x   optyma controller05 controller erc & ak rc 10x   optyma controller
05 controller erc & ak rc 10x optyma controller
maldini all
 

Viewers also liked (14)

Optimizing the image analyst's workflow for the United States Air Force
Optimizing the image analyst's workflow for the United States Air ForceOptimizing the image analyst's workflow for the United States Air Force
Optimizing the image analyst's workflow for the United States Air Force
 
Maintaining the Flex in Flexibility in a contentious South African Labour Market
Maintaining the Flex in Flexibility in a contentious South African Labour MarketMaintaining the Flex in Flexibility in a contentious South African Labour Market
Maintaining the Flex in Flexibility in a contentious South African Labour Market
 
Innovation in Renovation - Airedale International Air Conditioning, presented...
Innovation in Renovation - Airedale International Air Conditioning, presented...Innovation in Renovation - Airedale International Air Conditioning, presented...
Innovation in Renovation - Airedale International Air Conditioning, presented...
 
Calstart fuel cell bus review short v3
Calstart fuel cell bus review short v3Calstart fuel cell bus review short v3
Calstart fuel cell bus review short v3
 
Moving beyond Big Data, BAE Systems Detica
Moving beyond Big Data, BAE Systems Detica Moving beyond Big Data, BAE Systems Detica
Moving beyond Big Data, BAE Systems Detica
 
APM Conference Manchester: What have military aircraft done for the Northwest...
APM Conference Manchester: What have military aircraft done for the Northwest...APM Conference Manchester: What have military aircraft done for the Northwest...
APM Conference Manchester: What have military aircraft done for the Northwest...
 
BAE Systems IFF Program Overview
BAE Systems IFF Program OverviewBAE Systems IFF Program Overview
BAE Systems IFF Program Overview
 
Fral fdnf44 spec sheet
Fral fdnf44 spec sheetFral fdnf44 spec sheet
Fral fdnf44 spec sheet
 
Darwin Melgar CV November 2016
Darwin Melgar CV November 2016Darwin Melgar CV November 2016
Darwin Melgar CV November 2016
 
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
 
Maintenance Secretary
Maintenance SecretaryMaintenance Secretary
Maintenance Secretary
 
Cytochrome c Oxidase Jan06
Cytochrome c Oxidase Jan06Cytochrome c Oxidase Jan06
Cytochrome c Oxidase Jan06
 
Cultural Alignment Post Merger Linked In
Cultural Alignment Post Merger Linked InCultural Alignment Post Merger Linked In
Cultural Alignment Post Merger Linked In
 
05 controller erc & ak rc 10x optyma controller
05 controller erc & ak rc 10x   optyma controller05 controller erc & ak rc 10x   optyma controller
05 controller erc & ak rc 10x optyma controller
 

Similar to Bringing Trus and Visibility to Apache Hadoop

大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
Jianwei Li
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
Cloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Cloudera, Inc.
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
Niel Dunnage
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
GoDataDriven
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
Cloudera, Inc.
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Inside Analysis
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Cloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 

Similar to Bringing Trus and Visibility to Apache Hadoop (20)

大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 

Bringing Trus and Visibility to Apache Hadoop

  • 1. 1© Cloudera, Inc. All rights reserved. Bringing Trust and Visibility to Apache Hadoop Mark Donsky, Product Management, Cloudera Chang She, Software Engineering, Cloudera
  • 2. 2© Cloudera, Inc. All rights reserved. The benefits of Hadoop... One place for unlimited data • All types • More sources • Faster, larger ingestion Unified, multi-framework data access • More users • More tools • Faster changes
  • 3. 3© Cloudera, Inc. All rights reserved. …Cause trust, visibility, and governance challenges Business Users How do I find what’s relevant? Can I trust what I find? How can I explore data on my own? Information Security Who’s accessing what data? What are they doing with the data? Is sensitive data governed and protected? Can I meet compliance needs? Database Admins How is data being used today? How can I optimize for future workloads? How can I take advantage of Hadoop risk-free and fast?
  • 4. 4© Cloudera, Inc. All rights reserved. Building blocks of governance in Hadoop Audit Logs Lineage Data Policies Technical Metadata Business Metadata
  • 5. 5© Cloudera, Inc. All rights reserved. Metadata
  • 6. 6© Cloudera, Inc. All rights reserved. Enterprise metadata The foundation for governance Metadata enables you to put context and meaning to data Operational Job Run-Time Stats Report Run Information Hardware Usage Scheduler Stats Database Schema File Definition ETL Job Design BI Report Definition Data Model Technical Business Glossary Enterprise Taxonomy Ontology Business Data Lineage Impact Analysis Topology Understanding Data Governance Compliance Audits
  • 7. 7© Cloudera, Inc. All rights reserved. Enterprise metadata The foundation for governance Metadata enables you to put context and meaning to data to answer the important questions Business Technical Operational Unified Metadata Repository What data or information exists? Where is data being used? What is the data’s business definition? Who is responsible for the data? How is it inter-related to other data? Who is using the data? Why do we need this data? Can we trust this data? When was this data last updated? Who are the high-value customers? How do we define that? How is high value calculated? Where is customer data stored and used? Is the data reliable and accurate?
  • 8. 8© Cloudera, Inc. All rights reserved. Technical metadata – what’s available? Hive Query Text Table name Column name Data Type Owner Partitions Pig Script name Owner Creation date Last modified date HDFS Permissions Owner Group Creation date Last modified date MR/YARN JobID Mapper Class Reducer Class Inputs Outputs
  • 9. 9© Cloudera, Inc. All rights reserved. Technical metadata – where can I find it? Component Metadata HDFS fsimage (ls –lRa /) Hive Hive Metastore Server (database metadata tables) MapReduce JobTracker YARN Job History Server Oozie Oozie Server Pig JobTracker, Job History Server
  • 10. 10© Cloudera, Inc. All rights reserved. Technical metadata – Hive metastore Collection of structured tables containing technical metadata about Hive databases, tables, views, and columns
  • 11. 11© Cloudera, Inc. All rights reserved. Technical metadata – HCatalog • HCatalog uses the Hive Metastore to provide a management layer • Abstracts the file location and storage format • Makes formats available to Pig, Hive, MapReduce, etc. • Also accessible via REST API
  • 12. 12© Cloudera, Inc. All rights reserved. Business metadata – can we do this in Hadoop? • Custom metadata is vital for trust and visibility • Find all files associated with a particular clinical trial • Locate all statements for high-profile customers • Where is my sensitive data? • Where is the protected health information? • No - Hadoop doesn’t support business metadata
  • 13. 13© Cloudera, Inc. All rights reserved. Hadoop Auditing
  • 14. 14© Cloudera, Inc. All rights reserved. Hadoop audit logs – what do they look like? • Logs all file system access requests • Impala, HBase and other components use a similar format • Implemented in log4j at the INFO level { "allowed": true, "serviceName": "HDFS-1”, "username": "training”, "src": "/user”, "eventTime": 1398544478141, "ipAddress": "10.20.187.39”, "operation": "getfileinfo”, "dest": null, "permissions": null, "impersonator": null, "delegationTokenId": null } { "serviceName": "HIVE-1", "username": "admin", "impersonator": null, "ipAddress": "10.20.187.39", "operation": "QUERY", "eventTime": 1398402718797, "operationText": "select count(*) from salesdata", "allowed": true, "databaseName": "default", "tableName": "salesdata", "resourcePath": "/user/hive/warehouse/salesdata", "objectType": "TABLE" } HDFS Audit Log Hive Audit Log HDFS Property: Log4j.logger.org.apache.hadoop.hdfs. server.namenode.FSNamesystem.audit
  • 15. 15© Cloudera, Inc. All rights reserved. Hadoop audit logs – where can I find them? Component Default Location (CDH) HDFS Audit Logs /var/log/hadoop-hdfs/audit Hive Audit Logs /var/log/hive/audit Impala Audit Logs /var/log/impalad/audit HBase Audit Logs /var/log/hbase/audit • Log files are automatically rotated when a size limit is reached • Location and size limit are configurable
  • 16. 16© Cloudera, Inc. All rights reserved. Hadoop audit logs – limitations • Consolidation • Persistence • Filtering • Integration
  • 17. 17© Cloudera, Inc. All rights reserved. Lineage
  • 18. 18© Cloudera, Inc. All rights reserved. Lineage – how to track lineage • You can’t do this easily – you used to need to track this manually unless you’re using a tool like Cloudera Navigator • But…lineage is embedded in Hadoop technical metadata • Job configurations provide inputs/outputs • Hive metastore provides location of HDFS directory where data resides • Hive/Impala queries can be interpreted to provide fine-grained column-level lineage between query input-output • Some relationships (e.g., directory–file) are implicit
  • 19. 19© Cloudera, Inc. All rights reserved. Data Policies
  • 20. 20© Cloudera, Inc. All rights reserved. Data policies – Hadoop limitations • Information is of limited use unless it is actionable • There is a treasure trove of actionable information in the metadata that the various Hadoop services emit • Archival of unused data • Encryption of sensitive data • Remediation of incorrect permissions • Triggers should be configurable based on user-defined criteria • Hadoop does not offer a sufficient policy engine or action framework
  • 21. 21© Cloudera, Inc. All rights reserved. Building blocks of trust and visibility in Hadoop Audit Logs Lineage Data Policies Technical Metadata Business Metadata
  • 22. 22© Cloudera, Inc. All rights reserved. Cloudera Navigator Overview & Demo
  • 23. 23© Cloudera, Inc. All rights reserved. Cloudera Navigator The only integrated data management and governance platform for Hadoop Governance & Foundational Layer Business Metadata Technical Metadata Lineage Policies Audit Logs Self-Service Discovery & Analytics Data Scientists & BI Users Effortlessly find and trust the data that matters most Search Data definitions Analytics Profiling Usage-Driven Model Optimization Hadoop Administrators & DBAs Configure Hadoop to boost user productivity Migration Optimization Reporting Model maintenance Compliance-Ready Governance & Protection Information Security Track, understand and protect access to sensitive data Auditing Lineage Encryption Key management Active Data Management & Information Lifecycle Management Data Stewards & Curators Maximize cluster performance at Hadoop scale with ease Classification Stewardship Backup Retention
  • 24. 24© Cloudera, Inc. All rights reserved. Trust and visibility is an ecosystem Data Systems Enterprise Data Hub Security and Administration Unlimited Storage Process Discover Model Serve System Integration Infrastructure More than 1,600 partners ensure compatibility with existing investments, lower skill barriers, and help maximize value from your data. Operational Tools Applications
  • 25. 25© Cloudera, Inc. All rights reserved. Learn more! Please stop by our booth at P13 • See a demo of Cloudera Enterprise, including our governance solution that’s used by nearly 200 production customers for over two years! • Find out what makes Cloudera Enterprise the only PCI-certified Hadoop distribution • Learn about our 1600+ partner ecosystem
  • 26. 26© Cloudera, Inc. All rights reserved. Thank You! @markdonsky @changhiskhan

Editor's Notes

  1. Cloudera partners more broadly and deeply across the Hadoop ecosystem than any other vendor. With over 1200 partners and counting, our partnerships offer: Compatibility with your existing tools and skills 160+ certified on Cloudera 5, including all 12 of the 12 Gartner Business Intelligence Magic Quadrant leaders Flexible deployment options On-premises Public, private, or hybrid cloud Appliances and engineered systems Partnerships you can trust Deep engineering relationships Comprehensive certification program