SlideShare a Scribd company logo
© Cloudera, Inc. All rights reserved. 1
Risk Management for Data
Eddie Garcia
Chief Security Architect
© Cloudera, Inc. All rights reserved. 2
One Platform, Many Workloads
Batch, Interactive,
and Real-Time.
Leading performance and
usability in one platform.
• End-to-end analytic workflows
• Access more data
• Work with data in new ways
• Enable new users
Security and Administration
Process
Ingest
Sqoop, Flume,
Kafka
Transform
MapReduce, Hive,
Pig, Spark
Discover
Analytic Database
Impala
Search
Solr
Model
Machine Learning
SAS, R, Spark,
Mahout
Serve
NoSQL Database
HBase
Streaming
Spark Streaming
Unlimited Storage HDFS, HBase
YARN, Cloudera Manager,
Cloudera Navigator
© Cloudera, Inc. All rights reserved. 3
Comprehensive, Compliance-Ready Security
Authentication, Authorization, Audit, and Compliance
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Access
Defining what users
and applications can do
with data
Technical Concepts:
Authorization
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
Cloudera Manager Apache Sentry Cloudera Navigator
Navigator Encrypt & Key
Trustee | Partners
© Cloudera, Inc. All rights reserved. 4
What is HDFS encryption and why
do I care?
© Cloudera, Inc. All rights reserved. 5
Background
• Our customers are increasingly wanting to use HDFS to store sensitive data
• Customers often are mandated to protect data at rest
• PCI
• HIPAA
• National Security
• Company confidential
• Encryption of data at rest helps mitigate certain security threats
• Rogue administrators (insider threat)
• Compromised accounts (masquerade attacks)
• Lost/stolen hard drives
© Cloudera, Inc. All rights reserved. 6
Encryption, core concepts and
terminology
© Cloudera, Inc. All rights reserved. 7
Encryption and Cryptography
• AES: is a common cryptographic algorithm to encrypt data. Larger
key sizes are more secure. HDFS Encryption currently supports AES
128 (default) and AES 256 bit Keys
• RSA: is a cryptographic algorithm, commonly used to
communicate and exchange encryption keys. Common RSA key
sizes are 1024, 2048, 4096 bits.
• GPG: is a open source implementation of PGP that uses encryption
to secure data exchange
© Cloudera, Inc. All rights reserved. 8
Here a key, there a key, everywhere a key key
• Keys: are used to encrypt data
• Key encryption keys: used to encrypt other keys
• Public keys: are used to encrypt data but cannot decrypt
• Private keys: decrypt data that was encrypted with a public key
HDFS Encryption, Key Trustee and Navigator encrypt use different
combinations of these keys.
© Cloudera, Inc. All rights reserved. 9
HDFS Encryption Zones and Keys
• Encryption Zone Keys (EZ Keys): HDFS Encryption uses a randomly
generated AES key to protect Encryption Zones
• Data Encryption Keys (DEKs): are generated automatically to
encrypt any new file created in the Encryption Zone
• Encrypted Data Encryption Keys (EDEKs): are DEKs that have been
encrypted with an EZ Key. Encryption and decryption of EDEKs
happens entirely on the KMS
© Cloudera, Inc. All rights reserved. 10
Key Trustee and Hardware Security Module Keys
• Key Trustee Client GPG Key: Each client to the Key Trustee server
has a unique GPG Keypair
• Key Trustee Server GPG Key: the key trustee server also has a
unique GPG keypair
• HSM Keys: are private keys that never leave the HSM. Are used for
root of trust by encrypting keys stored in Key Trustee. HSMs are
common in Financial Services and Government, not so common in
other industries.
© Cloudera, Inc. All rights reserved. 11
KMS and Key Trustee what’s the
difference?
© Cloudera, Inc. All rights reserved. 12
Key Management Service (KMS)
• When encrypting any data it is important to securely store your encryption
keys away from the encrypted data
• KMS is a Key Management Service for HDFS Encryption to store and retrieve
encryption keys
• KMS is open source and provides a standard interface for pluggable key
providers
• The default key provider for KMS is the Java Key Store
• The Java Key Store is not recommended for production key management, it
is meant for development and testing
© Cloudera, Inc. All rights reserved. 13
Navigator Key Trustee
• Navigator Key Trustee provides secure, centralized and scalable key storage
and administration
• Is not open source and licensed with Cloudera Navigator
• Is the recommended option for production deployments
• Provides the hooks to integrate with Hardware Security Modules for
physically tamper proof requirements (FIPS 140-2 level 3)
• Also provides centralized Key Management for Navigator Encrypt
© Cloudera, Inc. All rights reserved. 14
KMS and Navigator Key Trustee
• KMS acts as a proxy to Navigator Key Trustee, KMS is implemented for high
performance, number of transactions and provides caching
• KMS understands Hadoop authentication and Kerberos
• KMS Access Controls Lists provides mechanisms to limit the Hadoop Users
and Groups that have access to keys
• KMS allows blacklisting of users and groups including super users like HDFS
• KMS creation separation of duties to protect against rogue privileged
administrators
© Cloudera, Inc. All rights reserved. 15
Key Trustee and KMS
© Cloudera, Inc. All rights reserved. 16
© Cloudera, Inc. All rights reserved. 17
© Cloudera, Inc. All rights reserved. 18
Pre-requisites and pre-planning
before you encrypt your cluster
© Cloudera, Inc. All rights reserved. 19
Planning for Encryption and Key Management
• It is always best to setup encryption before you load data (when
possible)
• Know where the sensitive data will reside
• Know which processes will need to access the encrypted data
• Plan for extra Key Trustee and KMS servers in the cluster
• When encrypting existing data, you will need extra free storage
• Encrypting data can take time, the larger the data set the longer it
will take
© Cloudera, Inc. All rights reserved. 20
Key Trustee system requirements
• Recommended configuration
• 2 dedicated Key Trustee Servers (or VM)* per Hadoop Cluster
• 2 dedicated KMS Servers (or VM)* per Hadoop Cluster
• Key Trustee
• Processor: 1 GHz 64-bit quad core
• Memory: 8 GB of RAM
• Storage: 20 GB on moderate to high-performance disk drives
• RHEL-compatible x64 6.4, 6.5, 6.6
• Requires the default umask of 0022.
*VMs not recommended for production, ok for dev/test
© Cloudera, Inc. All rights reserved. 21
Integrating with HSMs
• Requires the HSM vendor supplied jars / libraries
• Requires KeyHSM, a Key Trustee service that creates the binding
between Key Trustee and the HSM
• Supports the major HSMs
• SafeNet Luna
• SafeNet DataSecure, KeySecure
• Thales nCipher
• Account for additional time for the HSM administrator to
provision the accounts, software, network connectivity
© Cloudera, Inc. All rights reserved. 22
Security Reference Architecture
© Cloudera, Inc. All rights reserved. 23
HDFS Encrypt + Navigator Encrypt + Key Trustee
© Cloudera, Inc. All rights reserved. 24
Encryption Options and Where to Use Them
Data Location Enterprise
Key Management with
optional HSM
Protection against
server or disk
theft
Separation of Duties Option
(Protection against IT Admin*)
What to deploy (in addition
to KTS)
HDFS Navigator Key Trustee Yes Yes. Access to keys determined by
whitelists & blacklists. Hadoop admin
can be denied.*
HDFS Encryption (with KMS
proxy)
Metadata databases -
on postgres and mysql - Hive
Metastore, CM, Navigator,
Sqoop2
Navigator Key Trustee Yes Yes. Protection from root/linux admins -
Process-Based Access Control Lists
(PBACLs) ensure that only the database
processes can decrypt data
Navigator Encrypt using the
block-level (dmcrypt option);
PBACLs
Temp/Spill files - for CDH
components with native spill
file encryption: Impala, MR,
HBase, Accumulo, Flume
Not Needed (temporary keys
are maintained in memory
only)
Yes Yes. Protection from root/linux admins -
only the component process can access
the key
Enable native temp/spill
encryption for each of these
components (see CDH
documentation)
Temp/Spill files - for
components that don’t offer
their own temp encryption:
Spark, Kafka, Sqoop2, Hive
Server2 (map-side joins;
workaround: turn this off)
Navigator Key Trustee Yes Navigator Encrypt using the
block-level (dmcrypt option)
When a full block device is
not available for dmcrypt, a
loop file can be used instead.
Log files Not Needed Yes Yes - sensitive data deleted before
being saved to disk
Log Redaction (new feature in
CM 5.4)
© Cloudera, Inc. All rights reserved. 25
New in 5.5
Key Trustee admin
role
© Cloudera, Inc. All rights reserved. 26
HDFS Encryption Workflow
© Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
27
HDFS Encryption, Involved Parties
HDFS
KMS Key Trustee
zHSM
HSM
Client
optional
Key authorization
File authorization
© Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
28
HDFS Encryption, Writing a File
HDFS
KMS
Client
To Trustee
2 3
6
7
1
5
8
1. create file
2. generate key
3. encrypted key
4. store encrypted
key
5. file handle &
encrypted key
6. decrypt
encrypted key
7. decrypted key
8. encrypt & write data
4
© Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
29
HDFS Encryption, Reading a File
HDFS
KMS
Client
To Trustee
3
4
1
2
5
1. open file (passed read
permission check)
2. file handle &
encrypted key
3. decrypt
encrypted key
4. decrypted key
5. read & decrypt data
© Cloudera, Inc. All rights reserved. 30
HDFS Encryption Implementation
and Usage
© Cloudera, Inc. All rights reserved. 31
Enabling HDFS Encryption on a Cluster
• Need recent version of libcrypto.so on HDFS and MapReduce client hosts
• To check use the following command: hadoop checknative
Output
openssl: true /usr/lib64/libcrypto.so
• yum install openssl openssl-devel
• openssl package installs the library, openssl-devel creates the libcrypto.so symlink
(you can manually create this as well)
• Openssl provides AES-NI integration for Intel hardware
© Cloudera, Inc. All rights reserved. 32
Enabling HDFS Encryption on a Cluster
Using Cloudera Manager
1) Adding the KMS Service - add service Java KeyStore KMS on a host
2) Enabling Java KeyStore KMS for the HDFS Service
• HDFS service – configuration tab
• Scope > HDFS (Service-Wide)
• Category > All
• KMS Service property – turn on radio button
SAVE CHANGES
Restart Cluster
Deploy Client Configuration.
© Cloudera, Inc. All rights reserved. 33
Creating Encryption Zones
• Use the hadoop key and hdfs crypto command-line tools to create encryption keys and set
up new encryption zones.
# Create an encryption key for your zone as the application user that will be using the key
$ hadoop key create myKey
# Create a new empty directory and make it an encryption zone
$ hadoop fs -mkdir /zone
$ hdfs crypto -createZone -keyName myKey -path /zone
# To see the key zones
$ hdfs crypto –listZones
© Cloudera, Inc. All rights reserved. 34
Adding Files to an Encryption Zones
Remember they start empty! You cannot create a Zone in directories with data
hadoop distcp /user/dir /user/enczone
• By default, distcp compares checksums provided by the filesystem to verify that data
was successfully copied to the destination.
• When copying between an unencrypted and encrypted location, the filesystem
checksums will not match since the underlying block data is different.
• Use -skipcrccheck and -update flags to avoid verifying checksums.
• Also use the distcp flags to preserve all attributes (-prbugpcaxt)
© Cloudera, Inc. All rights reserved. 35
HDFS Encryption Zones for CDH
• Our docs have great information!
• Hive - /user/hive
• HBase - /hbase
• Solr - /solr
• Hue - /user/hue
• Spark - /user/spark/applicationHistory
• YARN - /user/history
• http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topi
cs/cdh_sg_component_kms.html
© Cloudera, Inc. All rights reserved. 36
Thank you

More Related Content

What's hot

Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
Cloudera, Inc.
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
Cloudera, Inc.
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Cloudera, Inc.
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
Jianwei Li
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_Webinar
Sean Spediacci
 
Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game Changers
Cloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
Making Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseMaking Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the Enterprise
Cloudera, Inc.
 
快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu
Jianwei Li
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera, Inc.
 

What's hot (20)

Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_Webinar
 
Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game Changers
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Making Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseMaking Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the Enterprise
 
快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 

Viewers also liked

KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...
KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...
KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...
JISC KeepIt project
 
Risk Management in Data Analysis
Risk Management in Data AnalysisRisk Management in Data Analysis
Risk Management in Data Analysis
David Lee
 
Diaku Axon for BCBS239 compliance
Diaku Axon for BCBS239 complianceDiaku Axon for BCBS239 compliance
Diaku Axon for BCBS239 compliance
DariusDiaku
 
Hadoop security landscape
Hadoop security landscapeHadoop security landscape
Hadoop security landscape
Sujee Maniyam
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
Niranjan Pandey
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
tcloudcomputing-tw
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
Cloudera, Inc.
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Shivaji Dutta
 
Success by integrating risk management in data governance
Success by integrating risk management in data governanceSuccess by integrating risk management in data governance
Success by integrating risk management in data governance
Tejasvi Addagada, CBAP
 

Viewers also liked (10)

KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...
KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...
KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...
 
Risk Management in Data Analysis
Risk Management in Data AnalysisRisk Management in Data Analysis
Risk Management in Data Analysis
 
Diaku Axon for BCBS239 compliance
Diaku Axon for BCBS239 complianceDiaku Axon for BCBS239 compliance
Diaku Axon for BCBS239 compliance
 
Hadoop security landscape
Hadoop security landscapeHadoop security landscape
Hadoop security landscape
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Success by integrating risk management in data governance
Success by integrating risk management in data governanceSuccess by integrating risk management in data governance
Success by integrating risk management in data governance
 

Similar to Risk Management for Data: Secured and Governed

Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
Cloudera, Inc.
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
Cloudera, Inc.
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
DataWorks Summit
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
Cloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
Managing your secrets in a cloud environment
Managing your secrets in a cloud environmentManaging your secrets in a cloud environment
Managing your secrets in a cloud environment
Taswar Bhatti
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
lee tracie
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Big Data Spain
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubDataWorks Summit
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
Cloudera, Inc.
 
Deep Dive: AWS CloudHSM (Classic)
Deep Dive: AWS CloudHSM (Classic)Deep Dive: AWS CloudHSM (Classic)
Deep Dive: AWS CloudHSM (Classic)
Amazon Web Services
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Yahoo!デベロッパーネットワーク
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
GoDataDriven
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 

Similar to Risk Management for Data: Secured and Governed (20)

Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Managing your secrets in a cloud environment
Managing your secrets in a cloud environmentManaging your secrets in a cloud environment
Managing your secrets in a cloud environment
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Deep Dive: AWS CloudHSM (Classic)
Deep Dive: AWS CloudHSM (Classic)Deep Dive: AWS CloudHSM (Classic)
Deep Dive: AWS CloudHSM (Classic)
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 

Recently uploaded (20)

Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 

Risk Management for Data: Secured and Governed

  • 1. © Cloudera, Inc. All rights reserved. 1 Risk Management for Data Eddie Garcia Chief Security Architect
  • 2. © Cloudera, Inc. All rights reserved. 2 One Platform, Many Workloads Batch, Interactive, and Real-Time. Leading performance and usability in one platform. • End-to-end analytic workflows • Access more data • Work with data in new ways • Enable new users Security and Administration Process Ingest Sqoop, Flume, Kafka Transform MapReduce, Hive, Pig, Spark Discover Analytic Database Impala Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase YARN, Cloudera Manager, Cloudera Navigator
  • 3. © Cloudera, Inc. All rights reserved. 3 Comprehensive, Compliance-Ready Security Authentication, Authorization, Audit, and Compliance Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Access Defining what users and applications can do with data Technical Concepts: Authorization Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry Cloudera Navigator Navigator Encrypt & Key Trustee | Partners
  • 4. © Cloudera, Inc. All rights reserved. 4 What is HDFS encryption and why do I care?
  • 5. © Cloudera, Inc. All rights reserved. 5 Background • Our customers are increasingly wanting to use HDFS to store sensitive data • Customers often are mandated to protect data at rest • PCI • HIPAA • National Security • Company confidential • Encryption of data at rest helps mitigate certain security threats • Rogue administrators (insider threat) • Compromised accounts (masquerade attacks) • Lost/stolen hard drives
  • 6. © Cloudera, Inc. All rights reserved. 6 Encryption, core concepts and terminology
  • 7. © Cloudera, Inc. All rights reserved. 7 Encryption and Cryptography • AES: is a common cryptographic algorithm to encrypt data. Larger key sizes are more secure. HDFS Encryption currently supports AES 128 (default) and AES 256 bit Keys • RSA: is a cryptographic algorithm, commonly used to communicate and exchange encryption keys. Common RSA key sizes are 1024, 2048, 4096 bits. • GPG: is a open source implementation of PGP that uses encryption to secure data exchange
  • 8. © Cloudera, Inc. All rights reserved. 8 Here a key, there a key, everywhere a key key • Keys: are used to encrypt data • Key encryption keys: used to encrypt other keys • Public keys: are used to encrypt data but cannot decrypt • Private keys: decrypt data that was encrypted with a public key HDFS Encryption, Key Trustee and Navigator encrypt use different combinations of these keys.
  • 9. © Cloudera, Inc. All rights reserved. 9 HDFS Encryption Zones and Keys • Encryption Zone Keys (EZ Keys): HDFS Encryption uses a randomly generated AES key to protect Encryption Zones • Data Encryption Keys (DEKs): are generated automatically to encrypt any new file created in the Encryption Zone • Encrypted Data Encryption Keys (EDEKs): are DEKs that have been encrypted with an EZ Key. Encryption and decryption of EDEKs happens entirely on the KMS
  • 10. © Cloudera, Inc. All rights reserved. 10 Key Trustee and Hardware Security Module Keys • Key Trustee Client GPG Key: Each client to the Key Trustee server has a unique GPG Keypair • Key Trustee Server GPG Key: the key trustee server also has a unique GPG keypair • HSM Keys: are private keys that never leave the HSM. Are used for root of trust by encrypting keys stored in Key Trustee. HSMs are common in Financial Services and Government, not so common in other industries.
  • 11. © Cloudera, Inc. All rights reserved. 11 KMS and Key Trustee what’s the difference?
  • 12. © Cloudera, Inc. All rights reserved. 12 Key Management Service (KMS) • When encrypting any data it is important to securely store your encryption keys away from the encrypted data • KMS is a Key Management Service for HDFS Encryption to store and retrieve encryption keys • KMS is open source and provides a standard interface for pluggable key providers • The default key provider for KMS is the Java Key Store • The Java Key Store is not recommended for production key management, it is meant for development and testing
  • 13. © Cloudera, Inc. All rights reserved. 13 Navigator Key Trustee • Navigator Key Trustee provides secure, centralized and scalable key storage and administration • Is not open source and licensed with Cloudera Navigator • Is the recommended option for production deployments • Provides the hooks to integrate with Hardware Security Modules for physically tamper proof requirements (FIPS 140-2 level 3) • Also provides centralized Key Management for Navigator Encrypt
  • 14. © Cloudera, Inc. All rights reserved. 14 KMS and Navigator Key Trustee • KMS acts as a proxy to Navigator Key Trustee, KMS is implemented for high performance, number of transactions and provides caching • KMS understands Hadoop authentication and Kerberos • KMS Access Controls Lists provides mechanisms to limit the Hadoop Users and Groups that have access to keys • KMS allows blacklisting of users and groups including super users like HDFS • KMS creation separation of duties to protect against rogue privileged administrators
  • 15. © Cloudera, Inc. All rights reserved. 15 Key Trustee and KMS
  • 16. © Cloudera, Inc. All rights reserved. 16
  • 17. © Cloudera, Inc. All rights reserved. 17
  • 18. © Cloudera, Inc. All rights reserved. 18 Pre-requisites and pre-planning before you encrypt your cluster
  • 19. © Cloudera, Inc. All rights reserved. 19 Planning for Encryption and Key Management • It is always best to setup encryption before you load data (when possible) • Know where the sensitive data will reside • Know which processes will need to access the encrypted data • Plan for extra Key Trustee and KMS servers in the cluster • When encrypting existing data, you will need extra free storage • Encrypting data can take time, the larger the data set the longer it will take
  • 20. © Cloudera, Inc. All rights reserved. 20 Key Trustee system requirements • Recommended configuration • 2 dedicated Key Trustee Servers (or VM)* per Hadoop Cluster • 2 dedicated KMS Servers (or VM)* per Hadoop Cluster • Key Trustee • Processor: 1 GHz 64-bit quad core • Memory: 8 GB of RAM • Storage: 20 GB on moderate to high-performance disk drives • RHEL-compatible x64 6.4, 6.5, 6.6 • Requires the default umask of 0022. *VMs not recommended for production, ok for dev/test
  • 21. © Cloudera, Inc. All rights reserved. 21 Integrating with HSMs • Requires the HSM vendor supplied jars / libraries • Requires KeyHSM, a Key Trustee service that creates the binding between Key Trustee and the HSM • Supports the major HSMs • SafeNet Luna • SafeNet DataSecure, KeySecure • Thales nCipher • Account for additional time for the HSM administrator to provision the accounts, software, network connectivity
  • 22. © Cloudera, Inc. All rights reserved. 22 Security Reference Architecture
  • 23. © Cloudera, Inc. All rights reserved. 23 HDFS Encrypt + Navigator Encrypt + Key Trustee
  • 24. © Cloudera, Inc. All rights reserved. 24 Encryption Options and Where to Use Them Data Location Enterprise Key Management with optional HSM Protection against server or disk theft Separation of Duties Option (Protection against IT Admin*) What to deploy (in addition to KTS) HDFS Navigator Key Trustee Yes Yes. Access to keys determined by whitelists & blacklists. Hadoop admin can be denied.* HDFS Encryption (with KMS proxy) Metadata databases - on postgres and mysql - Hive Metastore, CM, Navigator, Sqoop2 Navigator Key Trustee Yes Yes. Protection from root/linux admins - Process-Based Access Control Lists (PBACLs) ensure that only the database processes can decrypt data Navigator Encrypt using the block-level (dmcrypt option); PBACLs Temp/Spill files - for CDH components with native spill file encryption: Impala, MR, HBase, Accumulo, Flume Not Needed (temporary keys are maintained in memory only) Yes Yes. Protection from root/linux admins - only the component process can access the key Enable native temp/spill encryption for each of these components (see CDH documentation) Temp/Spill files - for components that don’t offer their own temp encryption: Spark, Kafka, Sqoop2, Hive Server2 (map-side joins; workaround: turn this off) Navigator Key Trustee Yes Navigator Encrypt using the block-level (dmcrypt option) When a full block device is not available for dmcrypt, a loop file can be used instead. Log files Not Needed Yes Yes - sensitive data deleted before being saved to disk Log Redaction (new feature in CM 5.4)
  • 25. © Cloudera, Inc. All rights reserved. 25 New in 5.5 Key Trustee admin role
  • 26. © Cloudera, Inc. All rights reserved. 26 HDFS Encryption Workflow
  • 27. © Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. 27 HDFS Encryption, Involved Parties HDFS KMS Key Trustee zHSM HSM Client optional Key authorization File authorization
  • 28. © Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. 28 HDFS Encryption, Writing a File HDFS KMS Client To Trustee 2 3 6 7 1 5 8 1. create file 2. generate key 3. encrypted key 4. store encrypted key 5. file handle & encrypted key 6. decrypt encrypted key 7. decrypted key 8. encrypt & write data 4
  • 29. © Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. 29 HDFS Encryption, Reading a File HDFS KMS Client To Trustee 3 4 1 2 5 1. open file (passed read permission check) 2. file handle & encrypted key 3. decrypt encrypted key 4. decrypted key 5. read & decrypt data
  • 30. © Cloudera, Inc. All rights reserved. 30 HDFS Encryption Implementation and Usage
  • 31. © Cloudera, Inc. All rights reserved. 31 Enabling HDFS Encryption on a Cluster • Need recent version of libcrypto.so on HDFS and MapReduce client hosts • To check use the following command: hadoop checknative Output openssl: true /usr/lib64/libcrypto.so • yum install openssl openssl-devel • openssl package installs the library, openssl-devel creates the libcrypto.so symlink (you can manually create this as well) • Openssl provides AES-NI integration for Intel hardware
  • 32. © Cloudera, Inc. All rights reserved. 32 Enabling HDFS Encryption on a Cluster Using Cloudera Manager 1) Adding the KMS Service - add service Java KeyStore KMS on a host 2) Enabling Java KeyStore KMS for the HDFS Service • HDFS service – configuration tab • Scope > HDFS (Service-Wide) • Category > All • KMS Service property – turn on radio button SAVE CHANGES Restart Cluster Deploy Client Configuration.
  • 33. © Cloudera, Inc. All rights reserved. 33 Creating Encryption Zones • Use the hadoop key and hdfs crypto command-line tools to create encryption keys and set up new encryption zones. # Create an encryption key for your zone as the application user that will be using the key $ hadoop key create myKey # Create a new empty directory and make it an encryption zone $ hadoop fs -mkdir /zone $ hdfs crypto -createZone -keyName myKey -path /zone # To see the key zones $ hdfs crypto –listZones
  • 34. © Cloudera, Inc. All rights reserved. 34 Adding Files to an Encryption Zones Remember they start empty! You cannot create a Zone in directories with data hadoop distcp /user/dir /user/enczone • By default, distcp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. • When copying between an unencrypted and encrypted location, the filesystem checksums will not match since the underlying block data is different. • Use -skipcrccheck and -update flags to avoid verifying checksums. • Also use the distcp flags to preserve all attributes (-prbugpcaxt)
  • 35. © Cloudera, Inc. All rights reserved. 35 HDFS Encryption Zones for CDH • Our docs have great information! • Hive - /user/hive • HBase - /hbase • Solr - /solr • Hue - /user/hue • Spark - /user/spark/applicationHistory • YARN - /user/history • http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topi cs/cdh_sg_component_kms.html
  • 36. © Cloudera, Inc. All rights reserved. 36 Thank you