SlideShare a Scribd company logo
July 27th, 2016
Hadoop Security and Compliance
Derek Sun
Big Data Architect
StampedeCon 2016
©2016 MasterCard. Proprietary and Confidential2
• Securing the infrastructure
• Authentication
• Authorization
• Data Security
• Auditing
• Governance & Metadata
management
• Masking & Redaction
• Compliant procedures
Today’s Discussion
Hadoop Security Governance & Compliance
©2016 MasterCard. Proprietary and Confidential3
Hadoop Security – Securing the infrastructure
• On-premise deployment & network segmentation
• Firewalls, e.g., corporate & host firewalls
• Intrusion detection & prevention, such as Open Network Insights (ONI)
• Up-to-date OS security patches, e.g., twice a year
• Hadoop services separation
1. Cores on master nodes, e.g., Yarn, ZK, NameNode …
2. Worker services on data nodes, e.g., NodeManager, DataNode, Hbase RegionServer …
3. Edge Nodes for authorized regular users, e.g., SSH, NFS gateway, Sqoop, HUE,
Oozie, Thrift/REST servers, users apps …
©2016 MasterCard. Proprietary and Confidential4
Hadoop Security – Authentication
• KERBEROS
1. What is Kerberos?
2. User & service principals
• Every user and service that participates in the Kerberos authentication protocol requires a principal to uniquely identify itself
3. Realm, across Realm trust, e.g., one-way trust from MIT realm to Windows AD domain
• A Kerberos realm is an authentication administrative domain. All principals are assigned to a specific Kerberos realm
4. MIT KDC vs AD KDC
I. Locally vs centrally managed principals
II. Flat vs more structural, e.g., use sub OU for each cluster
III. Maintained by Hadoop admin vs Corporate IT
IV. Potential single point of failure vs a set of AD domain controllers
©2016 MasterCard. Proprietary and Confidential5
Hadoop Security – Authorization
• File systems based - Local & HDFS, e.g., putting administrators in
dfs.permissions.superusergroup in Hadoop-policy.xml or UIs – CM/Ambari
• Service based
1. Set up ACLs in hadoop-policy.xml for HDFS, MR1, Yarn, ZK, Oozie …
2. Setup permissions at different levels for HBase, e.g., table, column or cell level …
• Role based, Apache Sentry, Apache Ranger, Apache KNOX
 Make sure to bypass some services accounts, such as in Hive, you should bypass
Hive, Impala, HDFS, and HUE users
©2016 MasterCard. Proprietary and Confidential6
Hadoop Security – Data Security
• Encryption At-Rest
• Native HDFS encryption, Encryption Zone, EZK, DEK & EDEK, Key Management Server (KMS)
& Key Storage Server
©2016 MasterCard. Proprietary and Confidential7
Hadoop Security – Data Security
• Encryption At-Rest Continue
• MapReduce intermediate data
• Services, e.g., Hive(user/hive), SOLR(/user/solr), HUE(user/hue), Yarn History(user/history),
Spark/Impala shuffle & disk spill data, Kafka log/data folders & etc.
• File system & full disk, e.g., audit log folders
• Use a Hardware Security Module (HSM) as a more secured key storage
©2016 MasterCard. Proprietary and Confidential8
Hadoop Security – Data Security
• Encryption Over-The-Wire (In-Motion)
1. Use TLS/SSL encrypt the communication channel for different protocols, e.g., TCP/IP, RPC, Http
2. TLS/SSL certificate can be used for a specify identity, e.g., per host, service, port & etc.
3. Keep scalability and maintenance effort in mind when apply certificate strategy,
e.g., based on type of applications - JKS for Java apps, and PEM for others
4. Use automatic deployment tools to deploy SSL certificates, e.g., Chef, StackIQ & etc.
5. Set up company own Certificate Authority (CA)
6. Use CA signed certificates for all non-dev environments, self-signed certificates
can be used in the dev environment
7. Disable clear text services once TLS/SSL is enabled, e.g., disable HTTP service on 11000
once Oozie HTTPS is enabled on port 11443
©2016 MasterCard. Proprietary and Confidential9
Hadoop Security – Auditing
• Passive auditing, doesn’t generate alerts, main purpose is to audit certain events to meet
business requirements, e.g., setOwner, setPermission on HDFS folders
• Active auditing, more aggressive, normally generates alerts, e.g., automatically generates email
alerts to InfoSec upon access denial events and configuration changes
• Enable and configure auditing logs for each service in log4j.properties, e.g., HDFS, YARN, HIVE,
Impala & etc.
• Use vendor supported products, such as Cloudera Navigator, to centrally configure, collect,
monitor audit policies/events and aggregate logs
• Segregation of duties set up ACLs properly, enable AES256 strong encryption on
the audit logs, and only auditors and their log collecting process such, as Splunk,
can access the audit log folders
©2016 MasterCard. Proprietary and Confidential10
Governance & Compliance – Governance & Metadata
Management
• Metadata extraction and management, e.g., HDFS folders, files & permissions, Yarn job
metadata, Oozie workflows, Hive queries & etc.
• Searchable with Lineage information attached
• Data Classification based on business perspectives
• Centralized Auditing
• Use tools to satisfy majority of the requirements, e.g., Cloudera Navigator & Apache Atlas
©2016 MasterCard. Proprietary and Confidential11
Governance & Compliance – Masking & Redaction
• HDFS log and query redaction, e.g.,
Hostname: b(([A-Za-z]|[A-Za-z][A-Za-z0-9-]*[A-Za-z0-9]).)+([A-Za-z0-9]|[A-Za-z0-9][A-Za-
z0-9-]*[A-Za-z0-9])b
Replace: HOSTNAME.REDACTED
• Audit Server Log masking & redaction for all supported Hadoop services, e.g.,
(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])[0-
9]{11})|(6(?:011|5[0-9]{2})[0-9]{12})|((?:2131|1800|35d{3})d{11})
©2016 MasterCard. Proprietary and Confidential12
Governance & Compliance – Compliant Procedures
• Penetration test & application scan
 All medium & high security vulnerability findings have to be remediated before the
certification deadline
• User roles and groups validation, e.g., audit policies on data access based on groups & roles
• Application log, audit log & change reports, e.g., provide role-based authorization audit
logs once a week to an internal auditor
• Patch management
• Data retention, encryption and key rotation policies
• Other business requirements
©2016 MasterCard. Proprietary and Confidential13
Takeaways
• Securing your Hadoop environments can be lengthy and evolving
• Homegrown processes are needed to satisfy business requirements
• Security is applied end to end in the process
• Big Data is still maturing
• Don’t confuse compliance with security
©2016 MasterCard. Proprietary and Confidential
Contact Us
14
Craig Hibbeler
+1 (636) 439 8186
Craig.Hibbeler@mastercard.com
Derek Sun
+1 (636) 722 5512
Derek.Sun@mastercard.com

More Related Content

What's hot

Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
DataWorks Summit/Hadoop Summit
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
DataWorks Summit
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
nvvrajesh
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
Cloudera, Inc.
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right Way
DataWorks Summit
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
Cloudera, Inc.
 
Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentials
Steve Tran
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseCloudera, Inc.
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
 
Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MRApplication architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MR
markgrover
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
 

What's hot (20)

Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right Way
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentials
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MRApplication architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MR
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 

Viewers also liked

Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
Batch and Real-time EHR updates into Hadoop - StampedeCon 2015Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
StampedeCon
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
StampedeCon
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
Alex Zeltov
 
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
StampedeCon
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
StampedeCon
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The Fundamentals
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
Cloudera, Inc.
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
DataWorks Summit/Hadoop Summit
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
Hortonworks
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
Madhan Neethiraj
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 

Viewers also liked (18)

Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
Batch and Real-time EHR updates into Hadoop - StampedeCon 2015Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The Fundamentals
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 

Similar to Hadoop Security and Compliance - StampedeCon 2016

大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
Jianwei Li
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Yahoo!デベロッパーネットワーク
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Big-Data-as-a-Service (BDaaS) Meetup
 
Pa cloudera manager-api's_extensibility_v2
Pa   cloudera manager-api's_extensibility_v2Pa   cloudera manager-api's_extensibility_v2
Pa cloudera manager-api's_extensibility_v2ClouderaUserGroups
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
Great Wide Open
 
Brief theoretical overview on AWS Components
Brief theoretical overview on AWS ComponentsBrief theoretical overview on AWS Components
Brief theoretical overview on AWS Components
Tech Tutorials
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
Cloudera, Inc.
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
Karan Alang
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
RightScale Webinar: Key Considerations For Cloud Migration and Portability
RightScale Webinar:  Key Considerations For Cloud Migration and PortabilityRightScale Webinar:  Key Considerations For Cloud Migration and Portability
RightScale Webinar: Key Considerations For Cloud Migration and Portability
RightScale
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
[2016 데이터 그랜드 컨퍼런스] 5 1(보안,품질). 웨어밸리 data security challenges and its solutio...
[2016 데이터 그랜드 컨퍼런스] 5 1(보안,품질). 웨어밸리 data security challenges and its solutio...[2016 데이터 그랜드 컨퍼런스] 5 1(보안,품질). 웨어밸리 data security challenges and its solutio...
[2016 데이터 그랜드 컨퍼런스] 5 1(보안,품질). 웨어밸리 data security challenges and its solutio...
K data
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
DataWorks Summit
 
Softlayer 07.nov.2014 en
Softlayer 07.nov.2014 enSoftlayer 07.nov.2014 en
Softlayer 07.nov.2014 en
Sergio Alvarez Alonso
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
GoDataDriven
 

Similar to Hadoop Security and Compliance - StampedeCon 2016 (20)

大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Pa cloudera manager-api's_extensibility_v2
Pa   cloudera manager-api's_extensibility_v2Pa   cloudera manager-api's_extensibility_v2
Pa cloudera manager-api's_extensibility_v2
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Brief theoretical overview on AWS Components
Brief theoretical overview on AWS ComponentsBrief theoretical overview on AWS Components
Brief theoretical overview on AWS Components
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
RightScale Webinar: Key Considerations For Cloud Migration and Portability
RightScale Webinar:  Key Considerations For Cloud Migration and PortabilityRightScale Webinar:  Key Considerations For Cloud Migration and Portability
RightScale Webinar: Key Considerations For Cloud Migration and Portability
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
[2016 데이터 그랜드 컨퍼런스] 5 1(보안,품질). 웨어밸리 data security challenges and its solutio...
[2016 데이터 그랜드 컨퍼런스] 5 1(보안,품질). 웨어밸리 data security challenges and its solutio...[2016 데이터 그랜드 컨퍼런스] 5 1(보안,품질). 웨어밸리 data security challenges and its solutio...
[2016 데이터 그랜드 컨퍼런스] 5 1(보안,품질). 웨어밸리 data security challenges and its solutio...
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Softlayer_07.Nov.2014_EN
Softlayer_07.Nov.2014_ENSoftlayer_07.Nov.2014_EN
Softlayer_07.Nov.2014_EN
 
Softlayer 07.nov.2014 en
Softlayer 07.nov.2014 enSoftlayer 07.nov.2014 en
Softlayer 07.nov.2014 en
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 

Recently uploaded

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Hadoop Security and Compliance - StampedeCon 2016

  • 1. July 27th, 2016 Hadoop Security and Compliance Derek Sun Big Data Architect StampedeCon 2016
  • 2. ©2016 MasterCard. Proprietary and Confidential2 • Securing the infrastructure • Authentication • Authorization • Data Security • Auditing • Governance & Metadata management • Masking & Redaction • Compliant procedures Today’s Discussion Hadoop Security Governance & Compliance
  • 3. ©2016 MasterCard. Proprietary and Confidential3 Hadoop Security – Securing the infrastructure • On-premise deployment & network segmentation • Firewalls, e.g., corporate & host firewalls • Intrusion detection & prevention, such as Open Network Insights (ONI) • Up-to-date OS security patches, e.g., twice a year • Hadoop services separation 1. Cores on master nodes, e.g., Yarn, ZK, NameNode … 2. Worker services on data nodes, e.g., NodeManager, DataNode, Hbase RegionServer … 3. Edge Nodes for authorized regular users, e.g., SSH, NFS gateway, Sqoop, HUE, Oozie, Thrift/REST servers, users apps …
  • 4. ©2016 MasterCard. Proprietary and Confidential4 Hadoop Security – Authentication • KERBEROS 1. What is Kerberos? 2. User & service principals • Every user and service that participates in the Kerberos authentication protocol requires a principal to uniquely identify itself 3. Realm, across Realm trust, e.g., one-way trust from MIT realm to Windows AD domain • A Kerberos realm is an authentication administrative domain. All principals are assigned to a specific Kerberos realm 4. MIT KDC vs AD KDC I. Locally vs centrally managed principals II. Flat vs more structural, e.g., use sub OU for each cluster III. Maintained by Hadoop admin vs Corporate IT IV. Potential single point of failure vs a set of AD domain controllers
  • 5. ©2016 MasterCard. Proprietary and Confidential5 Hadoop Security – Authorization • File systems based - Local & HDFS, e.g., putting administrators in dfs.permissions.superusergroup in Hadoop-policy.xml or UIs – CM/Ambari • Service based 1. Set up ACLs in hadoop-policy.xml for HDFS, MR1, Yarn, ZK, Oozie … 2. Setup permissions at different levels for HBase, e.g., table, column or cell level … • Role based, Apache Sentry, Apache Ranger, Apache KNOX  Make sure to bypass some services accounts, such as in Hive, you should bypass Hive, Impala, HDFS, and HUE users
  • 6. ©2016 MasterCard. Proprietary and Confidential6 Hadoop Security – Data Security • Encryption At-Rest • Native HDFS encryption, Encryption Zone, EZK, DEK & EDEK, Key Management Server (KMS) & Key Storage Server
  • 7. ©2016 MasterCard. Proprietary and Confidential7 Hadoop Security – Data Security • Encryption At-Rest Continue • MapReduce intermediate data • Services, e.g., Hive(user/hive), SOLR(/user/solr), HUE(user/hue), Yarn History(user/history), Spark/Impala shuffle & disk spill data, Kafka log/data folders & etc. • File system & full disk, e.g., audit log folders • Use a Hardware Security Module (HSM) as a more secured key storage
  • 8. ©2016 MasterCard. Proprietary and Confidential8 Hadoop Security – Data Security • Encryption Over-The-Wire (In-Motion) 1. Use TLS/SSL encrypt the communication channel for different protocols, e.g., TCP/IP, RPC, Http 2. TLS/SSL certificate can be used for a specify identity, e.g., per host, service, port & etc. 3. Keep scalability and maintenance effort in mind when apply certificate strategy, e.g., based on type of applications - JKS for Java apps, and PEM for others 4. Use automatic deployment tools to deploy SSL certificates, e.g., Chef, StackIQ & etc. 5. Set up company own Certificate Authority (CA) 6. Use CA signed certificates for all non-dev environments, self-signed certificates can be used in the dev environment 7. Disable clear text services once TLS/SSL is enabled, e.g., disable HTTP service on 11000 once Oozie HTTPS is enabled on port 11443
  • 9. ©2016 MasterCard. Proprietary and Confidential9 Hadoop Security – Auditing • Passive auditing, doesn’t generate alerts, main purpose is to audit certain events to meet business requirements, e.g., setOwner, setPermission on HDFS folders • Active auditing, more aggressive, normally generates alerts, e.g., automatically generates email alerts to InfoSec upon access denial events and configuration changes • Enable and configure auditing logs for each service in log4j.properties, e.g., HDFS, YARN, HIVE, Impala & etc. • Use vendor supported products, such as Cloudera Navigator, to centrally configure, collect, monitor audit policies/events and aggregate logs • Segregation of duties set up ACLs properly, enable AES256 strong encryption on the audit logs, and only auditors and their log collecting process such, as Splunk, can access the audit log folders
  • 10. ©2016 MasterCard. Proprietary and Confidential10 Governance & Compliance – Governance & Metadata Management • Metadata extraction and management, e.g., HDFS folders, files & permissions, Yarn job metadata, Oozie workflows, Hive queries & etc. • Searchable with Lineage information attached • Data Classification based on business perspectives • Centralized Auditing • Use tools to satisfy majority of the requirements, e.g., Cloudera Navigator & Apache Atlas
  • 11. ©2016 MasterCard. Proprietary and Confidential11 Governance & Compliance – Masking & Redaction • HDFS log and query redaction, e.g., Hostname: b(([A-Za-z]|[A-Za-z][A-Za-z0-9-]*[A-Za-z0-9]).)+([A-Za-z0-9]|[A-Za-z0-9][A-Za- z0-9-]*[A-Za-z0-9])b Replace: HOSTNAME.REDACTED • Audit Server Log masking & redaction for all supported Hadoop services, e.g., (4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])[0- 9]{11})|(6(?:011|5[0-9]{2})[0-9]{12})|((?:2131|1800|35d{3})d{11})
  • 12. ©2016 MasterCard. Proprietary and Confidential12 Governance & Compliance – Compliant Procedures • Penetration test & application scan  All medium & high security vulnerability findings have to be remediated before the certification deadline • User roles and groups validation, e.g., audit policies on data access based on groups & roles • Application log, audit log & change reports, e.g., provide role-based authorization audit logs once a week to an internal auditor • Patch management • Data retention, encryption and key rotation policies • Other business requirements
  • 13. ©2016 MasterCard. Proprietary and Confidential13 Takeaways • Securing your Hadoop environments can be lengthy and evolving • Homegrown processes are needed to satisfy business requirements • Security is applied end to end in the process • Big Data is still maturing • Don’t confuse compliance with security
  • 14. ©2016 MasterCard. Proprietary and Confidential Contact Us 14 Craig Hibbeler +1 (636) 439 8186 Craig.Hibbeler@mastercard.com Derek Sun +1 (636) 722 5512 Derek.Sun@mastercard.com