SlideShare a Scribd company logo
1 of 31
Download to read offline
Ā© 2014 Dataguise Inc. All rights reserved. 
Discovering & Protecting 
Sensitive Data in Hadoop 
jeremy@dataguise.com
Goals For Today 
Big Data for banking, healthcare, tech, govt, 
education, etc. need data security (But few have 
workable approaches in production today) 
Hadoop security approaches (What works and 
doesnā€™t work from the past, challenges in the 
present) 
Real world case studies (data-centric protection) 
Credit card security 
Healthcare data lake (Data-as-a-Service) 
Product analytics in the cloud 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
2
Market 
Overview 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 3
Data Growth 
ā€¢ 100% growth and 80% unstructured data by 2015 
ā€¦finding and classifying sensitive data will get 
harder 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
4 
Exabytes
Real-world unstructured data scenarios 
Web comment fields and customer 
surveys, CRM data 
Patient and doctor medical data 
in emails, PDFs, doctorā€™s notes 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
5 
Voice-to-txt files in Hadoop 
for customer service optimization; 
Log data from wellheads and 
oil drilling sensors 
Web e-Commerce 
Pay System
From%2012%to%2020,%enterprise%Big% 
Data%will%grow%7500%% 
in%next%6;8%yrs%% 
% 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
IT%headcount%for% 
Big%Data%will%grow% 
1.5x% 
The Importance of Automation
Why Security in Big Data 
Vertical 
Refine 
Explore 
Enrich 
Retail & Web 
ā€¢ Log Analysis Site 
Optimization 
ā€¢ Social Network 
Analysis 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
ā€¢ Dynamic Pricing 
ā€¢ Session & Content 
Optimization 
Retail 
ā€¢ Loyalty Program 
Optimization 
ā€¢ Brand & Sentiment 
Analysis 
ā€¢ Dynamic Pricing/ 
Targeted Offer 
Intelligence 
ā€¢ Threat Identification ā€¢ Person of Interest 
Discovery 
ā€¢ Cross Jurisdiction 
Queries 
Finance 
ā€¢ Risk Modeling & Fraud 
Identification 
ā€¢ Trade Performance 
Analytics 
ā€¢ Surveillance & Fraud 
Detection 
ā€¢ Customer Risk 
Analysis 
ā€¢ Real-time upsell, cross 
sales marketing offers 
Energy 
ā€¢ Smart Grid: Production 
Optimization 
ā€¢ Grid Failure 
Prevention 
ā€¢ Smart Meters 
ā€¢ Individual Power Grid 
Manufacturing 
ā€¢ Supply Chain 
Optimization 
ā€¢ Customer Churn 
Analysis 
ā€¢ Dynamic Delivery 
ā€¢ Replacement Parts 
Healthcare & 
Payer 
ā€¢ Electronic Medical 
Records (EMPI) ā€¢ Clinical Trials Analysis ā€¢ Insurance Premium 
Determination
Why Security in Big Data 
Vertical 
Refine 
Explore 
Enrich 
Retail & Web 
ā€¢ Log Analysis Site 
Optimization 
ā€¢ Social Network 
Analysis 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
ā€¢ Dynamic Pricing 
ā€¢ Session & Content 
Optimization 
Retail 
ā€¢ Loyalty Program 
Optimization 
ā€¢ Brand & Sentiment 
Analysis 
ā€¢ Dynamic Pricing/ 
Targeted Offer 
Intelligence 
ā€¢ Threat Identification ā€¢ Person of Interest 
Discovery 
ā€¢ Cross Jurisdiction 
Queries 
Finance 
ā€¢ Risk Modeling & Fraud 
Identification 
ā€¢ Trade Performance 
Analytics 
ā€¢ Surveillance & Fraud 
Detection 
ā€¢ Customer Risk 
Analysis 
ā€¢ Real-time upsell, cross 
sales marketing offers 
Energy 
ā€¢ Smart Grid: Production 
Optimization 
ā€¢ Grid Failure 
Prevention 
ā€¢ Smart Meters 
ā€¢ Individual Power Grid 
Manufacturing 
ā€¢ Supply Chain 
Optimization 
PCI or 
Financial 
ā€¢ Customer Churn 
Analysis 
ā€¢ Dynamic Delivery 
ā€¢ Replacement Parts 
Healthcare & 
Payer 
ā€¢ Electronic Medical 
Records (EMPI) ā€¢ Clinical Trials Analysis ā€¢ Insurance Premium 
Determination 
Privacy 
data 
PCI or 
Financial 
Personal 
Health (PHI) 
Personal 
Health (PHI) 
Privacy 
data 
Personal 
Health (PHI) 
Privacy 
data 
Privacy 
PdCaIt ao r 
Financial 
PCI or 
Financial 
Privacy 
data
Three Critical Considerations 
1. Ensuring Compliance 
ā€¢ The Big Ps (PCI, HIPAA, Privacy), data residency, 
FERPA,FISMA, FERC , etc. 
ā€¢ 1200 laws in 63 countries 
2. Reducing Breach Risk 
3. Quantifying both 
1. How much sensitive 
data? (ā€œun-announcedā€) 
2. Who is adding? (ad hoc user directories) 
3. Who is accessing? (sharing, selling, re-purposing) 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
9
Lab Project 
ā€¢ Hadoop as 
R&D 
ā€¢ Strictly data 
science 
ā€¢ Zero $$$ or 
selection of 
Distribution 
ā€¢ Zero 
recognition of 
sensitive data 
or exposure 
Proof Stage 
ā€¢ Achieving 
value 
ā€¢ Data lake cost 
savings 
ā€¢ Line of 
business 
ownership 
ā€¢ Nodal 
expansion 
ā€¢ Security 
elements? 
(unknown to 
InfoSec) 
ROI Validity 
ā€¢ ROI and TCO 
validity 
ā€¢ Distribution 
selection and 
purchase 
ā€¢ The Security 
ā€˜A- Haā€™ 
moment 
ā€¢ Solved with 
legacy or 
penalty box 
Hadoop 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
On Demand 
Hadoop 
ā€¢ Full scale 
production 
ā€¢ Ad hoc new 
uses 
ā€¢ Go Faster: 
Spark, 
Kafka 
ā€¢ Security 
sanctified 
The Evolution of Hadoop Projects
On-Demand Hadoop. 
ā€¢ Without adequate sensitive 
data protection, customers 
left to ā€œPenalty Boxingā€ 
Hadoop 
Ā» ā€œSecurity zonesā€ imposed by 
InfoSec 
Ā» Slows business, costly and 
cumbersome 
ā€¢ Data-centric protection can 
set those assets free 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
11
Data Protection 
In Hadoop 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 12
Security in Hadoop In Summary 
ā€¢ Like Cloud, Mobile, Virtualizationā€¦ Big Data 
drives fundamental new rules in security 
Ā» Ad hoc computing, wide open data sets 
Ā» Extended users and usages, sharing and selling 
Ā» 3 Vs moving to 6 Vs (automation, non-blocking) 
ā€¢ Problem #1 is compliance 
Ā» Reporting/auditing/monitoring as/more important than 
data security 
ā€¢ Data-centric protection can help 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
13
Hadoop Security Framework 
Access' 
Defining%what%users% 
and%applicaHons%can%do% 
with%data% 
Technical'Concepts:' 
Permissions% 
AuthorizaHon% 
Perimeter' 
% 
Guarding%access%to%the% 
cluster%itself% 
% 
%%% 
Technical'Concepts:' 
AuthenHcaHon% 
Network%isolaHon% 
% 
Perimeter' 
Guarding%access%to%the% 
cluster%itself% 
% 
Technical'Concepts:' 
AuthenHcaHon% 
Network%isolaHon% 
% 
ReporHng%on%where% 
data%came%from%and% 
how%itā€™s%being%used% 
% 
Technical'Concepts:' 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
Data' 
ProtecHng%data%in%the% 
cluster%from% 
unauthorized%visibility% 
% 
Technical'Concepts:' 
EncrypHon,%TokenizaHon,% 
Data%masking% 
% 
Visibility' 
AudiHng% 
Lineage% 
% 
ā€¢ The 4 approaches to address security within Hadoop (Perimeter, 
Data, Access, Visibility) 
ā€¢ Dataguise discovers & protects at the data layer and provides visibility 
for audit reporting and data lineage
Kerberos on Hadoop 
ā€¢ Kerberos (developed at MIT) has been the de-facto 
standard for strong authentication/authz 
Ā» Protection against user and service spoofing 
attacks, and allows for enforcement of user HDFS 
access permissions 
ā€¢ What does Kerberos Do? 
Ā» Establishes identity for clients, hosts, and services 
Ā» Prevents impersonation, passwords are never sent over 
the wire 
Ā» Tickets grant cryptographic ā€œpermissionsā€ to resources 
ā€¢ Kerberos is core of authentication in native 
Apache Hadoop from 2010 
Ā» Used for access ecosystem services HDFS, JT, Oozie., 
for server to server traffic auth. etc. BUT complex to 
manage! 
Ā» Lots of steps for example: 
http://www.cloudera.com/content/cloudera--content/cloudera-- 
docs/CDH4/4.3.0/CDH4--Security--Guide/cdh4sg_topic_3.html 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
15 
Access' 
Defining%what%users% 
and%applicaHons%can%do% 
with%data% 
% 
Technical'Concepts:' 
Permissions% 
AuthorizaHon% 
%
MapR Improvements on Auth/Authz 
ā€¢ Vastly simpler 
Ā» But no requirements for Kerberos in core 
Ā» Identity represented using a ticket which is issued by 
MapR CLDB servers (Container Location DataBase) 
Ā» Core services secured by default 
ā€¢ Easier integration 
Ā» User identity independent of host or operating system 
Ā» Local to MapR (no external Kerberos required) 
ā€¢ Faster 
Ā» Leverage Intel accelerated hardware crypto 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
16
Elements of Data Centric Protection 
ā€¢ 1. Identify which elements you want to protect 
via: 
Ā» Delimiters (structured data), name-value pairs (semi-structured) 
or data discovery service (unstructured) 
ā€¢ 2. Automated Protection Options: 
Ā» Automatically apply protection via: 
Ā» Format preserving encryption (FPE) 
Ā» Masking (replace, randomize, intellimask, static) 
Ā» Redaction (nullify) 
ā€¢ 3. Audit Strategy 
Ā» Sensitive data protection/access/lineage 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
17
Discovery 
ā€¢ Within HDFS 
Ā» Search for sensitive data per company policy ā€“ PII, PCI,ā€¦ 
Ā» Handle complex data types such as addresses 
Ā» Process incrementally (default) to handle only the new content 
ā€¢ In-flight 
Ā» Processing data on the fly as they are ingested into Hadoop HDFS 
Ā» Plug-in solution for FTP, Flume, 
Sqoop 
Ā» Search for sensitive data 
per policy ā€“ PII, PCI, HIPAAā€¦ 
Ā» NEXT UP: Kafka 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
18
How Discovery Works 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
19 
ā€¢ MapReduce or Flume/FTP/Sqoop Agent 
Ā» Root directories and drill downs 
Ā» Can scan entire dataset or incrementally (watermarking) 
ā€¢ Runs pattern, logic, context, algorithm, and ontology filters 
ā€¢ Can utilize white/black lists and reference sets
Protection Measures 
ā€¢ Protection plan should start with 
cutting 
Ā» What data can we delete/cut? 
Ā» What data can be redacted? 
Ā» Masking choices 
ā€¢ Consistency 
ā€¢ Realistic looking data 
ā€¢ Partial reveal (Intellimask) 
Credit Card # 4541 **** **** 3241 
ā€¢ What data needs reversibility 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
20
Encryption ā€œvsā€ Masking 
ā€¢ Encryption: 
+ Reversible 
+ Trusted with security proofs 
+ The first hammer 
+ De-centralized architectures 
- Complex 
- Key management 
- Useless without robust 
authentication and authorization 
- Data value destruction 
- Needs both encrypt-decrypt 
tooling 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
21 
ā€¢ Masking: 
+ Highest security 
+ Realistic data 
+ Range and value preserving 
+ Once and done 
+Scale-out and distributed 
+ No performance impact on usage 
+ Zero need for authentication and 
authorization and key management 
- Not as well marketed 
- Not reversible
Encryption ā€œvsā€ Masking 
ā€¢ Masking: 
+ Highest security 
+ Realistic data 
+ Range and value preserving 
+ Format-preserving and partial 
reveals 
+Scale-out and distributed 
+ No performance impact on usage 
+ Zero need for authentication and 
authorization and key management 
- Not as well marketed 
- Not reversible 
- Perceived to grow data 
ā€¢ Encryption: 
+ Reversible 
+ Trusted with security proofs 
+ Format-preserving and partial 
reveals 
+Scale-out and distributed 
+ The first hammer 
+ De-centralized architectures 
- Complex 
- Key management 
- Useless without robust 
authentication and authorization 
- Data value destruction 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
22 
The fundamental decision between masking and 
encryption comes down to reversibility: 
Some elements in analytics must resolve to original: 
(e.g. 66.249.22.145 or $34,332.12) 
Some elements ideal for psuedonyms: 
Social Security Numbers 
Credit Card Numbers 
Names
Real-World Performance 
ā€¢ Leveraging the power of MapReduce to run 
distributed encryption or masking 
ā€¢ Data volume: 2.2 TB 
ā€¢ Run Time: 23 min 
ā€¢ Sensitive Data %: 8/50 Columns in 2.2 Bn rows 
ā€¢ Run on 360 node MapR system 
ā€¢ In old-word database technology, this would type 
of job would have taken days/week(s) 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
23
Audit Strategy 
ā€¢ Essential to all goals: Compliance, breach 
protection, visibility and metrics 
ā€¢ Avoids the ā€œgotchaā€ moment 
Ā» Show all sensitive elements (count, location) 
Ā» Remediation applied 
Ā» Dashboard for fast access to critical policies and drill-downs 
for file and user action 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
24
How It works: Detection and Protection 
In-flight or @Rest 
RDBMS 
Xaction 
Data 
warehouse 
Site 
WEB 
FTP 
FlDugmFel uAmgee nt 
Plug-in 
DgFlume Agent 
1. Detect sensitive data 
2. Protect applying 
masking/encryption 
policies 
Production 
Cluster 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
Hadoop API 
Discover/ 
Mask/Encrypt 
DgHDFS Agent 
1. Detect sensitive data 
2. Protect applying 
masking/encryption 
policies 
Hadoop API 
DGHive, HDFS bulk 
decryption/Java app 
1. Selective decryption 
based on user/role 
and policy 
1 Data Discovery and 
protection while 
loaded into HDFS 
2 Data masked or 
encrypted in HDFS 
with Map/Reduce job 
3 Users can now 
access data 
DGDiscover-Masker 
1. In DB (Oracle, SQL.. 
SharePoint, Files) 
2. Protect applying 
masking/encryption 
policies 
Sqoop 
DgScoop Agent 
1. Detect sensitive data 
2. Protect applying 
masking/encryption 
policies
Case Studies 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 26
Protecting sensitive data in 
top credit card firm 
Source Data Protection Analysis 
Credit Card 
Transactions 
Omniture Files 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
Selective access to 
sensitive data 
based on role and 
app 
27 
Objectives 
! Consolidate existing payment risk 
analysis inside high-scale, lower cost 
Hadoop 
! Provide tiered access  authorization 
for multiple business apps (fraud, risk, 
cross-sell 
Solution 
! MapR Hadoop for single, reliable, high 
performance data analysis platform 
! Dataguise consistent masking enables 
analysis and unique index key values for 
de-identified data 
! Unique ability to output protected data in 
adjacent column or appended with 
delimiter inside existing column to protect 
data while 
governing access via authorization rules 
Incremental updates 
to HDFS automatically 
protected 
Results  Benefits 
ā€¢ Continuous real-time protection 
(job runs every 5 mins on ingest) 
ā€¢ Analytics draws on the secure 
purchasing data of 90 million credit card 
holders across 127 countries
Protecting personal health info (PHI) 
in aggregate data lake 
DG FTP Agent 
SQL 
Data 
FTP 
Health 
Records 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
HDFS Authorization controlled 
through group membership 
in Active Directory 
28 
Objectives 
! Reduce costly and preventable 
readmission, decrease mortality rates, and 
improve the quality of life for patients 
! Internal data service model DAaaS (Data 
Architecture as a Service) 
Solution 
! Solution needs to protect structured and 
unstructured source data in database, data 
warehouse, and flat file structures 
! Customer required customization of 
encryption and key management to fit into 
their existing corp infrastructure and security 
policies 
! Dataguise dashboard gives admins easy way to 
identify directories/files containing sensitive data 
Results  Benefits 
ā€¢ Delivered a cost-effective and easy way to 
determine where sensitive data resides within 
the cluster, and how itā€™s been protected 
! Seamless access to encrypted data from a 
variety of data access methods {Hive, Pig, 
Analytic tools}
Global Tech Product Analytics 
DG Flume Agent 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
29 
Objectives 
! Aggregate logging data (product, usage, 
user configuration) for all smartphones 
worldwide t 
! De-identify personal user info to ensure 
privacy and compliance with European/US 
Privacy 
Solution 
ā€¢ Customer routes all device logging data 
into 7 Global AWS clouds 
ā€¢ Uses Dataguise Flume agent to protect all 
sensitive data being written to Amazon S3 
ā€¢ Runs Dataguise in AWS, also utilizes 
Dataguise EMR security agents to 
selectively decrypt for authorized analytics 
in AWS 
Results  Benefits 
Apache 
Flume 
ā€¢ On-demand Hadoop for product 
analytics, user behavior, supply chain optimization 
ā€¢ High scale-out, high performance and high scale-out 
paramount 
ā€¢ 100% cloud based security 
Virtualized 
DG Secure 
Protected Data Amazon 
S3 
Smartphone 
Device Log 
Collectors 
AWS Clouds in Korea, Singapore, 
US (3), UK, and Ireland
Hadoop Data-Protection Checklist 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
30 
 Discover sensitive data 
 Automate protective measures 
 Integrate into Hadoop authorization 
 With continuous real-time tracking 
 Dashboards, Reports  Auditing 
 Automated Risk Assessment/Scoring 
 Automated inference protection 
(roadmap)
Thank You 
Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 
31 
Jeremy Stieglitz 
VP Products 
jeremy@dataguise.com

More Related Content

What's hot

Security and privacy of cloud data: what you need to know (Interop)
Security and privacy of cloud data: what you need to know (Interop)Security and privacy of cloud data: what you need to know (Interop)
Security and privacy of cloud data: what you need to know (Interop)Druva
Ā 
Big data security challenges and recommendations!
Big data security challenges and recommendations!Big data security challenges and recommendations!
Big data security challenges and recommendations!cisoplatform
Ā 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloudsornalathaNatarajan
Ā 
Symantec Data Insight for Storage
Symantec Data Insight for StorageSymantec Data Insight for Storage
Symantec Data Insight for StorageSymantec
Ā 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...Steven Meister
Ā 
Symantec Data Insight 4.0 July 2013
Symantec Data Insight 4.0 July 2013Symantec Data Insight 4.0 July 2013
Symantec Data Insight 4.0 July 2013Symantec
Ā 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
Ā 
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Niu Bai
Ā 
Storing Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance ChallengesStoring Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance ChallengesTony Pearson
Ā 
Managing Multiple Compliance Priorities - GDPR, CCPA, HIPAA, APEC, ISO 27001,...
Managing Multiple Compliance Priorities - GDPR, CCPA, HIPAA, APEC, ISO 27001,...Managing Multiple Compliance Priorities - GDPR, CCPA, HIPAA, APEC, ISO 27001,...
Managing Multiple Compliance Priorities - GDPR, CCPA, HIPAA, APEC, ISO 27001,...TrustArc
Ā 
OnRamp Customer Case Study - analyticsMD
OnRamp Customer Case Study - analyticsMDOnRamp Customer Case Study - analyticsMD
OnRamp Customer Case Study - analyticsMDJoshua Berman
Ā 
Expanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesExpanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesTom Kirby
Ā 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
Ā 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
Ā 
What Are you Waiting For? Remediate your File Shares and Govern your Informat...
What Are you Waiting For? Remediate your File Shares and Govern your Informat...What Are you Waiting For? Remediate your File Shares and Govern your Informat...
What Are you Waiting For? Remediate your File Shares and Govern your Informat...Everteam
Ā 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceCloudera, Inc.
Ā 
Gaining Support for Hadoop in a Large Corporate Environment
Gaining Support for Hadoop in a Large Corporate EnvironmentGaining Support for Hadoop in a Large Corporate Environment
Gaining Support for Hadoop in a Large Corporate EnvironmentDataWorks Summit
Ā 
Data Retention and eDiscovery from Symantec
Data Retention and eDiscovery from SymantecData Retention and eDiscovery from Symantec
Data Retention and eDiscovery from SymantecArrow ECS UK
Ā 
Data security or technology what drives dlp implementation
Data security or technology  what drives dlp implementationData security or technology  what drives dlp implementation
Data security or technology what drives dlp implementationSatyanandan Atyam
Ā 

What's hot (20)

Security and privacy of cloud data: what you need to know (Interop)
Security and privacy of cloud data: what you need to know (Interop)Security and privacy of cloud data: what you need to know (Interop)
Security and privacy of cloud data: what you need to know (Interop)
Ā 
Big data security challenges and recommendations!
Big data security challenges and recommendations!Big data security challenges and recommendations!
Big data security challenges and recommendations!
Ā 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloud
Ā 
Symantec Data Insight for Storage
Symantec Data Insight for StorageSymantec Data Insight for Storage
Symantec Data Insight for Storage
Ā 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Ā 
Symantec Data Insight 4.0 July 2013
Symantec Data Insight 4.0 July 2013Symantec Data Insight 4.0 July 2013
Symantec Data Insight 4.0 July 2013
Ā 
Security bigdata
Security bigdataSecurity bigdata
Security bigdata
Ā 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
Ā 
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714
Ā 
Storing Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance ChallengesStoring Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance Challenges
Ā 
Managing Multiple Compliance Priorities - GDPR, CCPA, HIPAA, APEC, ISO 27001,...
Managing Multiple Compliance Priorities - GDPR, CCPA, HIPAA, APEC, ISO 27001,...Managing Multiple Compliance Priorities - GDPR, CCPA, HIPAA, APEC, ISO 27001,...
Managing Multiple Compliance Priorities - GDPR, CCPA, HIPAA, APEC, ISO 27001,...
Ā 
OnRamp Customer Case Study - analyticsMD
OnRamp Customer Case Study - analyticsMDOnRamp Customer Case Study - analyticsMD
OnRamp Customer Case Study - analyticsMD
Ā 
Expanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesExpanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challenges
Ā 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
Ā 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
Ā 
What Are you Waiting For? Remediate your File Shares and Govern your Informat...
What Are you Waiting For? Remediate your File Shares and Govern your Informat...What Are you Waiting For? Remediate your File Shares and Govern your Informat...
What Are you Waiting For? Remediate your File Shares and Govern your Informat...
Ā 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
Ā 
Gaining Support for Hadoop in a Large Corporate Environment
Gaining Support for Hadoop in a Large Corporate EnvironmentGaining Support for Hadoop in a Large Corporate Environment
Gaining Support for Hadoop in a Large Corporate Environment
Ā 
Data Retention and eDiscovery from Symantec
Data Retention and eDiscovery from SymantecData Retention and eDiscovery from Symantec
Data Retention and eDiscovery from Symantec
Ā 
Data security or technology what drives dlp implementation
Data security or technology  what drives dlp implementationData security or technology  what drives dlp implementation
Data security or technology what drives dlp implementation
Ā 

Similar to Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protecting Sensitive Data in Hadoop (Dataguise)

Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Cloudera, Inc.
Ā 
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...Cloudera, Inc.
Ā 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
Ā 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataCloudera, Inc.
Ā 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
Ā 
Hadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business UnitHadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business UnitDataWorks Summit
Ā 
Turn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSTurn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSAmazon Web Services
Ā 
Innovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big DataInnovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big DataCloudera, Inc.
Ā 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Denodo
Ā 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
Ā 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Niel Dunnage
Ā 
David valovcin big data - big risk
David valovcin big data - big riskDavid valovcin big data - big risk
David valovcin big data - big riskIBM Sverige
Ā 
IDOL presentation
IDOL presentationIDOL presentation
IDOL presentationAndrey Karpov
Ā 
Data security in the cloud
Data security in the cloud Data security in the cloud
Data security in the cloud IBM Security
Ā 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
Ā 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubDataWorks Summit
Ā 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
Ā 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and ManufacturingCloudera, Inc.
Ā 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
Ā 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Jeffrey T. Pollock
Ā 

Similar to Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protecting Sensitive Data in Hadoop (Dataguise) (20)

Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Ā 
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Ā 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
Ā 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
Ā 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Ā 
Hadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business UnitHadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business Unit
Ā 
Turn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSTurn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWS
Ā 
Innovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big DataInnovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big Data
Ā 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Ā 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
Ā 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2
Ā 
David valovcin big data - big risk
David valovcin big data - big riskDavid valovcin big data - big risk
David valovcin big data - big risk
Ā 
IDOL presentation
IDOL presentationIDOL presentation
IDOL presentation
Ā 
Data security in the cloud
Data security in the cloud Data security in the cloud
Data security in the cloud
Ā 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Ā 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
Ā 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
Ā 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
Ā 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
Ā 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
Ā 

More from BigDataEverywhere

Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
Ā 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
Ā 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
Ā 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
Ā 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
Ā 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
Ā 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...BigDataEverywhere
Ā 

More from BigDataEverywhere (7)

Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Ā 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Ā 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Ā 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Ā 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Ā 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
Ā 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Ā 

Recently uploaded

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
Ā 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
Ā 
Call Girls in Defence Colony Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Call Girls in Defence Colony Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”Call Girls in Defence Colony Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Call Girls in Defence Colony Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”soniya singh
Ā 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
Ā 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
Ā 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
Ā 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
Ā 
å®šåˆ¶č‹±å›½ē™½é‡‘ę±‰å¤§å­¦ęƕäøščƁļ¼ˆUCBęƕäøščƁ书ļ¼‰ ꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
å®šåˆ¶č‹±å›½ē™½é‡‘ę±‰å¤§å­¦ęƕäøščƁļ¼ˆUCBęƕäøščƁ书ļ¼‰																			ꈐē»©å•åŽŸē‰ˆäø€ęƔäø€å®šåˆ¶č‹±å›½ē™½é‡‘ę±‰å¤§å­¦ęƕäøščƁļ¼ˆUCBęƕäøščƁ书ļ¼‰																			ꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
å®šåˆ¶č‹±å›½ē™½é‡‘ę±‰å¤§å­¦ęƕäøščƁļ¼ˆUCBęƕäøščƁ书ļ¼‰ ꈐē»©å•åŽŸē‰ˆäø€ęƔäø€ffjhghh
Ā 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
Ā 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
Ā 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
Ā 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
Ā 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
Ā 
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
Ā 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
Ā 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
Ā 
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...soniya singh
Ā 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
Ā 

Recently uploaded (20)

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
Ā 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
Ā 
Call Girls in Defence Colony Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Call Girls in Defence Colony Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”Call Girls in Defence Colony Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Call Girls in Defence Colony Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Ā 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
Ā 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
Ā 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Ā 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
Ā 
å®šåˆ¶č‹±å›½ē™½é‡‘ę±‰å¤§å­¦ęƕäøščƁļ¼ˆUCBęƕäøščƁ书ļ¼‰ ꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
å®šåˆ¶č‹±å›½ē™½é‡‘ę±‰å¤§å­¦ęƕäøščƁļ¼ˆUCBęƕäøščƁ书ļ¼‰																			ꈐē»©å•åŽŸē‰ˆäø€ęƔäø€å®šåˆ¶č‹±å›½ē™½é‡‘ę±‰å¤§å­¦ęƕäøščƁļ¼ˆUCBęƕäøščƁ书ļ¼‰																			ꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
å®šåˆ¶č‹±å›½ē™½é‡‘ę±‰å¤§å­¦ęƕäøščƁļ¼ˆUCBęƕäøščƁ书ļ¼‰ ꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
Ā 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
Ā 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Ā 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
Ā 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
Ā 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Ā 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Ā 
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Ā 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
Ā 
ź§ā¤ Aerocity Call Girls Service Aerocity Delhi ā¤ź§‚ 9999965857 ā˜Žļø Hard And Sexy ...
ź§ā¤ Aerocity Call Girls Service Aerocity Delhi ā¤ź§‚ 9999965857 ā˜Žļø Hard And Sexy ...ź§ā¤ Aerocity Call Girls Service Aerocity Delhi ā¤ź§‚ 9999965857 ā˜Žļø Hard And Sexy ...
ź§ā¤ Aerocity Call Girls Service Aerocity Delhi ā¤ź§‚ 9999965857 ā˜Žļø Hard And Sexy ...
Ā 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Ā 
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
Ā 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
Ā 

Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protecting Sensitive Data in Hadoop (Dataguise)

  • 1. Ā© 2014 Dataguise Inc. All rights reserved. Discovering & Protecting Sensitive Data in Hadoop jeremy@dataguise.com
  • 2. Goals For Today Big Data for banking, healthcare, tech, govt, education, etc. need data security (But few have workable approaches in production today) Hadoop security approaches (What works and doesnā€™t work from the past, challenges in the present) Real world case studies (data-centric protection) Credit card security Healthcare data lake (Data-as-a-Service) Product analytics in the cloud Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 2
  • 3. Market Overview Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 3
  • 4. Data Growth ā€¢ 100% growth and 80% unstructured data by 2015 ā€¦finding and classifying sensitive data will get harder Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 4 Exabytes
  • 5. Real-world unstructured data scenarios Web comment fields and customer surveys, CRM data Patient and doctor medical data in emails, PDFs, doctorā€™s notes Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 5 Voice-to-txt files in Hadoop for customer service optimization; Log data from wellheads and oil drilling sensors Web e-Commerce Pay System
  • 6. From%2012%to%2020,%enterprise%Big% Data%will%grow%7500%% in%next%6;8%yrs%% % Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL IT%headcount%for% Big%Data%will%grow% 1.5x% The Importance of Automation
  • 7. Why Security in Big Data Vertical Refine Explore Enrich Retail & Web ā€¢ Log Analysis Site Optimization ā€¢ Social Network Analysis Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL ā€¢ Dynamic Pricing ā€¢ Session & Content Optimization Retail ā€¢ Loyalty Program Optimization ā€¢ Brand & Sentiment Analysis ā€¢ Dynamic Pricing/ Targeted Offer Intelligence ā€¢ Threat Identification ā€¢ Person of Interest Discovery ā€¢ Cross Jurisdiction Queries Finance ā€¢ Risk Modeling & Fraud Identification ā€¢ Trade Performance Analytics ā€¢ Surveillance & Fraud Detection ā€¢ Customer Risk Analysis ā€¢ Real-time upsell, cross sales marketing offers Energy ā€¢ Smart Grid: Production Optimization ā€¢ Grid Failure Prevention ā€¢ Smart Meters ā€¢ Individual Power Grid Manufacturing ā€¢ Supply Chain Optimization ā€¢ Customer Churn Analysis ā€¢ Dynamic Delivery ā€¢ Replacement Parts Healthcare & Payer ā€¢ Electronic Medical Records (EMPI) ā€¢ Clinical Trials Analysis ā€¢ Insurance Premium Determination
  • 8. Why Security in Big Data Vertical Refine Explore Enrich Retail & Web ā€¢ Log Analysis Site Optimization ā€¢ Social Network Analysis Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL ā€¢ Dynamic Pricing ā€¢ Session & Content Optimization Retail ā€¢ Loyalty Program Optimization ā€¢ Brand & Sentiment Analysis ā€¢ Dynamic Pricing/ Targeted Offer Intelligence ā€¢ Threat Identification ā€¢ Person of Interest Discovery ā€¢ Cross Jurisdiction Queries Finance ā€¢ Risk Modeling & Fraud Identification ā€¢ Trade Performance Analytics ā€¢ Surveillance & Fraud Detection ā€¢ Customer Risk Analysis ā€¢ Real-time upsell, cross sales marketing offers Energy ā€¢ Smart Grid: Production Optimization ā€¢ Grid Failure Prevention ā€¢ Smart Meters ā€¢ Individual Power Grid Manufacturing ā€¢ Supply Chain Optimization PCI or Financial ā€¢ Customer Churn Analysis ā€¢ Dynamic Delivery ā€¢ Replacement Parts Healthcare & Payer ā€¢ Electronic Medical Records (EMPI) ā€¢ Clinical Trials Analysis ā€¢ Insurance Premium Determination Privacy data PCI or Financial Personal Health (PHI) Personal Health (PHI) Privacy data Personal Health (PHI) Privacy data Privacy PdCaIt ao r Financial PCI or Financial Privacy data
  • 9. Three Critical Considerations 1. Ensuring Compliance ā€¢ The Big Ps (PCI, HIPAA, Privacy), data residency, FERPA,FISMA, FERC , etc. ā€¢ 1200 laws in 63 countries 2. Reducing Breach Risk 3. Quantifying both 1. How much sensitive data? (ā€œun-announcedā€) 2. Who is adding? (ad hoc user directories) 3. Who is accessing? (sharing, selling, re-purposing) Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 9
  • 10. Lab Project ā€¢ Hadoop as R&D ā€¢ Strictly data science ā€¢ Zero $$$ or selection of Distribution ā€¢ Zero recognition of sensitive data or exposure Proof Stage ā€¢ Achieving value ā€¢ Data lake cost savings ā€¢ Line of business ownership ā€¢ Nodal expansion ā€¢ Security elements? (unknown to InfoSec) ROI Validity ā€¢ ROI and TCO validity ā€¢ Distribution selection and purchase ā€¢ The Security ā€˜A- Haā€™ moment ā€¢ Solved with legacy or penalty box Hadoop Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL On Demand Hadoop ā€¢ Full scale production ā€¢ Ad hoc new uses ā€¢ Go Faster: Spark, Kafka ā€¢ Security sanctified The Evolution of Hadoop Projects
  • 11. On-Demand Hadoop. ā€¢ Without adequate sensitive data protection, customers left to ā€œPenalty Boxingā€ Hadoop Ā» ā€œSecurity zonesā€ imposed by InfoSec Ā» Slows business, costly and cumbersome ā€¢ Data-centric protection can set those assets free Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 11
  • 12. Data Protection In Hadoop Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 12
  • 13. Security in Hadoop In Summary ā€¢ Like Cloud, Mobile, Virtualizationā€¦ Big Data drives fundamental new rules in security Ā» Ad hoc computing, wide open data sets Ā» Extended users and usages, sharing and selling Ā» 3 Vs moving to 6 Vs (automation, non-blocking) ā€¢ Problem #1 is compliance Ā» Reporting/auditing/monitoring as/more important than data security ā€¢ Data-centric protection can help Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 13
  • 14. Hadoop Security Framework Access' Defining%what%users% and%applicaHons%can%do% with%data% Technical'Concepts:' Permissions% AuthorizaHon% Perimeter' % Guarding%access%to%the% cluster%itself% % %%% Technical'Concepts:' AuthenHcaHon% Network%isolaHon% % Perimeter' Guarding%access%to%the% cluster%itself% % Technical'Concepts:' AuthenHcaHon% Network%isolaHon% % ReporHng%on%where% data%came%from%and% how%itā€™s%being%used% % Technical'Concepts:' Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL Data' ProtecHng%data%in%the% cluster%from% unauthorized%visibility% % Technical'Concepts:' EncrypHon,%TokenizaHon,% Data%masking% % Visibility' AudiHng% Lineage% % ā€¢ The 4 approaches to address security within Hadoop (Perimeter, Data, Access, Visibility) ā€¢ Dataguise discovers & protects at the data layer and provides visibility for audit reporting and data lineage
  • 15. Kerberos on Hadoop ā€¢ Kerberos (developed at MIT) has been the de-facto standard for strong authentication/authz Ā» Protection against user and service spoofing attacks, and allows for enforcement of user HDFS access permissions ā€¢ What does Kerberos Do? Ā» Establishes identity for clients, hosts, and services Ā» Prevents impersonation, passwords are never sent over the wire Ā» Tickets grant cryptographic ā€œpermissionsā€ to resources ā€¢ Kerberos is core of authentication in native Apache Hadoop from 2010 Ā» Used for access ecosystem services HDFS, JT, Oozie., for server to server traffic auth. etc. BUT complex to manage! Ā» Lots of steps for example: http://www.cloudera.com/content/cloudera--content/cloudera-- docs/CDH4/4.3.0/CDH4--Security--Guide/cdh4sg_topic_3.html Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 15 Access' Defining%what%users% and%applicaHons%can%do% with%data% % Technical'Concepts:' Permissions% AuthorizaHon% %
  • 16. MapR Improvements on Auth/Authz ā€¢ Vastly simpler Ā» But no requirements for Kerberos in core Ā» Identity represented using a ticket which is issued by MapR CLDB servers (Container Location DataBase) Ā» Core services secured by default ā€¢ Easier integration Ā» User identity independent of host or operating system Ā» Local to MapR (no external Kerberos required) ā€¢ Faster Ā» Leverage Intel accelerated hardware crypto Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 16
  • 17. Elements of Data Centric Protection ā€¢ 1. Identify which elements you want to protect via: Ā» Delimiters (structured data), name-value pairs (semi-structured) or data discovery service (unstructured) ā€¢ 2. Automated Protection Options: Ā» Automatically apply protection via: Ā» Format preserving encryption (FPE) Ā» Masking (replace, randomize, intellimask, static) Ā» Redaction (nullify) ā€¢ 3. Audit Strategy Ā» Sensitive data protection/access/lineage Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 17
  • 18. Discovery ā€¢ Within HDFS Ā» Search for sensitive data per company policy ā€“ PII, PCI,ā€¦ Ā» Handle complex data types such as addresses Ā» Process incrementally (default) to handle only the new content ā€¢ In-flight Ā» Processing data on the fly as they are ingested into Hadoop HDFS Ā» Plug-in solution for FTP, Flume, Sqoop Ā» Search for sensitive data per policy ā€“ PII, PCI, HIPAAā€¦ Ā» NEXT UP: Kafka Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 18
  • 19. How Discovery Works Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 19 ā€¢ MapReduce or Flume/FTP/Sqoop Agent Ā» Root directories and drill downs Ā» Can scan entire dataset or incrementally (watermarking) ā€¢ Runs pattern, logic, context, algorithm, and ontology filters ā€¢ Can utilize white/black lists and reference sets
  • 20. Protection Measures ā€¢ Protection plan should start with cutting Ā» What data can we delete/cut? Ā» What data can be redacted? Ā» Masking choices ā€¢ Consistency ā€¢ Realistic looking data ā€¢ Partial reveal (Intellimask) Credit Card # 4541 **** **** 3241 ā€¢ What data needs reversibility Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 20
  • 21. Encryption ā€œvsā€ Masking ā€¢ Encryption: + Reversible + Trusted with security proofs + The first hammer + De-centralized architectures - Complex - Key management - Useless without robust authentication and authorization - Data value destruction - Needs both encrypt-decrypt tooling Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 21 ā€¢ Masking: + Highest security + Realistic data + Range and value preserving + Once and done +Scale-out and distributed + No performance impact on usage + Zero need for authentication and authorization and key management - Not as well marketed - Not reversible
  • 22. Encryption ā€œvsā€ Masking ā€¢ Masking: + Highest security + Realistic data + Range and value preserving + Format-preserving and partial reveals +Scale-out and distributed + No performance impact on usage + Zero need for authentication and authorization and key management - Not as well marketed - Not reversible - Perceived to grow data ā€¢ Encryption: + Reversible + Trusted with security proofs + Format-preserving and partial reveals +Scale-out and distributed + The first hammer + De-centralized architectures - Complex - Key management - Useless without robust authentication and authorization - Data value destruction Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 22 The fundamental decision between masking and encryption comes down to reversibility: Some elements in analytics must resolve to original: (e.g. 66.249.22.145 or $34,332.12) Some elements ideal for psuedonyms: Social Security Numbers Credit Card Numbers Names
  • 23. Real-World Performance ā€¢ Leveraging the power of MapReduce to run distributed encryption or masking ā€¢ Data volume: 2.2 TB ā€¢ Run Time: 23 min ā€¢ Sensitive Data %: 8/50 Columns in 2.2 Bn rows ā€¢ Run on 360 node MapR system ā€¢ In old-word database technology, this would type of job would have taken days/week(s) Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 23
  • 24. Audit Strategy ā€¢ Essential to all goals: Compliance, breach protection, visibility and metrics ā€¢ Avoids the ā€œgotchaā€ moment Ā» Show all sensitive elements (count, location) Ā» Remediation applied Ā» Dashboard for fast access to critical policies and drill-downs for file and user action Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 24
  • 25. How It works: Detection and Protection In-flight or @Rest RDBMS Xaction Data warehouse Site WEB FTP FlDugmFel uAmgee nt Plug-in DgFlume Agent 1. Detect sensitive data 2. Protect applying masking/encryption policies Production Cluster Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL Hadoop API Discover/ Mask/Encrypt DgHDFS Agent 1. Detect sensitive data 2. Protect applying masking/encryption policies Hadoop API DGHive, HDFS bulk decryption/Java app 1. Selective decryption based on user/role and policy 1 Data Discovery and protection while loaded into HDFS 2 Data masked or encrypted in HDFS with Map/Reduce job 3 Users can now access data DGDiscover-Masker 1. In DB (Oracle, SQL.. SharePoint, Files) 2. Protect applying masking/encryption policies Sqoop DgScoop Agent 1. Detect sensitive data 2. Protect applying masking/encryption policies
  • 26. Case Studies Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 26
  • 27. Protecting sensitive data in top credit card firm Source Data Protection Analysis Credit Card Transactions Omniture Files Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL Selective access to sensitive data based on role and app 27 Objectives ! Consolidate existing payment risk analysis inside high-scale, lower cost Hadoop ! Provide tiered access authorization for multiple business apps (fraud, risk, cross-sell Solution ! MapR Hadoop for single, reliable, high performance data analysis platform ! Dataguise consistent masking enables analysis and unique index key values for de-identified data ! Unique ability to output protected data in adjacent column or appended with delimiter inside existing column to protect data while governing access via authorization rules Incremental updates to HDFS automatically protected Results Benefits ā€¢ Continuous real-time protection (job runs every 5 mins on ingest) ā€¢ Analytics draws on the secure purchasing data of 90 million credit card holders across 127 countries
  • 28. Protecting personal health info (PHI) in aggregate data lake DG FTP Agent SQL Data FTP Health Records Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL HDFS Authorization controlled through group membership in Active Directory 28 Objectives ! Reduce costly and preventable readmission, decrease mortality rates, and improve the quality of life for patients ! Internal data service model DAaaS (Data Architecture as a Service) Solution ! Solution needs to protect structured and unstructured source data in database, data warehouse, and flat file structures ! Customer required customization of encryption and key management to fit into their existing corp infrastructure and security policies ! Dataguise dashboard gives admins easy way to identify directories/files containing sensitive data Results Benefits ā€¢ Delivered a cost-effective and easy way to determine where sensitive data resides within the cluster, and how itā€™s been protected ! Seamless access to encrypted data from a variety of data access methods {Hive, Pig, Analytic tools}
  • 29. Global Tech Product Analytics DG Flume Agent Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 29 Objectives ! Aggregate logging data (product, usage, user configuration) for all smartphones worldwide t ! De-identify personal user info to ensure privacy and compliance with European/US Privacy Solution ā€¢ Customer routes all device logging data into 7 Global AWS clouds ā€¢ Uses Dataguise Flume agent to protect all sensitive data being written to Amazon S3 ā€¢ Runs Dataguise in AWS, also utilizes Dataguise EMR security agents to selectively decrypt for authorized analytics in AWS Results Benefits Apache Flume ā€¢ On-demand Hadoop for product analytics, user behavior, supply chain optimization ā€¢ High scale-out, high performance and high scale-out paramount ā€¢ 100% cloud based security Virtualized DG Secure Protected Data Amazon S3 Smartphone Device Log Collectors AWS Clouds in Korea, Singapore, US (3), UK, and Ireland
  • 30. Hadoop Data-Protection Checklist Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 30 Discover sensitive data Automate protective measures Integrate into Hadoop authorization With continuous real-time tracking Dashboards, Reports Auditing Automated Risk Assessment/Scoring Automated inference protection (roadmap)
  • 31. Thank You Ā© 2014 Dataguise Inc. All rights reserved. COMPANY CONFIDENTIAL 31 Jeremy Stieglitz VP Products jeremy@dataguise.com