SlideShare a Scribd company logo
1 of 19
1 
Fighting Cyber Fraud with Hadoop 
Niel Dunnage 
Senior Solutions Architect
Summary 
• Big Data is an increasingly powerful enterprise asset and this talk will 
explore the relationship between big data and cyber security, how we 
preserve privacy whilst exploiting the advantages of data collection 
and processing. Big Data technologies provide both governments and 
corporations powerful tools to offer more efficient and personalized 
services. The rapid adoption of these technologies has of course 
created tremendous social benefits. Unfortunately unwanted side 
effects are the potential rich pickings available to those with 
malicious intentions. Increasingly, the sophisticated cyber attacker is 
able to exploit the rich array public data to build detailed profiles on 
their adversaries to support their malicious intentions. 
2 ©2014 Cloudera, Inc. All rights reserved.
Agenda 
• Data: - The new oil 
• Defend your data 
• The security value of Big Data 
3 ©2014 Cloudera, Inc. All rights reserved. 
Source: Grant Thornton LLP 2014 Corporate General 
Counsel Survey, conducted by American Lawyer Media
Cyber Security:- Data is a valuable commodity 
• DDOS 
• Data Exfiltration 
• Confidential customer records 
• Transaction data 
• Reputation attack 
• False flag 
• Fake data 
• Insider Threat 
Operations designed to deceive in such a way that the operations 
appear as though they are being carried out by entities, groups or 
nations other than those who actually planned and executed them 
http://en.wikipedia.org/wiki/False_flag 
The @SQLiNairb hacker has released a database dump from a US 
fantasy football website (http://www.fftoday.com/), claiming that it 
was timed to coincide with the NFL draft 
@security_511 has continued to support OpSaudi, claiming further 
attacks on websites connected to Saudi Aramco. 
Anonymous Italy and Operation Green Rights (OpGR) have released the 
contents of an email account connected to an Italian steel producer, in 
connection to accusations of pollution against the company 
4 ©2014 Cloudera, Inc. All rights reserved.
Typical Security Layers 
Type Example 
Access Physical (lock and key), Virtual (Firewalls, VLANS) 
Authentication Logins – verify users are who they say they are 
Authorization Permissions – verify what a user can do 
Encryption at Rest Data protection for files on disk 
Encryption in 
Data protection on the wire 
transport 
Auditing Keep track of who accessed what 
Policy / Procedure Protect against Human Error & Social Engineering 
5 ©2014 Cloudera, Inc. All rights reserved.
6 
Cloudera’s Approach to Hadoop Security 
Comprehensive 
• Standards-based Authentication 
• Centralized, Granular Authorization 
• Native Data Protection 
• End-to-End Data Audit and Lineage 
Compliance-Ready 
• Meet compliance requirements 
• HIPAA, PCI-DSS, … 
• Encryption and key management 
Transparent 
• Security at the core 
• Minimal performance impact 
• Compatible with new components 
• Insight with compliance 
©2014 Cloudera, Inc. All rights reserved.
Defense: - Security Features 
• Hadoop Security: - Kerberos simplified deployment with 
Cloudera Manager 
• Sentry: - provides unified authorization with a single policy 
for Hive, Impala and Search 
• HDFS Extended ACL’s and HBase cell level access control 
• Navigator encrypt and key trustee deliver compliant data security 
• Via Gazzang acquisition 
• Navigator provides data management layer including audit, access 
control reviews, data classification and discovery, and lineage 
7 ©2014 Cloudera, Inc. All rights reserved.
8 
Kerberos Security 
Perimeter Security 
• Guarding access 
to the cluster 
itself 
• Technical Concepts: 
• Authentication 
• Network isolation 
Kerberos 
• Kerberos: A computer network authentication protocol that works on basis of 
tickets to allow nodes to prove identity to each other in a secure manner using 
encryption extensively 
• Messages are exchanged between: 
• Client 
• Server 
• Kerberos Key Distribution Center (KDC). 
• Note this is not part of Hadoop, but most Linux Distros come with MIT 
Kerberos KDC. 
• Passwords are not sent across network, Instead passwords are used to compute 
encryption keys 
• Authentication status is cached (don’t need to send credentials with each request) 
• Timestamps are essential to Kerberos (make sure system clocks are synchronized !) 
©2014 Cloudera, Inc. All rights reserved.
9 
Apache Sentry 
Access Security Sentry 
• Sentry provides unified authorization 
across multiple access paths 
• A single authorization policy will be enforced 
for Impala, Hive and Search 
• Role based access at Server, Database, Table or 
View granularity 
• Multi-tenant: Separate policies for each 
database / schema 
©2014 Cloudera, Inc. All rights reserved. 
• Access 
• Defining what users and 
applications can do with 
data 
• Technical Concepts: 
• Permissions 
• Authorization
10 
Cloudera Navigator 
Visibility Cloudera Navigator 
• Auditing and Access Management 
• View, granting and revoke permissions across the Hadoop stack 
• Identify access to a data asset around the time of security breach 
• Generate alert when a restricted data asset is accessed 
©2014 Cloudera, Inc. All rights reserved. 
• Lineage 
• Given a data set, trace back to the original source 
• Understand the downstream impact of purging/modifying a data set 
• Metadata Tagging and Discovery 
• Search through metadata to find data sets of interest 
• Given a data set, view schema, metadata and policies 
• Lifecycle Management 
• Automate periodic ingestion of data 
• Compress/encrypt a data set at rest 
• Purge a dataset/replicate data set to a remote site 
• Visibility 
• Reporting on where data 
came from and how it’s 
being used 
• Technical Concepts: 
• Auditing 
• Lineage
11 ©2014 Cloudera, Inc. All rights reserved.
Process Based ACL’s 
Linux File, Directory 
AES-256 Encryption 
Linux Server / VM 
Encrypt client 
12 ©Gazzang gazzang.com/products/cloudencrypt-for-aws 
GPG 
Linux Server / VM 
Key Trustee Server 
Encryption at rest 
Navigator Encrypt and Key Trustee 
• Encrypt any File, Directory 
• AES-256 Encryption 
• Unique Access controls 
• Process Based, NOT users / groups 
• 100% Transparent 
• Separation of Duties 
• Key Management 
• AES encryption keys stored on 
separate Key Trustee server 
• Key manager breach, data is safe 
• Data Server breach, data is safe
Our Design Strategy 
The Enterprise Data Hub 
A fully integrated 
Hadoop ecosystem 
13 
Interactive 
SQL 
CLOUDERA 
IMPALA 
©2014 Cloudera, Inc. All rights reserved. One pool of data 
One metadata model 
One security framework 
One set of system 
resources 
Engines 
Interactive 
Search 
CLOUDERA 
SEARCH 
Machine 
Learning 
Spark 
Mlib,MAHOUT, 
Oryx 
Math & 
Statistics 
SAS, R 
Resource Management 
YARN 
Storage 
Integration 
REST (Webhdfs), File (Fuse) Flume, Sqoop 
Metadata, Navigator 
Batch 
Processing 
Spark, 
MAPREDUCE, 
HIVE & PIG 
Stream 
Processing 
Spark 
streaming 
HDFS Hbase/ Accumulo 
TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS 
Security, Navigator, Sentry 
graph.vertices.filter{case(id, _) => 
id==13669222}.collect 
Select CPU_Met from application WHERE 
(USAGE > 1000) 
LEFT OUTER JOIN ON application_ID where 
application_type IS Non_Critical
14 
Enterprise Data Hub Users Cases 
Innovation and Advantage 
Ask bigger questions in the pursuit of discovering something incredible 
Operational Efficiency 
Perform existing workloads faster, cheaper, better 
©2013 Cloudera, Inc. All Rights Reserved. 
ETL 
Acceleration 
EDW 
Optimization 
Active 
Archive 
OSINT 
Analysis 
Fraud 
Detection 
Deep 
Exploratory 
BI 
Historical 
Compliance 
Log 
Processing 
Performance 
Management 
Risk 
Manageme 
nt
15 
Offence:- Fraud Detection 
User Cases 
• Distributed parallel execution 
with chained joins 
• Historical processing at scale 
• Machine Learning, 
malware/anomaly detection, 
spam filters etc 
• Combined real time and 
batch predictors 
15 
Fully Automated at scale
16 
Big Data Economics 
Ask bigger questions 
• Predictably process large data sets 
• Linear scaling 
• Robust and economic crypto 
security 
• Creative fail fast innovation 
• Powers productivity insights 
• Increasing infrastructure ROI 
• Increasing business ROI 
• Defeating fraudulent activity 
• Evaluating risk 
Ingest 
Innovate 
Predict Discover 
©2013 Cloudera, Inc. All Rights Reserved. 16
17 
buffer store 
Data Ingest 
• NRT Ingest 
• Flume 
• Optimized to flow real time event data 
into the Hadoop cluster 
• Spark Streaming for near real time micro 
batch aggregations 
• Twitter streaming 
• Kafka 
• Log 
• API 
• Bulk Load 
• Sqoop for structured 
• Fuse file system access 
• API 
• Web / Hue 
• Data Enrichment 
• Flume interceptors 
• Kite Morplines module 
• Configuration based interceptors that can 
enrich data. For example extracting 
facets, entity extraction applying 
regulatory tags 
©2014 Cloudera, Inc. All rights reserved. 
collect enrich 
Client 
Client 
Client 
Client 
Agent 
Agent 
Agent
18 
Near Real time Access to threats 
• View the geographic 
distribution of Slowloris 
DDOS taken from Apache 
web server logs 
• Help isolate unpatched 
servers 
• Identify source of attacks 
LogUtils.createStream(...) 
.filter(_.getText.contains(”408 Error")) 
.countByWindow(Seconds(10)) 
stream.join(historicCounts).filter { 
case (word, (curCount, oldCount)) => 
curCount > oldCount 
} 
©2014 Cloudera, Inc. All rights reserved.
19 
Machine Learning 
19 
Real-time large-scale 
machine learning predictive 
analytics infrastructure build 
on Hadoop 
• Collaborative filtering and 
recommendation 
• Classification and 
regression, 
• Clustering

More Related Content

What's hot

Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersCloudera, Inc.
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Cloudera, Inc.
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Niel Dunnage
 
Protecting health and life science organizations from breaches and ransomware
Protecting health and life science organizations from breaches and ransomwareProtecting health and life science organizations from breaches and ransomware
Protecting health and life science organizations from breaches and ransomwareCloudera, Inc.
 
Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18Cloudera, Inc.
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceCloudera, Inc.
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enoughCloudera, Inc.
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Cloudera, Inc.
 
2016 Cybersecurity Analytics State of the Union
2016 Cybersecurity Analytics State of the Union2016 Cybersecurity Analytics State of the Union
2016 Cybersecurity Analytics State of the UnionCloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosCloudera, Inc.
 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...Cloudera, Inc.
 

What's hot (20)

Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game Changers
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2
 
Protecting health and life science organizations from breaches and ransomware
Protecting health and life science organizations from breaches and ransomwareProtecting health and life science organizations from breaches and ransomware
Protecting health and life science organizations from breaches and ransomware
 
Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services Experience
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18
 
2016 Cybersecurity Analytics State of the Union
2016 Cybersecurity Analytics State of the Union2016 Cybersecurity Analytics State of the Union
2016 Cybersecurity Analytics State of the Union
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for Telcos
 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...
 

Similar to Fighting cyber fraud with hadoop

The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubDataWorks Summit
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...Cloudera, Inc.
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Cloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceCloudera, Inc.
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...BigDataEverywhere
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceGoDataDriven
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahidBigDataExpo
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
Extending Your Network Cloud Security to AWS
Extending Your Network Cloud Security to AWSExtending Your Network Cloud Security to AWS
Extending Your Network Cloud Security to AWSFidelis Cybersecurity
 
大数据数据安全
大数据数据安全大数据数据安全
大数据数据安全Jianwei Li
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopDataWorks Summit
 
Secure Your Data with Fidelis Network® for DLP
Secure Your Data with Fidelis Network® for DLPSecure Your Data with Fidelis Network® for DLP
Secure Your Data with Fidelis Network® for DLPFidelis Cybersecurity
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 

Similar to Fighting cyber fraud with hadoop (20)

The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahid
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Extending Your Network Cloud Security to AWS
Extending Your Network Cloud Security to AWSExtending Your Network Cloud Security to AWS
Extending Your Network Cloud Security to AWS
 
大数据数据安全
大数据数据安全大数据数据安全
大数据数据安全
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
 
Secure Your Data with Fidelis Network® for DLP
Secure Your Data with Fidelis Network® for DLPSecure Your Data with Fidelis Network® for DLP
Secure Your Data with Fidelis Network® for DLP
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Fighting cyber fraud with hadoop

  • 1. 1 Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect
  • 2. Summary • Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and cyber security, how we preserve privacy whilst exploiting the advantages of data collection and processing. Big Data technologies provide both governments and corporations powerful tools to offer more efficient and personalized services. The rapid adoption of these technologies has of course created tremendous social benefits. Unfortunately unwanted side effects are the potential rich pickings available to those with malicious intentions. Increasingly, the sophisticated cyber attacker is able to exploit the rich array public data to build detailed profiles on their adversaries to support their malicious intentions. 2 ©2014 Cloudera, Inc. All rights reserved.
  • 3. Agenda • Data: - The new oil • Defend your data • The security value of Big Data 3 ©2014 Cloudera, Inc. All rights reserved. Source: Grant Thornton LLP 2014 Corporate General Counsel Survey, conducted by American Lawyer Media
  • 4. Cyber Security:- Data is a valuable commodity • DDOS • Data Exfiltration • Confidential customer records • Transaction data • Reputation attack • False flag • Fake data • Insider Threat Operations designed to deceive in such a way that the operations appear as though they are being carried out by entities, groups or nations other than those who actually planned and executed them http://en.wikipedia.org/wiki/False_flag The @SQLiNairb hacker has released a database dump from a US fantasy football website (http://www.fftoday.com/), claiming that it was timed to coincide with the NFL draft @security_511 has continued to support OpSaudi, claiming further attacks on websites connected to Saudi Aramco. Anonymous Italy and Operation Green Rights (OpGR) have released the contents of an email account connected to an Italian steel producer, in connection to accusations of pollution against the company 4 ©2014 Cloudera, Inc. All rights reserved.
  • 5. Typical Security Layers Type Example Access Physical (lock and key), Virtual (Firewalls, VLANS) Authentication Logins – verify users are who they say they are Authorization Permissions – verify what a user can do Encryption at Rest Data protection for files on disk Encryption in Data protection on the wire transport Auditing Keep track of who accessed what Policy / Procedure Protect against Human Error & Social Engineering 5 ©2014 Cloudera, Inc. All rights reserved.
  • 6. 6 Cloudera’s Approach to Hadoop Security Comprehensive • Standards-based Authentication • Centralized, Granular Authorization • Native Data Protection • End-to-End Data Audit and Lineage Compliance-Ready • Meet compliance requirements • HIPAA, PCI-DSS, … • Encryption and key management Transparent • Security at the core • Minimal performance impact • Compatible with new components • Insight with compliance ©2014 Cloudera, Inc. All rights reserved.
  • 7. Defense: - Security Features • Hadoop Security: - Kerberos simplified deployment with Cloudera Manager • Sentry: - provides unified authorization with a single policy for Hive, Impala and Search • HDFS Extended ACL’s and HBase cell level access control • Navigator encrypt and key trustee deliver compliant data security • Via Gazzang acquisition • Navigator provides data management layer including audit, access control reviews, data classification and discovery, and lineage 7 ©2014 Cloudera, Inc. All rights reserved.
  • 8. 8 Kerberos Security Perimeter Security • Guarding access to the cluster itself • Technical Concepts: • Authentication • Network isolation Kerberos • Kerberos: A computer network authentication protocol that works on basis of tickets to allow nodes to prove identity to each other in a secure manner using encryption extensively • Messages are exchanged between: • Client • Server • Kerberos Key Distribution Center (KDC). • Note this is not part of Hadoop, but most Linux Distros come with MIT Kerberos KDC. • Passwords are not sent across network, Instead passwords are used to compute encryption keys • Authentication status is cached (don’t need to send credentials with each request) • Timestamps are essential to Kerberos (make sure system clocks are synchronized !) ©2014 Cloudera, Inc. All rights reserved.
  • 9. 9 Apache Sentry Access Security Sentry • Sentry provides unified authorization across multiple access paths • A single authorization policy will be enforced for Impala, Hive and Search • Role based access at Server, Database, Table or View granularity • Multi-tenant: Separate policies for each database / schema ©2014 Cloudera, Inc. All rights reserved. • Access • Defining what users and applications can do with data • Technical Concepts: • Permissions • Authorization
  • 10. 10 Cloudera Navigator Visibility Cloudera Navigator • Auditing and Access Management • View, granting and revoke permissions across the Hadoop stack • Identify access to a data asset around the time of security breach • Generate alert when a restricted data asset is accessed ©2014 Cloudera, Inc. All rights reserved. • Lineage • Given a data set, trace back to the original source • Understand the downstream impact of purging/modifying a data set • Metadata Tagging and Discovery • Search through metadata to find data sets of interest • Given a data set, view schema, metadata and policies • Lifecycle Management • Automate periodic ingestion of data • Compress/encrypt a data set at rest • Purge a dataset/replicate data set to a remote site • Visibility • Reporting on where data came from and how it’s being used • Technical Concepts: • Auditing • Lineage
  • 11. 11 ©2014 Cloudera, Inc. All rights reserved.
  • 12. Process Based ACL’s Linux File, Directory AES-256 Encryption Linux Server / VM Encrypt client 12 ©Gazzang gazzang.com/products/cloudencrypt-for-aws GPG Linux Server / VM Key Trustee Server Encryption at rest Navigator Encrypt and Key Trustee • Encrypt any File, Directory • AES-256 Encryption • Unique Access controls • Process Based, NOT users / groups • 100% Transparent • Separation of Duties • Key Management • AES encryption keys stored on separate Key Trustee server • Key manager breach, data is safe • Data Server breach, data is safe
  • 13. Our Design Strategy The Enterprise Data Hub A fully integrated Hadoop ecosystem 13 Interactive SQL CLOUDERA IMPALA ©2014 Cloudera, Inc. All rights reserved. One pool of data One metadata model One security framework One set of system resources Engines Interactive Search CLOUDERA SEARCH Machine Learning Spark Mlib,MAHOUT, Oryx Math & Statistics SAS, R Resource Management YARN Storage Integration REST (Webhdfs), File (Fuse) Flume, Sqoop Metadata, Navigator Batch Processing Spark, MAPREDUCE, HIVE & PIG Stream Processing Spark streaming HDFS Hbase/ Accumulo TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS Security, Navigator, Sentry graph.vertices.filter{case(id, _) => id==13669222}.collect Select CPU_Met from application WHERE (USAGE > 1000) LEFT OUTER JOIN ON application_ID where application_type IS Non_Critical
  • 14. 14 Enterprise Data Hub Users Cases Innovation and Advantage Ask bigger questions in the pursuit of discovering something incredible Operational Efficiency Perform existing workloads faster, cheaper, better ©2013 Cloudera, Inc. All Rights Reserved. ETL Acceleration EDW Optimization Active Archive OSINT Analysis Fraud Detection Deep Exploratory BI Historical Compliance Log Processing Performance Management Risk Manageme nt
  • 15. 15 Offence:- Fraud Detection User Cases • Distributed parallel execution with chained joins • Historical processing at scale • Machine Learning, malware/anomaly detection, spam filters etc • Combined real time and batch predictors 15 Fully Automated at scale
  • 16. 16 Big Data Economics Ask bigger questions • Predictably process large data sets • Linear scaling • Robust and economic crypto security • Creative fail fast innovation • Powers productivity insights • Increasing infrastructure ROI • Increasing business ROI • Defeating fraudulent activity • Evaluating risk Ingest Innovate Predict Discover ©2013 Cloudera, Inc. All Rights Reserved. 16
  • 17. 17 buffer store Data Ingest • NRT Ingest • Flume • Optimized to flow real time event data into the Hadoop cluster • Spark Streaming for near real time micro batch aggregations • Twitter streaming • Kafka • Log • API • Bulk Load • Sqoop for structured • Fuse file system access • API • Web / Hue • Data Enrichment • Flume interceptors • Kite Morplines module • Configuration based interceptors that can enrich data. For example extracting facets, entity extraction applying regulatory tags ©2014 Cloudera, Inc. All rights reserved. collect enrich Client Client Client Client Agent Agent Agent
  • 18. 18 Near Real time Access to threats • View the geographic distribution of Slowloris DDOS taken from Apache web server logs • Help isolate unpatched servers • Identify source of attacks LogUtils.createStream(...) .filter(_.getText.contains(”408 Error")) .countByWindow(Seconds(10)) stream.join(historicCounts).filter { case (word, (curCount, oldCount)) => curCount > oldCount } ©2014 Cloudera, Inc. All rights reserved.
  • 19. 19 Machine Learning 19 Real-time large-scale machine learning predictive analytics infrastructure build on Hadoop • Collaborative filtering and recommendation • Classification and regression, • Clustering

Editor's Notes

  1. Data is valuable both as asset and to your customers. As the guardians of your customers’ data, you provide services using that data such bank accounts and online tax disks. Of course you need to defend that data on your customer behalf if you want to maintain their loyalty. This talk will explore how using Cloudera’s Enterprise Data Hub you can do that, but also how you can use this technology to also play some offence and use the immense computational power to evaluate how you customer are being subject to cyber attacks and how you can help them.
  2. In the same way that data is indicative to business about purchase behavior and intent. So it is valuable to the bad guys whether to damage reputation, or simply to trade. The bad guys have the advantage of being able to aggregate from numerous data sources without worrying about regulation other than getting caught. As business moves their assets and knowledge capital online, these asset are increasingly spread throughout the supply chain business. For large enterprises protective this supply chain is challenging especially where it is outsourced Multi-tenant secure clusters running EDH could be the solution, resources are pooled together to create capability whereby all of the instrumentation and data assets are stored in the same data lake or reservoir, partitioned by robust security.
  3. Let’s take a look at some typical security layers that are used to protect these assets.
  4. Cloudera Enteprise Data Hub provides enterprise class security for Hadoop to specifically to enable complex and challenging regulatory workloads. Incorporating many upstream features from Intel’s project Rhino including encryption at rest and in motion with hardware enhanced performance, better use of role based access control, high levels of granularity such as cell level access control in Hbase and end to end audit compliance.
  5. YARN Static and Dynamic resource pools restrict resource utilization in a shared multi-tenant environment, thus contributing to availability of the cluster Encryption ensures the integrity and indeed the confidentiality of the data
  6. All communications including remote procedure calls between nodes for are authorized with a valid ticket. The KDC may feature a one way trust with the corporate directory or indeed be fully integrated using SSSD
  7. Role based access control to underlying data facilitate multi tenant (within the Enterprise) access to data
  8. Tracking the provenance of your data, throughout storage and processing chain is vital particularly if that data is subject to compliance regulation such as PCI Why you need Navigator: Lots of Data Landing in Cloudera Enterprise Huge quantities Many different sources – structured and unstructured Varying levels of sensitivity Many Users Working with the Data Administrators and compliance officers Analysts and data scientists Business users Need to Effectively Control and Consume Data Get visibility and control over the environment Discover and explore data
  9. Encryption in motion, SSL enabled for services with authenticated RPC calls on the cluster. The key trustee server can be integrated with existing HSMs in order that the master encryption keys be both tamper proof and revokable and work with existing key management policies. The access controls are processed based which effectively prevents a root user access to the unencrypted contents of a file. An important and valuable separation of duties.
  10. Our design strategy is to tightly integrate different processing paradigms into the Hadoop system. Resources are pooled enable different computation workloads such Map Reduce and Impala to utilize common infrastructure. Interactive SQL, batch processing whether map reduce, spark or stream processing such as Spark streaming are just another applications that you bring to your data. These are integrated with Hadoop’s existing security and resource management frameworks and is completely interoperable with existing data formats and processing engines such as Map Reduce. One pool of data Storage platforms (HDFS & HBase) Open data formats (files & records) Shared across multiple processing frameworks One metadata model No synchronization of metadata between 2 different systems (analytical DBMS and Hadoop) Same metadata used by other components within Hadoop itself (Hive, Pig, Impala, etc.) One security framework Single model for all of Hadoop Doesn’t require “turning off” any portion of native Hadoop security One set of system resources One set of nodes – storage, CPU, memory One management console Integrated resource management Scale linearly as capacity or performance needs grow
  11. The Enterprise data hub infrastructure can support an array of user cases that would otherwise be locked in expensive limited capability silos. Those user cases can be applied to the full data set more productively, at lower costs. As a result the economics facilitate the overall capability to ask those bigger questions. These user cases apply across domains encompassing management, security, HR and business intelligence.
  12. Complex Map Reduce jobs are often a chained series of task that involve Maps Reduce Maps Maps Reduce and so on. Apache Spark significantly simplifies the coding of these complex pipelines with a common API for both batch and streaming the programmer can then explicitly write to disk at the most optimum time
  13. Enterprises are increasingly using Hadoop and the economics of BigData to drive efficiencies in the way they provide and consume IT services. BigData economics allow the entirety of both the structured management metrics from IT infrastructure to be combined with the unstructured supporting commentary. This allows for new types of exploitation such as machine learning and predictive analysis. The innovation begins with continuously ingesting the metrification and supporting commentary that is describing the current performance. Discovery evaluates the historical patterns of performance that build up over time using machine learning to construct a model. These patterns in turn provide the insights into the predictions that those signals often illustrate. Cases include variations in efficiencies of manufacturers disks for variable such as power consumption, developer team code performance, impact of training and certification. All of which enables further innovations and gains based on facts.
  14. Flume A resilient framework for delivering event data to the Hadoop cluster. Sources, Channels and Sinks Kite is a set of libraries tools and features to build Hadoop applications Morphlines provides configuration driven tools that can extract facets using interceptors on the ingestion pipeline to enrich with meta data records
  15. In this sample all of the Apache web server logs are filtered for http 408 errors. The faceting by country using geoip lookup helps identify the source of the DDOS Slowloris is an Old DDOS trick whereby a web client very slooowly makes a connection to the web server, assuming Apache is patched the slowloris is revealed by filtering on the 408 errors.
  16. It can continuously build models from a stream of data at large scale using Apache Hadoop. It also serves queries of those models in real-time via an HTTP REST API, and can update models approximately in response to streaming new data. This two-tier design, comprised of the Computation Layer and Serving Layer, respectively, implement a lambda architecture. Collaborative filtering works like people that search for this search for that. Collaborative filtering is a form of supervised learning, where a value is predicted for new inputs based on known values for previous inputs often used for Spam filers. Clustering will group using algorithms such as Kmeans based on common features. Vectorising using TF-IDF for term frequency inverse document frequency which infers how important a word might be in a document. These can then be classified using an algorithm such as Naïve bayes Useful to extract as a feature that can then be clustered using Kmeans across a corpus of documents. Often used by search engines to score and rank documents according to a query. So for example stream of data from a Twitter channel sharing a hash tag Choices Oryx 1 Map Reduce based, Oryx 2 Spark based and Spark Mlib Doing so in memory on Spark is good for iterative algorithms avoid the need to materialize the data and jobs such as monte carlo simulations