Fighting cyber fraud with hadoop

1
Fighting Cyber Fraud with Hadoop
Niel Dunnage
Senior Solutions Architect

Summary
• Big Data is an increasingly powerful enterprise asset and this talk will
explore the relationship between big data and cyber security, how we
preserve privacy whilst exploiting the advantages of data collection
and processing. Big Data technologies provide both governments and
corporations powerful tools to offer more efficient and personalized
services. The rapid adoption of these technologies has of course
created tremendous social benefits. Unfortunately unwanted side
effects are the potential rich pickings available to those with
malicious intentions. Increasingly, the sophisticated cyber attacker is
able to exploit the rich array public data to build detailed profiles on
their adversaries to support their malicious intentions.
2 ©2014 Cloudera, Inc. All rights reserved.

Agenda
• Data: - The new oil
• Defend your data
• The security value of Big Data
Source: Grant Thornton LLP 2014 Corporate General
Counsel Survey, conducted by American Lawyer Media

Cyber Security:- Data is a valuable commodity
• DDOS
• Data Exfiltration
• Confidential customer records
• Transaction data
• Reputation attack
• False flag
• Fake data
• Insider Threat
Operations designed to deceive in such a way that the operations
appear as though they are being carried out by entities, groups or
nations other than those who actually planned and executed them
http://en.wikipedia.org/wiki/False_flag
The @SQLiNairb hacker has released a database dump from a US
fantasy football website (http://www.fftoday.com/), claiming that it
was timed to coincide with the NFL draft
@security_511 has continued to support OpSaudi, claiming further
attacks on websites connected to Saudi Aramco.
Anonymous Italy and Operation Green Rights (OpGR) have released the
contents of an email account connected to an Italian steel producer, in
connection to accusations of pollution against the company

Typical Security Layers
Type Example
Access Physical (lock and key), Virtual (Firewalls, VLANS)
Authentication Logins – verify users are who they say they are
Authorization Permissions – verify what a user can do
Encryption at Rest Data protection for files on disk
Encryption in
Data protection on the wire
transport
Auditing Keep track of who accessed what
Policy / Procedure Protect against Human Error & Social Engineering

6
Cloudera’s Approach to Hadoop Security
Comprehensive
• Standards-based Authentication
• Centralized, Granular Authorization
• Native Data Protection
• End-to-End Data Audit and Lineage
Compliance-Ready
• Meet compliance requirements
• HIPAA, PCI-DSS, …
• Encryption and key management
Transparent
• Security at the core
• Minimal performance impact
• Compatible with new components
• Insight with compliance
©2014 Cloudera, Inc. All rights reserved.

Defense: - Security Features
• Hadoop Security: - Kerberos simplified deployment with
Cloudera Manager
• Sentry: - provides unified authorization with a single policy
for Hive, Impala and Search
• HDFS Extended ACL’s and HBase cell level access control
• Navigator encrypt and key trustee deliver compliant data security
• Via Gazzang acquisition
• Navigator provides data management layer including audit, access
control reviews, data classification and discovery, and lineage

8
Kerberos Security
Perimeter Security
• Guarding access
to the cluster
itself
• Technical Concepts:
• Authentication
• Network isolation
Kerberos
• Kerberos: A computer network authentication protocol that works on basis of
tickets to allow nodes to prove identity to each other in a secure manner using
encryption extensively
• Messages are exchanged between:
• Client
• Server
• Kerberos Key Distribution Center (KDC).
• Note this is not part of Hadoop, but most Linux Distros come with MIT
Kerberos KDC.
• Passwords are not sent across network, Instead passwords are used to compute
encryption keys
• Authentication status is cached (don’t need to send credentials with each request)
• Timestamps are essential to Kerberos (make sure system clocks are synchronized !)

9
Apache Sentry
Access Security Sentry
• Sentry provides unified authorization
across multiple access paths
• A single authorization policy will be enforced
for Impala, Hive and Search
• Role based access at Server, Database, Table or
View granularity
• Multi-tenant: Separate policies for each
database / schema
• Access
• Defining what users and
applications can do with
data
• Permissions
• Authorization

10
Cloudera Navigator
Visibility Cloudera Navigator
• Auditing and Access Management
• View, granting and revoke permissions across the Hadoop stack
• Identify access to a data asset around the time of security breach
• Generate alert when a restricted data asset is accessed
• Lineage
• Given a data set, trace back to the original source
• Understand the downstream impact of purging/modifying a data set
• Metadata Tagging and Discovery
• Search through metadata to find data sets of interest
• Given a data set, view schema, metadata and policies
• Lifecycle Management
• Automate periodic ingestion of data
• Compress/encrypt a data set at rest
• Purge a dataset/replicate data set to a remote site
• Visibility
• Reporting on where data
came from and how it’s
being used
• Auditing
• Lineage

Process Based ACL’s
Linux File, Directory
AES-256 Encryption
Linux Server / VM
Encrypt client
12 ©Gazzang gazzang.com/products/cloudencrypt-for-aws
GPG
Linux Server / VM
Key Trustee Server
Encryption at rest
Navigator Encrypt and Key Trustee
• Encrypt any File, Directory
• AES-256 Encryption
• Unique Access controls
• Process Based, NOT users / groups
• 100% Transparent
• Separation of Duties
• Key Management
• AES encryption keys stored on
separate Key Trustee server
• Key manager breach, data is safe
• Data Server breach, data is safe

Our Design Strategy
The Enterprise Data Hub
A fully integrated
Hadoop ecosystem
13
Interactive
SQL
CLOUDERA
IMPALA
©2014 Cloudera, Inc. All rights reserved. One pool of data
One metadata model
One security framework
One set of system
resources
Engines
Interactive
Search
CLOUDERA
SEARCH
Machine
Learning
Spark
Mlib,MAHOUT,
Oryx
Math &
Statistics
SAS, R
Resource Management
YARN
Storage
Integration
REST (Webhdfs), File (Fuse) Flume, Sqoop
Metadata, Navigator
Batch
Processing
Spark,
MAPREDUCE,
HIVE & PIG
Stream
Processing
Spark
streaming
HDFS Hbase/ Accumulo
TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS
Security, Navigator, Sentry
graph.vertices.filter{case(id, _) =>
id==13669222}.collect
Select CPU_Met from application WHERE
(USAGE > 1000)
LEFT OUTER JOIN ON application_ID where
application_type IS Non_Critical

14
Enterprise Data Hub Users Cases
Innovation and Advantage
Ask bigger questions in the pursuit of discovering something incredible
Operational Efficiency
Perform existing workloads faster, cheaper, better
©2013 Cloudera, Inc. All Rights Reserved.
ETL
Acceleration
EDW
Optimization
Active
Archive
OSINT
Analysis
Fraud
Detection
Deep
Exploratory
BI
Historical
Compliance
Log
Processing
Performance
Management
Risk
Manageme
nt

15
Offence:- Fraud Detection
User Cases
• Distributed parallel execution
with chained joins
• Historical processing at scale
• Machine Learning,
malware/anomaly detection,
spam filters etc
• Combined real time and
batch predictors
15
Fully Automated at scale

16
Big Data Economics
Ask bigger questions
• Predictably process large data sets
• Linear scaling
• Robust and economic crypto
security
• Creative fail fast innovation
• Powers productivity insights
• Increasing infrastructure ROI
• Increasing business ROI
• Defeating fraudulent activity
• Evaluating risk
Ingest
Innovate
Predict Discover
©2013 Cloudera, Inc. All Rights Reserved. 16

17
buffer store
Data Ingest
• NRT Ingest
• Flume
• Optimized to flow real time event data
into the Hadoop cluster
• Spark Streaming for near real time micro
batch aggregations
• Twitter streaming
• Kafka
• Log
• API
• Bulk Load
• Sqoop for structured
• Fuse file system access
• API
• Web / Hue
• Data Enrichment
• Flume interceptors
• Kite Morplines module
• Configuration based interceptors that can
enrich data. For example extracting
facets, entity extraction applying
regulatory tags
collect enrich
Client
Client
Client
Client
Agent
Agent
Agent

18
Near Real time Access to threats
• View the geographic
distribution of Slowloris
DDOS taken from Apache
web server logs
• Help isolate unpatched
servers
• Identify source of attacks
LogUtils.createStream(...)
.filter(_.getText.contains(”408 Error"))
.countByWindow(Seconds(10))
stream.join(historicCounts).filter {
case (word, (curCount, oldCount)) =>
curCount > oldCount
}

19
Machine Learning
19
Real-time large-scale
machine learning predictive
analytics infrastructure build
on Hadoop
• Collaborative filtering and
recommendation
• Classification and
regression,
• Clustering

Fighting cyber fraud with hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fighting cyber fraud with hadoop

Similar to Fighting cyber fraud with hadoop (20)

Recently uploaded

Recently uploaded (20)

Fighting cyber fraud with hadoop

Editor's Notes