Advanced Threat Detection on Streaming Data

®
© 2016 MapR Technologies 1®
© 2016 MapR Technologies 1© 2016 MapR Technologies
®
Advanced Threat Detection on Streaming Data
Carol McDonald, Solution Architect
Strata + Hadoop World March 2016

®
© 2016 MapR Technologies 2
Meeting Advanced Threats Head On
•  Solutionary: Managed Security Services
Provider
–  Provides Threat Intelligence as a
Service

®
Real-time Detection of Advanced Threats
•  Objective:
–  Provide real time threat Intelligence on
trillions of messages per year
–  Store and process lots of unstructured
security data
–  Combine machine learning and predictive
analytics

®
Event-based Detection of Advanced Threats
Threat
Alerts Store and
Process
Unstructured
Data
Anomaly
Detection
Real-time Threat
Intelligence
Predictive Analytics
Machine Learning

®
Meeting Advanced Threats Head On
•  Challenges:
–  Expanding Data storage
in RDBMS expensive $$
–  Could not process
unstructured data at scale
Scaling Unstructured
Data Processing
Challenges
RDBMS Economics Unstructured Data

®
Serve DataStore DataCollect Data
What Did The Solution Need to do ?
Process DataData Sources
? ? ? ?
Security
Feeds
HTTP
Syslog
Firewall
Other

®
How to do this with High Performance at Scale?
•  Parallel , Partitioned = fast , scalable

®
Data Ingest
Solution: Stream Processing Architecture
Topics
Sources
Security
Feeds
HTTP
Syslog
Firewall
Other
Data Ingest:
•  Kafka or MapR Streams: fast
distributed messaging
Topics
Topics
Topics

®
Fast Distributed Messaging
•  Topics organize
events into categories
•  Topics decouple
producers from
Consumers

®
Fast Distributed Messaging
•  Topics are partitioned
for fast throughput
and scalability

®
•  Parallel , Partitioned:
–  Messaging

®
Data Ingest
Complex Event Processing with Storm and Esper
Stream
Processing
Parser
Bolt
Kafka
Spout
Enrich
Bolts
Esper
Kakfa
Bolt
Esper
Spout
Topic
Alert
Bolts
Cross topology correlation of events
•  Stream Processing:
–  Storm: distributed real
time computation
–  Esper: Complex Event
Processing
Topics
Topics
Topics

®
Complex Event Processing with Esper
•  Detect a related set or pattern of events
within a time window
•  Example Pattern Excess Login Failure:
–  Same user, same source login failure
SELECT * FROM
Event(ip_src IS NOT NULL
AND ec_activity=’Logon’
AND ec_outcome = ‘Failure’)
.std:groupwin(ip_src).win:time (300 sec)
GROUP BY ip_src HAVING COUNT(*) = 10

®
–  Processing

®
Real-time Detection of Advanced Threats: Examples
Data transferred
from critical
database servers
Large traffic flows
from a host to a
given IP address
Employee accessing
database servers at
unusual hours
User logging in from two
different countries within
a short window

®
Complex Event Processing with Storm and Esper
Cross-topology correlation of events

®
NoSQL
Storage
Stream
Processing
MapR-FS
MapR-DB
HDFS
Bolt
Index
Bolt
HBase
Bolt
•  NoSQL Storage
–  HBase: fast scalable storage and
caching
–  Elastic Search: Indexing for real-
time search analytics

®
Scalability with HBase (MapR-DB)
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Storage ModelRDBMS HBase
Normalized schema à Joins for
queries can cause bottleneck
De-normalized schema à Data that
is read together is stored together

®
MapR-DB (HBase API) is Designed to Scale
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Fast Reads and Writes by Key! Data is automatically partitioned
by Key Range!

®
–  Storage

®
NoSQL
Storage
MapR-FS
MapR-DB
•  Machine Learning
–  thread modeling
–  anomaly detection
•  Security Analytics
Serve Data

®
Data Driven Forensics Investigation
•  What can the data tell us?
–  What happened within a time range?
–  How did the threat get in?
–  What are all the activities associated with
a specific IP/user?
–  How much data was affected?
–  Has this occurred elsewhere in the past?

®

®
Key to Real Time: Event-based Data Flows
Key to Scale = Parallel Partitioned:
•  Messaging
•  Processing
•  Storage

®
Stream Processing
Building a Complete Data Architecture
Sources/Apps Bulk Processing
Web-Scale Storage
MapR-FS MapR-DB MapR Streams
Event StreamingDatabase

®
Key to Real Time: ConvergenceApps
High Availability Data
Protection
Unified Security Real Time Multi-tenancy
UnifiedManagement&Monitoring
Customer ExperienceData Architecture
Optimization
Security Investigation &
Event Management
Operational
Intelligence
Managed Services &
Custom Apps
Event Streaming Database Storage
Converged
Data Platform

®
Why Hadoop for Security Analytics?
•  Cost effective for storing and analyzing
large volumes of data in real-time
•  Provides search & query, machine
learning for activity correlation and
anomaly detection
•  When it comes to Hadoop, select an
enterprise distribution (e.g. MapR
Converged Data Platform) so you can
focus on your primary objective

®
To Learn More:
•  http://learn.mapr.com/

®
To Learn More:
•  Download example code
–  https://github.com/caroljmcdonald/mapr-streams-sparkstreaming-hbase
•  Read explanation of example code
–  https://www.mapr.com/blog/spark-streaming-hbase

®
Q&A
@mapr
https://www.mapr.com/blog/author/carol-mcdonald
Engage with us!
mapr-technologies

Advanced Threat Detection on Streaming Data

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (13)

Similar to Advanced Threat Detection on Streaming Data

Similar to Advanced Threat Detection on Streaming Data (20)

More from Carol McDonald

More from Carol McDonald (13)

Recently uploaded

Recently uploaded (20)

Advanced Threat Detection on Streaming Data