SlideShare a Scribd company logo
®
© 2016 MapR Technologies 1®
© 2016 MapR Technologies 1© 2016 MapR Technologies
®
Advanced Threat Detection on Streaming Data
Carol McDonald, Solution Architect
Strata + Hadoop World March 2016
®
© 2016 MapR Technologies 2®
© 2016 MapR Technologies 2
Meeting Advanced Threats Head On
•  Solutionary: Managed Security Services
Provider
–  Provides Threat Intelligence as a
Service
®
© 2016 MapR Technologies 3®
© 2016 MapR Technologies 3
Real-time Detection of Advanced Threats
•  Objective:
–  Provide real time threat Intelligence on
trillions of messages per year
–  Store and process lots of unstructured
security data
–  Combine machine learning and predictive
analytics
®
© 2016 MapR Technologies 4®
© 2016 MapR Technologies 4
Event-based Detection of Advanced Threats
Threat
Alerts Store and
Process
Unstructured
Data
Anomaly
Detection
Real-time Threat
Intelligence
Predictive Analytics
Machine Learning
®
© 2016 MapR Technologies 5®
© 2016 MapR Technologies 5
Meeting Advanced Threats Head On
•  Challenges:
–  Expanding Data storage
in RDBMS expensive $$
–  Could not process
unstructured data at scale
Scaling Unstructured
Data Processing
Challenges
RDBMS Economics Unstructured Data
®
© 2016 MapR Technologies 6®
© 2016 MapR Technologies 6
Serve DataStore DataCollect Data
What Did The Solution Need to do ?
Process DataData Sources
? ? ? ?
Security
Feeds
HTTP
Syslog
Firewall
Other
®
© 2016 MapR Technologies 7®
© 2016 MapR Technologies 7
How to do this with High Performance at Scale?
•  Parallel , Partitioned = fast , scalable
®
© 2016 MapR Technologies 8®
© 2016 MapR Technologies 8
Data Ingest
Solution: Stream Processing Architecture
Topics
Sources
Security
Feeds
HTTP
Syslog
Firewall
Other
Data Ingest:
•  Kafka or MapR Streams: fast
distributed messaging
Topics
Topics
Topics
®
© 2016 MapR Technologies 9®
© 2016 MapR Technologies 9
Fast Distributed Messaging
•  Topics organize
events into categories
•  Topics decouple
producers from
Consumers
®
© 2016 MapR Technologies 10®
© 2016 MapR Technologies 10
Fast Distributed Messaging
•  Topics are partitioned
for fast throughput
and scalability
®
© 2016 MapR Technologies 11®
© 2016 MapR Technologies 11
How to do this with High Performance at Scale?
•  Parallel , Partitioned:
–  Messaging
®
© 2016 MapR Technologies 12®
© 2016 MapR Technologies 12
Data Ingest
Complex Event Processing with Storm and Esper
Stream
Processing
Parser
Bolt
Kafka
Spout
Enrich
Bolts
Esper
Kakfa
Bolt
Esper
Spout
Topic
Alert
Bolts
Cross topology correlation of events
•  Stream Processing:
–  Storm: distributed real
time computation
–  Esper: Complex Event
Processing
Topics
Topics
Topics
®
© 2016 MapR Technologies 13®
© 2016 MapR Technologies 13
Complex Event Processing with Esper
•  Detect a related set or pattern of events
within a time window
•  Example Pattern Excess Login Failure:
–  Same user, same source login failure
SELECT * FROM
Event(ip_src IS NOT NULL
AND ec_activity=’Logon’
AND ec_outcome = ‘Failure’)
.std:groupwin(ip_src).win:time (300 sec)
GROUP BY ip_src HAVING COUNT(*) = 10
®
© 2016 MapR Technologies 14®
© 2016 MapR Technologies 14
How to do this with High Performance at Scale?
•  Parallel , Partitioned:
–  Processing
®
© 2016 MapR Technologies 15®
© 2016 MapR Technologies 15
Real-time Detection of Advanced Threats: Examples
Data transferred
from critical
database servers
Large traffic flows
from a host to a
given IP address
Employee accessing
database servers at
unusual hours
User logging in from two
different countries within
a short window
®
© 2016 MapR Technologies 16®
© 2016 MapR Technologies 16
Complex Event Processing with Storm and Esper
Cross-topology correlation of events
®
© 2016 MapR Technologies 17®
© 2016 MapR Technologies 17
NoSQL
Storage
Solution: Stream Processing Architecture
Stream
Processing
MapR-FS
MapR-DB
HDFS
Bolt
Index
Bolt
HBase
Bolt
•  NoSQL Storage
–  HBase: fast scalable storage and
caching
–  Elastic Search: Indexing for real-
time search analytics
®
© 2016 MapR Technologies 18®
© 2016 MapR Technologies 18
Scalability with HBase (MapR-DB)
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Storage ModelRDBMS HBase
Normalized schema à Joins for
queries can cause bottleneck
De-normalized schema à Data that
is read together is stored together
®
© 2016 MapR Technologies 19®
© 2016 MapR Technologies 19
MapR-DB (HBase API) is Designed to Scale
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Fast Reads and Writes by Key! Data is automatically partitioned
by Key Range!
®
© 2016 MapR Technologies 20®
© 2016 MapR Technologies 20
How to do this with High Performance at Scale?
•  Parallel , Partitioned:
–  Storage
®
© 2016 MapR Technologies 21®
© 2016 MapR Technologies 21
NoSQL
Storage
Solution: Stream Processing Architecture
MapR-FS
MapR-DB
•  Machine Learning
–  thread modeling
–  anomaly detection
•  Security Analytics
Serve Data
®
© 2016 MapR Technologies 22®
© 2016 MapR Technologies 22
Data Driven Forensics Investigation
•  What can the data tell us?
–  What happened within a time range?
–  How did the threat get in?
–  What are all the activities associated with
a specific IP/user?
–  How much data was affected?
–  Has this occurred elsewhere in the past?
®
© 2016 MapR Technologies 23®
© 2016 MapR Technologies 23
Solution: Stream Processing Architecture
®
© 2016 MapR Technologies 24®
© 2016 MapR Technologies 24
Key to Real Time: Event-based Data Flows
Key to Scale = Parallel Partitioned:
•  Messaging
•  Processing
•  Storage
®
© 2016 MapR Technologies 25®
© 2016 MapR Technologies 25
Stream Processing
Building a Complete Data Architecture
Sources/Apps Bulk Processing
Web-Scale Storage
MapR-FS MapR-DB MapR Streams
Event StreamingDatabase
®
© 2016 MapR Technologies 26®
© 2016 MapR Technologies 26
Key to Real Time: ConvergenceApps
High Availability Data
Protection
Unified Security Real Time Multi-tenancy
UnifiedManagement&Monitoring
Customer ExperienceData Architecture
Optimization
Security Investigation &
Event Management
Operational
Intelligence
Managed Services &
Custom Apps
Event Streaming Database Storage
Converged
Data Platform
®
© 2016 MapR Technologies 27®
© 2016 MapR Technologies 27
Why Hadoop for Security Analytics?
•  Cost effective for storing and analyzing
large volumes of data in real-time
•  Provides search & query, machine
learning for activity correlation and
anomaly detection
•  When it comes to Hadoop, select an
enterprise distribution (e.g. MapR
Converged Data Platform) so you can
focus on your primary objective
®
© 2016 MapR Technologies 28®
© 2016 MapR Technologies 28
To Learn More:
•  http://learn.mapr.com/
®
© 2016 MapR Technologies 29®
© 2016 MapR Technologies 29
To Learn More:
•  Download example code
–  https://github.com/caroljmcdonald/mapr-streams-sparkstreaming-hbase
•  Read explanation of example code
–  https://www.mapr.com/blog/spark-streaming-hbase
®
© 2016 MapR Technologies 30®
© 2016 MapR Technologies 30
Q&A
@mapr
https://www.mapr.com/blog/author/carol-mcdonald
Engage with us!
mapr-technologies

More Related Content

What's hot

NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
MapR Technologies
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
MapR Technologies
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
Carol McDonald
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
MapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
MapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Ian Downard
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
Carol McDonald
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
 

What's hot (19)

NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 

Viewers also liked

Real-Time Data Feeds Using the Streaming API
Real-Time Data Feeds Using the Streaming APIReal-Time Data Feeds Using the Streaming API
Real-Time Data Feeds Using the Streaming API
Salesforce Developers
 
Interoperable Web Services with JAX-WS and WSIT
Interoperable Web Services with JAX-WS and WSITInteroperable Web Services with JAX-WS and WSIT
Interoperable Web Services with JAX-WS and WSIT
Carol McDonald
 
Apache spark when things go wrong
Apache spark   when things go wrongApache spark   when things go wrong
Apache spark when things go wrong
Pawel Szulc
 
Introduction to type classes
Introduction to type classesIntroduction to type classes
Introduction to type classes
Pawel Szulc
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
Carol McDonald
 
Apache spark workshop
Apache spark workshopApache spark workshop
Apache spark workshop
Pawel Szulc
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
Wojciech Pituła
 
Kafka overview and use cases
Kafka overview and use casesKafka overview and use cases
Kafka overview and use cases
Indrajeet Kumar
 
Stock Prediction Using NLP and Deep Learning
Stock Prediction Using NLP and Deep Learning Stock Prediction Using NLP and Deep Learning
Stock Prediction Using NLP and Deep Learning
Keon Kim
 
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
DataStax Academy
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
Ran Silberman
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
Guido Schmutz
 

Viewers also liked (13)

Real-Time Data Feeds Using the Streaming API
Real-Time Data Feeds Using the Streaming APIReal-Time Data Feeds Using the Streaming API
Real-Time Data Feeds Using the Streaming API
 
Interoperable Web Services with JAX-WS and WSIT
Interoperable Web Services with JAX-WS and WSITInteroperable Web Services with JAX-WS and WSIT
Interoperable Web Services with JAX-WS and WSIT
 
Apache spark when things go wrong
Apache spark   when things go wrongApache spark   when things go wrong
Apache spark when things go wrong
 
Introduction to type classes
Introduction to type classesIntroduction to type classes
Introduction to type classes
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Apache spark workshop
Apache spark workshopApache spark workshop
Apache spark workshop
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Kafka overview and use cases
Kafka overview and use casesKafka overview and use cases
Kafka overview and use cases
 
Stock Prediction Using NLP and Deep Learning
Stock Prediction Using NLP and Deep Learning Stock Prediction Using NLP and Deep Learning
Stock Prediction Using NLP and Deep Learning
 
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 

Similar to Advanced Threat Detection on Streaming Data

Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
Nitin Kumar
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
Julius Remigio, CBIP
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
MapR Technologies
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
MapR Technologies
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
MapR Technologies
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Matt Stubbs
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
 
PrEstoCloud : PROACTIVE CLOUD RESOURCES MANAGEMENT AT THE EDGE FOR EFFICIENT ...
PrEstoCloud : PROACTIVE CLOUD RESOURCES MANAGEMENT AT THE EDGE FOR EFFICIENT ...PrEstoCloud : PROACTIVE CLOUD RESOURCES MANAGEMENT AT THE EDGE FOR EFFICIENT ...
PrEstoCloud : PROACTIVE CLOUD RESOURCES MANAGEMENT AT THE EDGE FOR EFFICIENT ...
OW2
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
MapR Technologies
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
MapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
MapR Technologies
 

Similar to Advanced Threat Detection on Streaming Data (20)

Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
PrEstoCloud : PROACTIVE CLOUD RESOURCES MANAGEMENT AT THE EDGE FOR EFFICIENT ...
PrEstoCloud : PROACTIVE CLOUD RESOURCES MANAGEMENT AT THE EDGE FOR EFFICIENT ...PrEstoCloud : PROACTIVE CLOUD RESOURCES MANAGEMENT AT THE EDGE FOR EFFICIENT ...
PrEstoCloud : PROACTIVE CLOUD RESOURCES MANAGEMENT AT THE EDGE FOR EFFICIENT ...
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 

More from Carol McDonald

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
Carol McDonald
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Carol McDonald
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Carol McDonald
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
Carol McDonald
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
Carol McDonald
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
Carol McDonald
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
Carol McDonald
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
Carol McDonald
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
Carol McDonald
 

More from Carol McDonald (13)

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
 
CU9411MW.DOC
CU9411MW.DOCCU9411MW.DOC
CU9411MW.DOC
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 

Recently uploaded

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 

Recently uploaded (20)

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 

Advanced Threat Detection on Streaming Data

  • 1. ® © 2016 MapR Technologies 1® © 2016 MapR Technologies 1© 2016 MapR Technologies ® Advanced Threat Detection on Streaming Data Carol McDonald, Solution Architect Strata + Hadoop World March 2016
  • 2. ® © 2016 MapR Technologies 2® © 2016 MapR Technologies 2 Meeting Advanced Threats Head On •  Solutionary: Managed Security Services Provider –  Provides Threat Intelligence as a Service
  • 3. ® © 2016 MapR Technologies 3® © 2016 MapR Technologies 3 Real-time Detection of Advanced Threats •  Objective: –  Provide real time threat Intelligence on trillions of messages per year –  Store and process lots of unstructured security data –  Combine machine learning and predictive analytics
  • 4. ® © 2016 MapR Technologies 4® © 2016 MapR Technologies 4 Event-based Detection of Advanced Threats Threat Alerts Store and Process Unstructured Data Anomaly Detection Real-time Threat Intelligence Predictive Analytics Machine Learning
  • 5. ® © 2016 MapR Technologies 5® © 2016 MapR Technologies 5 Meeting Advanced Threats Head On •  Challenges: –  Expanding Data storage in RDBMS expensive $$ –  Could not process unstructured data at scale Scaling Unstructured Data Processing Challenges RDBMS Economics Unstructured Data
  • 6. ® © 2016 MapR Technologies 6® © 2016 MapR Technologies 6 Serve DataStore DataCollect Data What Did The Solution Need to do ? Process DataData Sources ? ? ? ? Security Feeds HTTP Syslog Firewall Other
  • 7. ® © 2016 MapR Technologies 7® © 2016 MapR Technologies 7 How to do this with High Performance at Scale? •  Parallel , Partitioned = fast , scalable
  • 8. ® © 2016 MapR Technologies 8® © 2016 MapR Technologies 8 Data Ingest Solution: Stream Processing Architecture Topics Sources Security Feeds HTTP Syslog Firewall Other Data Ingest: •  Kafka or MapR Streams: fast distributed messaging Topics Topics Topics
  • 9. ® © 2016 MapR Technologies 9® © 2016 MapR Technologies 9 Fast Distributed Messaging •  Topics organize events into categories •  Topics decouple producers from Consumers
  • 10. ® © 2016 MapR Technologies 10® © 2016 MapR Technologies 10 Fast Distributed Messaging •  Topics are partitioned for fast throughput and scalability
  • 11. ® © 2016 MapR Technologies 11® © 2016 MapR Technologies 11 How to do this with High Performance at Scale? •  Parallel , Partitioned: –  Messaging
  • 12. ® © 2016 MapR Technologies 12® © 2016 MapR Technologies 12 Data Ingest Complex Event Processing with Storm and Esper Stream Processing Parser Bolt Kafka Spout Enrich Bolts Esper Kakfa Bolt Esper Spout Topic Alert Bolts Cross topology correlation of events •  Stream Processing: –  Storm: distributed real time computation –  Esper: Complex Event Processing Topics Topics Topics
  • 13. ® © 2016 MapR Technologies 13® © 2016 MapR Technologies 13 Complex Event Processing with Esper •  Detect a related set or pattern of events within a time window •  Example Pattern Excess Login Failure: –  Same user, same source login failure SELECT * FROM Event(ip_src IS NOT NULL AND ec_activity=’Logon’ AND ec_outcome = ‘Failure’) .std:groupwin(ip_src).win:time (300 sec) GROUP BY ip_src HAVING COUNT(*) = 10
  • 14. ® © 2016 MapR Technologies 14® © 2016 MapR Technologies 14 How to do this with High Performance at Scale? •  Parallel , Partitioned: –  Processing
  • 15. ® © 2016 MapR Technologies 15® © 2016 MapR Technologies 15 Real-time Detection of Advanced Threats: Examples Data transferred from critical database servers Large traffic flows from a host to a given IP address Employee accessing database servers at unusual hours User logging in from two different countries within a short window
  • 16. ® © 2016 MapR Technologies 16® © 2016 MapR Technologies 16 Complex Event Processing with Storm and Esper Cross-topology correlation of events
  • 17. ® © 2016 MapR Technologies 17® © 2016 MapR Technologies 17 NoSQL Storage Solution: Stream Processing Architecture Stream Processing MapR-FS MapR-DB HDFS Bolt Index Bolt HBase Bolt •  NoSQL Storage –  HBase: fast scalable storage and caching –  Elastic Search: Indexing for real- time search analytics
  • 18. ® © 2016 MapR Technologies 18® © 2016 MapR Technologies 18 Scalability with HBase (MapR-DB) Key colB col C val val val xxx val val Key colB col C val val val xxx val val Key colB col C val val val xxx val val Storage ModelRDBMS HBase Normalized schema à Joins for queries can cause bottleneck De-normalized schema à Data that is read together is stored together
  • 19. ® © 2016 MapR Technologies 19® © 2016 MapR Technologies 19 MapR-DB (HBase API) is Designed to Scale Key Range xxxx xxxx Key Range xxxx xxxx Key Range xxxx xxxx Key colB col C val val val xxx val val Key colB col C val val val xxx val val Key colB col C val val val xxx val val Fast Reads and Writes by Key! Data is automatically partitioned by Key Range!
  • 20. ® © 2016 MapR Technologies 20® © 2016 MapR Technologies 20 How to do this with High Performance at Scale? •  Parallel , Partitioned: –  Storage
  • 21. ® © 2016 MapR Technologies 21® © 2016 MapR Technologies 21 NoSQL Storage Solution: Stream Processing Architecture MapR-FS MapR-DB •  Machine Learning –  thread modeling –  anomaly detection •  Security Analytics Serve Data
  • 22. ® © 2016 MapR Technologies 22® © 2016 MapR Technologies 22 Data Driven Forensics Investigation •  What can the data tell us? –  What happened within a time range? –  How did the threat get in? –  What are all the activities associated with a specific IP/user? –  How much data was affected? –  Has this occurred elsewhere in the past?
  • 23. ® © 2016 MapR Technologies 23® © 2016 MapR Technologies 23 Solution: Stream Processing Architecture
  • 24. ® © 2016 MapR Technologies 24® © 2016 MapR Technologies 24 Key to Real Time: Event-based Data Flows Key to Scale = Parallel Partitioned: •  Messaging •  Processing •  Storage
  • 25. ® © 2016 MapR Technologies 25® © 2016 MapR Technologies 25 Stream Processing Building a Complete Data Architecture Sources/Apps Bulk Processing Web-Scale Storage MapR-FS MapR-DB MapR Streams Event StreamingDatabase
  • 26. ® © 2016 MapR Technologies 26® © 2016 MapR Technologies 26 Key to Real Time: ConvergenceApps High Availability Data Protection Unified Security Real Time Multi-tenancy UnifiedManagement&Monitoring Customer ExperienceData Architecture Optimization Security Investigation & Event Management Operational Intelligence Managed Services & Custom Apps Event Streaming Database Storage Converged Data Platform
  • 27. ® © 2016 MapR Technologies 27® © 2016 MapR Technologies 27 Why Hadoop for Security Analytics? •  Cost effective for storing and analyzing large volumes of data in real-time •  Provides search & query, machine learning for activity correlation and anomaly detection •  When it comes to Hadoop, select an enterprise distribution (e.g. MapR Converged Data Platform) so you can focus on your primary objective
  • 28. ® © 2016 MapR Technologies 28® © 2016 MapR Technologies 28 To Learn More: •  http://learn.mapr.com/
  • 29. ® © 2016 MapR Technologies 29® © 2016 MapR Technologies 29 To Learn More: •  Download example code –  https://github.com/caroljmcdonald/mapr-streams-sparkstreaming-hbase •  Read explanation of example code –  https://www.mapr.com/blog/spark-streaming-hbase
  • 30. ® © 2016 MapR Technologies 30® © 2016 MapR Technologies 30 Q&A @mapr https://www.mapr.com/blog/author/carol-mcdonald Engage with us! mapr-technologies