DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs

Washington DC Area Apache
Spark Interactive
Spark Bake-off
Team Name: Silvio Fiorito
Solution Title: Real-time Packet Analysis using Spark

Spark Bake-off
Page: 2
Team Introductions
 Silvio Fiorito
– Background in development and app security
– Started working with Hadoop in 2012
– Started using Spark at v0.6 in early 2013
– Built a few prototypes for low-latency query
services with Spark/Shark and then
SparkSQL
– Twitter: @granturing

Spark Bake-off
Page: 3
Solution Overview
 Real-time TCP packet analysis of geographically
distributed hosts
– Must support high throughput from many hosts
– 3 demo VMs ( 2 x Azure & 1 x AWS)
 Local Flume agent pushes events to Azure Event Hub
 Events are partitioned and persisted up to 7 days
 Spark Streaming app ingests streams
– Reconstruct packets
– Lookups for geo-ip and port description
– Clusters using pre-trained k-means model
– Saves data to Azure Table Storage and publishes on
Service Bus Topic

Spark Bake-off
Page: 4
Solution Overview

Spark Bake-off
Page: 5
Sample Dashboard with Power BI

Spark Bake-off
Page: 6
Final Comments & Questions
 With more time
– Add true anomaly detection with MLLib
– Test on hosts with real traffic
– Wire up end-to-end with d3.js viz and
SparkSQL backend
– Integrate with existing IDS/IPS rules
– Bad IPs lookup

DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs

Similar to DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs (20)

Recently uploaded

Recently uploaded (20)

DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs