Enancing Threat Detection with Big Data and AI

Enhancing Threat Detection
with Big Data and AI
Michael Armbrust & Burak Yavuz
@michaelarmbrust
Bay AreaCyber Security Meetup – April 2018

2
About Us
Engineers on the StreamTeam @
Committers / PMC Members on
Created:
• Spark SQL – High-level, declarative queries on big data
• Structured Streaming – Low latency, incremental processing
• Databricks Delta - Massive Scale, Transactional Cloud Storage

3
Fast and General Cluster Computing
Scala
SQL
EC2
Kubernetes
Automatic Parallelism and Fault-tolerance
GCP
YARN

4
Apache Spark Philosophy
Unified engine for complete
data applications
High-level user-friendly APIs
SQLStreaming ML Graph
…

5
Write Less Code: Compute an Average
private IntWritable one =
new IntWritable(1)
private IntWritable output =
new IntWritable()
proctected void map(
LongWritable key,
Text value,
Context context) {
String[] fields = value.split("t")
output.set(Integer.parseInt(fields[1]))
context.write(one, output)
}
IntWritable one = new IntWritable(1)
DoubleWritable average = new DoubleWritable()
protected void reduce(
IntWritable key,
Iterable<IntWritable> values,
Context context) {
int sum = 0
int count = 0
for(IntWritable value : values) {
sum += value.get()
count++
}
average.set(sum / (double) count)
context.Write(key, average)
}
data = sc.textFile(...).split("t")
data.map(lambda x: (x[0], [x.[1], 1]))
.reduceByKey(lambda x, y: [x[0] + y[0], x[1] + y[1]])
.map(lambda x: [x[0], x[1][0] / x[1][1]])
.collect()
5

6
Write Less Code: Compute an Average
Using RDDs
data = sc.textFile(...).split("t")
data.map(lambda x: (x[0], [x.[1], 1]))
.reduceByKey(lambda x, y: [x[0] + y[0], x[1] + y[1]])
.map(lambda x: [x[0], x[1][0] / x[1][1]])
.collect()
Using DataFrames
sqlCtx.table("people")
.groupBy("name")
.agg("name", avg("age"))
.collect()
6
Using SQL
SELECT name, avg(age)
FROM people
GROUP BY name

7
Why should I care?
As a business analyst, how does help me?
data scientist
APT HUNTER

8
Lets look at what has been
happing in AI…

9
Big Data was the Missing Link for AI
BIG DATA
Customer Data
Emails/Web pages
Click Streams
Sensor data (IoT)
Video/Speech
…
GREAT RESULTS

10
Hardest part of AI isn’t AI
“Hidden Technical Debt in Machine LearningSystems", Google NIPS2015
The hardest part of AI is Big Data
ML
Code

11
Does the same apply to
Security?

12
• Only a few weeks of data
• Very expensive to scale
• Proprietary formats
• No predictions (ML)
Messy data not ready
for analytics
DATA LAKE
Complex ETL
EDW
EDW
EDW Incidence
Response
Alerting
Reports
SIEM
Security Data @ Fortune 100 Company
SecurityInfrastructure
IDS/IPS, DLP, antivirus, load
balancers, proxy servers
Cloud Infrastructure& Apps
AWS, Azure, Google Cloud, Audit Logs
Servers Infrastructure
Linux, Unix, Windows
Network Infrastructure
Routers, switches, WAPs,
databases, LDAP
Threat Intelligence Feeds
TrillionsofRecords

13
DELTA
DATA LAKE
Reporting
Streaming
Analytics
The
LOW-LATENCY
of streaming
The
RELIABILITY&
PERFORMANCE
of data warehouse
The
SCALE
of data lake
The Delta Architecture

14
An Example Hunt
Raise your hand when you know what we are searching for…
spark.read.table("dns")
.where("len(query) > 50")
.groupBy(window("ts", "5 minutes"), "src_ip")
.count()
.where("count > 20")
Answer: DNS Exfiltration Attack

15
What about Future Threats?
Tune for a specific SLAs using the same code:
Batch
high latency
execute
on-demand
high throughput
Micro-batch
medium latency
efficient resource
allocation
high throughput
Continuous
millisecond latency
static resource
allocation

16
What about Future Threats?
Tune for a specific SLAs using the same code:
Batch
historical
hunting
Micro-batch
human-in-the-loop
alerts
Continuous
automatic
remediation

17
DNS Exfiltration Search
Rewritten as a streaming alert
spark.readStream.table("dns")
.where("len(query) > 50")
.groupBy(window("ts", "5 minutes"), "src_ip")
.count()
.where("count > 20")
.writeStream
.foreach(new PagerDutySink)

18
Demo: Hunting in
• Hosted Apache Spark in the Cloud
• Integrated Collaboration
• Monitoring / Alerting

Enancing Threat Detection with Big Data and AI

More Related Content

What's hot

Similar to Enancing Threat Detection with Big Data and AI

More from Databricks

Recently uploaded

Enancing Threat Detection with Big Data and AI