© 2016 Interset Software Inc. 1
© 2016 Interset Software Inc.
CATCHING BAD GUYS WITH MATH:
REAL WORLD DATA SCIENCE USE CASES FOR
CYBERATTACK DETECTION AND PREVENTION
Machine Learning Ottawa Meetup
Ottawa, September 2016
For internal use only. Confidential.
© 2016 Interset Software Inc. 2
Hey. I’m Stephan Jou. I like analytics.
• CTO at Interset
• Previously: Cognos and IBM’s Business
Analytics CTO Office
• Big data analytics, visualization, cloud,
predictive analytics, data mining, neural
networks, mobile, dashboarding and
semantic search
• M.Sc. in Computational Neuroscience
and Biomedical Engineering, and a dual
B.Sc. in Computer Science and Human
Physiology, all from the University of
Toronto
© 2016 Interset Software Inc. 3
Enterprise Threat
Detection Through The
Science Of
Behavioral Analytics
What	I	Do
© 2016 Interset Software Inc. 4
About Interset
At Interset, we catch bad guys with math.
• Data science and machine learning on big data
analytics technologies
• Cover multiple cybersecurity use cases
• Based in Ottawa, ON
• Award winning threat detection platform
• Successful deployments across multiple verticals
• Clients include US Intelligence Communities.
And a leader in security analytics.
© 2016 Interset Software Inc. 5
Effective and Principled Math
© 2016 Interset Software Inc. 6
Highly Scalable
© 2016 Interset Software Inc. 7
Intuitive and Consumable
© 2016 Interset Software Inc. 8
What	Do	These	Attacks	Have	in	Common?
© 2016 Interset Software Inc. 9
Lessons:
• Make	sure	your	partners	are	secure
– Hacked	(SQL	Injection)	a	partner	with	a	weak	
network
– Stole	user	names	and	passwords
• Identities	&	machines	are	“entities”	
– They	acted	in	highly	anomalous	ways
– Moved	large	amounts	of	data
– Moved	data	to	exfiltration	points
– At	four	companies	and	the	US	Army!
There	was	plenty	of	evidence	
and	time	if	only	it	was	visible!
“if we do this right, we will make a million
dollars each…” “we could have already sold
them for Bitcoins which would have been
untraceable if we did it right. It could have
already been easily an easy 50 grand.”
Who	Are	These	Guys?
© 2016 Interset Software Inc. 10
Lessons:
• Disgrunted insiders	employees	can	be	
at	risk
• What	were	the	anomalies?
– Copied	16,000	documents	within	five	
days	of	receiving	severance
There	was	plenty	of	evidence	and	time	if	
only	it	was	visible!
Who	Are	These	Two?
© 2016 Interset Software Inc. 11
Lessons:
• Most	attacks	are	from	users/identities	
with	proper	access
• Attacker	stayed	under	the	radar	for	
years
• Third	parties	(US	Intelligence)	most	
often	uncovers	the	attack
• What	were	the	anomalies?
– Accessing	data	not	related	to	his	job
– Moving	data	in	ways	that	same	role	users	
were	not	– over	time
– Money	problems
There	was	plenty	of	evidence	and	time	
if	only	it	was	visible!
And	This	Guy?
© 2016 Interset Software Inc. 12
Attackers are Fast
Defenders are Slow
Alert Overload
AV-test.org, 2015
64% of	US	companies	
face	10,000+	alerts
per	month
203	Days	- Breaches	
take	on	average	80	
days	to	discover	and	
123	days	to	resolve
60%	
of	data	is	
stolen	in	hours
54%	
Of	breaches	
remain	
undiscovered	
after	6	months
Ponemon Institute, 2015
Missed Incidents
8%	of	incidents	are	
detected	by	
endpoint,	firewall	&	
network	solutions
Verizon DBIR,2014
Drowning	In	Data	And	Missing	Incidents
© 2016 Interset Software Inc. 13
Advanced Threats = Enterprise Where’s “Bad” Waldo
© 2016 Interset Software Inc. 14
Advanced Threats = Enterprise Where’s “Bad” Waldo
© 2016 Interset Software Inc. 15
Kung Fu Move #1: (Big) Data
© 2016 Interset Software Inc. 16
Mandiant APT Attack Lifecycle
© 2016 Interset Software Inc. 17
Standard Approach: Active Directory and Firewall
Identity Store:
Unusual login behavior
Geo-velocity
Network Flow:
High risk exfiltrations
© 2016 Interset Software Inc. 18
Better Data = Faster and Better Detection
Identity Store:
Unusual login behavior
Geo-velocity
Identity Store:
Unusual machine logins
Endpoint:
Unusual data transfers, access
Large local data storage
Endpoint:
Unusual scheduled tasks
Unusual registry writes
Endpoint:
Unusual command line tools
Unusual registry reads
Endpoint:
Unusual network share access
Identity Store:
Unusual machine access
Endpoint:
Unusual use of compression
Unusual, large data transfer
Endpoint:
Unusual command line tools
Data Repositories:
Unusual or dangerous access
Unusual lateral movement
Dangerous behavioral changes
Network Flow:
High risk exfiltrations
© 2016 Interset Software Inc. 19
Kung Fu Move #2: Math
Source: Competing on Analytics, Davenport and Harris, 2007
Standard Reporting
Ad hoc Reporting
Query/Drill Down
Alerts
Forecasting
Simulation
Predictive Modeling
In memory data, fuzzy search, geo spatial
Causality, probabilistic, confidence levels
High fidelity, games, data farming
Larger data sets, nonlinear regression
Rules/triggers, context sensitive,complex events
Query by example, user defined reports
Real time, visualizations, user interaction
Traditional
Optimization Decision complexity, solution speed
NewData
Entity Resolution
Annotationand Tokenization
Relationship, Feature Extraction
People, roles, locations, things
Rules, semantic inferencing, matching
Automated, crowd sourced
Optimization under Uncertainty Quantifying or mitigating risk
Adaptive Analysis
Continual Analysis Responding to local change/feedback
Responding to context
NewMethods
Today.
© 2016 Interset Software Inc. 20
Standard Approach – Rules and Thresholds
A Pattern for Increased Monitoringfor Intellectual Property Theft by
Departing Insiders, Andrew Moore, Carnegie Mellon 2011
© 2016 Interset Software Inc. 21
The Threshold Approach Challenge
Abnormal
Normal
© 2016 Interset Software Inc. 22
The Threshold Approach Challenge
Abnormal
Normal
© 2016 Interset Software Inc. 23
The Threshold Approach Challenge
Abnormal
Normal
© 2016 Interset Software Inc. 24
A Probabilistic Approach
• Computes probability that a value in
a given hour is anomalous
• Bayesian approach
• Explicitly models both normal and
abnormal distributions
• Gaussian, Gamma
• Estimators for both normal and
abnormal based on observation
© 2016 Interset Software Inc. 25
USB drives are marked as
high risk method
Method
The volume of copying is large, compared
to John’s past 30 days and
compared to other sysadmins
Activity
John Sneakypants is a
contractor and sysadmin
with privileged access
User/Machine
These files have a high risk
and importance value
Asset
Behavioral Risk
John Sneakypants is copying an unusually large number of
sensitive files to an external USB drive.
© 2016 Interset Software Inc. 26
Aggregating Behaviors for Entity Risk
Activity
User/Machine Asset Method
Behavioral
Risk Score
Rentity = importance(t)×vulnerability(t)
User
Asset
Machine
Rbehavior = P(event | y)× wy ×
wu 2−i
⋅ Ru[i]
u∈U
∑ + wf 2− j
⋅ Rf [ j]
f ∈F
∑ + wm 2−k
⋅ Rm[k]
m∈M
∑
&
'
(
(
)
*
+
+
wu + wf + wm
© 2016 Interset Software Inc. 27
Activity
User/Machine Asset Method
Behavioral
Risk Score
Rentity = importance(t)×vulnerability(t)
User
Asset
Machine
Rbehavior = P(event | y)× wy ×
wu 2−i
⋅ Ru[i]
u∈U
∑ + wf 2− j
⋅ Rf [ j]
f ∈F
∑ + wm 2−k
⋅ Rm[k]
m∈M
∑
&
'
(
(
)
*
+
+
wu + wf + wm
Aggregating Behaviors for Entity Risk
• John Compromised is accessing an unusual, important network share 25
• … at a time of day he was almost never active at before 46
• … and took from a source code project that has been inactive for months 80
• … and just copied an unusual amount of sensitive files to a cloud service 96
© 2016 Interset Software Inc. 28
Example of a Story
© 2016 Interset Software Inc. 29
Data Feature	Derivation
Roybatty had	30	login	failures	on	Sharepoint-Finance	
Roybatty logged	into	Corp32_ActiveDirectory	
Roybatty took	2.5GB	from	 Sharepoint-Finance
Roybatty launched	 telnet	from	the	command	 line
Roybatty launched	 psexec from	the	command	 line
(other	features)
(other	features)
(other	features)
Active	Directory	Logs
SharePoint	Logs
Endpoint	Logs
Anomaly	Detection
Login	failure	anomaly	model
Destination	access	anomaly	model
Story	Aggregation
∑
𝑝"
𝑝#
𝑝$
𝑝%
𝑝&
𝑤"
𝑤#
𝑤$
𝑤%
𝑤&
0.80Volumetric	models
Executable	
Anomaly	Models
Multi-Level	Risk	Scoring	for	Reduced	Noise	and	False	Positives
© 2016 Interset Software Inc. 30
Process Pipeline
Use	Case Data
Exploratory	
Data	
Analysis
Production	
Deployment
Results
• Acquisition
• Cleanup
• Normalization
• Semantics
• Feature engineering
• Model design
• Model validation
• Object model
• Data ingest
• Model development
• Test and performance
© 2016 Interset Software Inc. 31
Exploratory Data Analysis
• 30 days of historical log data
from a new data source
• Interview and validate with
stakeholders
• Validate or tune existing
models
• Design and apply new models
• Dataset management
• Feature engineering
• Documentation
• R and R Studio
• R Markdown
Data
Acquisition
Data
Exploration
Feature
Engineering
Model
Design
Model
Validation
© 2016 Interset Software Inc. 32
Production	 Deployment
• Update ingest and object
model
• R to Spark and Phoenix
code
• Models to batch and real-
time
• Model efficacy
• Scalability and performance
• Flume and Kafka
• Spark, Phoenix and HBase
• Elastic Search and Kibana
Object Model
Data Ingest
Model
Development
Model
Development
Test and
Performance
© 2016 Interset Software Inc. 33
ANALYTICS REPORTING
Technology Stack for Big Data Analytics
S3
nginx
Cassandra
Flow
Ingest
(Flume)
Kafka
Active	
Directory
Workflow	
(Storm)
HDFS
Analytics
Phoenix	+	
HBase
Spark
Index
Elasticsearch
Auth	DB	
(H2)
Investigator
nginx
Kibana
AgentsSensors	
(real-time)
Connecto
rs
Connectors	
(batch)
ZooKeeper/reports
/v1
/v2
/rules
/webadmin
/api
/api/es
/api/flow
/login
/investigator
Endpoint	events
Endpoint	entities
Analytics
Models
Stories
Risk	scores
Events
© 2016 Interset Software Inc. 34
Use Case #1: $20B Manufacturer
X
2 Engineers
stole data
1 Year
$1 Million Spent
Large security
vendor failed to
find anything
2 Weeks
Easily
identified the 2
Engineers
Found 3
additional users
stealing data in
North America
Found 8
additional users
stealing data in
China
© 2016 Interset Software Inc. 35
2014/02/12 04:30:10 954a270da18f628a3bc955feaefdc82c@e12fa0ef845ea7dbbbe95a383b18ed78 10.43.32.127/10.222.224.96 diff 279fe92803963ed2938bb3b9ae7627d9/1a6f0f1fa86a9f3bab6b0c45c08cb1b8
2014/02/12 04:30:10 954a270da18f628a3bc955feaefdc82c@e12fa0ef845ea7dbbbe95a383b18ed78 10.43.32.127/10.222.224.96 diff 279fe92803963ed2938bb3b9ae7627d9/1a6f0f1fa86a9f3bab6b0c45c08cb1b8
2014/02/12 04:30:13 aa97936fbf29c14e368bba98731b91cd@9e551a32654b80e97a6d13314e76d898 10.43.32.127/10.231.216.182 sync 01659aa0f0788e3bccd7820f597b57ea/702dc192ef623c4b3059081dbdfafdb8
2014/02/12 04:30:14 04f0663486333d41b1d3b74d6832b931@32e96546e5d6e3c7cd400d8248e8c763 10.43.32.127/10.222.224.128 diff 72c6e05c9738cf7c2ac4c4cbbbc24de4/562d059099b70ce2bdd561b73295f811
2014/02/12 04:30:14 04f0663486333d41b1d3b74d6832b931@32e96546e5d6e3c7cd400d8248e8c763 10.43.32.127/10.222.224.128 diff 72c6e05c9738cf7c2ac4c4cbbbc24de4/29946458dbc115c28d45c55fe697ad2f
2014/02/12 04:30:14 aa97936fbf29c14e368bba98731b91cd@9e551a32654b80e97a6d13314e76d898 10.43.32.127/10.231.216.182 sync 01659aa0f0788e3bccd7820f597b57ea/1fc3a0f36eaf8ea6775f1c1da6feb7f0
2014/02/12 04:30:20 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/726a29f7e17e53ef832839cc56293c26
2014/02/12 04:30:31 954a270da18f628a3bc955feaefdc82c@e12fa0ef845ea7dbbbe95a383b18ed78 10.43.32.127/10.222.224.96 diff 279fe92803963ed2938bb3b9ae7627d9/284d07c70388732120f16a478a461832
2014/02/12 04:30:40 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/18bdc00d72c208a40c40f9a195965a89
2014/02/12 04:30:40 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/726a29f7e17e53ef832839cc56293c26
2014/02/12 04:30:41 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/c8fdcc251ef840b873f478689700371c
2014/02/12 04:30:41 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/5ac41d116ced8a2fad2475d9c95340cd
2014/02/12 04:30:41 322afe32500a00443f0485263cd5b35b@7b9842f6dc6385575550abb52f49c3fc 10.43.32.127/10.43.84.71 print b26db004f30ff812cdeec33552f7e41a/726a29f7e17e53ef832839cc56293c26
2014/02/12 04:30:41 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/ca0d8e5cb51627a18583c2649a126f1a
2014/02/12 04:30:42 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/c05f02c9804108cc1221d9e76e39f604
2014/02/12 04:30:42 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/a1c7c11b8eaa260ee2abe1fbba926198
2014/02/12 04:30:42 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/d8d6ac09c3d92b1c013cc92e3ffb0dc7
Timestamp Engineer Account IP Address
Action
(diff, sync, print, etc.)
Resource
(project, folder, file, etc.)
The Data: IP Access Logs
© 2016 Interset Software Inc. 36
The Math: 20+ Probabilistic Threat Models
Four classes of our threat models:
• Sneaking: Doing something out of pattern, unusual time
• Hoarding: Taking more than expected
• Mooching: Dangerous change in give versus take
• Wandering: Accessing unusual or unexpected IP
© 2016 Interset Software Inc. 37
Model: Unusual times
• Monitor, for each user, activity of
interest (e.g. start times of when a
file or window is brought into focus)
• Active times used as input into
Gaussian kernel density estimators
• Times that contain 95% of activity
deemed to be “normal”
• P(y is bad) at a given time is ratio of
expected activity to 95% activity line
© 2016 Interset Software Inc. 38
Model: Anomalous Activity Patterns
Engineer 1
• Regularly works six days a week (takes
Sundays off)
• Slight dip during lunches
Engineer 2
• Works five days a week
• Particularly active on Thursdays
© 2016 Interset Software Inc. 39
Model: Anomalous Activity Patterns
Engineer 1
• Starts work fairly early in morning
• Early lunch break
• Sometimes works past midnight
Engineer 2
• Doesn’t work as long hours as User 1
• 9 to 5’er
• Has occasionally worked a little bit after
8pm
© 2016 Interset Software Inc. 40
Model: Unusual IP Access
• Nodes: source code projects
• Edges: engineers accessing
projects
• Edge length: probability of project
access
• Louvain clustering surfaced four
large software groups or teams
• Access between team clusters is
quantifiably abnormal
© 2016 Interset Software Inc. 41
Use Case #2: Defense Contractor
High Probability Anomalous Behavior
Models
• Detected large copies to the portable
hard drive, at an unusual time of day
• Bayesian models to measure and detect
highly improbable events
High Risk File Models
• Detected high risk files, including
PowerPoints used to collect large
amounts of inappropriate content
• Risk aggregation based on suspicious
behaviors and unusual derivative
movement
© 2016 Interset Software Inc. 42
Use Case #3: Life Sciences Company
• Models detected and alerted security that
an IT Administrator was sending
anomalously large amounts of data in
Gmail attachments
• Further investigation determined that the
employee’s behavior was that of
someone planning to leave the company
• Employee admitted to be planning to
leave for a competitive company
• Employee was fired and all data
recovered
© 2016 Interset Software Inc. 43
Use Case #4: Media Company
• Analyzed cloud repository activity
of 1300 employees and 6000
external clients
• Detected anomalies including:
• Unusual collaborations
• Unusual internal and external
behaviors
• Out-of-country / out-of-city anomalies
• High risk download location
• Unexpected ISP/carrier usage
• Moocher and hoarder models
• High risk file take models
• Administrator and new device
classifiers
• Detected several suspect activities,
including a probable account
takeover
Multiple new device
registrations in non-
home country
Out-of-country takes
(only download,
previews – no uploads)
concurrent with home
country activity
Every downloaded file
had a high age (have
not been touched in a
long time)
1
2
3
© 2016 Interset Software Inc. 44
2014
Top	15	IT	Company
Predicting	Leaving	
Developers	by	
Analyzing	Source	
Code	Use
Use Cases Covered by Interset
2013
Leading	
Life	Science	 Co
Finding	Employees	
Leaving	with	
Company	Research
2013
ITSec Product	&	
Service	 Provider
Watching	Sensitive	Co	
Data	Assets	&	
Customer	Information
2015
Fortune	 1000	
Gaming	Company	
Monitoring	for	
Inside/Outside	
Targeted	 Attacks	on	IP
2014
Fortune	 200	
Pharmaceutical
Surfacing	Anomalies	
Between	 Geo	
Development	Teams
2014
Global	Capital	
Management	 Co
Tracking	Unpublished	
Research	&	Protecting	
Future	Positions
2014
Premier	 Hedge	
Fund	Co
Monitoring	
Compliance	&	
Tracking	Source	Code	
Trading	Algorithms
2014
Fortune	 500
Defense	 Contractor	
Tracking	Sensitive	
Docs	&	Verifying	
Data	Classifications
2014
Regional	 Savings	
Bank
Identifying	At-Risk	
Employees	Planning	
to	Quit	with	Data
2014
Intelligence	
Agency
Predict		&	
Investigate	Insider	
Threats	
2014
Large	 RX	Network
Provider
Monitoring
HIPAA Compliance	&	
High	Risk	Users
2014
Top	50	
Financial	 Firm
Tracking	Users	
That	download	Then	
Take	Their	Laptop	
Home
Insider	Threat
Continuous	
Monitoring
Insider	Threat Insider	Threat Insider	Threat Insider	Threat
Continuous	
Monitoring
Continuous	
Monitoring
Compliance Insider	Threat
2015
Global	Security	
Software	 Company
Detect	and	Alert	on	
Anomalies	Touching	
Sensitive	IP/Source	
Code
2015
Global	Security	
Software	 Company
Monitoring	&	Alerting	
to	Anomalous	activity	
touching	IP/Source	
Code
Insider	Threat Targeted	Attack	
Threat
Targeted	Attack	
Threat
Targeted	Attack	
Threat
© 2016 Interset Software Inc. 45
Conclusions
Two Kung Fu Moves
• The right data
• The right math
Data Science Implementation
• Define the data and IP sources
you are most concerned with
• Determine the source data that
captures the related events
• Data science can help prioritize
and focus
© 2016 Interset Software Inc. 46
© 2015 Interset Software Inc.
THANK YOU!
sjou@interset.com
eeksock

2016 09-19 - stephan jou - machine learning meetup v1

  • 1.
    © 2016 IntersetSoftware Inc. 1 © 2016 Interset Software Inc. CATCHING BAD GUYS WITH MATH: REAL WORLD DATA SCIENCE USE CASES FOR CYBERATTACK DETECTION AND PREVENTION Machine Learning Ottawa Meetup Ottawa, September 2016 For internal use only. Confidential.
  • 2.
    © 2016 IntersetSoftware Inc. 2 Hey. I’m Stephan Jou. I like analytics. • CTO at Interset • Previously: Cognos and IBM’s Business Analytics CTO Office • Big data analytics, visualization, cloud, predictive analytics, data mining, neural networks, mobile, dashboarding and semantic search • M.Sc. in Computational Neuroscience and Biomedical Engineering, and a dual B.Sc. in Computer Science and Human Physiology, all from the University of Toronto
  • 3.
    © 2016 IntersetSoftware Inc. 3 Enterprise Threat Detection Through The Science Of Behavioral Analytics What I Do
  • 4.
    © 2016 IntersetSoftware Inc. 4 About Interset At Interset, we catch bad guys with math. • Data science and machine learning on big data analytics technologies • Cover multiple cybersecurity use cases • Based in Ottawa, ON • Award winning threat detection platform • Successful deployments across multiple verticals • Clients include US Intelligence Communities. And a leader in security analytics.
  • 5.
    © 2016 IntersetSoftware Inc. 5 Effective and Principled Math
  • 6.
    © 2016 IntersetSoftware Inc. 6 Highly Scalable
  • 7.
    © 2016 IntersetSoftware Inc. 7 Intuitive and Consumable
  • 8.
    © 2016 IntersetSoftware Inc. 8 What Do These Attacks Have in Common?
  • 9.
    © 2016 IntersetSoftware Inc. 9 Lessons: • Make sure your partners are secure – Hacked (SQL Injection) a partner with a weak network – Stole user names and passwords • Identities & machines are “entities” – They acted in highly anomalous ways – Moved large amounts of data – Moved data to exfiltration points – At four companies and the US Army! There was plenty of evidence and time if only it was visible! “if we do this right, we will make a million dollars each…” “we could have already sold them for Bitcoins which would have been untraceable if we did it right. It could have already been easily an easy 50 grand.” Who Are These Guys?
  • 10.
    © 2016 IntersetSoftware Inc. 10 Lessons: • Disgrunted insiders employees can be at risk • What were the anomalies? – Copied 16,000 documents within five days of receiving severance There was plenty of evidence and time if only it was visible! Who Are These Two?
  • 11.
    © 2016 IntersetSoftware Inc. 11 Lessons: • Most attacks are from users/identities with proper access • Attacker stayed under the radar for years • Third parties (US Intelligence) most often uncovers the attack • What were the anomalies? – Accessing data not related to his job – Moving data in ways that same role users were not – over time – Money problems There was plenty of evidence and time if only it was visible! And This Guy?
  • 12.
    © 2016 IntersetSoftware Inc. 12 Attackers are Fast Defenders are Slow Alert Overload AV-test.org, 2015 64% of US companies face 10,000+ alerts per month 203 Days - Breaches take on average 80 days to discover and 123 days to resolve 60% of data is stolen in hours 54% Of breaches remain undiscovered after 6 months Ponemon Institute, 2015 Missed Incidents 8% of incidents are detected by endpoint, firewall & network solutions Verizon DBIR,2014 Drowning In Data And Missing Incidents
  • 13.
    © 2016 IntersetSoftware Inc. 13 Advanced Threats = Enterprise Where’s “Bad” Waldo
  • 14.
    © 2016 IntersetSoftware Inc. 14 Advanced Threats = Enterprise Where’s “Bad” Waldo
  • 15.
    © 2016 IntersetSoftware Inc. 15 Kung Fu Move #1: (Big) Data
  • 16.
    © 2016 IntersetSoftware Inc. 16 Mandiant APT Attack Lifecycle
  • 17.
    © 2016 IntersetSoftware Inc. 17 Standard Approach: Active Directory and Firewall Identity Store: Unusual login behavior Geo-velocity Network Flow: High risk exfiltrations
  • 18.
    © 2016 IntersetSoftware Inc. 18 Better Data = Faster and Better Detection Identity Store: Unusual login behavior Geo-velocity Identity Store: Unusual machine logins Endpoint: Unusual data transfers, access Large local data storage Endpoint: Unusual scheduled tasks Unusual registry writes Endpoint: Unusual command line tools Unusual registry reads Endpoint: Unusual network share access Identity Store: Unusual machine access Endpoint: Unusual use of compression Unusual, large data transfer Endpoint: Unusual command line tools Data Repositories: Unusual or dangerous access Unusual lateral movement Dangerous behavioral changes Network Flow: High risk exfiltrations
  • 19.
    © 2016 IntersetSoftware Inc. 19 Kung Fu Move #2: Math Source: Competing on Analytics, Davenport and Harris, 2007 Standard Reporting Ad hoc Reporting Query/Drill Down Alerts Forecasting Simulation Predictive Modeling In memory data, fuzzy search, geo spatial Causality, probabilistic, confidence levels High fidelity, games, data farming Larger data sets, nonlinear regression Rules/triggers, context sensitive,complex events Query by example, user defined reports Real time, visualizations, user interaction Traditional Optimization Decision complexity, solution speed NewData Entity Resolution Annotationand Tokenization Relationship, Feature Extraction People, roles, locations, things Rules, semantic inferencing, matching Automated, crowd sourced Optimization under Uncertainty Quantifying or mitigating risk Adaptive Analysis Continual Analysis Responding to local change/feedback Responding to context NewMethods Today.
  • 20.
    © 2016 IntersetSoftware Inc. 20 Standard Approach – Rules and Thresholds A Pattern for Increased Monitoringfor Intellectual Property Theft by Departing Insiders, Andrew Moore, Carnegie Mellon 2011
  • 21.
    © 2016 IntersetSoftware Inc. 21 The Threshold Approach Challenge Abnormal Normal
  • 22.
    © 2016 IntersetSoftware Inc. 22 The Threshold Approach Challenge Abnormal Normal
  • 23.
    © 2016 IntersetSoftware Inc. 23 The Threshold Approach Challenge Abnormal Normal
  • 24.
    © 2016 IntersetSoftware Inc. 24 A Probabilistic Approach • Computes probability that a value in a given hour is anomalous • Bayesian approach • Explicitly models both normal and abnormal distributions • Gaussian, Gamma • Estimators for both normal and abnormal based on observation
  • 25.
    © 2016 IntersetSoftware Inc. 25 USB drives are marked as high risk method Method The volume of copying is large, compared to John’s past 30 days and compared to other sysadmins Activity John Sneakypants is a contractor and sysadmin with privileged access User/Machine These files have a high risk and importance value Asset Behavioral Risk John Sneakypants is copying an unusually large number of sensitive files to an external USB drive.
  • 26.
    © 2016 IntersetSoftware Inc. 26 Aggregating Behaviors for Entity Risk Activity User/Machine Asset Method Behavioral Risk Score Rentity = importance(t)×vulnerability(t) User Asset Machine Rbehavior = P(event | y)× wy × wu 2−i ⋅ Ru[i] u∈U ∑ + wf 2− j ⋅ Rf [ j] f ∈F ∑ + wm 2−k ⋅ Rm[k] m∈M ∑ & ' ( ( ) * + + wu + wf + wm
  • 27.
    © 2016 IntersetSoftware Inc. 27 Activity User/Machine Asset Method Behavioral Risk Score Rentity = importance(t)×vulnerability(t) User Asset Machine Rbehavior = P(event | y)× wy × wu 2−i ⋅ Ru[i] u∈U ∑ + wf 2− j ⋅ Rf [ j] f ∈F ∑ + wm 2−k ⋅ Rm[k] m∈M ∑ & ' ( ( ) * + + wu + wf + wm Aggregating Behaviors for Entity Risk • John Compromised is accessing an unusual, important network share 25 • … at a time of day he was almost never active at before 46 • … and took from a source code project that has been inactive for months 80 • … and just copied an unusual amount of sensitive files to a cloud service 96
  • 28.
    © 2016 IntersetSoftware Inc. 28 Example of a Story
  • 29.
    © 2016 IntersetSoftware Inc. 29 Data Feature Derivation Roybatty had 30 login failures on Sharepoint-Finance Roybatty logged into Corp32_ActiveDirectory Roybatty took 2.5GB from Sharepoint-Finance Roybatty launched telnet from the command line Roybatty launched psexec from the command line (other features) (other features) (other features) Active Directory Logs SharePoint Logs Endpoint Logs Anomaly Detection Login failure anomaly model Destination access anomaly model Story Aggregation ∑ 𝑝" 𝑝# 𝑝$ 𝑝% 𝑝& 𝑤" 𝑤# 𝑤$ 𝑤% 𝑤& 0.80Volumetric models Executable Anomaly Models Multi-Level Risk Scoring for Reduced Noise and False Positives
  • 30.
    © 2016 IntersetSoftware Inc. 30 Process Pipeline Use Case Data Exploratory Data Analysis Production Deployment Results • Acquisition • Cleanup • Normalization • Semantics • Feature engineering • Model design • Model validation • Object model • Data ingest • Model development • Test and performance
  • 31.
    © 2016 IntersetSoftware Inc. 31 Exploratory Data Analysis • 30 days of historical log data from a new data source • Interview and validate with stakeholders • Validate or tune existing models • Design and apply new models • Dataset management • Feature engineering • Documentation • R and R Studio • R Markdown Data Acquisition Data Exploration Feature Engineering Model Design Model Validation
  • 32.
    © 2016 IntersetSoftware Inc. 32 Production Deployment • Update ingest and object model • R to Spark and Phoenix code • Models to batch and real- time • Model efficacy • Scalability and performance • Flume and Kafka • Spark, Phoenix and HBase • Elastic Search and Kibana Object Model Data Ingest Model Development Model Development Test and Performance
  • 33.
    © 2016 IntersetSoftware Inc. 33 ANALYTICS REPORTING Technology Stack for Big Data Analytics S3 nginx Cassandra Flow Ingest (Flume) Kafka Active Directory Workflow (Storm) HDFS Analytics Phoenix + HBase Spark Index Elasticsearch Auth DB (H2) Investigator nginx Kibana AgentsSensors (real-time) Connecto rs Connectors (batch) ZooKeeper/reports /v1 /v2 /rules /webadmin /api /api/es /api/flow /login /investigator Endpoint events Endpoint entities Analytics Models Stories Risk scores Events
  • 34.
    © 2016 IntersetSoftware Inc. 34 Use Case #1: $20B Manufacturer X 2 Engineers stole data 1 Year $1 Million Spent Large security vendor failed to find anything 2 Weeks Easily identified the 2 Engineers Found 3 additional users stealing data in North America Found 8 additional users stealing data in China
  • 35.
    © 2016 IntersetSoftware Inc. 35 2014/02/12 04:30:10 954a270da18f628a3bc955feaefdc82c@e12fa0ef845ea7dbbbe95a383b18ed78 10.43.32.127/10.222.224.96 diff 279fe92803963ed2938bb3b9ae7627d9/1a6f0f1fa86a9f3bab6b0c45c08cb1b8 2014/02/12 04:30:10 954a270da18f628a3bc955feaefdc82c@e12fa0ef845ea7dbbbe95a383b18ed78 10.43.32.127/10.222.224.96 diff 279fe92803963ed2938bb3b9ae7627d9/1a6f0f1fa86a9f3bab6b0c45c08cb1b8 2014/02/12 04:30:13 aa97936fbf29c14e368bba98731b91cd@9e551a32654b80e97a6d13314e76d898 10.43.32.127/10.231.216.182 sync 01659aa0f0788e3bccd7820f597b57ea/702dc192ef623c4b3059081dbdfafdb8 2014/02/12 04:30:14 04f0663486333d41b1d3b74d6832b931@32e96546e5d6e3c7cd400d8248e8c763 10.43.32.127/10.222.224.128 diff 72c6e05c9738cf7c2ac4c4cbbbc24de4/562d059099b70ce2bdd561b73295f811 2014/02/12 04:30:14 04f0663486333d41b1d3b74d6832b931@32e96546e5d6e3c7cd400d8248e8c763 10.43.32.127/10.222.224.128 diff 72c6e05c9738cf7c2ac4c4cbbbc24de4/29946458dbc115c28d45c55fe697ad2f 2014/02/12 04:30:14 aa97936fbf29c14e368bba98731b91cd@9e551a32654b80e97a6d13314e76d898 10.43.32.127/10.231.216.182 sync 01659aa0f0788e3bccd7820f597b57ea/1fc3a0f36eaf8ea6775f1c1da6feb7f0 2014/02/12 04:30:20 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/726a29f7e17e53ef832839cc56293c26 2014/02/12 04:30:31 954a270da18f628a3bc955feaefdc82c@e12fa0ef845ea7dbbbe95a383b18ed78 10.43.32.127/10.222.224.96 diff 279fe92803963ed2938bb3b9ae7627d9/284d07c70388732120f16a478a461832 2014/02/12 04:30:40 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/18bdc00d72c208a40c40f9a195965a89 2014/02/12 04:30:40 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/726a29f7e17e53ef832839cc56293c26 2014/02/12 04:30:41 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/c8fdcc251ef840b873f478689700371c 2014/02/12 04:30:41 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/5ac41d116ced8a2fad2475d9c95340cd 2014/02/12 04:30:41 322afe32500a00443f0485263cd5b35b@7b9842f6dc6385575550abb52f49c3fc 10.43.32.127/10.43.84.71 print b26db004f30ff812cdeec33552f7e41a/726a29f7e17e53ef832839cc56293c26 2014/02/12 04:30:41 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/ca0d8e5cb51627a18583c2649a126f1a 2014/02/12 04:30:42 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/c05f02c9804108cc1221d9e76e39f604 2014/02/12 04:30:42 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/a1c7c11b8eaa260ee2abe1fbba926198 2014/02/12 04:30:42 322afe32500a00443f0485263cd5b35b@7df6e0320568c974862e030ec447e74c 10.43.32.127/10.43.84.90 print b26db004f30ff812cdeec33552f7e41a/d8d6ac09c3d92b1c013cc92e3ffb0dc7 Timestamp Engineer Account IP Address Action (diff, sync, print, etc.) Resource (project, folder, file, etc.) The Data: IP Access Logs
  • 36.
    © 2016 IntersetSoftware Inc. 36 The Math: 20+ Probabilistic Threat Models Four classes of our threat models: • Sneaking: Doing something out of pattern, unusual time • Hoarding: Taking more than expected • Mooching: Dangerous change in give versus take • Wandering: Accessing unusual or unexpected IP
  • 37.
    © 2016 IntersetSoftware Inc. 37 Model: Unusual times • Monitor, for each user, activity of interest (e.g. start times of when a file or window is brought into focus) • Active times used as input into Gaussian kernel density estimators • Times that contain 95% of activity deemed to be “normal” • P(y is bad) at a given time is ratio of expected activity to 95% activity line
  • 38.
    © 2016 IntersetSoftware Inc. 38 Model: Anomalous Activity Patterns Engineer 1 • Regularly works six days a week (takes Sundays off) • Slight dip during lunches Engineer 2 • Works five days a week • Particularly active on Thursdays
  • 39.
    © 2016 IntersetSoftware Inc. 39 Model: Anomalous Activity Patterns Engineer 1 • Starts work fairly early in morning • Early lunch break • Sometimes works past midnight Engineer 2 • Doesn’t work as long hours as User 1 • 9 to 5’er • Has occasionally worked a little bit after 8pm
  • 40.
    © 2016 IntersetSoftware Inc. 40 Model: Unusual IP Access • Nodes: source code projects • Edges: engineers accessing projects • Edge length: probability of project access • Louvain clustering surfaced four large software groups or teams • Access between team clusters is quantifiably abnormal
  • 41.
    © 2016 IntersetSoftware Inc. 41 Use Case #2: Defense Contractor High Probability Anomalous Behavior Models • Detected large copies to the portable hard drive, at an unusual time of day • Bayesian models to measure and detect highly improbable events High Risk File Models • Detected high risk files, including PowerPoints used to collect large amounts of inappropriate content • Risk aggregation based on suspicious behaviors and unusual derivative movement
  • 42.
    © 2016 IntersetSoftware Inc. 42 Use Case #3: Life Sciences Company • Models detected and alerted security that an IT Administrator was sending anomalously large amounts of data in Gmail attachments • Further investigation determined that the employee’s behavior was that of someone planning to leave the company • Employee admitted to be planning to leave for a competitive company • Employee was fired and all data recovered
  • 43.
    © 2016 IntersetSoftware Inc. 43 Use Case #4: Media Company • Analyzed cloud repository activity of 1300 employees and 6000 external clients • Detected anomalies including: • Unusual collaborations • Unusual internal and external behaviors • Out-of-country / out-of-city anomalies • High risk download location • Unexpected ISP/carrier usage • Moocher and hoarder models • High risk file take models • Administrator and new device classifiers • Detected several suspect activities, including a probable account takeover Multiple new device registrations in non- home country Out-of-country takes (only download, previews – no uploads) concurrent with home country activity Every downloaded file had a high age (have not been touched in a long time) 1 2 3
  • 44.
    © 2016 IntersetSoftware Inc. 44 2014 Top 15 IT Company Predicting Leaving Developers by Analyzing Source Code Use Use Cases Covered by Interset 2013 Leading Life Science Co Finding Employees Leaving with Company Research 2013 ITSec Product & Service Provider Watching Sensitive Co Data Assets & Customer Information 2015 Fortune 1000 Gaming Company Monitoring for Inside/Outside Targeted Attacks on IP 2014 Fortune 200 Pharmaceutical Surfacing Anomalies Between Geo Development Teams 2014 Global Capital Management Co Tracking Unpublished Research & Protecting Future Positions 2014 Premier Hedge Fund Co Monitoring Compliance & Tracking Source Code Trading Algorithms 2014 Fortune 500 Defense Contractor Tracking Sensitive Docs & Verifying Data Classifications 2014 Regional Savings Bank Identifying At-Risk Employees Planning to Quit with Data 2014 Intelligence Agency Predict & Investigate Insider Threats 2014 Large RX Network Provider Monitoring HIPAA Compliance & High Risk Users 2014 Top 50 Financial Firm Tracking Users That download Then Take Their Laptop Home Insider Threat Continuous Monitoring Insider Threat Insider Threat Insider Threat Insider Threat Continuous Monitoring Continuous Monitoring Compliance Insider Threat 2015 Global Security Software Company Detect and Alert on Anomalies Touching Sensitive IP/Source Code 2015 Global Security Software Company Monitoring & Alerting to Anomalous activity touching IP/Source Code Insider Threat Targeted Attack Threat Targeted Attack Threat Targeted Attack Threat
  • 45.
    © 2016 IntersetSoftware Inc. 45 Conclusions Two Kung Fu Moves • The right data • The right math Data Science Implementation • Define the data and IP sources you are most concerned with • Determine the source data that captures the related events • Data science can help prioritize and focus
  • 46.
    © 2016 IntersetSoftware Inc. 46 © 2015 Interset Software Inc. THANK YOU! sjou@interset.com eeksock